Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data plane wrapper and accept sparse indices as unsigned 32-bit integer #70

Merged
merged 17 commits into from
Mar 1, 2024

Conversation

rohanshah18
Copy link
Contributor

@rohanshah18 rohanshah18 commented Feb 26, 2024

Problem

  1. The SDK is missing a wrapper for data plane operations that helps improve user interaction.
  2. The sparse indices in Pinecone are unsigned 32-bit integer but since Java doesn't offer the support for it natively, so there is a datatype mismatch issue.

Solution

  1. Add a wrapper for data plane operations and broke down the input request objects to java native datatypes except for Struct. The wrapper contains operations for both blocking and the async stub. Also I have added overloaded methods so its easier for users to call the data plane operations.
    Example:
    public UpsertResponse upsert(UpsertRequest request)
    vs.
public UpsertResponse upsert(String id,
            List<Float> values,
            List<Long> sparseIndices,
            List<Float> sparseValues,
            com.google.protobuf.Struct metadata,
            String namespace)
  1. The datatype for sparse indices in Pinecone db is unsigned 32-bit integer so as a part of this PR, the sparse indices will be now accepted as Java long instead of Java int with the input range of unsigned 32-bit integer [0, 2^32 - 1] and everything outside of this range will throw PineconeValidationException. As a part of this change, I have added 3 classes relating to the Query Response in the unsigned_indices_model package since we don't want to edit the java generated classes:
  • QueryResponseWithUnsignedIndices.java (reflects QueryResponse)
  • ScoredVectorWithUnsignedIndices.java (reflects ScoredVector)
  • SparseValuesWithUnsignedIndices.java (reflects SparseValues)

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Infrastructure change (CI configs, etc)
  • Non-code change (docs, etc)
  • None of the above: (explain here)

Test Plan

Added unit and integration tests.

@rohanshah18 rohanshah18 marked this pull request as ready for review February 28, 2024 20:58
Copy link
Contributor

@jhamon jhamon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not done going through this, but thought I would submit some initial feedback for you to work on.

Copy link
Contributor

@jhamon jhamon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's some more feedback. Something to keep in mind in general is that namespaces are not used by everyone. We should have tests exercising the default and non-default namespace cases to make sure the usage looks reasonably concise for both use cases. Otherwise when we go to show how to do things in docs, it's going to look poor from a UX perspective.

I didn't have a chance to go line-by-line through all the tests, but since I made a lot of suggestions in the non-test code I'm expecting a lot of test code to change. I don't love how many tests are commented in this PR... as a general rule, we want to keep main in a shippable state at all times and disabling a large amount of tests seems like moving away from that goal but I leave it up to you. I'm sure you already know, but there's a substantial amount of testing needed to cover all the combinations of optional parameters and make sure we aren't going to NPE anywhere when nulls are being passed around.

@rohanshah18
Copy link
Contributor Author

rohanshah18 commented Feb 29, 2024

@jhamon Thanks for the detailed feedback. I have made all of the changes and my intention was to get to the level where our expectations are same about the thin-wrapper. I 100% agree that the tests should not be commented so instead of a separate PR for future stub, I'll push a commit tomorrow which will re-enable all of the tests.

@rohanshah18
Copy link
Contributor Author

The original plan was to split this PR into 3 individual PR's (data plane changes for blocking and future stubs, and converting sparseIndices to unsigned 32 bit int) so they are focused and easy to review but I have now merged them all into 1 PR.

Copy link
Contributor

@austin-denoble austin-denoble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing all of @jhamon's feedback, nice progress on getting the data wrapper in place.

Is there anything you'd like me to look at in relation to the integration tests? You unassigned the control plane integration ticket from me so I'm not sure, but it's clear we need some clean-up and hardening, specifically the collection tests if you're disabling them all. Let me know if there's anything there you'd like me to work on.

See my comments on the PineconeFutureDataPlaneClient file around having a super class for these two classes to consolidate some of the validation logic and let me know if that makes sense.

@rohanshah18 rohanshah18 merged commit 31036f0 into main Mar 1, 2024
6 checks passed
@rohanshah18 rohanshah18 deleted the rshah/dataPlaneWrapper branch March 1, 2024 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants