Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for OpenSearch as a database #300

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open

Conversation

daverigby
Copy link
Collaborator

Problem

Describe the purpose of this change. What problem is being solved and why?

Solution

Describe the approach you took. Link to any relevant bugs, issues, docs, or other resources.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Infrastructure change (CI configs, etc)
  • Non-code change (docs, etc)
  • None of the above: (explain here)

Test Plan

Describe specific steps for validating this change.

Copy link
Collaborator Author

@daverigby daverigby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. As discussed before, I would try to get some basic integretion tests working for OpenSearch - see tests/integration/test_pgvector.py. If we can get a local Docker image working then it should be possible to ru the tests against that.

To your quesiton on metadata filtering, YFCC makes uses of metadata.

# None specified, default to "vsb-<workload>"
self.index_name = f"vsb-{name}"

self.create_index()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to call create_index() here, it should be sufficient to just do it in initialise_population.

@daverigby daverigby changed the title Opensearch db Add support for OpenSearch as a database Feb 21, 2025
actions.append(action)
actions.append(vector_document)
# Bulk ingest documents
return self.client.bulk(body=actions,request_timeout=600)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: You don't need to retry the creation of the actions list - just move that into the head of insert_batch, and then have your do_insert_with_retry method just call self.client.bulk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants