Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Azure AI Search Vector Store #17651

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ZachHandley
Copy link

Description

  • Updated Metadata so you can use either a metadata string (LlamaIndex) or a dict/JSON object where it will extract the default fields, and then assume the remainder is metadata and match your fields
  • Added new SearchField option configs for the _metadata_filterable_keys
  • Added new semantic_config (SemanticConfiguration | str | None) and semantic_config_name (I added semantic_config_name for backwards compatibility but)
  • Added new vector_search_profile (VectorSearchProfile | None) and kept the existing vector_search_profile_name for backwards compatiblity
  • Changed key_fields to doc_id_field (from doc_id_field_str) to support new str | SearchField
  • All changes should be backwards compatible in keeping in-line with LlamaIndex's docs
  • Added more helper methods to try to make it a bit easier to use, I may have gone a bit overboard, but feel free to scale back

All Tests Pass, but need to test with an actual Azure Vector Store I would imagine

Fixes # (issue)

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • No (I can!)

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No (I can!)

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.

  • I added new unit tests to cover this change
  • I believe this change is already covered by existing unit tests

Suggested Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran make format; make lint to appease the lint gods

- Updated Metadata so you can use either a metadata string (LlamaIndex) or a dict/JSON object where it will extract the default fields, and then assume the remainder is metadata and match your fields
- Added new SearchField option configs for the _metadata_filterable_keys
- Added new semantic_config (SemanticConfiguration | str | None) and semantic_config_name (I added semantic_config_name for backwards compatibility but)
- Added new vector_search_profile (VectorSearchProfile | None) and kept the existing vector_search_profile_name for backwards compatiblity
- Changed key_fields to doc_id_field (from doc_id_field_str) to support new str | SearchField
- Added more helper methods to try to make it a bit easier to use, I may have gone a bit overboard, but feel free to scale back

All Tests Pass, but need to test with an actual Azure Vector Store I would imagine
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jan 27, 2025
ZachHandley and others added 4 commits January 27, 2025 15:23
It wasn't in there originally so I presume it needs to be gone
Turns out the metadata was fine, hopefully this is a beneficial patch
@ZachHandley
Copy link
Author

ZachHandley commented Jan 28, 2025

So I guess Azure was returning metadata without ending brackets, so I took this on for not a ton of reason, but after checking my work I would remove the following:

  • The semantic config and Vector Search don't need to be given as parameters, they can be given in the SearchIndex (should I do this? It would be less user friendly) --and the client can hook that for the name and configuration. I'm just gonna swap it to an optional VectorSearch/SemanticSearch instead, as those contain all of the data, allowing for drop-in replacement, which is what I was going for anyways
  • The spread object I added I think was unnecessary

I still stand by the metadata improvements in terms of the filterable_metadata_field_keys being SearchField bases optionally as those are a 1:1, as well as the other QOL stuff like using the custom semantic config, but I can reduce the code (and I will be doing that)

Swap VectorSearchProfile for VectorSearch, SemanticSearchConfig for SemanticSearch
Rollback a few things

Works in my testing, but my vector indexes metadata was messed up
@logan-markewich
Copy link
Collaborator

Not sure if you are still updating this, but in the sake of keeping PRs smaller, going to go ahead and merge this #17683

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants