Vector DB Lookup is a vector search tool that allows users to search top k similar vectors from vector database. This tool is a wrapper for multiple third-party vector databases. The list of current supported databases is as follows.
Name | Description |
---|---|
Azure Cognitive Search | Microsoft's cloud search service with built-in AI capabilities that enrich all types of information to help identify and explore relevant content at scale. |
Qdrant | Qdrant is a vector similarity search engine that provides a production-ready service with a convenient API to store, search and manage points (i.e. vectors) with an additional payload. |
Weaviate | Weaviate is an open source vector database that stores both objects and vectors. This allows for combining vector search with structured filtering. |
This tool will support more vector databases.
-
For AzureML users, the tool is installed in default image, you can use the tool without extra installation.
-
For local users,
pip install promptflow-vectordb
The tool searches data from a third-party vector database. To use it, you should create resources in advance and establish connection between the tool and the resource.
-
Azure Cognitive Search:
- Create resource Azure Cognitive Search.
- Add "Cognitive search" connection. Fill "API key" field with "Primary admin key" from "Keys" section of created resource, and fill "API base" field with the URL, the URL format is
https://{your_serive_name}.search.windows.net
.
-
Qdrant:
- Follow the installation to deploy Qdrant to a self-maintained cloud server.
- Add "Qdrant" connection. Fill "API base" with your self-maintained cloud server address and fill "API key" field.
-
Weaviate:
- Follow the installation to deploy Weaviate to a self-maintained instance.
- Add "Weaviate" connection. Fill "API base" with your self-maintained instance address and fill "API key" field.
The tool accepts the following inputs:
-
Azure Cognitive Search:
Name Type Description Required connection CognitiveSearchConnection The created connection for accessing to Cognitive Search endpoint. Yes index_name string The index name created in Cognitive Search resource. Yes text_field string The text field name. The returned text field will populate the text of output. No vector_field string The vector field name. The target vector is searched in this vector field. Yes search_params dict The search parameters. It's key-value pairs. Except for parameters in the tool input list mentioned above, additional search parameters can be formed into a JSON object as search_params. For example, use {"select": ""}
as search_params to select the returned fields, use{"search": ""}
to perform a hybrid search.No search_filters dict The search filters. It's key-value pairs, the input format is like {"filter": ""}
No vector list The target vector to be queried, which can be generated by Embedding tool. Yes top_k int The count of top-scored entities to return. Default value is 3 No -
Qdrant:
Name Type Description Required connection QdrantConnection The created connection for accessing to Qdrant server. Yes collection_name string The collection name created in self-maintained cloud server. Yes text_field string The text field name. The returned text field will populate the text of output. No search_params dict The search parameters can be formed into a JSON object as search_params. For example, use {"params": {"hnsw_ef": 0, "exact": false, "quantization": null}}
to set search_params.No search_filters dict The search filters. It's key-value pairs, the input format is like {"filter": {"should": [{"key": "", "match": {"value": ""}}]}}
No vector list The target vector to be queried, which can be generated by Embedding tool. Yes top_k int The count of top-scored entities to return. Default value is 3 No -
Weaviate:
Name Type Description Required connection WeaviateConnection The created connection for accessing to Weaviate. Yes class_name string The class name. Yes text_field string The text field name. The returned text field will populate the text of output. No vector list The target vector to be queried, which can be generated by Embedding tool. Yes top_k int The count of top-scored entities to return. Default value is 3 No
The following is an example JSON format response returned by the tool, which includes the top-k scored entities. The entity follows a generic schema of vector search result provided by promptflow-vectordb SDK.
-
Azure Cognitive Search:
For Azure Cognitive Search, the following fields are populated:
Field Name Type Description original_entity dict the original response json from search REST API score float @search.score from the original entity, which evaluates the similarity between the entity and the query vector text string text of the entity vector list vector of the entity Output
[ { "metadata": null, "original_entity": { "@search.score": 0.5099789, "id": "", "your_text_filed_name": "sample text1", "your_vector_filed_name": [-0.40517663431890405, 0.5856996257406859, -0.1593078462266455, -0.9776269170785785, -0.6145604369828972], "your_additional_field_name": "" }, "score": 0.5099789, "text": "sample text1", "vector": [-0.40517663431890405, 0.5856996257406859, -0.1593078462266455, -0.9776269170785785, -0.6145604369828972] } ]
-
Qdrant:
For Qdrant, the following fields are populated:
Field Name Type Description original_entity dict the original response json from search REST API metadata dict payload from the original entity score float score from the original entity, which evaluates the similarity between the entity and the query vector text string text of the payload vector list vector of the entity Output
[ { "metadata": { "text": "sample text1" }, "original_entity": { "id": 1, "payload": { "text": "sample text1" }, "score": 1, "vector": [0.18257418, 0.36514837, 0.5477226, 0.73029673], "version": 0 }, "score": 1, "text": "sample text1", "vector": [0.18257418, 0.36514837, 0.5477226, 0.73029673] } ]
-
Weaviate:
For Weaviate, the following fields are populated:
Field Name Type Description original_entity dict the original response json from search REST API score float certainty from the original entity, which evaluates the similarity between the entity and the query vector text string text in the original entity vector list vector of the entity Output
[ { "metadata": null, "original_entity": { "_additional": { "certainty": 1, "distance": 0, "vector": [ 0.58, 0.59, 0.6, 0.61, 0.62 ] }, "text": "sample text1." }, "score": 1, "text": "sample text1.", "vector": [ 0.58, 0.59, 0.6, 0.61, 0.62 ] } ]