This repository now includes an example of integrating a GPT Vision model with Azure AI Search. This feature enables indexing and searching images and graphs, such as financial documents, in addition to text-based content, and then sending the retrieved content to the GPT model for response generation.
- Document Handling: Source documents are split into pages and saved as PNG files in blob storage. Each file's name and page number are embedded for reference.
- Data Extraction: Text data is extracted using OCR.
- Data Indexing: Text and image embeddings, generated using Azure AI Vision (Azure AI Vision Embeddings), are indexed in Azure AI Search along with the raw text.
- Search and Response: Searches can be conducted using vectors or hybrid methods. Responses are generated by GPT vision model based on the retrieved content.
- Create a Computer Vision account in Azure Portal first, so that you can agree to the Responsible AI terms for that resource. You can delete that account after agreeing.
- The ability to deploy a gpt-4o model in the supported regions. If you're not sure, try to create a gpt-4o deployment from your Azure OpenAI deployments page.
- Ensure that you can deploy the Azure OpenAI resource group in a region where all required components are available:
- Azure OpenAI models
- gpt-35-turbo
- text-embedding-ada-002
- gpt-4o
- Azure AI Vision
- Azure OpenAI models
-
Update repository: Pull the latest changes.
-
Enable GPT vision approach:
First, make sure you do not have integrated vectorization enabled, since that is currently incompatible:
azd env set USE_FEATURE_INT_VECTORIZATION false
Then set the environment variable for enabling vision support:
azd env set USE_GPT4V true
When set, that flag will provision a Computer Vision resource and gpt-4o model, upload image versions of PDFs to Blob storage, upload embeddings of images in a new
imageEmbedding
field, and enable the vision approach in the UI. -
Clean old deployments (optional): Run
azd down --purge
for a fresh setup. -
Start the application: Execute
azd up
to build, provision, deploy, and initiate document preparation. -
- Access the developer options in the web app and select "Use GPT vision model".
- Sample questions will be updated for testing.
- Interact with the questions to view responses.
- The 'Thought Process' tab shows the retrieved data and its processing by the GPT vision model.
Feel free to explore and contribute to enhancing this feature. For questions or feedback, use the repository's issue tracker.