-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multimodal embeddings #260
Comments
We don't have anything planned today, yet! So given the content blocks example, is the idea that you would accept an array of text or images, interleaved, and then embeddings would be generated based on the content? I assume this means the model would be a multimodal embedding model, like CLIP, for example? |
Yup that's exactly what I'm thinking. The ability to accept content blocks would help a lot with RAG applications as well, as the full retrieved documents could be sent directly the the LLM. |
Definitely open to exploring this. If you have a proposal for an interface for multimodal embeddings, definitely curious. I also realize the current structure of text-only vectorizers is a bit rigid. Might be a better solution to packaging support for text, image, and multimodal all in a single streamlined interface. Open to suggestions! |
@tylerhutcherson I created a proposal on how multimodal embeddings could work and added a reference implementation with VoyageAI. I created a Draft PR: #294 |
Is there an interface planned for multimodal embeddings? We'd love to contribute one that accepts interleaved text and images, similar to how Anthropic does content blocks.
The text was updated successfully, but these errors were encountered: