Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multimodal embeddings #260

Open
fzliu opened this issue Jan 22, 2025 · 4 comments
Open

Multimodal embeddings #260

fzliu opened this issue Jan 22, 2025 · 4 comments
Assignees

Comments

@fzliu
Copy link

fzliu commented Jan 22, 2025

Is there an interface planned for multimodal embeddings? We'd love to contribute one that accepts interleaved text and images, similar to how Anthropic does content blocks.

@tylerhutcherson
Copy link
Collaborator

We don't have anything planned today, yet! So given the content blocks example, is the idea that you would accept an array of text or images, interleaved, and then embeddings would be generated based on the content? I assume this means the model would be a multimodal embedding model, like CLIP, for example?

@tylerhutcherson tylerhutcherson self-assigned this Jan 24, 2025
@fzliu
Copy link
Author

fzliu commented Jan 24, 2025

Yup that's exactly what I'm thinking. The ability to accept content blocks would help a lot with RAG applications as well, as the full retrieved documents could be sent directly the the LLM.

@tylerhutcherson
Copy link
Collaborator

Definitely open to exploring this. If you have a proposal for an interface for multimodal embeddings, definitely curious. I also realize the current structure of text-only vectorizers is a bit rigid. Might be a better solution to packaging support for text, image, and multimodal all in a single streamlined interface. Open to suggestions!

@fzowl
Copy link
Contributor

fzowl commented Mar 6, 2025

@tylerhutcherson I created a proposal on how multimodal embeddings could work and added a reference implementation with VoyageAI. I created a Draft PR: #294

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants