-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Support System Generated Ingest Pipeline/Processor #17509
Comments
Are you referring to Flow Framework? Wouldn't we just modify the Semantic Search Template (and related templates) with whatever improvements you're proposing? |
The proposal is about adding a semantic field, I think all the stuff can be covered by neural search plugin, why we need any code change to core? Can you elaborate? |
No. We want to further simplify it by providing a new field type where user just need to provide the model id during index creation. And we will auto add embedding fields to the index mapping, generate embeddings during the ingest and rewrite query against the embedding. Below is an example:
Then we will create the index like:
Then during ingest we will auto do text chunking and embedding generation. We don't want to create a concrete ingest pipeline and ask user to manage it. That's why we propose to internally create the ingest processor based on the index mapping and inject it to the ingest process. |
Yeah the proposal is about a new field type in neural search plugin. But we need the support from core to allow us inject the auto generated ingest processor to the ingest process. This can be a generic behavior that any plugin can systematically create an ingest processor based on the index and inject it to the ingest process. The main reason we need this is we want to do some ingest work without a real ingest pipeline. In this way we can simplify the neural search set up. |
Hi @dbwiddis @model-collapse. Do you still see any concern here? If you need more clarification you can take a look at: |
In your proposal @bzhangam, how are below embedding configuration provided?
|
Is your feature request related to a problem? Please describe
I'm working on a proposal in neural plugin to simply the neural search set up. We want to remove the step that user needs to set up an ingest pipeline to use ML model to generate the embedding.
Describe the solution you'd like
We propose to create a new field type semantic for original data. Then during indexing OpenSearch will check if there is a semantic field. If there is one it will automatically create an ingest processor and append it to the final ingest pipeline. If there is no final ingest pipeline then we will create a pipeline with that processor as the final ingest pipeline. This auto generated ingest processor is invisible to users and they don't need to manage it. In this solution we auto generate the ingest processor only based on the index configuration and we will limit the scope to that.
But we are also thinking should we set up a more generic solution for system generated ingest pipelines/processors for use cases that we want to auto generate them to simplify the user experience?
Related component
No response
Describe alternatives you've considered
No response
Additional context
[RFC] Support Semantic Field Type to Simplify Neural Search Set Up HLD
[RFC] Support Semantic Field Type to Simplify Neural Search Set Up LLD
The text was updated successfully, but these errors were encountered: