-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds EP-10494: AI APIs enhancement proposal #10495
base: main
Are you sure you want to change the base?
Adds EP-10494: AI APIs enhancement proposal #10495
Conversation
docs/content/enhancements/10494.md
Outdated
|
||
This EP proposes adding support for AI Gateway APIs to the K8s Gateway. | ||
The AI Gateway APIs will enable users to route traffic to LLM providers while applying advanced policies, such as | ||
prompt enrichment, prompt guard, streaming, and user-invoked function calling. These APIs will be implemented using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @npolshakova - what is an example of user-invoked function calling surfaced in the API you proposed? You might have it already in the doc, but i couldn't find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! Added some additional examples in this commit: 5c3a41a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @npolshakova ! this looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal LGTM
docs/content/enhancements/10494.md
Outdated
kind: HTTPRoute | ||
name: openai | ||
ai: | ||
backupModels: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see the model is specified on the upstream, but backup models on the route? should they all be in once place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs/content/enhancements/10494.md
Outdated
promptGuard: | ||
request: | ||
webhook: | ||
host: 172.17.0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use upstream here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have an upstream ref in the webhook instead of specifying the host? I think that should work, right @EItanya @andy-fong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so because the webhook call is initiated from the ext_proc outside of envoy, so it cannot use envoy upstream cluster if that's what you mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless we front the webhook service with envoy but that will mean we are looping back: enovy -> ext_proc -> envoy -> webhook.
docs/content/enhancements/10494.md
Outdated
authToken: | ||
secretRef: | ||
name: openai-secret | ||
namespace: ai-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep secretRefs local? i.e. not support namespaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we just have the namespace because we're using orev1.LocalObjectReference
. I think we can scope it down to just the namespace. @EItanya
docs/content/enhancements/10494.md
Outdated
request: | ||
moderation: | ||
openai: | ||
authToken: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of authToken.SecretRef.{name/namespace}
can we splify with just secretName
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think so #10495 (comment)
|
||
A request going through the kgateway will follow the flow below: | ||
|
||
![ai gateway request flow](resourcesi_request_flow.png "ai gateway request flow") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![ai gateway request flow](resourcesi_request_flow.png "ai gateway request flow") | |
![ai gateway request flow](ai_request_flow.png "ai gateway request flow") |
|
||
- **Automatic Secret Integration:** kgateway reads the API key from a Kubernetes secret to handle authentication for requests on the specified path. | ||
- **Inline API Key:** The API key is included directly in the Upstream definition. | ||
- **Passthrough Mode:** The API key is passed through to the LLM provider as a header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
- **Passthrough Mode:** The API key is passed through to the LLM provider as a header. | |
- **Passthrough Mode:** The API key in the downstream request header is passed through to the LLM provider. |
The AI apis allow you to configure prompt guards to block unwanted requests to the LLM provider and mask sensitive data. | ||
For example, you can use the AI API RoutePolicy to configure a prompt guard that parses requests sent to the LLM | ||
provider to identify a regex pattern match. The AI gateway blocks any requests that contain the that pattern in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provider to identify a regex pattern match. The AI gateway blocks any requests that contain the that pattern in the | |
provider to identify a regex pattern match. The AI gateway blocks any requests containing that pattern in the |
ai: | ||
promptGuard: | ||
request: | ||
moderation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to have moderation
in this example? You have a separate example and explanation for moderation below, so this seems to be duplicated.
request should receive a streamed response. | ||
* Other providers, such as the Gemini and Vertex AI providers, change the path to determine | ||
streaming, such as the streamGenerateContent segment of the path in the Vertex AI streaming endpoint | ||
https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:streamGenerateContent?key=<key>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:streamGenerateContent?key=<key>. | |
`https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:streamGenerateContent?key=<key>`. |
based on the conversation history. The user then can execute those functions on the application side, and provide results | ||
back to the model. | ||
|
||
1. Fetching data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Fetching data: | |
Here are some common use cases for Function Calling: | |
1. Fetching data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I added 2 spaces at the end after :
because markdown by default ignore the single newline.
|
||
1. Fetching data: | ||
Retrieve data from internal systems before sending final response, like checking the weather or number of vacation days in an HR system. | ||
2. Taking action |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. Taking action | |
2. Taking action: |
Retrieve data from internal systems before sending final response, like checking the weather or number of vacation days in an HR system. | ||
2. Taking action | ||
Trigger actions based on the conversation, like scheduling meetings or initiating order returns. | ||
3. Building multi-step workflows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3. Building multi-step workflows | |
3. Building multi-step workflows: |
Trigger actions based on the conversation, like scheduling meetings or initiating order returns. | ||
3. Building multi-step workflows | ||
Execute multi-step workflows, like data extraction pipelines or content personalization. | ||
4. Interacting with Application UIs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4. Interacting with Application UIs | |
4. Interacting with Application UIs: |
spec: | ||
ai: | ||
vertexAi: | ||
model: gemini-1.5-flash-001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we mention what would happen if the URL or the request body contains the model value that's not the same as the model value in the Upstream?
Description
Adds Enhancement Proposal 10494 that proposes adding AI Gateway APIs support.
Supports #10494
Initial APIs can be found in this draft PR: #10493