Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds EP-10494: AI APIs enhancement proposal #10495

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

npolshakova
Copy link
Contributor

Description

Adds Enhancement Proposal 10494 that proposes adding AI Gateway APIs support.

Supports #10494

Initial APIs can be found in this draft PR: #10493


This EP proposes adding support for AI Gateway APIs to the K8s Gateway.
The AI Gateway APIs will enable users to route traffic to LLM providers while applying advanced policies, such as
prompt enrichment, prompt guard, streaming, and user-invoked function calling. These APIs will be implemented using the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @npolshakova - what is an example of user-invoked function calling surfaced in the API you proposed? You might have it already in the doc, but i couldn't find it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! Added some additional examples in this commit: 5c3a41a

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @npolshakova ! this looks great!

Copy link
Contributor

@EItanya EItanya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal LGTM

@npolshakova npolshakova requested a review from linsun January 28, 2025 21:57
@npolshakova
Copy link
Contributor Author

@linsun / @yuval-k can I get another review?

@npolshakova npolshakova requested a review from lgadban January 31, 2025 16:29
kind: HTTPRoute
name: openai
ai:
backupModels:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see the model is specified on the upstream, but backup models on the route? should they all be in once place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is old config actually- it's not in the initial set of AI APIs, will remove.

@EItanya, this isn't used anymore right?

promptGuard:
request:
webhook:
host: 172.17.0.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use upstream here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have an upstream ref in the webhook instead of specifying the host? I think that should work, right @EItanya @andy-fong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so because the webhook call is initiated from the ext_proc outside of envoy, so it cannot use envoy upstream cluster if that's what you mean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless we front the webhook service with envoy but that will mean we are looping back: enovy -> ext_proc -> envoy -> webhook.

authToken:
secretRef:
name: openai-secret
namespace: ai-test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep secretRefs local? i.e. not support namespaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we just have the namespace because we're using orev1.LocalObjectReference. I think we can scope it down to just the namespace. @EItanya

request:
moderation:
openai:
authToken:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of authToken.SecretRef.{name/namespace} can we splify with just secretName ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think so #10495 (comment)

@npolshakova npolshakova mentioned this pull request Feb 6, 2025
4 tasks

A request going through the kgateway will follow the flow below:

![ai gateway request flow](resourcesi_request_flow.png "ai gateway request flow")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
![ai gateway request flow](resourcesi_request_flow.png "ai gateway request flow")
![ai gateway request flow](ai_request_flow.png "ai gateway request flow")


- **Automatic Secret Integration:** kgateway reads the API key from a Kubernetes secret to handle authentication for requests on the specified path.
- **Inline API Key:** The API key is included directly in the Upstream definition.
- **Passthrough Mode:** The API key is passed through to the LLM provider as a header.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
- **Passthrough Mode:** The API key is passed through to the LLM provider as a header.
- **Passthrough Mode:** The API key in the downstream request header is passed through to the LLM provider.

The AI apis allow you to configure prompt guards to block unwanted requests to the LLM provider and mask sensitive data.
For example, you can use the AI API RoutePolicy to configure a prompt guard that parses requests sent to the LLM
provider to identify a regex pattern match. The AI gateway blocks any requests that contain the that pattern in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
provider to identify a regex pattern match. The AI gateway blocks any requests that contain the that pattern in the
provider to identify a regex pattern match. The AI gateway blocks any requests containing that pattern in the

ai:
promptGuard:
request:
moderation:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to have moderation in this example? You have a separate example and explanation for moderation below, so this seems to be duplicated.

request should receive a streamed response.
* Other providers, such as the Gemini and Vertex AI providers, change the path to determine
streaming, such as the streamGenerateContent segment of the path in the Vertex AI streaming endpoint
https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:streamGenerateContent?key=<key>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:streamGenerateContent?key=<key>.
`https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:streamGenerateContent?key=<key>`.

based on the conversation history. The user then can execute those functions on the application side, and provide results
back to the model.

1. Fetching data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Fetching data:
Here are some common use cases for Function Calling:
1. Fetching data:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I added 2 spaces at the end after : because markdown by default ignore the single newline.


1. Fetching data:
Retrieve data from internal systems before sending final response, like checking the weather or number of vacation days in an HR system.
2. Taking action
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Taking action
2. Taking action:

Retrieve data from internal systems before sending final response, like checking the weather or number of vacation days in an HR system.
2. Taking action
Trigger actions based on the conversation, like scheduling meetings or initiating order returns.
3. Building multi-step workflows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
3. Building multi-step workflows
3. Building multi-step workflows:

Trigger actions based on the conversation, like scheduling meetings or initiating order returns.
3. Building multi-step workflows
Execute multi-step workflows, like data extraction pipelines or content personalization.
4. Interacting with Application UIs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
4. Interacting with Application UIs
4. Interacting with Application UIs:

spec:
ai:
vertexAi:
model: gemini-1.5-flash-001
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention what would happen if the URL or the request body contains the model value that's not the same as the model value in the Upstream?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants