Adds EP-10494: AI APIs enhancement proposal #10495

npolshakova · 2025-01-23T18:14:17Z

Description

Adds Enhancement Proposal 10494 that proposes adding AI Gateway APIs support.

Supports #10494

Initial APIs can be found in this draft PR: #10493

linsun · 2025-01-24T16:02:51Z

docs/content/enhancements/10494.md

+
+This EP proposes adding support for AI Gateway APIs to the K8s Gateway.
+The AI Gateway APIs will enable users to route traffic to LLM providers while applying advanced policies, such as 
+prompt enrichment, prompt guard, streaming, and user-invoked function calling. These APIs will be implemented using the


Hi @npolshakova - what is an example of user-invoked function calling surfaced in the API you proposed? You might have it already in the doc, but i couldn't find it.

Good call! Added some additional examples in this commit: 5c3a41a

Thank you @npolshakova ! this looks great!

EItanya

This proposal LGTM

docs/content/enhancements/10494.md

npolshakova · 2025-01-28T21:57:29Z

@linsun / @yuval-k can I get another review?

yuval-k · 2025-02-03T11:28:19Z

docs/content/enhancements/10494.md

+    kind: HTTPRoute
+    name: openai
+  ai:
+    backupModels:


i see the model is specified on the upstream, but backup models on the route? should they all be in once place?

I think this is old config actually- it's not in the initial set of AI APIs, will remove.

@EItanya, this isn't used anymore right?

yuval-k · 2025-02-03T11:29:13Z

docs/content/enhancements/10494.md

+      promptGuard:
+        request:
+          webhook:
+            host: 172.17.0.1


should we use upstream here?

Have an upstream ref in the webhook instead of specifying the host? I think that should work, right @EItanya @andy-fong?

I don't think so because the webhook call is initiated from the ext_proc outside of envoy, so it cannot use envoy upstream cluster if that's what you mean.

unless we front the webhook service with envoy but that will mean we are looping back: enovy -> ext_proc -> envoy -> webhook.

yuval-k · 2025-02-03T11:29:40Z

docs/content/enhancements/10494.md

+              authToken:
+                secretRef:
+                  name: openai-secret
+                  namespace: ai-test


can we keep secretRefs local? i.e. not support namespaces?

I think we just have the namespace because we're using orev1.LocalObjectReference. I think we can scope it down to just the namespace. @EItanya

yuval-k · 2025-02-03T11:31:20Z

docs/content/enhancements/10494.md

+        request:
+          moderation:
+            openai:
+              authToken:


instead of authToken.SecretRef.{name/namespace} can we splify with just secretName ?

Yeah I think so #10495 (comment)

…t-proposal

andy-fong · 2025-02-11T23:24:11Z

design/10494.md

+
+A request going through the kgateway will follow the flow below:
+
+![ai gateway request flow](resourcesi_request_flow.png "ai gateway request flow")


Suggested change

![ai gateway request flow](resourcesi_request_flow.png "ai gateway request flow")

![ai gateway request flow](ai_request_flow.png "ai gateway request flow")

andy-fong · 2025-02-11T23:28:32Z

design/10494.md

+
+- **Automatic Secret Integration:** kgateway reads the API key from a Kubernetes secret to handle authentication for requests on the specified path.
+- **Inline API Key:** The API key is included directly in the Upstream definition.
+- **Passthrough Mode:** The API key is passed through to the LLM provider as a header.


nit:

Suggested change

- **Passthrough Mode:** The API key is passed through to the LLM provider as a header.

- **Passthrough Mode:** The API key in the downstream request header is passed through to the LLM provider.

andy-fong · 2025-02-11T23:40:55Z

design/10494.md

+The AI apis allow you to configure prompt guards to block unwanted requests to the LLM provider and mask sensitive data.
+
+For example, you can use the AI API RoutePolicy to configure a prompt guard that parses requests sent to the LLM 
+provider to identify a regex pattern match. The AI gateway blocks any requests that contain the that pattern in the 


Suggested change

provider to identify a regex pattern match. The AI gateway blocks any requests that contain the that pattern in the

provider to identify a regex pattern match. The AI gateway blocks any requests containing that pattern in the

andy-fong · 2025-02-11T23:46:09Z

design/10494.md

+    ai:
+      promptGuard:
+        request:
+          moderation:


Do you mean to have moderation in this example? You have a separate example and explanation for moderation below, so this seems to be duplicated.

andy-fong · 2025-02-11T23:50:24Z

design/10494.md

+request should receive a streamed response. 
+* Other providers, such as the Gemini and Vertex AI providers, change the path to determine 
+streaming, such as the streamGenerateContent segment of the path in the Vertex AI streaming endpoint 
+https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:streamGenerateContent?key=<key>. 


Suggested change

https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:streamGenerateContent?key=<key>.

`https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:streamGenerateContent?key=<key>`.

andy-fong · 2025-02-12T00:05:12Z

design/10494.md

+based on the conversation history. The user then can execute those functions on the application side, and provide results
+back to the model.
+
+1. Fetching data:


Suggested change

1. Fetching data:

Here are some common use cases for Function Calling:

1. Fetching data:

Note that I added 2 spaces at the end after : because markdown by default ignore the single newline.

andy-fong · 2025-02-12T00:06:36Z

design/10494.md

+
+1. Fetching data:
+   Retrieve data from internal systems before sending final response, like checking the weather or number of vacation days in an HR system.
+2. Taking action


Suggested change

2. Taking action

2. Taking action:

andy-fong · 2025-02-12T00:06:48Z

design/10494.md

+   Retrieve data from internal systems before sending final response, like checking the weather or number of vacation days in an HR system.
+2. Taking action
+   Trigger actions based on the conversation, like scheduling meetings or initiating order returns.
+3. Building multi-step workflows


Suggested change

3. Building multi-step workflows

3. Building multi-step workflows:

andy-fong · 2025-02-12T00:06:59Z

design/10494.md

+   Trigger actions based on the conversation, like scheduling meetings or initiating order returns.
+3. Building multi-step workflows
+   Execute multi-step workflows, like data extraction pipelines or content personalization.
+4. Interacting with Application UIs


Suggested change

4. Interacting with Application UIs

4. Interacting with Application UIs:

andy-fong · 2025-02-12T00:10:27Z

design/10494.md

+spec:
+  ai:
+    vertexAi:
+      model: gemini-1.5-flash-001


Should we mention what would happen if the URL or the request body contains the model value that's not the same as the model value in the Upstream?

add ai api enhancement proposal

b20ae7c

npolshakova requested review from yuval-k, danehans and EItanya January 23, 2025 19:36

npolshakova added 2 commits January 23, 2025 14:39

fix links

7a0a7f2

reword

0dab5f2

linsun reviewed Jan 24, 2025

View reviewed changes

add additional sections and examples

5c3a41a

EItanya reviewed Jan 27, 2025

View reviewed changes

add deepseek custom hosting example, highlight egress gw usecase

464ca49

lgadban reviewed Jan 28, 2025

View reviewed changes

docs/content/enhancements/10494.md Outdated Show resolved Hide resolved

fix typos

ad622e7

npolshakova requested a review from linsun January 28, 2025 21:57

npolshakova requested a review from lgadban January 31, 2025 16:29

yuval-k reviewed Feb 3, 2025

View reviewed changes

feedback

c9c8105

npolshakova mentioned this pull request Feb 6, 2025

[WIP] Add support for AI APIs #10493

Draft

4 tasks

npolshakova added 2 commits February 11, 2025 12:29

Merge remote-tracking branch 'origin/main' into add-ai-api-enhancemen…

fa3ec93

…t-proposal

move EP

e5e54db

npolshakova requested review from EItanya and yuval-k February 11, 2025 17:34

Merge branch 'main' into add-ai-api-enhancement-proposal

3ab7979

andy-fong suggested changes Feb 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds EP-10494: AI APIs enhancement proposal #10495

Adds EP-10494: AI APIs enhancement proposal #10495

npolshakova commented Jan 23, 2025

linsun Jan 24, 2025

npolshakova Jan 27, 2025

linsun Jan 28, 2025

EItanya left a comment

npolshakova commented Jan 28, 2025

yuval-k Feb 3, 2025

npolshakova Feb 4, 2025

yuval-k Feb 3, 2025

npolshakova Feb 4, 2025

andy-fong Feb 11, 2025

andy-fong Feb 11, 2025

yuval-k Feb 3, 2025

npolshakova Feb 4, 2025

yuval-k Feb 3, 2025

npolshakova Feb 4, 2025

andy-fong Feb 11, 2025

andy-fong Feb 11, 2025

andy-fong Feb 11, 2025

andy-fong Feb 11, 2025

andy-fong Feb 11, 2025

andy-fong Feb 12, 2025

andy-fong Feb 12, 2025

andy-fong Feb 12, 2025

andy-fong Feb 12, 2025

andy-fong Feb 12, 2025

andy-fong Feb 12, 2025


		A request going through the kgateway will follow the flow below:

		![ai gateway request flow](resourcesi_request_flow.png "ai gateway request flow")

	![ai gateway request flow](resourcesi_request_flow.png "ai gateway request flow")
	![ai gateway request flow](ai_request_flow.png "ai gateway request flow")

	- Passthrough Mode: The API key is passed through to the LLM provider as a header.
	- Passthrough Mode: The API key in the downstream request header is passed through to the LLM provider.

	provider to identify a regex pattern match. The AI gateway blocks any requests that contain the that pattern in the
	provider to identify a regex pattern match. The AI gateway blocks any requests containing that pattern in the

	https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:streamGenerateContent?key=<key>.
	`https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:streamGenerateContent?key=<key>`.

	1. Fetching data:
	Here are some common use cases for Function Calling:
	1. Fetching data:

	3. Building multi-step workflows
	3. Building multi-step workflows:

	4. Interacting with Application UIs
	4. Interacting with Application UIs:

Adds EP-10494: AI APIs enhancement proposal #10495

Are you sure you want to change the base?

Adds EP-10494: AI APIs enhancement proposal #10495

Conversation

npolshakova commented Jan 23, 2025

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EItanya left a comment

Choose a reason for hiding this comment

npolshakova commented Jan 28, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment