File tree 3 files changed +35
-40
lines changed
3 files changed +35
-40
lines changed Original file line number Diff line number Diff line change 38
38
```
39
39
Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.
40
40
41
-
42
41
1 . ** Deploy Gateway**
43
42
44
43
``` bash
52
51
1 . ** Deploy Ext-Proc**
53
52
54
53
``` bash
55
- kubectl apply -f ./manifests/gateway/ext_proc.yaml
56
- kubectl apply -f ./manifests/gateway/patch_policy.yaml
54
+ kubectl apply -f ./manifests/ext_proc.yaml
57
55
```
58
- > ** _ NOTE:_ ** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
59
-
60
- 1 . ** OPTIONALLY** : Apply Traffic Policy
61
56
62
- For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
57
+ 1 . ** Deploy Envoy Gateway Custom Policies **
63
58
64
59
``` bash
65
- kubectl apply -f ./manifests/gateway/traffic_policy.yaml
60
+ kubectl apply -f ./manifests/extension_policy.yaml
61
+ kubectl apply -f ./manifests/patch_policy.yaml
66
62
```
67
63
68
64
1 . ** Try it out**
Original file line number Diff line number Diff line change
1
+ apiVersion : gateway.envoyproxy.io/v1alpha1
2
+ kind : EnvoyExtensionPolicy
3
+ metadata :
4
+ name : ext-proc-policy
5
+ namespace : default
6
+ spec :
7
+ extProc :
8
+ - backendRefs :
9
+ - group : " "
10
+ kind : Service
11
+ name : inference-gateway-ext-proc
12
+ port : 9002
13
+ processingMode :
14
+ request :
15
+ body : Buffered
16
+ response :
17
+ # The timeouts are likely not needed here. We can experiment with removing/tuning them slowly.
18
+ # The connection limits are more important and will cause the opaque: ext_proc_gRPC_error_14 error in Envoy GW if not configured correctly.
19
+ messageTimeout : 1000s
20
+ backendSettings :
21
+ circuitBreaker :
22
+ maxConnections : 40000
23
+ maxPendingRequests : 40000
24
+ maxParallelRequests : 40000
25
+ timeout :
26
+ tcp :
27
+ connectTimeout : 24h
28
+ targetRef :
29
+ group : gateway.networking.k8s.io
30
+ kind : HTTPRoute
31
+ name : llm-route
Original file line number Diff line number Diff line change @@ -103,35 +103,3 @@ spec:
103
103
port : 9002
104
104
targetPort : 9002
105
105
type : ClusterIP
106
- ---
107
- apiVersion : gateway.envoyproxy.io/v1alpha1
108
- kind : EnvoyExtensionPolicy
109
- metadata :
110
- name : ext-proc-policy
111
- namespace : default
112
- spec :
113
- extProc :
114
- - backendRefs :
115
- - group : " "
116
- kind : Service
117
- name : inference-gateway-ext-proc
118
- port : 9002
119
- processingMode :
120
- request :
121
- body : Buffered
122
- response :
123
- # The timeouts are likely not needed here. We can experiment with removing/tuning them slowly.
124
- # The connection limits are more important and will cause the opaque: ext_proc_gRPC_error_14 error in Envoy GW if not configured correctly.
125
- messageTimeout : 1000s
126
- backendSettings :
127
- circuitBreaker :
128
- maxConnections : 40000
129
- maxPendingRequests : 40000
130
- maxParallelRequests : 40000
131
- timeout :
132
- tcp :
133
- connectTimeout : 24h
134
- targetRef :
135
- group : gateway.networking.k8s.io
136
- kind : HTTPRoute
137
- name : llm-route
You can’t perform that action at this time.
0 commit comments