Skip to content

Commit 786f48f

Browse files
authored
Document generate HTTP endpoint (#6412)
* Document generate HTTP endpoint * Address comment * Fix up * format * Address comment
1 parent 85487a1 commit 786f48f

File tree

1 file changed

+178
-0
lines changed

1 file changed

+178
-0
lines changed

docs/protocol/extension_generate.md

+178
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
<!--
2+
# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
#
4+
# Redistribution and use in source and binary forms, with or without
5+
# modification, are permitted provided that the following conditions
6+
# are met:
7+
# * Redistributions of source code must retain the above copyright
8+
# notice, this list of conditions and the following disclaimer.
9+
# * Redistributions in binary form must reproduce the above copyright
10+
# notice, this list of conditions and the following disclaimer in the
11+
# documentation and/or other materials provided with the distribution.
12+
# * Neither the name of NVIDIA CORPORATION nor the names of its
13+
# contributors may be used to endorse or promote products derived
14+
# from this software without specific prior written permission.
15+
#
16+
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
17+
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19+
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
20+
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
21+
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
22+
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
23+
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25+
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26+
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27+
-->
28+
29+
# Generate Extension
30+
31+
> [!NOTE]
32+
> The Generate Extension is *provisional* and likely to change in future versions.
33+
34+
This document describes Triton's generate extension. The generate
35+
extension provides a simple text-oriented endpoint schema for interacting with
36+
large language models (LLMs). The generate endpoint is specific to HTTP/REST
37+
frontend.
38+
39+
## HTTP/REST
40+
41+
In all JSON schemas shown in this document, `$number`, `$string`, `$boolean`,
42+
`$object` and `$array` refer to the fundamental JSON types. #optional
43+
indicates an optional JSON field.
44+
45+
Triton exposes the generate endpoint at the following URLs. The client may use
46+
HTTP POST request to different URLs for different response behavior, the
47+
endpoint will return the generate results on success or an error in the case of
48+
failure.
49+
50+
```
51+
POST v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/generate
52+
53+
POST v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/generate_stream
54+
```
55+
56+
### generate v.s. generate_stream
57+
58+
Both URLs expect the same request JSON object, and generate the same response
59+
JSON object. However, `generate` returns exactly 1 response JSON object, while
60+
`generate_stream` may return multiple responses based on the inference
61+
results. `generate_stream` returns the responses as
62+
[Server-Sent Events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events)
63+
(SSE), where each response will be a "data" chunk in the HTTP response body.
64+
Also, note that an error may be returned during inference, whereas the HTTP
65+
response code has been set in the first response of the SSE, which can result in
66+
receiving an [error object](#generate-response-json-error-object) while status
67+
code shows success (200). Therefore the user must always check whether an error
68+
object is received when generating responses through `generate_stream`.
69+
70+
### Generate Request JSON Object
71+
72+
The generate request object, identified as *$generate_request*, is
73+
required in the HTTP body of the POST request. The model name and
74+
(optionally) version must be available in the URL. If a version is not
75+
provided, the server may choose a version based on its own policies or
76+
return an error.
77+
78+
$generate_request =
79+
{
80+
"text_input" : $string,
81+
"parameters" : $parameters #optional
82+
}
83+
84+
* "text_input" : The text input that the model should generate output from.
85+
* "parameters" : An optional object containing zero or more parameters for this
86+
generate request expressed as key/value pairs. See
87+
[Parameters](#parameters) for more information.
88+
89+
> [!NOTE]
90+
> Any additional properties in the request object are passed either as
91+
> parameters or tensors based on model specification.
92+
93+
#### Parameters
94+
95+
The *$parameters* JSON describes zero or more “name”/”value” pairs,
96+
where the “name” is the name of the parameter and the “value” is a
97+
$string, $number, or $boolean.
98+
99+
$parameters =
100+
{
101+
$parameter, ...
102+
}
103+
104+
$parameter = $string : $string | $number | $boolean
105+
106+
Parameters are model-specific. The user should check with the model
107+
specification to set the parameters.
108+
109+
#### Example Request
110+
111+
Below is an example to send generate request with additional model parameters `stream` and `temperature`.
112+
113+
```
114+
$ curl -X POST localhost:8000/v2/models/mymodel/generate -d '{"text_input": "client input", "parameters": {"stream": false, "temperature": 0}}'
115+
116+
POST /v2/models/mymodel/generate HTTP/1.1
117+
Host: localhost:8000
118+
Content-Type: application/json
119+
Content-Length: <xx>
120+
{
121+
"text_input": "client input",
122+
"parameters" :
123+
{
124+
"stream": false,
125+
"temperature": 0
126+
}
127+
}
128+
```
129+
130+
### Generate Response JSON Object
131+
132+
A successful generate request is indicated by a 200 HTTP status code.
133+
The generate response object, identified as *$generate_response*, is returned in
134+
the HTTP body.
135+
136+
$generate_response =
137+
{
138+
"model_name" : $string,
139+
"model_version" : $string,
140+
"text_output" : $string
141+
}
142+
143+
* "model_name" : The name of the model used for inference.
144+
* "model_version" : The specific model version used for inference.
145+
* "text_output" : The output of the inference.
146+
147+
#### Example Response
148+
149+
```
150+
200
151+
{
152+
"model_name" : "mymodel",
153+
"model_version" : "1",
154+
"text_output" : "model output"
155+
}
156+
```
157+
158+
### Generate Response JSON Error Object
159+
160+
A failed generate request must be indicated by an HTTP error status
161+
(typically 400). The HTTP body must contain the
162+
*$generate_error_response* object.
163+
164+
$generate_error_response =
165+
{
166+
"error": <error message string>
167+
}
168+
169+
* “error” : The descriptive message for the error.
170+
171+
#### Example Error
172+
173+
```
174+
400
175+
{
176+
"error" : "error message"
177+
}
178+
```

0 commit comments

Comments
 (0)