|
| 1 | +<!-- |
| 2 | +# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
| 3 | +# |
| 4 | +# Redistribution and use in source and binary forms, with or without |
| 5 | +# modification, are permitted provided that the following conditions |
| 6 | +# are met: |
| 7 | +# * Redistributions of source code must retain the above copyright |
| 8 | +# notice, this list of conditions and the following disclaimer. |
| 9 | +# * Redistributions in binary form must reproduce the above copyright |
| 10 | +# notice, this list of conditions and the following disclaimer in the |
| 11 | +# documentation and/or other materials provided with the distribution. |
| 12 | +# * Neither the name of NVIDIA CORPORATION nor the names of its |
| 13 | +# contributors may be used to endorse or promote products derived |
| 14 | +# from this software without specific prior written permission. |
| 15 | +# |
| 16 | +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY |
| 17 | +# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
| 18 | +# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR |
| 19 | +# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR |
| 20 | +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, |
| 21 | +# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, |
| 22 | +# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR |
| 23 | +# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY |
| 24 | +# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT |
| 25 | +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE |
| 26 | +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
| 27 | +--> |
| 28 | + |
| 29 | +# Generate Extension |
| 30 | + |
| 31 | +> [!NOTE] |
| 32 | +> The Generate Extension is *provisional* and likely to change in future versions. |
| 33 | +
|
| 34 | +This document describes Triton's generate extension. The generate |
| 35 | +extension provides a simple text-oriented endpoint schema for interacting with |
| 36 | +large language models (LLMs). The generate endpoint is specific to HTTP/REST |
| 37 | +frontend. |
| 38 | + |
| 39 | +## HTTP/REST |
| 40 | + |
| 41 | +In all JSON schemas shown in this document, `$number`, `$string`, `$boolean`, |
| 42 | +`$object` and `$array` refer to the fundamental JSON types. #optional |
| 43 | +indicates an optional JSON field. |
| 44 | + |
| 45 | +Triton exposes the generate endpoint at the following URLs. The client may use |
| 46 | +HTTP POST request to different URLs for different response behavior, the |
| 47 | +endpoint will return the generate results on success or an error in the case of |
| 48 | +failure. |
| 49 | + |
| 50 | +``` |
| 51 | +POST v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/generate |
| 52 | +
|
| 53 | +POST v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/generate_stream |
| 54 | +``` |
| 55 | + |
| 56 | +### generate v.s. generate_stream |
| 57 | + |
| 58 | +Both URLs expect the same request JSON object, and generate the same response |
| 59 | +JSON object. However, `generate` returns exactly 1 response JSON object, while |
| 60 | +`generate_stream` may return multiple responses based on the inference |
| 61 | +results. `generate_stream` returns the responses as |
| 62 | +[Server-Sent Events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events) |
| 63 | +(SSE), where each response will be a "data" chunk in the HTTP response body. |
| 64 | +Also, note that an error may be returned during inference, whereas the HTTP |
| 65 | +response code has been set in the first response of the SSE, which can result in |
| 66 | +receiving an [error object](#generate-response-json-error-object) while status |
| 67 | +code shows success (200). Therefore the user must always check whether an error |
| 68 | +object is received when generating responses through `generate_stream`. |
| 69 | + |
| 70 | +### Generate Request JSON Object |
| 71 | + |
| 72 | +The generate request object, identified as *$generate_request*, is |
| 73 | +required in the HTTP body of the POST request. The model name and |
| 74 | +(optionally) version must be available in the URL. If a version is not |
| 75 | +provided, the server may choose a version based on its own policies or |
| 76 | +return an error. |
| 77 | + |
| 78 | + $generate_request = |
| 79 | + { |
| 80 | + "text_input" : $string, |
| 81 | + "parameters" : $parameters #optional |
| 82 | + } |
| 83 | + |
| 84 | +* "text_input" : The text input that the model should generate output from. |
| 85 | +* "parameters" : An optional object containing zero or more parameters for this |
| 86 | + generate request expressed as key/value pairs. See |
| 87 | + [Parameters](#parameters) for more information. |
| 88 | + |
| 89 | +> [!NOTE] |
| 90 | +> Any additional properties in the request object are passed either as |
| 91 | +> parameters or tensors based on model specification. |
| 92 | +
|
| 93 | +#### Parameters |
| 94 | + |
| 95 | +The *$parameters* JSON describes zero or more “name”/”value” pairs, |
| 96 | +where the “name” is the name of the parameter and the “value” is a |
| 97 | +$string, $number, or $boolean. |
| 98 | + |
| 99 | + $parameters = |
| 100 | + { |
| 101 | + $parameter, ... |
| 102 | + } |
| 103 | + |
| 104 | + $parameter = $string : $string | $number | $boolean |
| 105 | + |
| 106 | +Parameters are model-specific. The user should check with the model |
| 107 | +specification to set the parameters. |
| 108 | + |
| 109 | +#### Example Request |
| 110 | + |
| 111 | +Below is an example to send generate request with additional model parameters `stream` and `temperature`. |
| 112 | + |
| 113 | +``` |
| 114 | +$ curl -X POST localhost:8000/v2/models/mymodel/generate -d '{"text_input": "client input", "parameters": {"stream": false, "temperature": 0}}' |
| 115 | +
|
| 116 | +POST /v2/models/mymodel/generate HTTP/1.1 |
| 117 | +Host: localhost:8000 |
| 118 | +Content-Type: application/json |
| 119 | +Content-Length: <xx> |
| 120 | +{ |
| 121 | + "text_input": "client input", |
| 122 | + "parameters" : |
| 123 | + { |
| 124 | + "stream": false, |
| 125 | + "temperature": 0 |
| 126 | + } |
| 127 | +} |
| 128 | +``` |
| 129 | + |
| 130 | +### Generate Response JSON Object |
| 131 | + |
| 132 | +A successful generate request is indicated by a 200 HTTP status code. |
| 133 | +The generate response object, identified as *$generate_response*, is returned in |
| 134 | +the HTTP body. |
| 135 | + |
| 136 | + $generate_response = |
| 137 | + { |
| 138 | + "model_name" : $string, |
| 139 | + "model_version" : $string, |
| 140 | + "text_output" : $string |
| 141 | + } |
| 142 | + |
| 143 | +* "model_name" : The name of the model used for inference. |
| 144 | +* "model_version" : The specific model version used for inference. |
| 145 | +* "text_output" : The output of the inference. |
| 146 | + |
| 147 | +#### Example Response |
| 148 | + |
| 149 | +``` |
| 150 | +200 |
| 151 | +{ |
| 152 | + "model_name" : "mymodel", |
| 153 | + "model_version" : "1", |
| 154 | + "text_output" : "model output" |
| 155 | +} |
| 156 | +``` |
| 157 | + |
| 158 | +### Generate Response JSON Error Object |
| 159 | + |
| 160 | +A failed generate request must be indicated by an HTTP error status |
| 161 | +(typically 400). The HTTP body must contain the |
| 162 | +*$generate_error_response* object. |
| 163 | + |
| 164 | + $generate_error_response = |
| 165 | + { |
| 166 | + "error": <error message string> |
| 167 | + } |
| 168 | + |
| 169 | +* “error” : The descriptive message for the error. |
| 170 | + |
| 171 | +#### Example Error |
| 172 | + |
| 173 | +``` |
| 174 | +400 |
| 175 | +{ |
| 176 | + "error" : "error message" |
| 177 | +} |
| 178 | +``` |
0 commit comments