Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk tolerance annotation in streaming completion docs #5190

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -184,15 +184,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Comparing usage returns in the above Non Streaming `model_client.create(messages=messages)` vs streaming `model_client.create_stream(messages=messages)` we see differences.\n",
"The non streaming response by default returns valid prompt and completion token usage counts. \n",
"The streamed response by default returns zero values.\n",
"Comparing usage returns in the above non-streaming `model_client.create(messages=messages)` to streaming `model_client.create_stream(messages=messages)`, we see differences. The non-streaming response by default returns a valid prompt and completion token usage counts. The streamed response by default returns zero values.\n",
"\n",
"as documented in the OPENAI API Reference an additional parameter `stream_options` can be specified to return valid usage counts. see [stream_options](https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options)\n",
"As documented in the OpenAI API Reference, an additional parameter `stream_options` can be specified to return valid usage counts. See [stream_options](https://platform.openai.com/docs/api-reference/chat/create#chat-create-stream_options). Only set this when using streaming, i.e. when using `create_stream`. To enable this, set `extra_create_args={\"stream_options\": {\"include_usage\": True}},` when calling `create_stream`. Depending on which completion client is being used, the maximum empty chunks allowed may need to be adjusted, e.g. `max_consecutive_empty_chunk_tolerance=2`, to account for the trailing empty message containing usage information.\n",
"\n",
"Only set this when you using streaming ie , using `create_stream` \n",
"\n",
"to enable this in `create_stream` set `extra_create_args={\"stream_options\": {\"include_usage\": True}},`\n",
"\n",
"```{note}\n",
"Note whilst other API's like LiteLLM also support this, it is not always guarenteed that it is fully supported or correct.\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1008,6 +1008,8 @@ class OpenAIChatCompletionClient(BaseOpenAIChatCompletionClient, Component[OpenA

client = ChatCompletionClient.load_component(config)

Note: When usage information is requested (see `documentation <https://platform.openai.com/docs/api-reference/chat/streaming#chat/streaming-choices>`_.) with the `create_stream` method, `max_consecutive_empty_chunk_tolerance` should be increased to permit the trailing empty chunk carrying the usage information. E.g. `completion_client.create_stream(... , max_consecutive_empty_chunk_tolerance=2, extra_create_args={"stream_options": {"include_usage": True}})`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, apologize for the confusion in my comment in a different issue. I think this should go to the API doc of create_stream method in the BaseOpenAIChatCompletionClient.


To view the full list of available configuration options, see the :py:class:`OpenAIClientConfigurationConfigModel` class.

"""
Expand Down Expand Up @@ -1117,7 +1119,7 @@ class AzureOpenAIChatCompletionClient(
# api_key="sk-...", # For key-based authentication. `AZURE_OPENAI_API_KEY` environment variable can also be used instead.
)

To load the client that uses identity based aith from a configuration, you can use the `load_component` method:
To load the client that uses identity based auth from a configuration, you can use the `load_component` method:

.. code-block:: python

Expand All @@ -1142,7 +1144,8 @@ class AzureOpenAIChatCompletionClient(

client = ChatCompletionClient.load_component(config)


Note: When usage information is requested (see `documentation <https://platform.openai.com/docs/api-reference/chat/streaming#chat/streaming-choices>`_.) with the `create_stream` method, `max_consecutive_empty_chunk_tolerance` should be increased to permit the trailing empty chunk carrying the usage information. E.g. `completion_client.create_stream(... , max_consecutive_empty_chunk_tolerance=2, extra_create_args={"stream_options": {"include_usage": True}})`.

To view the full list of available configuration options, see the :py:class:`AzureOpenAIClientConfigurationConfigModel` class.


Expand Down