diff --git a/docs/python.md b/docs/python.md index 5957a99de6..8060c2d250 100644 --- a/docs/python.md +++ b/docs/python.md @@ -220,9 +220,9 @@ class Predictor(BasePredictor): ### Streaming output -Cog models can stream output as the `predict()` method is running. For example, a language model can output tokens as they're being generated and an image generation model can output a images they are being generated. +Cog models can stream output as the `predict()` method is running. For example, a language model can output tokens as they're being generated and an image generation model can output images as they are being generated. -To support streaming output in your Cog model, add `from typing import Iterator` to your predict.py file. The `typing` package is a part of Python's standard library so it doesn't need to be installed. Then add a return type annotation to the `predict()` method in the form `-> Iterator[]` where `` can be one of `str`, `int`, `float`, `bool`, `cog.File`, or `cog.Path`. +To support streaming output in your Cog model, add `from typing import Iterator` to your predict.py file. The `typing` package is a part of Python's standard library so it doesn't need to be installed. Then add a return type annotation to the `predict()` method in the form `-> Iterator[]` where `` can be one of `str`, `int`, `float`, `bool`, or `cog.Path`. ```py from cog import BasePredictor, Path diff --git a/docs/yaml.md b/docs/yaml.md index f33c913f90..c8c98f773d 100644 --- a/docs/yaml.md +++ b/docs/yaml.md @@ -154,6 +154,13 @@ This stanza describes the concurrency capabilities of the model. It has one opti The maximum number of concurrent predictions the model can process. If this is set, the model must specify an [async `predict()` method](python.md#async-predictors-and-concurrency). +For example: + +```yaml +concurrency: + max: 10 +``` + ## `image` The name given to built Docker images. If you want to push to a registry, this should also include the registry name.