Skip to content

Commit

Permalink
Support fastapi engine in pfserving container (#2968)
Browse files Browse the repository at this point in the history
# Description

- support fastapi engine in local pf serving container, customer can
choose to use fastapi by setting `PROMPTFLOW_SERVING_ENGINE` env
- support controlling gunicorn worker and thread num by env

# All Promptflow Contribution checklist:
- [x] **The pull request does not introduce [breaking changes].**
- [ ] **CHANGELOG is updated for new features, bug fixes or other
significant changes.**
- [x] **I have read the [contribution guidelines](../CONTRIBUTING.md).**
- [ ] **Create an issue and link to the pull request to get dedicated
review from promptflow team. Learn more: [suggested
workflow](../CONTRIBUTING.md#suggested-workflow).**

## General Guidelines and Best Practices
- [ ] Title of the pull request is clear and informative.
- [ ] There are a small number of commits, each of which have an
informative message. This means that previously merged commits do not
appear in the history of the PR. For more information on cleaning up the
commits in your PR, [see this
page](https://github.com/Azure/azure-powershell/blob/master/documentation/development-docs/cleaning-up-commits.md).

### Testing Guidelines
- [ ] Pull request includes test coverage for the included changes.

---------

Co-authored-by: xiaopwan <[email protected]>
  • Loading branch information
wxpjimmy and xiaopwan authored Apr 24, 2024
1 parent 1232bef commit c1d9141
Show file tree
Hide file tree
Showing 9 changed files with 51 additions and 10 deletions.
18 changes: 16 additions & 2 deletions docs/how-to-guides/deploy-a-flow/deploy-using-docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,11 +87,25 @@ You'll need to set up the environment variables in the container to make the con
### Run with `docker run`

You can run the docker image directly set via below commands:
#### Run with `flask` serving engine
You can run the docker image directly set via below commands, this will by default use `flask` serving engine:
```bash
# The started service will listen on port 8080.You can map the port to any port on the host machine as you want.
docker run -p 8080:8080 -e OPEN_AI_CONNECTION_API_KEY=<secret-value> web-classification-serve
docker run -p 8080:8080 -e OPEN_AI_CONNECTION_API_KEY=<secret-value> -e PROMPTFLOW_WORKER_NUM=<expect-worker-num> -e PROMPTFLOW_WORKER_THREADS=<expect-thread-num-per-worker> web-classification-serve
```
Note that:
- `PROMPTFLOW_WORKER_NUM`: optional setting, it controls how many workers started in your container, default value is 8.
- `PROMPTFLOW_WORKER_THREADS`: optional setting, it controls how many threads started in one worker, default value is 1. **this setting only works for flask engine**

#### Run with `fastapi` serving engine
Starting from pf 1.10.0, we support new `fastapi` based serving engine, you can choose to use `fastapi` serving engine via below commands:
```bash
# The started service will listen on port 8080.You can map the port to any port on the host machine as you want.
docker run -p 8080:8080 -e OPEN_AI_CONNECTION_API_KEY=<secret-value> -e PROMPTFLOW_SERVING_ENGINE=fastapi -e PROMPTFLOW_WORKER_NUM=<expect-worker-num> web-classification-serve
```
Note that:
- `PROMPTFLOW_WORKER_NUM`: optional setting, it controls how many workers started in your container, default value is 8.
- `PROMPTFLOW_SERVING_ENGINE`: optional setting, it controls which serving engine to use in your container, default value is `flask`, currently only support `flask` and `fastapi`.

### Test the endpoint
After start the service, you can use curl to test it:
Expand Down
1 change: 1 addition & 0 deletions src/promptflow-core/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# promptflow-core package

## v1.10.0 (Upcoming)
- Add fastapi serving engine support.

## v1.9.0 (2024.04.17)

Expand Down
1 change: 1 addition & 0 deletions src/promptflow-devkit/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
- Expose --ui to trigger a chat window, reach [here](https://microsoft.github.io/promptflow/reference/pf-command-reference.html#pf-flow-test) for more details.
- The `pf config set <key=value>` support set the folder where the config is saved by `--path config_folder` parameter,
and the config will take effect when **os.getcwd** is a subdirectory of the specified folder.
- Local serving container support using fastapi engine and tuning worker/thread num via environment variables, reach [here](https://microsoft.github.io/promptflow/how-to-guides/deploy-a-flow/deploy-using-docker.html) for more details.

## v1.9.0 (2024.04.17)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ RUN conda create -n {{env.conda_env_name}} python=3.9.16 pip=23.0.1 -q -y && \
{% endif %}
conda run -n {{env.conda_env_name}} pip install keyrings.alt && \
conda run -n {{env.conda_env_name}} pip install gunicorn==20.1.0 && \
conda run -n {{env.conda_env_name}} pip install 'uvicorn>=0.27.0,<1.0.0' && \
conda run -n {{env.conda_env_name}} pip cache purge && \
conda clean -a -y
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,15 @@ ls /connections
{% for connection_yaml_path in connection_yaml_paths %}
pf connection create --file /{{ connection_yaml_path }}
{% endfor %}
echo "start promptflow serving with worker_num: 8, worker_threads: 1"
WORKER_NUM=${PROMPTFLOW_WORKER_NUM:-"8"}
WORKER_THREADS=${PROMPTFLOW_WORKER_THREADS:-"1"}
SERVING_ENGINE=${PROMPTFLOW_SERVING_ENGINE:-"flask"}
gunicorn_app="promptflow.core._serving.app:create_app(engine='${SERVING_ENGINE}')"
cd /flow
gunicorn -w 8 --threads 1 -b "0.0.0.0:8080" --timeout 300 "promptflow.core._serving.app:create_app()"
if [ "$SERVING_ENGINE" = "flask" ]; then
echo "start promptflow serving with worker_num: ${WORKER_NUM}, worker_threads: ${WORKER_THREADS}, app: ${gunicorn_app}"
gunicorn -w ${WORKER_NUM} --threads ${WORKER_THREADS} -b "0.0.0.0:8080" --timeout 300 ${gunicorn_app}
else
echo "start promptflow serving with worker_num: ${WORKER_NUM}, app: ${gunicorn_app}"
gunicorn --worker-class uvicorn.workers.UvicornWorker -w ${WORKER_NUM} -b "0.0.0.0:8080" --timeout 300 ${gunicorn_app}
fi
2 changes: 2 additions & 0 deletions src/promptflow/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
## v1.10.0 (Upcoming)
### Features Added
- [promptflow-devkit]: Expose --ui to trigger a chat window, reach [here](https://microsoft.github.io/promptflow/reference/pf-command-reference.html#pf-flow-test) for more details.
- [promptflow-devkit]: Local serving container support using fastapi engine and tuning worker/thread num via environment variables, reach [here](https://microsoft.github.io/promptflow/how-to-guides/deploy-a-flow/deploy-using-docker.html) for more details.
- [promptflow-core]: Add fastapi serving engine support.

## v1.9.0 (2024.04.17)

Expand Down
10 changes: 7 additions & 3 deletions src/promptflow/tests/test_configs/flows/export/linux/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,23 @@ FROM docker.io/continuumio/miniconda3:latest

WORKDIR /

COPY ./flow /flow
COPY ./flow/requirements_txt /flow/requirements_txt

# gcc is for build psutil in MacOS
RUN apt-get update && apt-get install -y runit gcc

# create conda environment
RUN conda create -n promptflow-serve python=3.9.16 pip=23.0.1 -q -y && \
conda run -n promptflow-serve \
pip install -r /flow/requirements_txt && \
conda run -n promptflow-serve pip install keyrings.alt && \
conda run -n promptflow-serve pip install gunicorn==20.1.0 && \
conda run -n promptflow-serve pip install 'uvicorn>=0.27.0,<1.0.0' && \
conda run -n promptflow-serve pip cache purge && \
conda clean -a -y

COPY ./flow /flow

RUN apt-get update && apt-get install -y runit

EXPOSE 8080

Expand All @@ -28,4 +32,4 @@ COPY ./runit /var/runit
RUN chmod -R +x /var/runit

COPY ./start.sh /
CMD ["bash", "./start.sh"]
CMD ["bash", "./start.sh"]
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ while pgrep gunicorn >/dev/null; do
sleep 1
done

echo "$(date -uIns) - Stopped all Gunicorn processes"
echo "$(date -uIns) - Stopped all Gunicorn processes"
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,15 @@ export PATH="$CONDA_ENV_PATH/bin:$PATH"
ls
ls /connections
pf connection create --file /connections/custom_connection.yaml
echo "start promptflow serving with worker_num: 8, worker_threads: 1"
WORKER_NUM=${PROMPTFLOW_WORKER_NUM:-"8"}
WORKER_THREADS=${PROMPTFLOW_WORKER_THREADS:-"1"}
SERVING_ENGINE=${PROMPTFLOW_SERVING_ENGINE:-"flask"}
gunicorn_app="promptflow.core._serving.app:create_app(engine='${SERVING_ENGINE}')"
cd /flow
gunicorn -w 8 --threads 1 -b "0.0.0.0:8080" --timeout 300 "promptflow.core._serving.app:create_app()"
if [ "$SERVING_ENGINE" = "flask" ]; then
echo "start promptflow serving with worker_num: ${WORKER_NUM}, worker_threads: ${WORKER_THREADS}, app: ${gunicorn_app}"
gunicorn -w ${WORKER_NUM} --threads ${WORKER_THREADS} -b "0.0.0.0:8080" --timeout 300 ${gunicorn_app}
else
echo "start promptflow serving with worker_num: ${WORKER_NUM}, app: ${gunicorn_app}"
gunicorn --worker-class uvicorn.workers.UvicornWorker -w ${WORKER_NUM} -b "0.0.0.0:8080" --timeout 300 ${gunicorn_app}
fi

0 comments on commit c1d9141

Please sign in to comment.