Support fastapi engine in pfserving container (#2968)

# Description - support fastapi engine in local pf serving container, customer can choose to use fastapi by setting `PROMPTFLOW_SERVING_ENGINE` env - support controlling gunicorn worker and thread num by env # All Promptflow Contribution checklist: - [x] **The pull request does not introduce [breaking changes].** - [ ] **CHANGELOG is updated for new features, bug fixes or other significant changes.** - [x] **I have read the [contribution guidelines](../CONTRIBUTING.md).** - [ ] **Create an issue and link to the pull request to get dedicated review from promptflow team. Learn more: [suggested workflow](../CONTRIBUTING.md#suggested-workflow).** ## General Guidelines and Best Practices - [ ] Title of the pull request is clear and informative. - [ ] There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, [see this page](https://github.com/Azure/azure-powershell/blob/master/documentation/development-docs/cleaning-up-commits.md). ### Testing Guidelines - [ ] Pull request includes test coverage for the included changes. --------- Co-authored-by: xiaopwan <[email protected]>
microsoft · Apr 24, 2024 · c1d9141 · c1d9141
1 parent 1232bef
commit c1d9141
Show file tree

Hide file tree

Showing 9 changed files with 51 additions and 10 deletions.
diff --git a/docs/how-to-guides/deploy-a-flow/deploy-using-docker.md b/docs/how-to-guides/deploy-a-flow/deploy-using-docker.md
@@ -87,11 +87,25 @@ You'll need to set up the environment variables in the container to make the con
 
 ### Run with `docker run`
 
-You can run the docker image directly set via below commands:
+#### Run with `flask` serving engine
+You can run the docker image directly set via below commands, this will by default use `flask` serving engine:
 ```bash
 # The started service will listen on port 8080.You can map the port to any port on the host machine as you want.
-docker run -p 8080:8080 -e OPEN_AI_CONNECTION_API_KEY=<secret-value> web-classification-serve
+docker run -p 8080:8080 -e OPEN_AI_CONNECTION_API_KEY=<secret-value> -e PROMPTFLOW_WORKER_NUM=<expect-worker-num> -e PROMPTFLOW_WORKER_THREADS=<expect-thread-num-per-worker> web-classification-serve
 ```
+Note that:
+- `PROMPTFLOW_WORKER_NUM`: optional setting, it controls how many workers started in your container, default value is 8.
+- `PROMPTFLOW_WORKER_THREADS`: optional setting, it controls how many threads started in one worker, default value is 1. **this setting only works for flask engine**
+
+#### Run with `fastapi` serving engine
+Starting from pf 1.10.0, we support new `fastapi` based serving engine, you can choose to use `fastapi` serving engine via below commands:
+```bash
+# The started service will listen on port 8080.You can map the port to any port on the host machine as you want.
+docker run -p 8080:8080 -e OPEN_AI_CONNECTION_API_KEY=<secret-value> -e PROMPTFLOW_SERVING_ENGINE=fastapi -e PROMPTFLOW_WORKER_NUM=<expect-worker-num> web-classification-serve
+```
+Note that:
+- `PROMPTFLOW_WORKER_NUM`: optional setting, it controls how many workers started in your container, default value is 8.
+- `PROMPTFLOW_SERVING_ENGINE`: optional setting, it controls which serving engine to use in your container, default value is `flask`, currently only support `flask` and `fastapi`.
 
 ### Test the endpoint
 After start the service, you can use curl to test it:

diff --git a/src/promptflow-core/CHANGELOG.md b/src/promptflow-core/CHANGELOG.md
@@ -1,6 +1,7 @@
 # promptflow-core package
 
 ## v1.10.0 (Upcoming)
+- Add fastapi serving engine support.
 
 ## v1.9.0 (2024.04.17)
 

diff --git a/src/promptflow-devkit/CHANGELOG.md b/src/promptflow-devkit/CHANGELOG.md
@@ -6,6 +6,7 @@
 - Expose --ui to trigger a chat window, reach [here](https://microsoft.github.io/promptflow/reference/pf-command-reference.html#pf-flow-test) for more details.
 - The `pf config set <key=value>` support set the folder where the config is saved by `--path config_folder` parameter,
   and the config will take effect when **os.getcwd** is a subdirectory of the specified folder.
+- Local serving container support using fastapi engine and tuning worker/thread num via environment variables, reach [here](https://microsoft.github.io/promptflow/how-to-guides/deploy-a-flow/deploy-using-docker.html) for more details.
 
 ## v1.9.0 (2024.04.17)
 

diff --git a/src/promptflow-devkit/promptflow/_sdk/data/docker/Dockerfile.jinja2 b/src/promptflow-devkit/promptflow/_sdk/data/docker/Dockerfile.jinja2
@@ -38,6 +38,7 @@ RUN conda create -n {{env.conda_env_name}} python=3.9.16 pip=23.0.1 -q -y && \
 {% endif %}
     conda run -n {{env.conda_env_name}} pip install keyrings.alt && \
     conda run -n {{env.conda_env_name}} pip install gunicorn==20.1.0 && \
+    conda run -n {{env.conda_env_name}} pip install 'uvicorn>=0.27.0,<1.0.0' && \
     conda run -n {{env.conda_env_name}} pip cache purge && \
     conda clean -a -y
 

diff --git a/src/promptflow-devkit/promptflow/_sdk/data/docker/runit/promptflow-serve/run.jinja2 b/src/promptflow-devkit/promptflow/_sdk/data/docker/runit/promptflow-serve/run.jinja2
@@ -13,6 +13,15 @@ ls /connections
 {% for connection_yaml_path in connection_yaml_paths %}
 pf connection create --file /{{ connection_yaml_path }}
 {% endfor %}
-echo "start promptflow serving with worker_num: 8, worker_threads: 1"
+WORKER_NUM=${PROMPTFLOW_WORKER_NUM:-"8"}
+WORKER_THREADS=${PROMPTFLOW_WORKER_THREADS:-"1"}
+SERVING_ENGINE=${PROMPTFLOW_SERVING_ENGINE:-"flask"}
+gunicorn_app="promptflow.core._serving.app:create_app(engine='${SERVING_ENGINE}')"
 cd /flow
-gunicorn -w 8 --threads 1 -b "0.0.0.0:8080" --timeout 300 "promptflow.core._serving.app:create_app()"
+if [ "$SERVING_ENGINE" = "flask" ]; then
+    echo "start promptflow serving with worker_num: ${WORKER_NUM}, worker_threads: ${WORKER_THREADS}, app: ${gunicorn_app}"
+    gunicorn -w ${WORKER_NUM} --threads ${WORKER_THREADS} -b "0.0.0.0:8080" --timeout 300 ${gunicorn_app}
+else
+    echo "start promptflow serving with worker_num: ${WORKER_NUM}, app: ${gunicorn_app}"
+    gunicorn --worker-class uvicorn.workers.UvicornWorker -w ${WORKER_NUM} -b "0.0.0.0:8080" --timeout 300 ${gunicorn_app}
+fi
diff --git a/src/promptflow/CHANGELOG.md b/src/promptflow/CHANGELOG.md
@@ -3,6 +3,8 @@
 ## v1.10.0 (Upcoming)
 ### Features Added
 - [promptflow-devkit]: Expose --ui to trigger a chat window, reach [here](https://microsoft.github.io/promptflow/reference/pf-command-reference.html#pf-flow-test) for more details.
+- [promptflow-devkit]: Local serving container support using fastapi engine and tuning worker/thread num via environment variables, reach [here](https://microsoft.github.io/promptflow/how-to-guides/deploy-a-flow/deploy-using-docker.html) for more details.
+- [promptflow-core]: Add fastapi serving engine support.
 
 ## v1.9.0 (2024.04.17)
 

diff --git a/src/promptflow/tests/test_configs/flows/export/linux/Dockerfile b/src/promptflow/tests/test_configs/flows/export/linux/Dockerfile
@@ -3,19 +3,23 @@ FROM docker.io/continuumio/miniconda3:latest
 
 WORKDIR /
 
-COPY ./flow /flow
+COPY ./flow/requirements_txt /flow/requirements_txt
+
+# gcc is for build psutil in MacOS
+RUN apt-get update && apt-get install -y runit gcc
 
 # create conda environment
 RUN conda create -n promptflow-serve python=3.9.16 pip=23.0.1 -q -y && \
     conda run -n promptflow-serve \
     pip install -r /flow/requirements_txt && \
     conda run -n promptflow-serve pip install keyrings.alt && \
     conda run -n promptflow-serve pip install gunicorn==20.1.0 && \
+    conda run -n promptflow-serve pip install 'uvicorn>=0.27.0,<1.0.0' && \
     conda run -n promptflow-serve pip cache purge && \
     conda clean -a -y
 
+COPY ./flow /flow
 
-RUN apt-get update && apt-get install -y runit
 
 EXPOSE 8080
 
@@ -28,4 +32,4 @@ COPY ./runit /var/runit
 RUN chmod -R +x /var/runit
 
 COPY ./start.sh /
-CMD ["bash", "./start.sh"]
+CMD ["bash", "./start.sh"]
diff --git a/src/promptflow/tests/test_configs/flows/export/linux/runit/promptflow-serve/finish b/src/promptflow/tests/test_configs/flows/export/linux/runit/promptflow-serve/finish
@@ -10,4 +10,4 @@ while pgrep gunicorn >/dev/null; do
   sleep 1
 done
 
-echo "$(date -uIns) - Stopped all Gunicorn processes"
+echo "$(date -uIns) - Stopped all Gunicorn processes"
diff --git a/src/promptflow/tests/test_configs/flows/export/linux/runit/promptflow-serve/run b/src/promptflow/tests/test_configs/flows/export/linux/runit/promptflow-serve/run
@@ -6,6 +6,15 @@ export PATH="$CONDA_ENV_PATH/bin:$PATH"
 ls
 ls /connections
 pf connection create --file /connections/custom_connection.yaml
-echo "start promptflow serving with worker_num: 8, worker_threads: 1"
+WORKER_NUM=${PROMPTFLOW_WORKER_NUM:-"8"}
+WORKER_THREADS=${PROMPTFLOW_WORKER_THREADS:-"1"}
+SERVING_ENGINE=${PROMPTFLOW_SERVING_ENGINE:-"flask"}
+gunicorn_app="promptflow.core._serving.app:create_app(engine='${SERVING_ENGINE}')"
 cd /flow
-gunicorn -w 8 --threads 1 -b "0.0.0.0:8080" --timeout 300 "promptflow.core._serving.app:create_app()"
+if [ "$SERVING_ENGINE" = "flask" ]; then
+    echo "start promptflow serving with worker_num: ${WORKER_NUM}, worker_threads: ${WORKER_THREADS}, app: ${gunicorn_app}"
+    gunicorn -w ${WORKER_NUM} --threads ${WORKER_THREADS} -b "0.0.0.0:8080" --timeout 300 ${gunicorn_app}
+else
+    echo "start promptflow serving with worker_num: ${WORKER_NUM}, app: ${gunicorn_app}"
+    gunicorn --worker-class uvicorn.workers.UvicornWorker -w ${WORKER_NUM} -b "0.0.0.0:8080" --timeout 300 ${gunicorn_app}
+fi
-Original file line number
+Diff line change
@@ Expand Up / @@ -10,4 +10,4 @@ while pgrep gunicorn >/dev/null; do @@
       sleep 1
     done
-    echo "$(date -uIns) - Stopped all Gunicorn processes"
+    echo "$(date -uIns) - Stopped all Gunicorn processes"