Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 1bdcaa5

Browse files
authoredJun 13, 2024
Update docker ENTRYPOINT to ensure proper argument handling (dottxt-ai#962)
## Summary This PR updates the `ENTRYPOINT` instruction in the Dockerfile to ensure that additional arguments passed to the container via `docker run` are correctly appended to the entrypoint command. ### Before the change: Parameter `model` is not passed to the entrypoint command and the default model `facebook/opt-125m` is loaded instead. ```bash > sudo docker run --runtime=nvidia --gpus all -p 8000:8000 my-outlines-image --model="microsoft/phi-2" /usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( INFO 06-12 14:45:46 llm_engine.py:161] Initializing an LLM engine (v0.5.0) with config: model='facebook/opt-125m', speculative_config=None, tokenizer='facebook/opt-125m', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=facebook/opt-125m) ``` ### After the change: Parameter `model` is correctly passed to the entrypoint command ```bash > sudo docker run --runtime=nvidia --gpus all -p 8000:8000 my-outlines-image --model="microsoft/phi-2" /usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. warnings.warn( INFO 06-12 14:59:17 llm_engine.py:161] Initializing an LLM engine (v0.5.0) with config: model='microsoft/phi-2', speculative_config=None, tokenizer='microsoft/phi-2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=microsoft/phi-2) ```
1 parent a987159 commit 1bdcaa5

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed
 

‎Dockerfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,4 @@ RUN --mount=source=.git,target=.git,type=bind \
1414
pip install --no-cache-dir .[serve]
1515

1616
# https://outlines-dev.github.io/outlines/reference/vllm/
17-
ENTRYPOINT python3 -m outlines.serve.serve
17+
ENTRYPOINT ["python3", "-m", "outlines.serve.serve"]

0 commit comments

Comments
 (0)
Please sign in to comment.