Update docker ENTRYPOINT to ensure proper argument handling (dottxt-ai#962)

shashankmangla · web-flow · commit 1bdcaa5f1a67 · 2024-06-13T08:54:27.000+02:00
## Summary

This PR updates the `ENTRYPOINT` instruction in the Dockerfile to ensure
that additional arguments passed to the container via `docker run` are
correctly appended to the entrypoint command.

### Before the change:

Parameter `model` is not passed to the entrypoint command and the
default model `facebook/opt-125m` is loaded instead.

```bash
&gt; sudo docker run --runtime=nvidia --gpus all -p 8000:8000 my-outlines-image --model="microsoft/phi-2"

/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
INFO 06-12 14:45:46 llm_engine.py:161] Initializing an LLM engine (v0.5.0) with config: model='facebook/opt-125m', speculative_config=None, tokenizer='facebook/opt-125m', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=facebook/opt-125m)
```

### After the change:

Parameter `model` is correctly passed to the entrypoint command

```bash
&gt; sudo docker run --runtime=nvidia --gpus all -p 8000:8000 my-outlines-image --model="microsoft/phi-2"

/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
INFO 06-12 14:59:17 llm_engine.py:161] Initializing an LLM engine (v0.5.0) with config: model='microsoft/phi-2', speculative_config=None, tokenizer='microsoft/phi-2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=microsoft/phi-2)
```
diff --git a/Dockerfile b/Dockerfile
@@ -14,4 +14,4 @@ RUN --mount=source=.git,target=.git,type=bind \
     pip install --no-cache-dir .[serve]
 
 # https://outlines-dev.github.io/outlines/reference/vllm/
-ENTRYPOINT python3 -m outlines.serve.serve
+ENTRYPOINT ["python3", "-m", "outlines.serve.serve"]