Skip to content

Commit a2ba1c7

Browse files
authored
Update quickstart llm docker in serve/readme; added ts.llm_launcher example (#3300)
* Update quickstart llm docker readme; added ts.llm_launcher example * fix wording
1 parent db1a003 commit a2ba1c7

File tree

1 file changed

+12
-1
lines changed

1 file changed

+12
-1
lines changed

README.md

+12-1
Original file line numberDiff line numberDiff line change
@@ -62,13 +62,24 @@ Refer to [torchserve docker](docker/README.md) for details.
6262

6363
### 🤖 Quick Start LLM Deployment
6464

65+
```bash
66+
# Make sure to install torchserve with pip or conda as described above and login with `huggingface-cli login`
67+
python -m ts.llm_launcher --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token_auth
68+
69+
# Try it out
70+
curl -X POST -d '{"model":"meta-llama/Meta-Llama-3-8B-Instruct", "prompt":"Hello, my name is", "max_tokens": 200}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model/1.0/v1/completions"
71+
```
72+
73+
### 🚢 Quick Start LLM Deployment with Docker
74+
6575
```bash
6676
#export token=<HUGGINGFACE_HUB_TOKEN>
6777
docker build --pull . -f docker/Dockerfile.llm -t ts/llm
6878

6979
docker run --rm -ti --shm-size 10g --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/llm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token_auth
7080

71-
curl -X POST -d '{"prompt":"Hello, my name is", "max_new_tokens": 50}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model"
81+
# Try it out
82+
curl -X POST -d '{"model":"meta-llama/Meta-Llama-3-8B-Instruct", "prompt":"Hello, my name is", "max_tokens": 200}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model/1.0/v1/completions"
7283
```
7384

7485
Refer to [LLM deployment](docs/llm_deployment.md) for details and other methods.

0 commit comments

Comments
 (0)