🚀 Deployment

Ensure you have the Hugging Face pre-trained LLM directory with tokenizer, model, and config files before deployment. Download LLMs using this Python code:

Download

from transformers import AutoTokenizer, AutoModelForCausalLM

# Model ID
model_id = "airesearch/LLaMa3-8b-WangchanX-sft-Demo"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

# Save tokenizer and model
path = "LLaMa3-8b-WangchanX-sft-Demo"
tokenizer.save_pretrained(path)
model.save_pretrained(path)

Text Generation Inference

Text Generation Inference (TGI) is a toolkit that simplifies the deployment and serving of Large Language Models (LLMs). It offers advanced features such as tensor parallelism, quantization, watermarking, and custom prompt generation, making it easy to deploy and utilize LLMs in various applications. You can find more details.

At the current working directory location, prepare the following:
- The directory containing the pre-trained LLM model from Hugging Face. For example, if you are using the LLaMa3-8b-WangchanX-sft-Demo model, the directory should be named LLaMa3-8b-WangchanX-sft-Demo.

Create a Dockerfile with the following content to build a Docker image:

FROM ghcr.io/huggingface/text-generation-inference:2.0
COPY LLaMa3-8b-WangchanX-sft-Demo /data/LLaMa3-8b-WangchanX-sft-Demo

Build the image using the following command:

docker build -t text-generation-inference -f <Dockerfile> .

Alternatively, you can simply build the image which we already provided in the deployment directory:

docker build -t text-generation-inference -f deployment/TGI/Dockerfile.TextGenerationInference .

Run the image using this command:

docker run --gpus all -p 8888:80 text-generation-inference --model-id /data/LLaMa3-8b-WangchanX-sft-Demo #you can add -d flag to run in background

And then you can make requests like this:

curl 127.0.0.1:8888/generate_stream \
    -X POST \
    -d '{"inputs":"<|user|>ลิเก กับ งิ้ว ต่างกันอย่างไร<|end_of_text|>\n<|assistant|>\n","parameters":{"max_new_tokens":2048}}' \
    -H 'Content-Type: application/json'

Preview:

NOTE

LocalAI

LocalAI is a free, open-source OpenAI alternative. It provides a drop-in REST API compatible with OpenAI's specs for local/on-prem inference with LLMs, image/audio generation across model families on consumer hardware sans GPU. You can find more details.

At the current working directory location, prepare the following:
- The directory containing the pre-trained LLM model from Hugging Face. For example, if you are using the LLaMa3-8b-WangchanX-sft-Demo model, the directory should be named LLaMa3-8b-WangchanX-sft-Demo.
- The model YAML file. This file can be found in the deployment/LocalAI directory. For the LLaMa3-8b-WangchanX-sft-Demo model, the YAML file would be named LLaMa3-8b-WangchanX-sft-Demo.yaml.
Create a Dockerfile with the following content to build a Docker image:

FROM localai/localai:latest-aio-gpu-nvidia-cuda-12
COPY LLaMa3-8b-WangchanX-sft-Demo.yaml /build/models

Build the image using the following command:

docker build -t localai -f <Dockerfile> .

Alternatively, you can simply build the image which we already provided in the deployment directory:

docker build -t localai -f deployment/LocalAi/Dockerfile.LocalAi .

Run the image using this command:

docker run --gpus all -p 8888:8080 localai #you can add -d flag to run in background

And then you can make requests like this:

curl http://localhost:8888/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{ "model": "LLaMa3-8b-WangchanX-sft-Demo", "messages": [{"role": "user", "content": "ลิเก กับ งิ้ว ต่างกันอย่างไร", "temperature": 0.1}] }'

Preview:

Ollama

Ollama is an open-source and user-friendly platform that allows you to run large language models (LLMs) locally on your machine. You can find more details.

At the current working directory location, prepare the following:
- The directory containing the pre-trained LLM model from Hugging Face. For example, if you are using the LLaMa3-8b-WangchanX-sft-Demo model, the directory should be named LLaMa3-8b-WangchanX-sft-Demo.
Create a Dockerfile with the following content to build a Docker image:

FROM ollama/ollama

COPY LLaMa3-8b-WangchanX-sft-Demo /root/LLaMa3-8b-WangchanX-sft-Demo

RUN apt update && apt-get install python3 python3-pip python3-venv git -y

# Clone the ollama repository first
RUN git clone https://github.com/ollama/ollama.git /root/ollama

# Change to the cloned ollama directory
WORKDIR /root/ollama

# Initialize and update git submodules
RUN git submodule update --init --recursive

# Create and activate virtual environment
RUN python3 -m venv .venv
RUN . .venv/bin/activate
RUN python3 -m pip install -r llm/llama.cpp/requirements.txt

# Build the submodule
RUN make -C llm/llama.cpp quantize

# Convert
RUN python3 llm/llama.cpp/convert-hf-to-gguf.py /root/LLaMa3-8b-WangchanX-sft-Demo --outtype f16 --outfile /root/LLaMa3-8b-WangchanX-sft-Demo.gguf

Build the image using the following command:

docker build -t ollama -f <Dockerfile> .

Alternatively, you can simply build the image which we already provided in the deployment directory:

docker build -t ollama -f deployment/Ollama/Dockerfile.Ollama .

Run the image using this command:

docker run -d --gpus all -p 11434:11434  ollama #you can add -d flag to run in background

Create model:

curl http://localhost:11434/api/create -d '{
  "name": "LLaMa3-8b-WangchanX-sft-Demo",
  "modelfile":"FROM /root/LLaMa3-8b-WangchanX-sft-Demo.gguf\n\n\nTEMPLATE \"\"\"\n{{ if .System }}<|system|>\n{{.System}}<|end_of_text|>\n{{ end }}{{ if .Prompt }}<|user|>\n{{ .Prompt }}<|end_of_text|>\n{{ end }}<|assistant|>\n\"\"\"\n\nPARAMETER stop \"<|end_of_text|>\"\nPARAMETER stop \"<|assistant|>\"\nPARAMETER stop \"<|user|>\"\nPARAMETER stop \"<|system|>\""
}'

And then you can make requests like this:

curl http://localhost:11434/api/chat -d '{
  "model": "LLaMa3-8b-WangchanX-sft-Demo",
  "messages": [
    {
      "role": "user",
      "content": "ลิเก กับ งิ้ว ต่างกันอย่างไร"
    }
  ]
}'

Preview:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployments.md

Deployments.md

🚀 Deployment

Files

Deployments.md

Latest commit

History

Deployments.md

File metadata and controls

🚀 Deployment