Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SGLang as Potential Backend for Evaluation #2703

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,24 @@ vLLM occasionally differs in output from Huggingface. We treat Huggingface as th
> [!Tip]
> Passing `max_model_len=4096` or some other reasonable default to vLLM through model args may cause speedups or prevent out-of-memory errors when trying to use auto batch size, such as for Mistral-7B-v0.1 which defaults to a maximum length of 32k.

### Tensor + Data Parallel and Fast Offline Batching Inference with `SGLang`
We support SGLang with its efficient offline batch inference. Its **[Fast Backend Runtime](https://docs.sglang.ai/index.html)** thanks to efficient serving with RadixAttention for prefix caching, jump-forward constrained decoding, overhead-free CPU scheduler, continuous batching, token attention (paged attention), tensor parallelism, FlashInfer kernels, chunked prefill, and quantization (FP8/INT4/AWQ/GPTQ).

To use SGLang as evaluation backend, please **install it in advance**(due to dependencies of `Flashinfer`, a fast attention kernel library). See docs [here](https://docs.sglang.ai/start/install.html#install-sglang) to install SGLang. We recommend using uv to install the dependencies with a higher installation speed:
```bash
pip install --upgrade pip
pip install uv
uv pip install sgl-kernel --force-reinstall --no-deps
uv pip install "sglang[all]>=0.4.3.post2" --find-links https://flashinfer.ai/whl/cu124/torch2.5/flashinfer-python
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this. Just saying:

To use SGLang as evaluation backend, please install it in advance(due to dependencies of Flashinfer, a fast attention kernel library). See docs here to install SGLang.

Delete the uv commands, since it maybe updated in SGLang.


SGLang's server arguments are slightly different from other backends, see [here](https://docs.sglang.ai/backend/server_arguments.html) for more information. We provide an example of the usage here:
```bash
lm_eval --model sglang \
--model_args pretrained={model_name},tp_size={data_parallel_size},dp_size={tensor_parallel_size},dtype=auto,mem-fraction-static=0.9, \
--tasks gsm8k_cot \
--batch_size auto
```
### Model APIs and Inference Servers

Our library also supports the evaluation of models served via several commercial APIs, and we hope to implement support for the most commonly used performant local/self-hosted inference servers.
Expand Down