You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/source/models/supported_models.md
+49-19
Original file line number
Diff line number
Diff line change
@@ -14,8 +14,11 @@ Alongside each architecture, we include some popular models that use it.
14
14
15
15
By default, vLLM loads models from [HuggingFace (HF) Hub](https://huggingface.co/models).
16
16
17
-
To determine whether a given model is supported, you can check the `config.json` file inside the HF repository.
18
-
If the `"architectures"` field contains a model architecture listed below, then it should be supported in theory.
17
+
To determine whether a given model is natively supported, you can check the `config.json` file inside the HF repository.
18
+
If the `"architectures"` field contains a model architecture listed below, then it should be natively supported.
19
+
20
+
Models do not _need_ to be natively supported to be used in vLLM.
21
+
The <project:#transformers-fallback> enables you to run models directly using their Transformers implementation (or even remote code on the Hugging Face Model Hub!).
19
22
20
23
:::{tip}
21
24
The easiest way to check if your model is really supported at runtime is to run the program below:
@@ -40,50 +43,59 @@ If vLLM successfully returns text (for generative models) or hidden states (for
40
43
Otherwise, please refer to [Adding a New Model](#new-model) for instructions on how to implement your model in vLLM.
41
44
Alternatively, you can [open an issue on GitHub](https://github.com/vllm-project/vllm/issues/new/choose) to request vLLM support.
42
45
46
+
(transformers-fallback)=
47
+
43
48
### Transformers fallback
44
49
45
-
`vllm` can fallback to models that are available in `transformers`. This does not work for all models for now, but most decoder language models are supported, and vision language model support is planned!
50
+
vLLM can fallback to model implementations that are available in Transformers. This does not work for all models for now, but most decoder language models are supported, and vision language model support is planned!
46
51
47
-
To check if the backend is `transformers`, you can simply do this:
52
+
To check if the backend is Transformers, you can simply do this:
48
53
49
54
```python
50
55
from vllm importLLM
51
56
llm = LLM(model=..., task="generate") # Name or path of your model
If it is `TransformersModel` then it means it's based on `transformers`!
60
+
If it is `TransformersModel` then it means it's based on Transformers!
56
61
57
-
#### Supported features
62
+
:::{note}
63
+
vLLM may not fully optimise the Transformers implementation so you may see degraded performance if comparing a native model to a Transformers model in vLLM.
64
+
:::
58
65
59
-
##### Quantization
66
+
####Supported features
60
67
61
-
Transformers fallback has supported most of available quantization in vLLM (except GGUF). See [Quantization page](#quantization-index) for more information about supported quantization in vllm.
68
+
The Transformers fallback explicitly supports the following features:
Transformers fallback has supported LoRA. The usage way is identical to how LoRA works with models supported by vLLM. If you encounter any issues, please open an issue.
74
+
#### Remote code
66
75
67
-
##### Remote code
76
+
Earlier we mentioned that the Transformers fallback enables you to run remote code models directly in vLLM.
77
+
If you are interested in this feature, this section is for you!
68
78
69
-
This fallback also means that any model on the hub that can be used in `transformers` with `trust_remote_code=True` that correctly implements attention can be used in production!
79
+
Simply set `trust_remote_code=True` and vLLM will run any model on the Model Hub that is compatible with Transformers.
80
+
Provided that the model writer implements their model in a compatible way, this means that you can run new models before they are officially supported in Transformers or vLLM!
70
81
71
82
```python
72
83
from vllm importLLM
73
84
llm = LLM(model=..., task="generate", trust_remote_code=True) # Name or path of your model
@@ -102,8 +114,26 @@ class MyModel(PreTrainedModel):
102
114
Here is what happens in the background:
103
115
104
116
1. The config is loaded
105
-
2.`MyModel` python class is loaded from the `auto_map`, and we check that the model `_supports_attention_backend`.
106
-
3. The `TransformersModel` backend is used. See `/model_executors/models/transformers`, which leverage `self.config._attn_implementation = "vllm"`, thus the need to use `ALL_ATTENTION_FUNCTION`.
117
+
2.`MyModel` Python class is loaded from the `auto_map`, and we check that the model `_supports_attention_backend`.
118
+
3. The `TransformersModel` backend is used. See <gh-file:vllm/model_executor/models/transformers.py>, which leverage `self.config._attn_implementation = "vllm"`, thus the need to use `ALL_ATTENTION_FUNCTION`.
119
+
120
+
To make your model compatible with tensor parallel, it needs:
121
+
122
+
```{code-block} python
123
+
:caption: configuration_my_model.py
124
+
125
+
from transformers import PretrainedConfig
126
+
127
+
class MyConfig(PretrainedConfig):
128
+
base_model_tp_plan = {
129
+
"layers.*.self_attn.q_proj": "colwise",
130
+
...
131
+
}
132
+
```
133
+
134
+
:::{tip}
135
+
`base_model_tp_plan` is a `dict` that maps fully qualified layer name patterns to tensor parallel styles (currently only `"colwise"` and `"rowwise"` are supported).
136
+
:::
107
137
108
138
That's it!
109
139
@@ -893,7 +923,7 @@ Currently the PaliGemma model series is implemented without PrefixLM attention m
893
923
:::
894
924
895
925
:::{note}
896
-
To use Qwen2.5-VL series models, you have to install Huggingface `transformers` library from source via `pip install git+https://github.com/huggingface/transformers`.
926
+
To use Qwen2.5-VL series models, you have to install Hugging Face Transformers library from source via `pip install git+https://github.com/huggingface/transformers`.
0 commit comments