modelscope · Jintao-Huang · Feb 3, 2025 · Feb 3, 2025 · Feb 3, 2025 · Feb 3, 2025
diff --git a/docs/source/Instruction/导出.md → docs/source/Instruction/导出与推送.md b/docs/source/Instruction/导出.md → docs/source/Instruction/导出与推送.md
@@ -1,4 +1,11 @@
-# 导出
+# 导出与推送
+
+| 量化技术 | 多模态 | 推理加速 | 继续训练 |
+| -------- | ------ | -------- | -------- |
+| GPTQ     | ✅      | ✅        | ✅        |
+| AWQ      | ✅      | ✅        | ✅        |
+| BNB      | ❌      | ✅        | ✅        |
+
 swift支持使用awq、gptq、bnb、hqq、eetq技术对模型进行量化。其中awq、gptq量化技术支持vllm/lmdeploy进行推理加速，需要使用校准数据集，量化性能更好，但量化速度较慢。而bnb、hqq、eetq无需校准数据，量化速度较快。这五种量化方法都支持qlora微调。
 
 awq、gptq、bnb（8bit）支持使用`swift export`进行量化。而bnb、hqq、eetq可以直接在sft和infer时进行快速量化。

diff --git a/docs/source/Instruction/推理和部署.md b/docs/source/Instruction/推理和部署.md
@@ -1,23 +1,181 @@
 # 推理和部署
 
+以下为swift支持的推理引擎以及接入部分的相应能力，三种推理加速引擎为SWIFT的推理、部署、评测模块提供推理加速：
+
+| 推理加速引擎 | OpenAI API | 多模态 |  量化模型 | 多LoRA | QLoRA | Batch推理 | 并行技术       |
+| ------------ | -------------- | ---------- | ------ | -------- | ------ | ----- | ----- |
+| pytorch      | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/deploy/client/llm/chat/openai_client.py) | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/app/mllm.sh) |     ✅        | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/infer/demo_lora.py) | ✅     | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/infer/pt/batch_ddp.sh) |DDP/device_map |
+| [vllm](https://github.com/vllm-project/vllm)         | ✅          | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/infer/vllm/mllm_tp.sh) |    ✅        | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/deploy/lora/server.sh) | ❌    | ✅ |  TP/PP   |
+| [lmdeploy](https://github.com/InternLM/lmdeploy)    | ✅          | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/infer/lmdeploy/mllm_tp.sh) |      ✅        | ❌      | ❌     | ✅ | TP     |
+
+
+## 推理
+ms-swift使用了分层式的设计思想，用户可以使用命令行界面、Web-UI界面和直接使用Python的方式进行推理。
+
+### 使用CLI
+
+全参数模型：
+```shell
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --model Qwen/Qwen2.5-7B-Instruct \
+    --stream true \
+    --infer_backend pt \
+    --max_new_tokens 2048
+```
+
+LoRA模型：
+```shell
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --model Qwen/Qwen2.5-7B-Instruct \
+    --adapters swift/test_lora \
+    --stream true \
+    --infer_backend pt \
+    --temperature 0 \
+    --max_new_tokens 2048
+```
+
+
+**命令行推理指令**
+
+以上为交互式命令行界面推理，脚本运行后仅需在terminal中输入query即可。你也可以输入以下特殊指令：
+- `multi-line`: 切换到多行模式，在输入中支持换行输入，以`#`代表输入结束
+- `single-line`: 切换到单行模式，以换行代表输入结束
+- `reset-system`: 重置system并清空历史记录
+- `clear`: 清除历史记录
+- `quit` or `exit`: 退出对话
+
+**多模态模型**
+
+```shell
+CUDA_VISIBLE_DEVICES=0 \
+MAX_PIXELS=1003520 \
+VIDEO_MAX_PIXELS=50176 \
+FPS_MAX_FRAMES=12 \
+swift infer \
+    --model Qwen/Qwen2.5-VL-3B-Instruct \
+    --stream true \
+    --infer_backend pt \
+    --max_new_tokens 2048
+```
+
+如果要进行多模态模型的推理，可以在query中添加`<image>/<video>/<audio>`等标签（代表图像表征在`inputs_embeds`中的位置），例如输入`<image><image>这两张图有啥区别`，`<video>描述这段视频`。然后根据提示输入相应的图像/视频/音频即可。
+
+```
+<<< <image><image>这两张图有什么区别
+Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
+Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png
+这两张图片的区别在于它们所展示的动物和场景。
+
+1. **第一张图片**：
+  - 展示了一只小猫。
+  - 小猫有大大的眼睛，表情显得有些困惑或好奇。
+  - 背景是模糊的，可能是室内环境。
+
+2. **第二张图片**：
+  - 展示了一群羊。
+  - 羊们站在草地上，背景是绿色的山丘和蓝天白云。
+  - 羊的表情看起来很平静，似乎在享受大自然的环境。
+
+总结来说，第一张图片是一只小猫，而第二张图片是一群羊。
+--------------------------------------------------
+<<< clear
+<<< <video>描述这段视频
+Input a video path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4
+The video shows a baby wearing sunglasses sitting on a bed and reading a book. The baby is holding the book with both hands and appears to be focused on the pages. The baby's feet are visible in the frame, and they are moving slightly as they read. The background of the video shows a room with a bed and some furniture.
+```
+
+
+数据集推理：
+```shell
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --model Qwen/Qwen2.5-7B-Instruct \
+    --stream true \
+    --infer_backend pt \
+    --val_dataset AI-ModelScope/alpaca-gpt4-data-zh \
+    --max_new_tokens 2048
+```
+
+以上提供了全参数和LoRA流式推理的例子，以下介绍更多SWIFT中的推理技术：
+- 界面推理：你可以将`swift infer`改成`swift app`
+- batch推理：`infer_backend=pt`可以指定`--max_batch_size`对大模型和多模态大模型进行batch推理，具体参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/infer/pt/batch_ddp.sh)。在进行batch推理时，你不能设置`--stream true`。
+- DDP/device_map推理：`infer_backend=pt`支持使用DDP/device_map技术进行并行推理，具体参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/infer/pt/mllm_device_map.sh)。
+- 推理加速：swift支持使用vllm/lmdeploy对推理、部署和评测模块进行推理加速，只需要额外指定`--infer_backend vllm/lmdeploy`即可。
+- 多模态模型：我们提供了[pt](https://github.com/modelscope/ms-swift/blob/main/examples/infer/pt/mllm_device_map.sh)/[vllm](https://github.com/modelscope/ms-swift/blob/main/examples/infer/vllm/mllm_tp.sh)/[lmdeploy](https://github.com/modelscope/ms-swift/blob/main/examples/infer/lmdeploy/mllm_tp.sh)对多模态模型进行多GPU推理的shell脚本。
+- 量化模型：直接选择GPTQ、AWQ、BNB量化的模型，例如：`--model Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4`即可。
+- 更多模型类型：我们提供了[bert](https://github.com/modelscope/ms-swift/blob/main/examples/infer/pt/bert.sh)、[reward_model](https://github.com/modelscope/ms-swift/blob/main/examples/infer/pt/reward_model.sh)、[prm](https://github.com/modelscope/ms-swift/blob/main/examples/infer/pt/prm.sh)的推理脚本
+
+
+小帖士：
+- SWIFT会将推理结果保存起来，你可以通过`--result_path`指定保存路径
+- 如果要输出logprobs，只需要在推理时，指定`--logprobs true`即可。SWIFT会保存。注意，设置`--stream true`将不会存储
+- 使用`--infer_backend vllm`出现OOM，可以通过降低`--max_model_len`，`--max_num_seqs`，选择合适的`--gpu_memory_utilization`，设置`--enforce_eager true`。或者使用tensor并行`--tensor_parallel_size`来解决。
+- 使用`--infer_backend vllm`推理多模态模型，需要传入多张图片。可以设置`--limit_mm_per_prompt`解决，例如：`--limit_mm_per_prompt '{"image": 10, "video": 5}'`。
+- 推理qwen2-vl/qwen2.5-vl出现OOM，可以通过设置`MAX_PIXELS`、`VIDEO_MAX_PIXELS`、`FPS_MAX_FRAMES`解决，可以参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/app/mllm.sh)。
+- swift内置对话模板与使用transformers运行的对话模板对齐，测试参考[这里](https://github.com/modelscope/ms-swift/blob/main/tests/test_align/test_template/test_vision.py)。如果出现未对齐情况，欢迎提issue/PR修正。
+
+
+### 使用Web-UI
+如果你要使用界面的方式进行推理，可以查看[Web-UI文档](../GetStarted/Web-UI.md)。
+
+### 使用Python
+
+文本模型：
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = '0'
+
+from swift.llm import PtEngine, RequestConfig, InferRequest
+model = 'Qwen/Qwen2.5-0.5B-Instruct'
+
+# 加载推理引擎
+engine = PtEngine(model, max_batch_size=2)
+request_config = RequestConfig(max_tokens=512, temperature=0)
+
+# 这里使用了2个infer_request来展示batch推理
+infer_requests = [
+    InferRequest(messages=[{'role': 'user', 'content': 'who are you?'}]),
+    InferRequest(messages=[{'role': 'user', 'content': '浙江的省会在哪？'},
+                           {'role': 'assistant', 'content': '浙江的省会在哪？'},
+                           {'role': 'user', 'content': '这里有什么好玩的地方'},]),
+]
+resp_list = engine.infer(infer_requests, request_config)
+query0 = infer_requests[0].messages[0]['content']
+print(f'query0: {query0}')
+print(f'response0: {resp_list[0].choices[0].message.content}')
+print(f'response1: {resp_list[1].choices[0].message.content}')
+```
+
+多模态模型：
+```
+
+```
+
+
+
+- grounding任务：对多模态模型进行Grounding任务画框，可以参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/infer/demo_grounding.py)
+- 多LoRA推理：
+- agent推理：参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/infer/demo_agent.py)。
+
+
+### 
+
+
 SWIFT支持以命令行、Python代码和界面方式进行推理和部署：
 - 使用`engine.infer`或者`engine.infer_async`进行python的方式推理. 参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/infer/demo.py).
 - 使用`swift infer`使用命令行的方式进行推理. 参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/infer/cli_demo.sh).
 - 使用`swift deploy`进行服务部署，并使用openai API或者`client.infer`的方式推理. 服务端参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/deploy/server), 客户端参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/deploy/client).
 - 使用`swift app`部署模型进行界面推理, 可以查看[这里](../GetStarted/Web-UI.md)
 
 
-## 命令行推理指令
 
-命令行推理可以参考上述第二点给出的链接。脚本运行后仅需在terminal中输入query即可。注意命令行的几个使用方式：
-- `reset-system`命令 重置system
-- `multi-line`命令 切换到多行模式，在输入中支持换行输入，以`#`代表输入结束
-- `single-line`命令 切换到单行模式
-- `clear`命令 清除history
-- `exit`命令 退出
-- 如果query中带有多模态数据，添加`<image>/<video>/<audio>`等标签，例如输入`<image>What is in the image?`，即可在接下来输入图片地址
 
-## 推理加速后端
+
+## 部署
+
+
+
+
+##
 
 可以使用`swift infer/deploy`执行推理和部署。目前SWIFT支持pt（原生torch）、vLLM、LMDeploy三种推理框架，分别可以用`--infer_backend pt/vllm/lmdeploy`进行切换。
 除pt外，vllm和lmdeploy分别有自己的模型支持范围，请查看各自官方文档来确定是否可用，以防出现运行错误。
diff --git a/docs/source/Instruction/预训练与微调.md b/docs/source/Instruction/预训练与微调.md
@@ -225,6 +225,18 @@ print(f'response0: {resp_list[0].choices[0].message.content}')
 print(f'response1: {resp_list[1].choices[0].message.content}')
 ```
 
+如果使用ms-swift训练的模型，可以通过以下方式获取训练的配置：
+```python
+from swift.llm import safe_snapshot_download, BaseArguments
+
+lora_adapters = safe_snapshot_download('swift/test_lora')
+args = BaseArguments.from_pretrained(lora_adapters)
+print(f'args.model: {args.model}')
+print(f'args.model_type: {args.model_type}')
+print(f'args.template_type: {args.template}')
+print(f'args.default_system: {args.system}')
+```
+
 - 对全参数训练的checkpoint进行推理，样式同[大模型推理示例](https://github.com/modelscope/ms-swift/blob/main/examples/infer/demo.py)，修改`model`即可。
 - 使用流式推理以及`VllmEngine`、`LmdeployEngine`进行推理加速，可以参考[大模型](https://github.com/modelscope/ms-swift/blob/main/examples/infer/demo.py)和[多模态大模型](https://github.com/modelscope/ms-swift/blob/main/examples/infer/demo_mllm.py)推理示例。
 - 微调后的模型使用huggingface transformers/peft生态推理，可以参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/infer/demo_hf.py)。

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -23,14 +23,13 @@ Swift DOCUMENTATION
    Instruction/推理和部署.md
    Instruction/采样.md
    Instruction/评测.md
-   Instruction/导出.md
+   Instruction/导出与推送.md
    Instruction/强化微调.md
    Instruction/GRPO.md
    Instruction/支持的模型和数据集.md
    Instruction/使用tuners.md
    Instruction/智能体的支持.md
    Instruction/NPU支持.md
-   Instruction/推送模型.md
    Instruction/ReleaseNote3.0.md
    Instruction/常见问题整理.md
 

diff --git a/docs/source_en/Instruction/Export.md → .../source_en/Instruction/Export-and-push.md b/docs/source_en/Instruction/Export.md → .../source_en/Instruction/Export-and-push.md
diff --git a/docs/source_en/index.rst b/docs/source_en/index.rst
@@ -23,14 +23,13 @@ Swift DOCUMENTATION
    Instruction/Inference-and-deployment.md
    Instruction/Sample.md
    Instruction/Evaluation.md
-   Instruction/Export.md
+   Instruction/Export-and-push.md
    Instruction/Reinforced-Fine-tuning.md
    Instruction/GRPO.md
    Instruction/Supported-models-and-datasets.md
    Instruction/Use-tuners.md
    Instruction/Agent-support.md
    Instruction/NPU-support.md
-   Instruction/Push-model.md
    Instruction/ReleaseNote3.0
    Instruction/Frequently-asked-questions.md