DeepSeek-R1-Distill-Qwen-7B grpo微调失败 #3114

yyyiron · 2025-02-14T09:05:22Z

Describe the bug
如题
以下为训练代码

swift rlhf
--rlhf_type grpo
--model 。/DeepSeek-R1-Distill-Qwen-7B
--reward_funcs accuracy format
--use_vllm false
--train_type full
--torch_dtype bfloat16
--dataset 'AI-MO/NuminaMath-TIR#5000'
--max_completion_length 2048
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--learning_rate 1e-6
--gradient_accumulation_steps 2
--eval_steps 20
--save_steps 20
--save_total_limit 2
--logging_steps 5
--max_length 4096
--output_dir ./output
--warmup_ratio 0.05
--dataloader_num_workers 2
--dataset_num_proc 2
--num_generations 1
--temperature 0
--max_steps 5
--system '/home/ubuntu/llama_factory_ft/prompt.txt'
以下为训练过程

以下为推理代码
CUDA_VISIBLE_DEVICES=0
swift infer
--stream False
--merge_lora False
--max_model_len 8192
--temperature 1
--max_new_tokens 2048
--seed 42
--model ./output/v2-20250214-145826/checkpoint-5 \

输出结果

微调前大小 15G
微调后大小(output/v2-20250214-145826/checkpoint-5) 43G

Your hardware and system info
GPU：2*H20
CUDA:12.4
NVIDIA版本:| NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4

感谢

Jintao-Huang · 2025-02-14T09:06:30Z

num_generations 太小了

yyyiron · 2025-02-14T09:33:45Z

你好，请问一下过小会导致什么问题，正常应该设置为多少

yyyiron · 2025-02-14T09:35:21Z

除了上面的方法，现在我用双卡跑vllm加速推理的时候，发现显示张量没有放到一张卡上，麻烦您再帮看下
CUDA_VISIBLE_DEVICES=0,1
NPROC_PER_NODE=1 \

swift rlhf
--rlhf_type grpo
--model /deepseek-model/qwen7b4bit/hub/deepseek-ai/DeepSeek-R1-Distill-Qwen-1___5B
--reward_funcs accuracy format
--use_vllm true
--vllm_gpu_memory_utilization 0.9
--vllm_max_model_len 8192
--vllm_device auto
--train_type lora
--torch_dtype bfloat16
--dataset 'AI-MO/NuminaMath-TIR#5000'
--max_completion_length 2048
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--learning_rate 1e-6
--gradient_accumulation_steps 2
--eval_steps 20
--save_steps 20
--save_total_limit 2
--logging_steps 5
--max_length 4096
--output_dir /deepseek-model/output
--warmup_ratio 0.05
--dataloader_num_workers 2
--dataset_num_proc 2
--num_generations 1
--temperature 0
--max_steps 5
--system '/home/ubuntu/llama_factory_ft/prompt.txt'
--deepspeed zero2

num_generations 太小了

Jintao-Huang · 2025-02-14T09:38:31Z

升级一下vllm试试呢

yyyiron · 2025-02-14T09:41:45Z

升级一下vllm试试呢

以下为我的版本

absl-py 2.1.0
accelerate 1.3.0
addict 2.4.0
aiofiles 23.2.1
aiohappyeyeballs 2.4.6
aiohttp 3.11.12
aiohttp-cors 0.7.0
aiosignal 1.3.2
airportsdata 20241001
aliyun-python-sdk-core 2.16.0
aliyun-python-sdk-kms 2.16.5
annotated-types 0.7.0
antlr4-python3-runtime 4.13.2
anyio 4.8.0
astor 0.8.1
attrdict 2.0.1
attrs 25.1.0
binpacking 1.5.2
blake3 1.0.4
cachetools 5.5.1
certifi 2025.1.31
cffi 1.17.1
charset-normalizer 3.4.1
click 8.1.8
cloudpickle 3.1.1
colorful 0.5.6
compressed-tensors 0.8.1
contourpy 1.3.1
cpm-kernels 1.0.11
crcmod 1.7
cryptography 44.0.1
cycler 0.12.1
dacite 1.9.2
datasets 3.2.0
deepspeed 0.16.3
depyf 0.18.0
dill 0.3.8
diskcache 5.6.3
distlib 0.3.9
distro 1.9.0
einops 0.8.1
et_xmlfile 2.0.0
fastapi 0.115.8
ffmpy 0.5.0
filelock 3.17.0
fonttools 4.56.0
frozenlist 1.5.0
fsspec 2024.9.0
future 1.0.0
gguf 0.10.0
google-api-core 2.24.1
google-auth 2.38.0
googleapis-common-protos 1.67.0
gradio 5.16.0
gradio_client 1.7.0
grpcio 1.70.0
h11 0.14.0
hjson 3.1.0
httpcore 1.0.7
httptools 0.6.4
httpx 0.28.1
huggingface-hub 0.28.1
idna 3.10
importlib_metadata 8.6.1
iniconfig 2.0.0
interegular 0.3.3
jieba 0.42.1
Jinja2 3.1.5
jiter 0.8.2
jmespath 0.10.0
joblib 1.4.2
jsonschema 4.23.0
jsonschema-specifications 2024.10.1
kiwisolver 1.4.8
lark 1.2.2
latex2sympy2_extended 1.0.6
lm-format-enforcer 0.10.9
Markdown 3.7
markdown-it-py 3.0.0
MarkupSafe 2.1.5
math-verify 0.5.2
matplotlib 3.10.0
mdurl 0.1.2
mistral_common 1.5.3
modelscope 1.22.3
mpmath 1.3.0
ms-swift 3.2.0.dev0 /home/ubuntu/llama_factory_ft/ms-swift
msgpack 1.1.0
msgspec 0.19.0
multidict 6.1.0
multiprocess 0.70.16
nest-asyncio 1.6.0
networkx 3.4.2
ninja 1.11.1.3
nltk 3.9.1
numpy 1.26.4
nvidia-cublas-cu12 12.4.5.8
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.1.3
nvidia-curand-cu12 10.3.5.147
nvidia-cusolver-cu12 11.6.1.9
nvidia-cusparse-cu12 12.3.1.170
nvidia-cusparselt-cu12 0.6.2
nvidia-ml-py 12.570.86
nvidia-nccl-cu12 2.21.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.4.127
openai 1.62.0
opencensus 0.11.4
opencensus-context 0.1.3
opencv-python-headless 4.11.0.86
openpyxl 3.1.5
orjson 3.10.15
oss2 2.19.1
outlines 0.1.11
outlines_core 0.1.26
packaging 24.2
pandas 2.2.3
partial-json-parser 0.2.1.1.post5
peft 0.14.0
pillow 11.1.0
pip 25.0
platformdirs 4.3.6
pluggy 1.5.0
prometheus_client 0.21.1
prometheus-fastapi-instrumentator 7.0.2
propcache 0.2.1
proto-plus 1.26.0
protobuf 5.29.3
psutil 6.1.1
py-cpuinfo 9.0.0
py-spy 0.4.0
pyarrow 19.0.0
pyasn1 0.6.1
pyasn1_modules 0.4.1
pybind11 2.13.6
pycountry 24.6.1
pycparser 2.22
pycryptodome 3.21.0
pydantic 2.10.6
pydantic_core 2.27.2
pydub 0.25.1
Pygments 2.19.1
pyparsing 3.2.1
pytest 8.3.4
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-multipart 0.0.20
pytz 2025.1
PyYAML 6.0.2
pyzmq 26.2.1
ray 2.42.1
referencing 0.36.2
regex 2024.11.6
requests 2.32.3
rich 13.9.4
rouge 1.0.1
rpds-py 0.22.3
rsa 4.9
ruff 0.9.6
safehttpx 0.1.6
safetensors 0.5.2
scipy 1.15.1
semantic-version 2.10.0
sentencepiece 0.2.0
setuptools 69.5.1
shellingham 1.5.4
simplejson 3.19.3
six 1.17.0
smart-open 7.1.0
sniffio 1.3.1
sortedcontainers 2.4.0
starlette 0.45.3
sympy 1.13.1
tensorboard 2.19.0
tensorboard-data-server 0.7.2
tiktoken 0.8.0
tokenizers 0.21.0
tomlkit 0.13.2
torch 2.5.1
torchaudio 2.5.1
torchvision 0.20.1
tqdm 4.67.1
transformers 4.48.3
transformers-stream-generator 0.0.5
triton 3.1.0
trl 0.15.0.dev0
typer 0.15.1
typing_extensions 4.12.2
tzdata 2025.1
urllib3 2.3.0
uvicorn 0.34.0
uvloop 0.21.0
virtualenv 20.29.2
vllm 0.6.5
watchfiles 1.0.4
websockets 14.2
Werkzeug 3.1.3
wheel 0.45.1
wrapt 1.17.2
xformers 0.0.28.post3
xgrammar 0.1.11
xxhash 3.5.0
yarl 1.18.3
zipp 3.21.0
zstandard 0.23.0

yyyiron · 2025-02-14T10:20:06Z

num_generations 太小了

你好，我这边按照你的方法改动已经成功，以下为双卡的例子
swift rlhf
--rlhf_type grpo
--model ./DeepSeek-R1-Distill-Qwen-1___5B
--reward_funcs accuracy format
--train_type full
--lora_rank 8
--lora_alpha 32
--target_modules all-linear
--torch_dtype bfloat16
--dataset 'AI-MO/NuminaMath-TIR#5000'
--max_completion_length 1024
--num_train_epochs 1
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--learning_rate 1e-5
--gradient_accumulation_steps 1
--eval_steps 100
--save_steps 100
--save_total_limit 2
--logging_steps 5
--max_length 2048
--output_dir ./single_output
--warmup_ratio 0.05
--dataloader_num_workers 4
--dataset_num_proc 4
--num_generations 4
--temperature 0.9
--system '/home/ubuntu/llama_factory_ft/prompt.txt'
谢谢你的帮助

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSeek-R1-Distill-Qwen-7B grpo微调失败 #3114

DeepSeek-R1-Distill-Qwen-7B grpo微调失败 #3114

yyyiron commented Feb 14, 2025

Jintao-Huang commented Feb 14, 2025

yyyiron commented Feb 14, 2025

yyyiron commented Feb 14, 2025

Jintao-Huang commented Feb 14, 2025

yyyiron commented Feb 14, 2025

yyyiron commented Feb 14, 2025

DeepSeek-R1-Distill-Qwen-7B grpo微调失败 #3114

DeepSeek-R1-Distill-Qwen-7B grpo微调失败 #3114

Comments

yyyiron commented Feb 14, 2025

Jintao-Huang commented Feb 14, 2025

yyyiron commented Feb 14, 2025

yyyiron commented Feb 14, 2025

Jintao-Huang commented Feb 14, 2025

yyyiron commented Feb 14, 2025

yyyiron commented Feb 14, 2025