Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepSeek-R1-Distill-Qwen-7B grpo微调失败 #3114

Open
yyyiron opened this issue Feb 14, 2025 · 6 comments
Open

DeepSeek-R1-Distill-Qwen-7B grpo微调失败 #3114

yyyiron opened this issue Feb 14, 2025 · 6 comments

Comments

@yyyiron
Copy link

yyyiron commented Feb 14, 2025

Describe the bug
如题
以下为训练代码

swift rlhf
--rlhf_type grpo
--model 。/DeepSeek-R1-Distill-Qwen-7B
--reward_funcs accuracy format
--use_vllm false
--train_type full
--torch_dtype bfloat16
--dataset 'AI-MO/NuminaMath-TIR#5000'
--max_completion_length 2048
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--learning_rate 1e-6
--gradient_accumulation_steps 2
--eval_steps 20
--save_steps 20
--save_total_limit 2
--logging_steps 5
--max_length 4096
--output_dir ./output
--warmup_ratio 0.05
--dataloader_num_workers 2
--dataset_num_proc 2
--num_generations 1
--temperature 0
--max_steps 5
--system '/home/ubuntu/llama_factory_ft/prompt.txt'
以下为训练过程
Image

以下为推理代码
CUDA_VISIBLE_DEVICES=0
swift infer
--stream False
--merge_lora False
--max_model_len 8192
--temperature 1
--max_new_tokens 2048
--seed 42
--model ./output/v2-20250214-145826/checkpoint-5 \

输出结果

Image

微调前大小 15G
微调后大小(output/v2-20250214-145826/checkpoint-5) 43G

Image

Your hardware and system info
GPU:2*H20
CUDA:12.4
NVIDIA版本:| NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4

感谢

@Jintao-Huang
Copy link
Collaborator

num_generations 太小了

@yyyiron
Copy link
Author

yyyiron commented Feb 14, 2025

你好,请问一下过小会导致什么问题,正常应该设置为多少

@yyyiron
Copy link
Author

yyyiron commented Feb 14, 2025

除了上面的方法,现在我用双卡跑vllm加速推理的时候,发现显示张量没有放到一张卡上,麻烦您再帮看下
CUDA_VISIBLE_DEVICES=0,1
NPROC_PER_NODE=1 \

swift rlhf
--rlhf_type grpo
--model /deepseek-model/qwen7b4bit/hub/deepseek-ai/DeepSeek-R1-Distill-Qwen-1___5B
--reward_funcs accuracy format
--use_vllm true
--vllm_gpu_memory_utilization 0.9
--vllm_max_model_len 8192
--vllm_device auto
--train_type lora
--torch_dtype bfloat16
--dataset 'AI-MO/NuminaMath-TIR#5000'
--max_completion_length 2048
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--learning_rate 1e-6
--gradient_accumulation_steps 2
--eval_steps 20
--save_steps 20
--save_total_limit 2
--logging_steps 5
--max_length 4096
--output_dir /deepseek-model/output
--warmup_ratio 0.05
--dataloader_num_workers 2
--dataset_num_proc 2
--num_generations 1
--temperature 0
--max_steps 5
--system '/home/ubuntu/llama_factory_ft/prompt.txt'
--deepspeed zero2

Image

num_generations 太小了

@Jintao-Huang
Copy link
Collaborator

升级一下vllm试试呢

@yyyiron
Copy link
Author

yyyiron commented Feb 14, 2025

升级一下vllm试试呢

以下为我的版本


absl-py 2.1.0
accelerate 1.3.0
addict 2.4.0
aiofiles 23.2.1
aiohappyeyeballs 2.4.6
aiohttp 3.11.12
aiohttp-cors 0.7.0
aiosignal 1.3.2
airportsdata 20241001
aliyun-python-sdk-core 2.16.0
aliyun-python-sdk-kms 2.16.5
annotated-types 0.7.0
antlr4-python3-runtime 4.13.2
anyio 4.8.0
astor 0.8.1
attrdict 2.0.1
attrs 25.1.0
binpacking 1.5.2
blake3 1.0.4
cachetools 5.5.1
certifi 2025.1.31
cffi 1.17.1
charset-normalizer 3.4.1
click 8.1.8
cloudpickle 3.1.1
colorful 0.5.6
compressed-tensors 0.8.1
contourpy 1.3.1
cpm-kernels 1.0.11
crcmod 1.7
cryptography 44.0.1
cycler 0.12.1
dacite 1.9.2
datasets 3.2.0
deepspeed 0.16.3
depyf 0.18.0
dill 0.3.8
diskcache 5.6.3
distlib 0.3.9
distro 1.9.0
einops 0.8.1
et_xmlfile 2.0.0
fastapi 0.115.8
ffmpy 0.5.0
filelock 3.17.0
fonttools 4.56.0
frozenlist 1.5.0
fsspec 2024.9.0
future 1.0.0
gguf 0.10.0
google-api-core 2.24.1
google-auth 2.38.0
googleapis-common-protos 1.67.0
gradio 5.16.0
gradio_client 1.7.0
grpcio 1.70.0
h11 0.14.0
hjson 3.1.0
httpcore 1.0.7
httptools 0.6.4
httpx 0.28.1
huggingface-hub 0.28.1
idna 3.10
importlib_metadata 8.6.1
iniconfig 2.0.0
interegular 0.3.3
jieba 0.42.1
Jinja2 3.1.5
jiter 0.8.2
jmespath 0.10.0
joblib 1.4.2
jsonschema 4.23.0
jsonschema-specifications 2024.10.1
kiwisolver 1.4.8
lark 1.2.2
latex2sympy2_extended 1.0.6
lm-format-enforcer 0.10.9
Markdown 3.7
markdown-it-py 3.0.0
MarkupSafe 2.1.5
math-verify 0.5.2
matplotlib 3.10.0
mdurl 0.1.2
mistral_common 1.5.3
modelscope 1.22.3
mpmath 1.3.0
ms-swift 3.2.0.dev0 /home/ubuntu/llama_factory_ft/ms-swift
msgpack 1.1.0
msgspec 0.19.0
multidict 6.1.0
multiprocess 0.70.16
nest-asyncio 1.6.0
networkx 3.4.2
ninja 1.11.1.3
nltk 3.9.1
numpy 1.26.4
nvidia-cublas-cu12 12.4.5.8
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.1.3
nvidia-curand-cu12 10.3.5.147
nvidia-cusolver-cu12 11.6.1.9
nvidia-cusparse-cu12 12.3.1.170
nvidia-cusparselt-cu12 0.6.2
nvidia-ml-py 12.570.86
nvidia-nccl-cu12 2.21.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.4.127
openai 1.62.0
opencensus 0.11.4
opencensus-context 0.1.3
opencv-python-headless 4.11.0.86
openpyxl 3.1.5
orjson 3.10.15
oss2 2.19.1
outlines 0.1.11
outlines_core 0.1.26
packaging 24.2
pandas 2.2.3
partial-json-parser 0.2.1.1.post5
peft 0.14.0
pillow 11.1.0
pip 25.0
platformdirs 4.3.6
pluggy 1.5.0
prometheus_client 0.21.1
prometheus-fastapi-instrumentator 7.0.2
propcache 0.2.1
proto-plus 1.26.0
protobuf 5.29.3
psutil 6.1.1
py-cpuinfo 9.0.0
py-spy 0.4.0
pyarrow 19.0.0
pyasn1 0.6.1
pyasn1_modules 0.4.1
pybind11 2.13.6
pycountry 24.6.1
pycparser 2.22
pycryptodome 3.21.0
pydantic 2.10.6
pydantic_core 2.27.2
pydub 0.25.1
Pygments 2.19.1
pyparsing 3.2.1
pytest 8.3.4
python-dateutil 2.9.0.post0
python-dotenv 1.0.1
python-multipart 0.0.20
pytz 2025.1
PyYAML 6.0.2
pyzmq 26.2.1
ray 2.42.1
referencing 0.36.2
regex 2024.11.6
requests 2.32.3
rich 13.9.4
rouge 1.0.1
rpds-py 0.22.3
rsa 4.9
ruff 0.9.6
safehttpx 0.1.6
safetensors 0.5.2
scipy 1.15.1
semantic-version 2.10.0
sentencepiece 0.2.0
setuptools 69.5.1
shellingham 1.5.4
simplejson 3.19.3
six 1.17.0
smart-open 7.1.0
sniffio 1.3.1
sortedcontainers 2.4.0
starlette 0.45.3
sympy 1.13.1
tensorboard 2.19.0
tensorboard-data-server 0.7.2
tiktoken 0.8.0
tokenizers 0.21.0
tomlkit 0.13.2
torch 2.5.1
torchaudio 2.5.1
torchvision 0.20.1
tqdm 4.67.1
transformers 4.48.3
transformers-stream-generator 0.0.5
triton 3.1.0
trl 0.15.0.dev0
typer 0.15.1
typing_extensions 4.12.2
tzdata 2025.1
urllib3 2.3.0
uvicorn 0.34.0
uvloop 0.21.0
virtualenv 20.29.2
vllm 0.6.5
watchfiles 1.0.4
websockets 14.2
Werkzeug 3.1.3
wheel 0.45.1
wrapt 1.17.2
xformers 0.0.28.post3
xgrammar 0.1.11
xxhash 3.5.0
yarl 1.18.3
zipp 3.21.0
zstandard 0.23.0

@yyyiron
Copy link
Author

yyyiron commented Feb 14, 2025

num_generations 太小了

你好,我这边按照你的方法改动已经成功,以下为双卡的例子
swift rlhf
--rlhf_type grpo
--model ./DeepSeek-R1-Distill-Qwen-1___5B
--reward_funcs accuracy format
--train_type full
--lora_rank 8
--lora_alpha 32
--target_modules all-linear
--torch_dtype bfloat16
--dataset 'AI-MO/NuminaMath-TIR#5000'
--max_completion_length 1024
--num_train_epochs 1
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
--learning_rate 1e-5
--gradient_accumulation_steps 1
--eval_steps 100
--save_steps 100
--save_total_limit 2
--logging_steps 5
--max_length 2048
--output_dir ./single_output
--warmup_ratio 0.05
--dataloader_num_workers 4
--dataset_num_proc 4
--num_generations 4
--temperature 0.9
--system '/home/ubuntu/llama_factory_ft/prompt.txt'
谢谢你的帮助

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants