-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeepSeek-R1-Distill-Qwen-7B grpo微调失败 #3114
Comments
num_generations 太小了 |
你好,请问一下过小会导致什么问题,正常应该设置为多少 |
除了上面的方法,现在我用双卡跑vllm加速推理的时候,发现显示张量没有放到一张卡上,麻烦您再帮看下 swift rlhf
|
升级一下vllm试试呢 |
以下为我的版本 absl-py 2.1.0 |
你好,我这边按照你的方法改动已经成功,以下为双卡的例子 |
Describe the bug
如题
以下为训练代码
swift rlhf
![Image](https://private-user-images.githubusercontent.com/51651728/413235109-831a7145-4afc-4387-99d6-706afaf06431.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2MzMxNTEsIm5iZiI6MTczOTYzMjg1MSwicGF0aCI6Ii81MTY1MTcyOC80MTMyMzUxMDktODMxYTcxNDUtNGFmYy00Mzg3LTk5ZDYtNzA2YWZhZjA2NDMxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDE1MjA1MVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTVmYjkwYmUyYWZlYTlhNWNkODQ3NmM3YmUwZTE0YzE4MjRmZjk4YTA3MjlhMjg4ZjUyNDFjN2YyYTg1OTNiZTcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.aI7bshQzPAX4aPS8rxrlWqxN8Y4N_zLtHX8-Cao9Z7k)
--rlhf_type grpo
--model 。/DeepSeek-R1-Distill-Qwen-7B
--reward_funcs accuracy format
--use_vllm false
--train_type full
--torch_dtype bfloat16
--dataset 'AI-MO/NuminaMath-TIR#5000'
--max_completion_length 2048
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--learning_rate 1e-6
--gradient_accumulation_steps 2
--eval_steps 20
--save_steps 20
--save_total_limit 2
--logging_steps 5
--max_length 4096
--output_dir ./output
--warmup_ratio 0.05
--dataloader_num_workers 2
--dataset_num_proc 2
--num_generations 1
--temperature 0
--max_steps 5
--system '/home/ubuntu/llama_factory_ft/prompt.txt'
以下为训练过程
以下为推理代码
CUDA_VISIBLE_DEVICES=0
swift infer
--stream False
--merge_lora False
--max_model_len 8192
--temperature 1
--max_new_tokens 2048
--seed 42
--model ./output/v2-20250214-145826/checkpoint-5 \
输出结果
微调前大小 15G
微调后大小(output/v2-20250214-145826/checkpoint-5) 43G
Your hardware and system info
GPU:2*H20
CUDA:12.4
NVIDIA版本:| NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4
感谢
The text was updated successfully, but these errors were encountered: