[Feature] Support Distributed LogProb for GRPO Training #6247

duanjunwen · 2025-03-14T10:42:21Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs
I have installed pre-commit: pip install pre-commit && pre-commit install

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

TongLi3701

Thanks Junwen, I left some comments.

applications/ColossalChat/coati/distributed/consumer.py

applications/ColossalChat/coati/distributed/grpo_consumer.py

applications/ColossalChat/coati/distributed/utils.py

colossalai/shardformer/layer/loss.py

colossalai/shardformer/policies/qwen2.py

tests/test_shardformer/test_layer/test_dist_log_prob.py

TongLi3701 · 2025-03-17T09:58:22Z

Please also compare the peak memory when using these two different methods.

duanjunwen · 2025-03-18T01:27:33Z

Please also compare the peak memory when using these two different methods.

We compared the case where the parallel output is True or False under strategy tp2zero1.
When parallel output is False, the peak mem is 93000+ MB (may suffer OOM when other users grab resources).
When parallel output is True, the peak mem is around 82000 MB.

TongLi3701

Thanks Junwen, I left some comments. Please address the comments and click on resolve conversation then request review again.

Thanks.

applications/ColossalChat/coati/distributed/utils.py

…rpo-latest

duanjunwen · 2025-03-18T03:56:17Z

Resolve Conflict.

applications/ColossalChat/coati/distributed/consumer.py

colossalai/shardformer/layer/loss.py

duanjunwen and others added 4 commits March 13, 2025 13:24

[fix] fix qwen VocabParallelLMHead1D and gather output

03ce3c5

fix tp bug

b835d1b

fix consumer

137ec17

[feat] Support Distributed LogProb for GRPO Training

ce8a8b3

duanjunwen requested a review from a team as a code owner March 14, 2025 10:42

duanjunwen requested a review from TongLi3701 March 17, 2025 01:25

duanjunwen added 5 commits March 17, 2025 10:57

Merge branch 'hpcaitech:grpo-latest' into grpo-latest

7b3c310

[fix] fix loss func

a810b20

[fix] fix log prob plugin

c247bd8

[fix] fix qwen modeling param

b78ab3a

[fix] rm comments

dddd062

TongLi3701 requested changes Mar 17, 2025

View reviewed changes

duanjunwen added 2 commits March 17, 2025 18:09

[fix] rm hard-code;fix non-dist version

74de49d

[fix] fix test file param name and benchmark tp gather output=True/False

188d69d

duanjunwen requested a review from TongLi3701 March 17, 2025 10:39

duanjunwen added 2 commits March 18, 2025 09:34

[fix] rm non-dist version in dist log prob

01bcaca

[fix] fix comments

0277592

TongLi3701 reviewed Mar 18, 2025

View reviewed changes

applications/ColossalChat/coati/distributed/utils.py Outdated Show resolved Hide resolved

duanjunwen added 5 commits March 18, 2025 11:34

[fix] fix dis log prob plugin

3a8a387

[fix] fix test case

d29f39d

[fix] fix qwen VocabParallelLMHead1D and gather output

dcf3f9b

Merge branch 'grpo-latest' of github.com:duanjunwen/ColossalAI into g…

d90bf57

…rpo-latest

Merge branch 'grpo-latest' into grpo-dist-loss

8615b24

duanjunwen requested a review from TongLi3701 March 18, 2025 03:55

TongLi3701 reviewed Mar 18, 2025

View reviewed changes

applications/ColossalChat/coati/distributed/consumer.py Outdated Show resolved Hide resolved

colossalai/shardformer/layer/loss.py Outdated Show resolved Hide resolved

duanjunwen added 2 commits March 18, 2025 16:16

[fix] fix DistLogProb comments

0ebeebc

[fix] restore tp size

1a7cc25

duanjunwen requested a review from TongLi3701 March 18, 2025 08:22

[fix] fix comments

7e2f058

TongLi3701 approved these changes Mar 18, 2025

View reviewed changes

colossalai/shardformer/layer/loss.py Outdated Show resolved Hide resolved

colossalai/shardformer/layer/loss.py Outdated Show resolved Hide resolved

[fix] fix comment; fix LogSoftmax usage

f381cea

duanjunwen requested a review from TongLi3701 March 18, 2025 09:32

TongLi3701 approved these changes Mar 18, 2025

View reviewed changes

TongLi3701 merged commit 7795d4c into hpcaitech:grpo-latest Mar 18, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support Distributed LogProb for GRPO Training #6247

[Feature] Support Distributed LogProb for GRPO Training #6247

duanjunwen commented Mar 14, 2025

TongLi3701 left a comment

TongLi3701 commented Mar 17, 2025

duanjunwen commented Mar 18, 2025 •

edited

Loading

TongLi3701 left a comment •

edited

Loading

duanjunwen commented Mar 18, 2025

[Feature] Support Distributed LogProb for GRPO Training #6247

[Feature] Support Distributed LogProb for GRPO Training #6247

Conversation

duanjunwen commented Mar 14, 2025

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

TongLi3701 left a comment

Choose a reason for hiding this comment

TongLi3701 commented Mar 17, 2025

duanjunwen commented Mar 18, 2025 • edited Loading

TongLi3701 left a comment • edited Loading

Choose a reason for hiding this comment

duanjunwen commented Mar 18, 2025

duanjunwen commented Mar 18, 2025 •

edited

Loading

TongLi3701 left a comment •

edited

Loading