add DeepSpeed tensor parallel initialization. #36114

inkcherry · 2025-02-10T11:20:24Z

Initialization for DeepSpeed tensor parallel+zero optimizer, which needs to be done before the optimizer initializes the model weights. The related APIs are already supported in DeepSpeed. deepspeedai/DeepSpeed#6922 @muellerzr

cc DeepSpeed team: @tjruwase
move to #36825

muellerzr

Thanks, cc @ArthurZucker

HuggingFaceDocBuilderDev · 2025-02-28T14:55:33Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

inkcherry · 2025-03-01T04:56:07Z

thanks for the review! @muellerzr
accelerate includes modifications related to the dataloader and saving and works together with this feature.
huggingface/accelerate#3390

inkcherry · 2025-03-14T01:50:23Z

thanks for the review. @muellerzr! The related PR has already been merged: huggingface/accelerate#3390. Is this PR ready for merge? Thanks!

inkcherry · 2025-03-19T07:49:34Z

sorry to bother you again and I hope this doesn't disturb you : ) the doc deepspeedai/DeepSpeed#7151 is ready but pending with this pr, I would appreciate it if this PR can be merged. Thank you very much!
FYI @muellerzr @ArthurZucker @shethaadit @tjruwase

abhilash1910

LGTM ! Is the CI stalling @muellerzr ?

inkcherry · 2025-03-19T11:13:29Z

LGTM ! Is the CI stalling @muellerzr ?

Not quite sure, it's been hanging on Checking for the ability to merge automatically... for several hours.

It's strange. Not sure if it's due to the CI or some GitHub service issue (https://www.githubstatus.com/incidents/lg4s05t6ttxb). Force pushing to refresh also results in it hanging ):

move to #36825 fix this hang issue.

github-actions · 2025-03-19T14:23:05Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

ArthurZucker

Happy to merge let's make sure CIs work

muellerzr · 2025-03-20T14:21:02Z

@inkcherry I think we can close this one in favor of #36825 yes?

inkcherry · 2025-03-20T14:30:33Z

@inkcherry I think we can close this one in favor of #36825 yes?

yes, still hang now, let me close this one.

inkcherry added 3 commits February 10, 2025 11:15

add ds tp change

63c6e9a

Merge remote-tracking branch 'origin/main' into my_tp_upstream4

da29d9c

Merge branch 'main' into ds_tp_upstream2

93591f4

inkcherry mentioned this pull request Feb 24, 2025

Autotp training deepspeedai/DeepSpeed#6922

Closed

muellerzr approved these changes Feb 28, 2025

View reviewed changes

muellerzr requested a review from ArthurZucker February 28, 2025 14:29

shethaadit approved these changes Mar 3, 2025

View reviewed changes

Merge branch 'main' into ds_tp_upstream2

24f1be4

inkcherry force-pushed the ds_tp_upstream2 branch 2 times, most recently from d3c2d1c to 24f1be4 Compare March 19, 2025 09:43

Merge remote-tracking branch 'origin/main' into ds_tp_upstream2

1b70a70

abhilash1910 approved these changes Mar 19, 2025

View reviewed changes

inkcherry closed this Mar 19, 2025

inkcherry reopened this Mar 19, 2025

github-actions bot marked this pull request as draft March 19, 2025 14:23

inkcherry mentioned this pull request Mar 19, 2025

DeepSpeed tensor parallel+ZeRO #36825

Merged

ArthurZucker marked this pull request as ready for review March 20, 2025 10:50

ArthurZucker approved these changes Mar 20, 2025

View reviewed changes

inkcherry closed this Mar 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add DeepSpeed tensor parallel initialization. #36114

add DeepSpeed tensor parallel initialization. #36114

inkcherry commented Feb 10, 2025 •

edited

Loading

muellerzr left a comment

HuggingFaceDocBuilderDev commented Feb 28, 2025

inkcherry commented Mar 1, 2025

inkcherry commented Mar 14, 2025

inkcherry commented Mar 19, 2025 •

edited

Loading

abhilash1910 left a comment

inkcherry commented Mar 19, 2025 •

edited

Loading

github-actions bot commented Mar 19, 2025

ArthurZucker left a comment

muellerzr commented Mar 20, 2025

inkcherry commented Mar 20, 2025

add DeepSpeed tensor parallel initialization. #36114

add DeepSpeed tensor parallel initialization. #36114

Conversation

inkcherry commented Feb 10, 2025 • edited Loading

muellerzr left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Feb 28, 2025

inkcherry commented Mar 1, 2025

inkcherry commented Mar 14, 2025

inkcherry commented Mar 19, 2025 • edited Loading

abhilash1910 left a comment

Choose a reason for hiding this comment

inkcherry commented Mar 19, 2025 • edited Loading

github-actions bot commented Mar 19, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment

muellerzr commented Mar 20, 2025

inkcherry commented Mar 20, 2025

inkcherry commented Feb 10, 2025 •

edited

Loading

inkcherry commented Mar 19, 2025 •

edited

Loading

inkcherry commented Mar 19, 2025 •

edited

Loading