-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: fixed the init_module
and deepspeed
#20175
base: master
Are you sure you want to change the base?
docs: fixed the init_module
and deepspeed
#20175
Conversation
Just as a note - I myself was able to test that this is the correct code to |
init_module
and deepspeed
@@ -79,6 +79,17 @@ When training distributed models with :doc:`FSDP/TP <model_parallel/index>` or D | |||
optimizer = torch.optim.Adam(model.parameters()) | |||
optimizer = fabric.setup_optimizers(optimizer) | |||
|
|||
With DeepSpeed Stage 3, the use of :meth:`~lightning.fabric.fabric.Fabric.init_module` context manager is necessesary for the model to be sharded correctly instead of attempted to be put on the GPU in its entirety. Deepspeed requires the models and optimizer to be set up jointly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first sentence, not wrong but it's not entirely different from FSDP and therefore the code above was already appropriate.
I think what you wanted to mention was the second sentence "Deepspeed requires the models and optimizer to be set up jointly", right? Can we do it so we don't repeat ourselves, e.g., just a comment about calling model, optimizer = fabric.setup(model, optimizer)
for DeepSpeed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally. i wrote that sentence by rewriting your words from an issue linked above, but i agree that in general it applies to fsdp as well.
just to be clear i understood you - old block of code with a text comment re: deepspeed below? or still two blocks of code? or one but with two ways (one commented out?)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one code block if possible, with a code comment or text comment outside the block would be perfect. Great idea thanks
What does this PR do?
Fixes the
init_module
docs that suggest to do the following:However, setting up the optimizer and the model separately breaks with DeepSpeed because of the following:
pytorch-lightning/src/lightning/fabric/fabric.py
Lines 1031 to 1037 in cf24a19
Changed the docs to reflect the correct syntax.
Furthermore, discussed the necessity to use this with DeepSpeed Stage 3 as is stated here:
#17792 (comment)
I only discussed DeepSpeed Stage 3 as I cannot find a statement whether this needs to be done for Stage 2 to work correctly. From personal experience - this led to an improvement by allowing a slightly larger batch (2 -> 4) for me. Personally confused as to why this helps since Stage 2 is not sharding the parameters, but my understanding of these strategies is highly limited. Potentially @awaelchli or someone else could enlighten me if/why it does? Happy to change to include Stage 2 (or also 1?).Only discussed for Stage 3 as that is where this is necessary. As an update - I believe i traced down what improved my performance to a larger batch and it wasn't the inclusion of the init module.
Just as a note - I read the Docs editing README and that building locally is required. Had some local device issues with building them (my local device issues) but I triple checked that it follows the .rst format, and this is a simple change.
Fixes: No issue as this a (likely straightforward) documentation fix.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Reviewer checklist
Did you have fun?
Make sure you had fun coding 🙃
[x] Yes!
Debugging why DeepSpeed wasn't working wasn't as fun though. Potentially could benefit from an article in docs, even though this is an experimental feature in fabric?
📚 Documentation preview 📚: https://pytorch-lightning--20175.org.readthedocs.build/en/20175/