docs: fixed the `init_module` and deepspeed #20175

alyakin314 · 2024-08-07T06:43:10Z

What does this PR do?

Fixes the init_module docs that suggest to do the following:

# Recommended for FSDP, TP and DeepSpeed
with fabric.init_module(empty_init=True):
    model = GPT3()  # parameters are placed on the meta-device

model = fabric.setup(model)  # parameters get sharded and initialized at once

# Make sure to create the optimizer only after the model has been set up
optimizer = torch.optim.Adam(model.parameters())
optimizer = fabric.setup_optimizers(optimizer)

However, setting up the optimizer and the model separately breaks with DeepSpeed because of the following:

pytorch-lightning/src/lightning/fabric/fabric.py

Lines 1031 to 1037 in cf24a19

    
           def _validate_setup_optimizers(self, optimizers: Sequence[Optimizer]) -> None: 
        
               self._validate_launched() 
        
               if isinstance(self._strategy, (DeepSpeedStrategy, XLAStrategy)): 
        
                   raise RuntimeError( 
        
                       f"The `{type(self._strategy).__name__}` requires the model and optimizer(s) to be set up jointly" 
        
                       " through `.setup(model, optimizer, ...)`." 
        
                   )

Changed the docs to reflect the correct syntax.

Furthermore, discussed the necessity to use this with DeepSpeed Stage 3 as is stated here:
#17792 (comment)

I only discussed DeepSpeed Stage 3 as I cannot find a statement whether this needs to be done for Stage 2 to work correctly. From personal experience - this led to an improvement by allowing a slightly larger batch (2 -> 4) for me. Personally confused as to why this helps since Stage 2 is not sharding the parameters, but my understanding of these strategies is highly limited. Potentially @awaelchli or someone else could enlighten me if/why it does? Happy to change to include Stage 2 (or also 1?).

Only discussed for Stage 3 as that is where this is necessary. As an update - I believe i traced down what improved my performance to a larger batch and it wasn't the inclusion of the init module.

Just as a note - I read the Docs editing README and that building locally is required. Had some local device issues with building them (my local device issues) but I triple checked that it follows the .rst format, and this is a simple change.

Fixes: No issue as this a (likely straightforward) documentation fix.

Before submitting

Was this discussed/agreed via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

[x] Yes!
Debugging why DeepSpeed wasn't working wasn't as fun though. Potentially could benefit from an article in docs, even though this is an experimental feature in fabric?

📚 Documentation preview 📚: https://pytorch-lightning--20175.org.readthedocs.build/en/20175/

alyakin314 · 2024-08-07T07:23:33Z

Just as a note - I myself was able to test that this is the correct code to .setup with Stage 2 only ~~(where it helped)~~. I cannot run Stage 3 due to the old ~~non-deepspeed~~ non-fabric checkpoints issue.

awaelchli · 2024-08-07T14:50:32Z

docs/source-fabric/advanced/model_init.rst

@@ -79,6 +79,17 @@ When training distributed models with :doc:`FSDP/TP <model_parallel/index>` or D
    optimizer = torch.optim.Adam(model.parameters())
    optimizer = fabric.setup_optimizers(optimizer)

+With DeepSpeed Stage 3, the use of :meth:`~lightning.fabric.fabric.Fabric.init_module` context manager is necessesary for the model to be sharded correctly instead of attempted to be put on the GPU in its entirety. Deepspeed requires the models and optimizer to be set up jointly.


The first sentence, not wrong but it's not entirely different from FSDP and therefore the code above was already appropriate.

I think what you wanted to mention was the second sentence "Deepspeed requires the models and optimizer to be set up jointly", right? Can we do it so we don't repeat ourselves, e.g., just a comment about calling model, optimizer = fabric.setup(model, optimizer) for DeepSpeed?

totally. i wrote that sentence by rewriting your words from an issue linked above, but i agree that in general it applies to fsdp as well.

just to be clear i understood you - old block of code with a text comment re: deepspeed below? or still two blocks of code? or one but with two ways (one commented out?)?

I think one code block if possible, with a code comment or text comment outside the block would be perfect. Great idea thanks

fixed the init_module and deepspeed docs

fd28011

alyakin314 requested review from lantiga, Borda and awaelchli as code owners August 7, 2024 06:43

github-actions bot added docs Documentation related fabric lightning.fabric.Fabric labels Aug 7, 2024

alyakin314 added 2 commits August 7, 2024 02:53

extra newline removal

fe53b0c

module_init sentence phrasing

30e414b

alyakin314 marked this pull request as draft August 7, 2024 06:57

alyakin314 marked this pull request as ready for review August 7, 2024 06:59

Borda changed the title ~~fixed the init_module and deepspeed docs~~ docs: fixed the init_module and deepspeed Aug 7, 2024

Borda approved these changes Aug 7, 2024

View reviewed changes

awaelchli reviewed Aug 7, 2024

View reviewed changes

Merge branch 'master' into init_module_deepspeed_docs_fix

4f9f36c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: fixed the `init_module` and deepspeed #20175

docs: fixed the `init_module` and deepspeed #20175

alyakin314 commented Aug 7, 2024 •

edited

Loading

alyakin314 commented Aug 7, 2024 •

edited

Loading

awaelchli Aug 7, 2024

alyakin314 Aug 8, 2024

awaelchli Aug 8, 2024

	def _validate_setup_optimizers(self, optimizers: Sequence[Optimizer]) -> None:
	self._validate_launched()
	if isinstance(self._strategy, (DeepSpeedStrategy, XLAStrategy)):
	raise RuntimeError(
	f"The `{type(self._strategy).__name__}` requires the model and optimizer(s) to be set up jointly"
	" through `.setup(model, optimizer, ...)`."
	)

docs: fixed the init_module and deepspeed #20175

Are you sure you want to change the base?

docs: fixed the init_module and deepspeed #20175

Conversation

alyakin314 commented Aug 7, 2024 • edited Loading

What does this PR do?

PR review

Did you have fun?

alyakin314 commented Aug 7, 2024 • edited Loading

awaelchli Aug 7, 2024

Choose a reason for hiding this comment

alyakin314 Aug 8, 2024

Choose a reason for hiding this comment

awaelchli Aug 8, 2024

Choose a reason for hiding this comment

docs: fixed the `init_module` and deepspeed #20175

docs: fixed the `init_module` and deepspeed #20175

alyakin314 commented Aug 7, 2024 •

edited

Loading

alyakin314 commented Aug 7, 2024 •

edited

Loading