[Bug]Add sequence_parallel in layernorm init to enable 3D parallelism with DeepSpeed for non CUDA device. #468

ys950902 · 2025-02-28T06:32:58Z

When you running on non-CUDA device, for 3D parallelism with DeepSpeed you will got this error, can see below:
[rank19]: File "/home/yisheng/anaconda3/envs/llm_pt_25/lib/python3.10/site-packages/deepspeed/runtime/pipe/module.py", line 214, in init
[rank19]: self._build()
[rank19]: File "/home/yisheng/anaconda3/envs/llm_pt_25/lib/python3.10/site-packages/deepspeed/runtime/pipe/module.py", line 270, in _build
[rank19]: module = layer.build()
[rank19]: File "/home/yisheng/anaconda3/envs/llm_pt_25/lib/python3.10/site-packages/deepspeed/runtime/pipe/module.py", line 74, in build
[rank19]: return self.typename(*self.module_args, **self.module_kwargs)
[rank19]: TypeError: LayerNorm.init() got an unexpected keyword argument 'sequence_parallel'

cause for Megatron-DeepSpeed, sequence_parallel is added in Megatron-DeepSpeed for layernorm, for current implementation, non-CUDA device is using from torch.nn import LayerNorm for layernorm, there is no attr named sequence_parallel, will cause init error for non-CUDA device.

…run successfully with DeepSpeed Signed-off-by: yisheng <[email protected]>

ys950902 · 2025-02-28T07:33:09Z

Hi @tjruwase, I think we have talked about this question before,
1.It is quite subtle since it does not show the connection to sequence-parallelism
Cause for Megatron-DeepSpeed, the sequence_parallel is added, can see below
https://github.com/deepspeedai/Megatron-DeepSpeed/blob/main/megatron/model/gpt_model.py#L406
and when you running 3D parallelism+deepspeed the keyword argument 'sequence_parallel' will be checked, if not added on non-CUDA device it will cause error.
2.It is unclear to me that new LayerNorm is equivalent to torch.nn.LayerNorm for non sequence-parallel case. Maintaining parity with torch.nn.LayerNorm imposes extra development burden.
It is the same, you can see in fused_layer_norm that cuda used, if not using fuesd kernel, is the same
http://github.com/deepspeedai/Megatron-DeepSpeed/blob/main/megatron/model/fused_layer_norm.py#L96

ys950902 requested review from jeffra, tjruwase and GuanhuaWang as code owners February 28, 2025 06:32

add sequence_parallel in layernorm init to enable 3D parallelism can …

f099692

…run successfully with DeepSpeed Signed-off-by: yisheng <[email protected]>

ys950902 force-pushed the layernorm_init branch from 44619fa to f099692 Compare February 28, 2025 06:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]Add sequence_parallel in layernorm init to enable 3D parallelism with DeepSpeed for non CUDA device. #468

[Bug]Add sequence_parallel in layernorm init to enable 3D parallelism with DeepSpeed for non CUDA device. #468

ys950902 commented Feb 28, 2025

ys950902 commented Feb 28, 2025

[Bug]Add sequence_parallel in layernorm init to enable 3D parallelism with DeepSpeed for non CUDA device. #468

Are you sure you want to change the base?

[Bug]Add sequence_parallel in layernorm init to enable 3D parallelism with DeepSpeed for non CUDA device. #468

Conversation

ys950902 commented Feb 28, 2025

ys950902 commented Feb 28, 2025