You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Due to limited GPU resources (8*V100 32g), I try to use deepspeed zero-3 (deepspeed==0.9.5) to train the 13B model, but I encounter some difficulties when loading the model.
When running the command like
[2024-04-25 04:12:22,346] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-25 04:12:24,796] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0]}
[2024-04-25 04:12:24,796] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=1, node_rank=0
[2024-04-25 04:12:24,796] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2024-04-25 04:12:24,796] [INFO] [launch.py:163:main] dist_world_size=1
[2024-04-25 04:12:24,797] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0
[2024-04-25 04:12:27,946] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-25 04:12:28,765] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2024-04-25 04:12:28,765] [INFO] [comm.py:594:init_distributed] cdb=None
[2024-04-25 04:12:28,765] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
warnings.warn("Initializing zero-element tensors is a no-op")
[2024-04-25 04:12:30,365] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 12.92B parameters
Traceback (most recent call last):
File "/root/data/yflu/muffin/./muffin/train/debug.py", line 22, in <module>
load()
File "/root/data/yflu/muffin/./muffin/train/debug.py", line 16, in load
model = Beit3LlavaLlamaForCausalLM.from_pretrained(
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2959, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 385, in wrapper
f(module, *args, **kwargs)
File "/root/data/yflu/muffin/muffin/model/muffin.py", line 311, in __init__
self.model = Beit3LlavaLlamaModel(config, mm_vision_tower=mm_vision_tower)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 385, in wrapper
f(module, *args, **kwargs)
File "/root/data/yflu/muffin/muffin/model/muffin.py", line 153, in __init__
self.vision_tower = timm.create_model(mm_vision_tower)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/timm/models/factory.py", line 81, in create_model
model = create_fn(pretrained=pretrained, **kwargs)
File "/root/data/yflu/muffin/muffin/model/beit3.py", line 135, in beit3_large_patch16_672
model = BEiT3Wrapper(args, **kwargs)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 385, in wrapper
f(module, *args, **kwargs)
File "/root/data/yflu/muffin/muffin/model/beit3.py", line 51, in __init__
self.beit3 = BEiT3(args)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 385, in wrapper
f(module, *args, **kwargs)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torchscale/model/BEiT3.py", line 40, in __init__
self.encoder = Encoder(
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 385, in wrapper
f(module, *args, **kwargs)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torchscale/architecture/encoder.py", line 209, in __init__
self.build_encoder_layer(
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torchscale/architecture/encoder.py", line 296, in build_encoder_layer
layer = EncoderLayer(
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 385, in wrapper
f(module, *args, **kwargs)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torchscale/architecture/encoder.py", line 30, in __init__
self.self_attn = self.build_self_attention(self.embed_dim, args)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torchscale/architecture/encoder.py", line 103, in build_self_attention
return MultiheadAttention(
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 385, in wrapper
f(module, *args, **kwargs)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torchscale/component/multihead_attention.py", line 40, in __init__
self.k_proj = MultiwayWrapper(args, nn.Linear(embed_dim, embed_dim, bias=True))
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torchscale/component/multiway_network.py", line 12, in MultiwayWrapper
return MultiwayNetwork(module, dim=dim)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 385, in wrapper
f(module, *args, **kwargs)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torchscale/component/multiway_network.py", line 30, in __init__
self.B.reset_parameters()
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 109, in reset_parameters
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
File "/root/anaconda3/envs/muffin/lib/python3.10/site-packages/torch/nn/init.py", line 287, in _calculate_fan_in_and_fan_out
raise ValueError("Fan in and fan out can not be computed for tensor with fewer than 2 dimensions")
ValueError: Fan in and fan out can not be computed for tensor with fewer than 2 dimensions
It seems that multihead attention module BEIT3 cannot be initialized, the weight of linear layer self.B in module MultiwayNetwork is empty.
The python script is listed below
import torch
from muffin import Beit3LlavaLlamaForCausalLM
import transformers
from typing import Optional
from dataclasses import dataclass, field
@dataclass
class ModelArguments:
model_name_or_path: Optional[str] = field(default="facebook/opt-125m")
def load():
parser = transformers.HfArgumentParser(
(ModelArguments, transformers.TrainingArguments))
model_args,training_args = parser.parse_args_into_dataclasses()
model = Beit3LlavaLlamaForCausalLM.from_pretrained(
model_args.model_name_or_path,
torch_dtype=torch.float16
)
if __name__ == "__main__":
load()
Due to limited GPU resources (8*V100 32g), I try to use deepspeed zero-3 (deepspeed==0.9.5) to train the 13B model, but I encounter some difficulties when loading the model.
When running the command like
to just load the model, an error will be raised
It seems that multihead attention module BEIT3 cannot be initialized, the weight of linear layer
self.B
in moduleMultiwayNetwork
is empty.The python script is listed below
Deepspeed configuration
I would be very appreciated if this problem can be solved. I am also hoping that the authors can provide support of deepspeed training.
Thanks a lot!
The text was updated successfully, but these errors were encountered: