Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Why does lora fine-tuning take longer than full parameter fine-tuning? #254

Open
WangRongsheng opened this issue Mar 10, 2025 · 0 comments

Comments

@WangRongsheng
Copy link

Environment

machine: 4*A800 (80GiB)

train scripts:

export WANDB_BASE_URL="https://api.wandb.ai"
export WANDB_MODE=online
torchrun --nnodes 1 --nproc_per_node 4 --master_port 29903 \
    fastvideo/train.py \
    --seed 1024 \
    --pretrained_model_name_or_path /sds_wangby/models/hunyuan_diffusers \
    --model_type hunyuan_hf \
    --cache_dir data/.cache \
    --data_json_path /sds_wangby/models/cjy/med_vid/code-wrs/dataset/Image-Vid-Finetune-HunYuan/videos2caption.json \
    --validation_prompt_dir /sds_wangby/models/cjy/med_vid/code-wrs/dataset/Image-Vid-Finetune-HunYuan/validation \
    --gradient_checkpointing \
    --train_batch_size 16 \
    --num_latent_t 24 \
    --sp_size 4 \
    --train_sp_batch_size 1 \
    --dataloader_num_workers 4 \
    --gradient_accumulation_steps 16 \
    --max_train_steps 8000 \
    --learning_rate 8e-5 \
    --mixed_precision bf16 \
    --checkpointing_steps 500 \
    --validation_steps 100 \
    --validation_sampling_steps 50 \
    --checkpoints_total_limit 3 \
    --allow_tf32 \
    --ema_start_step 0 \
    --cfg 0.0 \
    --ema_decay 0.999 \
    --log_validation \
    --output_dir data/outputs/Finetune-Hunyuan-lora \
    --tracker_project_name Finetune-Hunyuan-lora \
    --num_frames 93 \
    --validation_guidance_scale "1.0" \
    --shift 7 \
    --use_lora \
    --lora_rank 32 \
    --lora_alpha 32 

Describe the bug

We find that using lora fine-tuning takes longer than full parameter fine-tuning and doesn't speed up any training when I tweak --train_batch_size as well as --gradient_accumulation_steps?

When I adjust --train_batch_size as well as --gradient_accumulation_steps, the memory is always stable, when I adjust --train_sp_batch_size, the memory is increased, but the training time becomes longer.

Image

Reproduction

None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant