Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to save the best performing checkpoint during LoRA fine-tuning on Hunyuan Video? #267

Open
dingangui opened this issue Feb 19, 2025 · 4 comments

Comments

@dingangui
Copy link

In the HunyuanVideo training scripts, we can save checkpoints every 500 steps by passing --checkpointing_steps 500. The final model is saved through the following code:

if accelerator.is_main_process:
    transformer = unwrap_model(accelerator, self.transformer)

    if self.args.training_type == "lora":
        transformer_lora_layers = get_peft_model_state_dict(transformer)

        self.model_config["pipeline_cls"].save_lora_weights(
            save_directory=self.args.output_dir,
            transformer_lora_layers=transformer_lora_layers,
        )
    else:
        transformer.save_pretrained(os.path.join(self.args.output_dir, "transformer"))

(Reference: https://github.com/a-r-r-o-w/finetrainers/blob/4bb10c62324aef4fbac85bb381acb9f6f39a5076/finetrainers/trainer.py#L837C1-L848C95)

My question is: How can I ensure that I save the best performing model during LoRA fine-tuning? The final saved model might not be the best, as the loss could fluctuate during training. The same applies to intermediate checkpoints. Is there a recommended approach for tracking and saving the best-performing model?

@neph1
Copy link

neph1 commented Feb 19, 2025

You test them yourself and see. Or you use the validation videos if you're fortunate enough to be able to generate them. You look for the best convergence with your concept on a range of subjects.
I haven't been able to overfit so far, but I'm only on a measly 3090.

@dingangui
Copy link
Author

You test them yourself and see. Or you use the validation videos if you're fortunate enough to be able to generate them. You look for the best convergence with your concept on a range of subjects. I haven't been able to overfit so far, but I'm only on a measly 3090.

Thanks for your reply, I'm doing exactly what you said, I checked the validation videos and found that results from the later checkpoints sometimes perform worse than the earlier ones. So I’m worried that the best model might not be the latest one.

@BlackTea-c
Copy link

can you set the validation args?--validation_prompt "BW_STYLE A black and white animated scene unfolds with an anthropomorphic goat surrounded by musical notes and symbols, suggesting a playful environment. Mickey Mouse appears, leaning forward in curiosity as the goat remains still. The goat then engages with Mickey, who bends down to converse or react. The dynamics shift as Mickey grabs the goat, potentially in surprise or playfulness, amidst a minimalistic background. The scene captures the evolving relationship between the two characters in a whimsical, animated setting, emphasizing their interactions and emotions:::BW_STYLE A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance"
--validation_images "/path/to/image1.png:::/path/to/image2.png"
--validation_prompt_separator :::
--num_validation_videos 1
--validation_epochs 10 \

@dingangui
Copy link
Author

dingangui commented Feb 21, 2025

@BlackTea-c yes, I already set the validation args, I checked the validation video results, sometimes the videos quality from the latest checkpoint looks worse than the old one. That's why I want to know how to save the best checkpoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants