How to save the best performing checkpoint during LoRA fine-tuning on Hunyuan Video? #267

dingangui · 2025-02-19T07:49:11Z

In the HunyuanVideo training scripts, we can save checkpoints every 500 steps by passing --checkpointing_steps 500. The final model is saved through the following code:

if accelerator.is_main_process:
    transformer = unwrap_model(accelerator, self.transformer)

    if self.args.training_type == "lora":
        transformer_lora_layers = get_peft_model_state_dict(transformer)

        self.model_config["pipeline_cls"].save_lora_weights(
            save_directory=self.args.output_dir,
            transformer_lora_layers=transformer_lora_layers,
        )
    else:
        transformer.save_pretrained(os.path.join(self.args.output_dir, "transformer"))

(Reference: https://github.com/a-r-r-o-w/finetrainers/blob/4bb10c62324aef4fbac85bb381acb9f6f39a5076/finetrainers/trainer.py#L837C1-L848C95)

My question is: How can I ensure that I save the best performing model during LoRA fine-tuning? The final saved model might not be the best, as the loss could fluctuate during training. The same applies to intermediate checkpoints. Is there a recommended approach for tracking and saving the best-performing model?

The text was updated successfully, but these errors were encountered:

neph1 · 2025-02-19T18:23:30Z

You test them yourself and see. Or you use the validation videos if you're fortunate enough to be able to generate them. You look for the best convergence with your concept on a range of subjects.
I haven't been able to overfit so far, but I'm only on a measly 3090.

dingangui · 2025-02-20T01:41:12Z

You test them yourself and see. Or you use the validation videos if you're fortunate enough to be able to generate them. You look for the best convergence with your concept on a range of subjects. I haven't been able to overfit so far, but I'm only on a measly 3090.

Thanks for your reply, I'm doing exactly what you said, I checked the validation videos and found that results from the later checkpoints sometimes perform worse than the earlier ones. So I’m worried that the best model might not be the latest one.

BlackTea-c · 2025-02-20T08:42:40Z

can you set the validation args?--validation_prompt "BW_STYLE A black and white animated scene unfolds with an anthropomorphic goat surrounded by musical notes and symbols, suggesting a playful environment. Mickey Mouse appears, leaning forward in curiosity as the goat remains still. The goat then engages with Mickey, who bends down to converse or react. The dynamics shift as Mickey grabs the goat, potentially in surprise or playfulness, amidst a minimalistic background. The scene captures the evolving relationship between the two characters in a whimsical, animated setting, emphasizing their interactions and emotions:::BW_STYLE A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance"
--validation_images "/path/to/image1.png:::/path/to/image2.png"
--validation_prompt_separator :::
--num_validation_videos 1
--validation_epochs 10 \

dingangui · 2025-02-21T01:39:00Z

@BlackTea-c yes, I already set the validation args, I checked the validation video results, sometimes the videos quality from the latest checkpoint looks worse than the old one. That's why I want to know how to save the best checkpoints.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to save the best performing checkpoint during LoRA fine-tuning on Hunyuan Video? #267

How to save the best performing checkpoint during LoRA fine-tuning on Hunyuan Video? #267

dingangui commented Feb 19, 2025

neph1 commented Feb 19, 2025

dingangui commented Feb 20, 2025

BlackTea-c commented Feb 20, 2025

dingangui commented Feb 21, 2025 •

edited

Loading

How to save the best performing checkpoint during LoRA fine-tuning on Hunyuan Video? #267

How to save the best performing checkpoint during LoRA fine-tuning on Hunyuan Video? #267

Comments

dingangui commented Feb 19, 2025

neph1 commented Feb 19, 2025

dingangui commented Feb 20, 2025

BlackTea-c commented Feb 20, 2025

dingangui commented Feb 21, 2025 • edited Loading

dingangui commented Feb 21, 2025 •

edited

Loading