Difficult to converge when training hundreds / thousands of videos #265

xinchengshuai · 2025-02-17T06:51:43Z

I tried to fine-tune LTX (I2V) on my custom dataset, but I found it difficult to converge. Specifically, I set the resolution and frame rate to 512 * 512 * 9. I found that when there are only a few dozen videos, the data can converge in a few hundred iterations. But when training on 500 or more videos, it was found that it's hard to converge even after 20000 iterations. Has anyone encountered a similar problem before.

a-r-r-o-w · 2025-02-19T00:05:13Z

From my experiments, I would suggest using multi-resolution data to avoid model collapse. LTX was trained with a variety of different frame/height/width, so trying to focus it on one specific resolution might require more training steps to get right.

Also, the current implementation of LTX Video does not account for first-frame conditioning (an essential part of the training algorithm as mentioned in the LTX paper). I've added that in #245, but that PR is not yet ready

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difficult to converge when training hundreds / thousands of videos #265

Difficult to converge when training hundreds / thousands of videos #265

xinchengshuai commented Feb 17, 2025

a-r-r-o-w commented Feb 19, 2025

Difficult to converge when training hundreds / thousands of videos #265

Difficult to converge when training hundreds / thousands of videos #265

Comments

xinchengshuai commented Feb 17, 2025

a-r-r-o-w commented Feb 19, 2025