Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficult to converge when training hundreds / thousands of videos #265

Open
xinchengshuai opened this issue Feb 17, 2025 · 1 comment
Open

Comments

@xinchengshuai
Copy link

I tried to fine-tune LTX (I2V) on my custom dataset, but I found it difficult to converge. Specifically, I set the resolution and frame rate to 512 * 512 * 9. I found that when there are only a few dozen videos, the data can converge in a few hundred iterations. But when training on 500 or more videos, it was found that it's hard to converge even after 20000 iterations. Has anyone encountered a similar problem before.

@a-r-r-o-w
Copy link
Owner

From my experiments, I would suggest using multi-resolution data to avoid model collapse. LTX was trained with a variety of different frame/height/width, so trying to focus it on one specific resolution might require more training steps to get right.

Also, the current implementation of LTX Video does not account for first-frame conditioning (an essential part of the training algorithm as mentioned in the LTX paper). I've added that in #245, but that PR is not yet ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants