-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question about lora train #191
Comments
This is expected behaviour and surprised me too. Hunyuan has been hard to train loras for with good effects unless the training is done with more steps. CogVideoX is able to learn concepts in a lower amount of steps. For example, with CogVideoX, it starts to learn the Steamboat disney dataset well within 2000-3000 steps, but the same takes HunyuanVideo about 7000-8000 steps. It is most likely due to not having the optimal training hyperparameters for hunyuan yet, so would encourage more experimentation with higher steps, different hyperparameters, etc. I've tried cakify-effect training for HunyuanVideo with (5000 steps, lr 1e-5, 183 videos, resolutions 17x512x768 49x512x768 81x512x768, adamw), but the results are not very promising. After speaking with a few others that are trying to do work on this too, it seems like multi aspect-ratio and lots of examples seem to work better. The most promising run (not mine but someone from an art community) has been with first training only on images of cakes being cut, followed by videos, so would maybe try doing the same. Here is some examples from my best training run of 5000 steps:
Another thing that's worked for me is training with just images. This was discovered to be possible by someone else, so will not take any credits, but it may also lead to lower amount of motion in videos generated with the lora enabled. These are my results from a 10000 step run and 400 images of fake pokemons: output.mp4Left to right is 4000 steps, 6000 steps, 8000 steps and 10000 steps. As can be seen, until 4000 steps, the model did not learn the type of creatures I wanted to generate at all, but eventually converged. This maybe points to require learning_rate/weight_decay/optimizer tuning |
@a-r-r-o-w Would it perhaps be effective to train the lora simultaneously on a subject from images alongside a regularisation dataset, containing random video samples unrelated to it, to prevent overfitting to stills? |
Hi! |
@a-r-r-o-w |
@Symbiomatrix Yes, I do plan to add support for prior loss soon. We need to work on some data loading experience and improvements first, after which I'll address this.
Hi @cseti007, nice to see you here! Yes, loading both images and videos should be possible. The dataset format is the same, and you just need to point to the image files like you do the video files. Simple example here - you can combine both videos/images however you want as long as the metadata points to the correct files. Maybe helpful but badly written docs.
I really don't have any perfect recommendations tbh and am still exploring myself to find settings that work fast (in low number of train steps) and produce the exact effect/character I'm looking for. LR between 1e-4 to 1e-6 works best. You can try lowering weight decay to something like 1e-4 or 1e-5 too to reduce weight penalty in lora when trying to overfit it to something specific. Other than that, there's not really much to play with without a better understanding of the training dynamics of each model... You should probably use |
@a-r-r-o-w Hello, I have trained lora 10000 steps using LR 1e-5 and the black and white Mickey Mouse video dataset, but when using the prompt in the prompt file, lora has no significant effect. What is the difference between using lora and not using lora to generate videos when you refer to works best? Also, can you provide me with the prompt you generated? Thank you very much! |
Hey, have you resolved the issue? I faced a similar problem as well. I trained LoRA for 5,000 steps (batch size of 8, learning rate of 3e-5, beta values of 0.9 and 0.95, and weight decay of 1e-5) using the CogVideox-2b model on the Mickey Mouse video dataset, but the LoRA didn’t show any significant impact. |
Could you let me know which version of CogVideox you were fine-tuning and the batch size you used for that? I experimented with CogVideox-2b (5,000 steps, batch size of 8, learning rate of 3e-5, beta values of 0.9 and 0.95, and weight decay of 1e-5) and found that LoRA training didn't yield noticeable effects. Thanks in advance! |
did you try adamw8bit? it seems that loss won't decrease like using adamw when using adamw8bit. and result are bad when training the same steps as adamw @a-r-r-o-w |
System Info / 系統信息
diffuser: from source
Information / 问题信息
Reproduction / 复现过程
i have trained the lora with cogvideox and hunyuan using same dataset
Cogvideox works well, but Hunyuan has no effect at all. Is there anything I can pay attention to or adjust the parameters here
Expected behavior / 期待表现
问题得到解决
The text was updated successfully, but these errors were encountered: