-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory using Validation #295
Comments
For information, while adding
|
Another point about the new pre-processing. In the
The problem is that I have a large number of different durations in the corpus I use:
So 25 various duration, some like 257 frames is common to 4 videos. If I use the rule indicated in the comment, for my 36 videos corpus with 25 durations type, I need to put the The problem is that the computation is intensive and very long (a lot more than previous code). For example, I just put the
The s/it varying depending of the step ofcourse, but here this is clearly a bootleneck. I understand that precomputation can be slow, but it can be good to find a solution to fix this step to do not have to precompute more than needed. What do you think about that point? EDIT: Pretty sure the s/it slowdown is due to memory swap during the precomputation. I think it's due by the number of items to precompute. |
The base model is trained with a specific max sequence length. You could modify the code for it to be higher, but there are no guarantees on how it affects the quality. If a model hasn't observed data in the range it expects, it would probably not end up well. But, I haven't tried it myself to answer this, so maybe worth exploring!
Do you notice this consistently? Does it ever recover back to 1 s/it? I'm unable to replicate the behaviour in all my examples I've shared in the
Awesome, thank you for the kind words! I'll think about the cosmetic information after some bigger optimizations!
Seems like a bug. I'll try to repro and fix
Working on adding non-precomputation based data loading soon! Also tracking in #296. The |
Yes. I understand. It seem that LTXV fix the max tokens to 128. I need to adapt my automatic prompt generation to stay under.
For the test I've done, yes. But I think (instinctively) that it's a memory swap problem. I reach the VRAM limit and CUDA start to use shared memory (memory swap), CUDA seem to stay in this state. I'm not sure and do not have the expertise, but I will search information about that point soon. Maybe a little trick exists to override this problem.
Sure! you deserve lots of encouragement. finetrainers is an excellent project.
Cool.
Yes. This is clearly the only problem for me now. Even with low |
I've made some updates to how precomputation is done to save more VRAM, and am consistently seeing the usage either lower than or equivalent to legacy scripts with precomputation. It should hopefully help and not end up using shared memory.
For precomputation, if the dataset remains the same and is a local dataset, I should probably not trigger it to even start. Currently, a new run always runs precomputation if enabled with
These are some of the core cases, but can bet that there will be a lot more cases to cover. In general, could use some improvements with dataset handling since there are so many formats to deal with correctly. |
If I understand the various cases, we can reuse the precomputed conditions/latents. It can be cool, for a local training, to check if the 'precomputed' directory and files exists and do not overwrite. From my point of view, it's always more simple for the user to delete the directory manually than to edit the script file. What do you think about this? EDIT: Testing with:
The process start and precompute. But if I restart, the precomputation restart and overwrite previous one. If I remove
The process start with this message: |
I've just update with the latest version and start a training (LTXV LoRA) using validation steps this time. I use the new bash method (from the
sft/ltx_video/crush_smol_lora
example). I've some details to report.The following part of your input was truncated because 'max_sequence_length' is set to 128 tokens
. Is the max_sequence_length customizable ?validation-step-n-n-idtoken-startoftheprompt.mp4
. The problem is that if two validation prompts start with the same sequence, one validation video overwrite the other. Maybe it can be cool to replace the 'startoftheprompt' text by the id of the prompt in the list or a hash ?The training continue after the validation step, but the iteration/seconds become crazy. Jump from 1 s/it to 10 s/it approximately depending of the step (probably because memory swap from VRAM and shared RAM). II guess there is a reason for this, but is there a possibility in the new version to optimize this part? What makes the memory usage grow during validation and not return to its initial state as during the previous training steps?
Anyway. You rock. The new version is clean. Maybe some colors in the command line could be appreciate for lisibility but this is just cosmethic details ;)
The text was updated successfully, but these errors were encountered: