-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compatibility with original HunyuanVideo repository #203
Comments
From what I understand flash attention is drop in so it should just work. But not an expert, this is how it works with LLM's |
@YoadTew Thank you for your interest! Yes, it is on my list of TODO's to support flash attention alongside sageattn. In Diffusers, we expose the option of setting a custom attention processor, so all that's required is implementing the processor to deduce/create the right parameters to pass to flash. It might also be possible to just leverage SDPBackend.FLASH_ATTENTION as a quick test, no? I haven't tried it yet for the trainer, so I'm unsure if it will work out-of-the-box. You might have to make some changes for it to work and contributions are welcome, but will try to support it as soon as I find some extra time |
Oh, we use an attention mask, so SDPBackend.FLASH_ATTENTION won't work :/ The workaround to use it would be to use very long prompts, so that there are no padding tokens, and not provide any attention mask at all. I think it should work, and could add some checks that enforce user dataset to have enough tokens so that no padding happens |
Hey @a-r-r-o-w , you are correct the SDPBackend.FLASH_ATTENTION solution does not work because of the padding mask. Steps to reproduce:
and install flash-attention:
Code for new AttnProcessor:
|
Wow, @YoadTew! Given it benefits the runtime, I think it could be generally beneficial to the |
Hey @sayakpaul, sure, I will try to find time to open a PR this week |
Thanks @a-r-r-o-w for the great work!
I had a question about the HunyuanVideo training script:
From what I've seen the diffusers version of HunyuanVideo is not implemented with flash attention, opposed to the original implementation.
As a result, generation in diffusers is slower and takes more VRAM (>80GB for 720p videos, 129 frames vs ~60GB for flash-attention implementation).
Is this training script compatible with the original flash-attention implementation? And after training, is it possible to use the new checkpoints with the original HunyuanVideo repo?
The text was updated successfully, but these errors were encountered: