-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] Update Wan Docs with memory optimizations #11089
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, looks great, just a a couple of comments that aren't blockers, just my opinion.
|
||
We will first need to install some addtional dependencies. | ||
|
||
```shell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should start telling the users what the additional dependencies are and a link to them so they feel more secure and understand what are they installing?
we can add just a link to the pypi page too: https://pypi.org/project/ftfy/
Also now that I see it, maybe this shouldn't be an required dependency but an optional one? I'll take a look later on how it's used.
docs/source/en/api/pipelines/wan.md
Outdated
@@ -65,6 +403,11 @@ transformer = WanTransformer3DModel.from_single_file(ckpt_path, torch_dtype=torc | |||
pipe = WanPipeline.from_pretrained("Wan-AI/Wan2.1-T2V-1.3B-Diffusers", transformer=transformer) | |||
``` | |||
|
|||
## Recommendations for Inference: | |||
- Keep `AutencoderKLWan` in `torch.float32` for better decoding quality. | |||
- `num_frames` should be of the form `4 * k + 1`, for example `49` or `81`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can be more clear here at write that k
is the frames per second
or fps
in a more common language?
|
||
#### Block Level Group Offloading | ||
|
||
We can reduce our VRAM requirements by applying group offloading to the larger model components of the pipeline; the `WanTransformer3DModel` and `UMT5EncoderModel`. Group offloading will break up the individual modules of a model and offload/onload them onto your GPU as needed during inference. In this example, we'll apply `block_level` offloading, which will group the modules in a model into blocks of size `num_blocks_per_group` and offload/onload them to GPU. Moving to between CPU and GPU does add latency to the inference process. You can trade off between latency and memory savings by increasing or decreasing the `num_blocks_per_group`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we apply group offload on vae?
Thank you for this, super useful information. Have been struggling to get Wan i2v and Group Offloading working. I've tried many things to get Wan i2v to work, and properly bnb too. Are quantizations (w. ex. bitsandbytes) supposed to work on Wan too? |
from diffusers import AutoencoderKLWan, WanTransformer3DModel, WanImageToVideoPipeline | ||
from diffusers.hooks.group_offloading import apply_group_offloading | ||
from diffusers.utils import export_to_video, load_image | ||
from transformers import UMT5EncoderModel, CLIPVisionMode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLIPVisionMode is missing CLIPVisionModel
"An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in " | ||
"the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot." | ||
) | ||
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is missing "
What does this PR do?
Based on feedback here
https://huggingface.slack.com/archives/C065E480NN9/p1742176300453069
Fixes # (issue)
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.