Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable Diffusion 3.5 Large CUDA OUT_OF_MEMORY on RTX 3090 #2597

Open
danielclough opened this issue Nov 5, 2024 · 9 comments
Open

Stable Diffusion 3.5 Large CUDA OUT_OF_MEMORY on RTX 3090 #2597

danielclough opened this issue Nov 5, 2024 · 9 comments

Comments

@danielclough
Copy link
Contributor

When I run
cargo run --example stable-diffusion-3 --release --features=cuda -- --which 3.5-large --prompt "pretty picture"
I am get
Error: DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory")
with Stable Diffusion 3.5 Large and Turbo.

According to this chart from stability.ai they should run on an RTX 3090.

chart

@LaurentMazare
Copy link
Collaborator

That seems odd, we made a couple optimizations to memory usage following #2574 and in the end, SD 3.5 large was reported to work well on a GPU with only 20GB of memory. Maybe there are some other processes using the memory?
If not it would be good to run a nsys profile to see when the memory is being used.

@danielclough
Copy link
Contributor Author

There are no other processes running.

How would you like me to run nsys?

Here's some system info:

cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.1 LTS"

rustc --version
rustc 1.81.0 (eeb90cda1 2024-09-04)

cargo --version
cargo 1.81.0 (2dbb1af80 2024-08-20)

NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6
...

@super-fun-surf
Copy link

have you tried it with cudnn in addition to cuda feafture. I found it used less ram when cudnn was enabled.

@danielclough
Copy link
Contributor Author

have you tried it with cudnn in addition to cuda feafture. I found it used less ram when cudnn was enabled.

I have not.

@LaurentMazare
Should cudnn be required to run it properly?

@LaurentMazare
Copy link
Collaborator

cudnn shouldn't be necessary but might indeed help reduce gpu memory usage.
That said, I can running the command you mentioned only results in using ~20GB of memory in my case so my guess is that something else is off there.
20241112-mem

@danielclough
Copy link
Contributor Author

The GPU doesn't actually fill up all the memory.

image

Any suggestions for how to troubleshoot this would be welcome.

@LaurentMazare
Copy link
Collaborator

Not sure how much I would trust the memory usage reported by some external tool (especially here where it seems to only measure memory usage every 10s), it's probably safer to use nsys to get a proper memory profile.

@super-fun-surf
Copy link

are you unable to run it with cudnn. It really did help and my ADA4000 with 20GB won't run SD35L without it. Also I would also recommend the nsys for monitoring.

@danielclough
Copy link
Contributor Author

Unless it is supposed to require cudnn I am not interested in the workaround.

This isn't something that is important to me, so I don't know if I will make time to troubleshoot it without hand holding.

Feel free to close the issue if cudnn is supposed to be required.

Otherwise, I guess someone else will care enough to troubleshoot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants