torch.OutOfMemoryError: CUDA out of memory #30

Lenan22 · 2024-11-26T13:15:09Z

When I run : python -m deepcompressor.app.diffusion.ptq configs/model/flux.1-schnell.yaml configs/svdquant/int4.yaml --eval-benchmarks MJHQ --eval-num-samples 1024

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 384.00 MiB. GPU 0 has a total capacity of 39.56 GiB of which 359.62 MiB is free. Process 4119960 has 448.00 MiB memory in use. Process 1215056 has 38.77 GiB memory in use. Of the allocated memory 38.17 GiB is allocated by PyTorch, and 104.10 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

My A800 only has 40G of storage and can't run through this example of yours. Is there a new flux.1-schnell.yaml configuration available for me? I want to get it to run through and verify the effect of quantization.

senlyu163 · 2024-12-10T06:12:04Z

Hi, i met OOM too, do u solve it?

lmxyy · 2025-02-01T07:05:15Z

Could you provide more error logs? Typically, this issue can be bypassed by setting a proper calibration batch size.

yanglianwei · 2025-02-23T03:24:01Z

Hi, i met OOM too, do u solve it?

File "/diffusers/models/attention_processor.py", line 1703, in apply_rope
xk_out = freqs_cis[..., 0] * xk_[..., 0] + freqs_cis[..., 1] * xk_[..., 1]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 864.00 MiB. GPU 0 has a total capacity of 47.43 GiB of which 707.00 MiB is free. Including non-PyTorch memory, this process has 46.73 GiB memory in use. Of the allocated memory 43.40 GiB is allocated by PyTorch, and 3.02 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

yanglianwei · 2025-02-23T03:31:03Z

Commond:
python -m deepcompressor.app.diffusion.ptq /examples/diffusion/configs/model/flux.1-dev.yaml /examples/diffusion/configs/svdquant/int4.yaml --eval-benchmarks MJHQ --eval-num-samples 1024

Error:
File "/deepcompressor/app/diffusion/ptq.py", line 397, in
raise e
File "/deepcompressor/app/diffusion/ptq.py", line 389, in
main(config, logging_level=tools.logging.DEBUG)
File "/deepcompressor/app/diffusion/ptq.py", line 314, in main
model = ptq(
^^^^
File "/deepcompressor/app/diffusion/ptq.py", line 128, in ptq
smooth_cache = smooth_diffusion(model, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/deepcompressor/app/diffusion/quant/smooth.py", line 598, in smooth_diffusion
for _, (layer, layer_cache, layer_kwargs) in tqdm(
File "/python3.11/site-packages/tqdm/std.py", line 1181, in iter
for obj in iterable:
File "/deepcompressor/app/diffusion/dataset/calib.py", line 319, in iter_layer_activations
for layer_idx, (layer_name, (layer, layer_cache, layer_inputs)) in enumerate(
File "/python3.11/site-packages/torch/utils/_contextlib.py", line 36, in generator_context
response = gen.send(None)
^^^^^^^^^^^^^^
File "/deepcompressor/dataset/cache.py", line 327, in _iter_layer_activations
model(*sample.args, **sample.kwargs)
File "/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python3.11/site-packages/diffusers/models/transformers/transformer_flux.py", line 406, in forward
encoder_hidden_states, hidden_states = block(
^^^^^^
File "/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python3.11/site-packages/torch/nn/modules/module.py", line 1845, in _call_impl
return inner()
^^^^^^^
File "/python3.11/site-packages/torch/nn/modules/module.py", line 1793, in inner
result = forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python3.11/site-packages/diffusers/models/transformers/transformer_flux.py", line 200, in forward
attn_output, context_attn_output = self.attn(
^^^^^^^^^^
File "/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self.call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python3.11/site-packages/torch/nn/modules/module.py", line 1845, in call_impl
return inner()
^^^^^^^
File "/python3.11/site-packages/torch/nn/modules/module.py", line 1793, in inner
result = forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python3.11/site-packages/diffusers/models/attention_processor.py", line 490, in forward
return self.processor(
^^^^^^^^^^^^^^^
File "/python3.11/site-packages/diffusers/models/attention_processor.py", line 1846, in call
query, key = apply_rope(query, key, image_rotary_emb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python3.11/site-packages/diffusers/models/attention_processor.py", line 1703, in apply_rope
xk_out = freqs_cis[..., 0] * xk[..., 0] + freqs_cis[..., 1] * xk[..., 1]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 864.00 MiB. GPU 0 has a total capacity of 47.43 GiB of which 707.00 MiB is free. Including non-PyTorch memory, this process has 46.73 GiB memory in use. Of the allocated memory 43.40 GiB is allocated by PyTorch, and 3.02 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

48GB Single A6000

How to solve this problem, thanks!

lmxyy added bug Something isn't working svdquant labels Feb 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.OutOfMemoryError: CUDA out of memory #30

torch.OutOfMemoryError: CUDA out of memory #30

Lenan22 commented Nov 26, 2024

senlyu163 commented Dec 10, 2024

lmxyy commented Feb 1, 2025

yanglianwei commented Feb 23, 2025

yanglianwei commented Feb 23, 2025

torch.OutOfMemoryError: CUDA out of memory #30

torch.OutOfMemoryError: CUDA out of memory #30

Comments

Lenan22 commented Nov 26, 2024

senlyu163 commented Dec 10, 2024

lmxyy commented Feb 1, 2025

yanglianwei commented Feb 23, 2025

yanglianwei commented Feb 23, 2025