You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/opt/conda/lib/python3.11/site-packages/deepcompressor/app/diffusion/ptq.py", line 10, in <module>
from deepcompressor.app.llm.nn.patch import patch_attention, patch_gemma_rms_norm
File "/opt/conda/lib/python3.11/site-packages/deepcompressor/app/llm/nn/__init__.py", line 3, in <module>
from .struct import LlmModelStruct, LlmTransformerBlockStruct, LlmTransformerStruct
File "/opt/conda/lib/python3.11/site-packages/deepcompressor/app/llm/nn/struct.py", line 10, in <module>
from transformers.models.gemma2.modeling_gemma2 import (
ImportError: cannot import name 'Gemma2FlashAttention2' from 'transformers.models.gemma2.modeling_gemma2' (/opt/conda/lib/python3.11/site-packages/transformers/models/gemma2/modeling_gemma2.py)
But if I run:
pip3 install transformers==4.46.0
Then that fixes it. I've tested this on a Runpod H100, and my local RTX 4090, and it's the same.
I think the transformers version should be pinned like
transformers==4.46.0
. Currently if I install, and then run:I get:
But if I run:
Then that fixes it. I've tested this on a Runpod H100, and my local RTX 4090, and it's the same.
I'm following this exactly, but the way: https://github.com/mit-han-lab/deepcompressor/tree/main/examples/diffusion Trying reproduce Schnell quantization.
However, after fixing the above issue, I then run into this issue:
The text was updated successfully, but these errors were encountered: