Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibilities of support Pascal #66

Open
sorasoras opened this issue Dec 7, 2024 · 4 comments
Open

Possibilities of support Pascal #66

sorasoras opened this issue Dec 7, 2024 · 4 comments

Comments

@sorasoras
Copy link

Since Pascal except P100 do support F32 and Int8 via DP4A. I was wonder if sage attention is usable via DP4A along.

@sorasoras
Copy link
Author

Since Pascal except P100 do support F32 and Int8 via DP4A. I was wonder if sage attention is usable via DP4A along.

https://github.com/sasha0552/pascal-pkgs-ci/releases
there is pascal repo for triton.

@sorasoras
Copy link
Author

I tried to run this and this happen.

sageattn_cogvideo.py
Couldn't connect to the Hub: (ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')), '(Request ID: 67caf1ce-8db5-4dd7-b1b3-6756c9d 34fe8)').
Will try to load from local cache.
Loading pipeline components...: 20%|████████████████████▏ | 1/5 [00:12<00:50, 12.55s/it] The config attributes {'invert_scale_latents': False} were passed to AutoencoderKLCogVideoX, but are not expected and will be ignored. Please verify your config.json co nfiguration file.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.40it/s]
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00, 1.61it/s]
0%| | 0/50 [00:00<?, ?it/s]loc(callsite("/usr/local/lib/python3.10/dist-packages/sageattention/attn_qk_int8_per_block.py":18:23 at "/usr/local/lib/python3.10/dist-packages/sageattention/attn_qk_int8_per_block.py":78:55)): error: 'tt.fp_to_fp' op operand #0 must be ranked tensor of floating-point values, but got 'tensor<128x64xi8, #triton_gpu.dot_op<{opIdx = 0, parent = #triton_gpu.blocked<{sizePerThread = [4, 4], threadsPerWarp = [2, 16], warpsPerCTA = [4, 1], order = [1, 0]}>}>>'
0%| | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/root/SageAttention/example/sageattn_cogvideo.py", line 19, in
video = pipe(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/cogvideo/pipeline_cogvideox.py", line 684, in call
noise_pred = self.transformer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 473, in forward
hidden_states, encoder_hidden_states = block(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 132, in forward
attn_hidden_states, attn_encoder_hidden_states = self.attn1(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py", line 495, in forward
return self.processor(
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py", line 1950, in call
hidden_states = F.scaled_dot_product_attention(
File "/usr/local/lib/python3.10/dist-packages/sageattention/core.py", line 110, in sageattn
o = attn_false(q_int8, k_int8, v, q_scale, k_scale, tensor_layout=tensor_layout, output_dtype=dtype)
File "/usr/local/lib/python3.10/dist-packages/sageattention/attn_qk_int8_per_block.py", line 113, in forward
_attn_fwd[grid](
File "/usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py", line 345, in
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py", line 662, in run
kernel = self.compile(
File "/usr/local/lib/python3.10/dist-packages/triton/compiler/compiler.py", line 282, in compile
next_module = compile_ir(module, metadata)
File "/usr/local/lib/python3.10/dist-packages/triton/backends/nvidia/compiler.py", line 317, in
stages["ttgir"] = lambda src, metadata: self.make_ttgir(src, metadata, options, self.capability)
File "/usr/local/lib/python3.10/dist-packages/triton/backends/nvidia/compiler.py", line 189, in make_ttgir
pm.run(mod)
RuntimeError: PassManager::run failed
root@SORANET:~/SageAttention/example#

@Ph0rk0z
Copy link

Ph0rk0z commented Dec 21, 2024

tensor<128x64xi8

It got int8 but was expecting floats is my guess. P40 natively supports int8. P100 doesn't at all. If you are loading a model quantized with fp8 pytorch, try a GGUF and it's more likely to work.

@sorasoras
Copy link
Author

sorasoras commented Dec 23, 2024

tensor<128x64xi8

It got int8 but was expecting floats is my guess. P40 natively supports int8. P100 doesn't at all. If you are loading a model quantized with fp8 pytorch, try a GGUF and it's more likely to work.

so maybe just load a fp16 model or gguf. A suggestions would be nice.

ps:you are right this is a p40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants