Possibilities of support Pascal #66

sorasoras · 2024-12-07T06:37:46Z

Since Pascal except P100 do support F32 and Int8 via DP4A. I was wonder if sage attention is usable via DP4A along.

sorasoras · 2024-12-07T08:28:02Z

Since Pascal except P100 do support F32 and Int8 via DP4A. I was wonder if sage attention is usable via DP4A along.

https://github.com/sasha0552/pascal-pkgs-ci/releases
there is pascal repo for triton.

sorasoras · 2024-12-19T17:36:18Z

I tried to run this and this happen.

sageattn_cogvideo.py
Couldn't connect to the Hub: (ProtocolError('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')), '(Request ID: 67caf1ce-8db5-4dd7-b1b3-6756c9d 34fe8)').
Will try to load from local cache.
Loading pipeline components...: 20%|████████████████████▏ | 1/5 [00:12<00:50, 12.55s/it] The config attributes {'invert_scale_latents': False} were passed to AutoencoderKLCogVideoX, but are not expected and will be ignored. Please verify your config.json co nfiguration file.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.40it/s]
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00, 1.61it/s]
0%| | 0/50 [00:00<?, ?it/s]loc(callsite("/usr/local/lib/python3.10/dist-packages/sageattention/attn_qk_int8_per_block.py":18:23 at "/usr/local/lib/python3.10/dist-packages/sageattention/attn_qk_int8_per_block.py":78:55)): error: 'tt.fp_to_fp' op operand #0 must be ranked tensor of floating-point values, but got 'tensor<128x64xi8, #triton_gpu.dot_op<{opIdx = 0, parent = #triton_gpu.blocked<{sizePerThread = [4, 4], threadsPerWarp = [2, 16], warpsPerCTA = [4, 1], order = [1, 0]}>}>>'
0%| | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/root/SageAttention/example/sageattn_cogvideo.py", line 19, in
video = pipe(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/cogvideo/pipeline_cogvideox.py", line 684, in call
noise_pred = self.transformer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 473, in forward
hidden_states, encoder_hidden_states = block(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 132, in forward
attn_hidden_states, attn_encoder_hidden_states = self.attn1(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py", line 495, in forward
return self.processor(
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py", line 1950, in call
hidden_states = F.scaled_dot_product_attention(
File "/usr/local/lib/python3.10/dist-packages/sageattention/core.py", line 110, in sageattn
o = attn_false(q_int8, k_int8, v, q_scale, k_scale, tensor_layout=tensor_layout, output_dtype=dtype)
File "/usr/local/lib/python3.10/dist-packages/sageattention/attn_qk_int8_per_block.py", line 113, in forward
_attn_fwd[grid](
File "/usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py", line 345, in
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py", line 662, in run
kernel = self.compile(
File "/usr/local/lib/python3.10/dist-packages/triton/compiler/compiler.py", line 282, in compile
next_module = compile_ir(module, metadata)
File "/usr/local/lib/python3.10/dist-packages/triton/backends/nvidia/compiler.py", line 317, in
stages["ttgir"] = lambda src, metadata: self.make_ttgir(src, metadata, options, self.capability)
File "/usr/local/lib/python3.10/dist-packages/triton/backends/nvidia/compiler.py", line 189, in make_ttgir
pm.run(mod)
RuntimeError: PassManager::run failed
root@SORANET:~/SageAttention/example#

Ph0rk0z · 2024-12-21T20:18:37Z

tensor<128x64xi8

It got int8 but was expecting floats is my guess. P40 natively supports int8. P100 doesn't at all. If you are loading a model quantized with fp8 pytorch, try a GGUF and it's more likely to work.

sorasoras · 2024-12-23T03:32:10Z

tensor<128x64xi8

It got int8 but was expecting floats is my guess. P40 natively supports int8. P100 doesn't at all. If you are loading a model quantized with fp8 pytorch, try a GGUF and it's more likely to work.

so maybe just load a fp16 model or gguf. A suggestions would be nice.

ps：you are right this is a p40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possibilities of support Pascal #66

Possibilities of support Pascal #66

sorasoras commented Dec 7, 2024

sorasoras commented Dec 7, 2024

sorasoras commented Dec 19, 2024

Ph0rk0z commented Dec 21, 2024

sorasoras commented Dec 23, 2024 •

edited

Loading

Possibilities of support Pascal #66

Possibilities of support Pascal #66

Comments

sorasoras commented Dec 7, 2024

sorasoras commented Dec 7, 2024

sorasoras commented Dec 19, 2024

Ph0rk0z commented Dec 21, 2024

sorasoras commented Dec 23, 2024 • edited Loading

sorasoras commented Dec 23, 2024 •

edited

Loading