Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CK] [FP8] Add FP8 support to CK A8W8 GEMM #124

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,6 @@ there are number of op test, you can run them like this: `python3 op_tests/test_
|GEMM | D=AxB+C |
|FusedMoE | bf16 balabala |
|WIP | coming soon... |

## Ops
1. [INT8/FP8 A8W8 Per-Tensor/Rowwise Scaling GEMM](csrc/ck_gemm_a8w8/README.md)
189 changes: 189 additions & 0 deletions aiter/configs/a8w8_gemm_model_config/Llama-3.1-70B-Instruct-TP1.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
M,N,K
1,8192,8192
1,10240,8192
1,8192,28672
1,57344,8192
2,8192,8192
2,8192,28672
2,10240,8192
2,57344,8192
4,10240,8192
4,8192,8192
4,57344,8192
4,8192,28672
8,8192,8192
8,8192,28672
8,57344,8192
8,10240,8192
16,8192,8192
16,8192,28672
16,57344,8192
16,10240,8192
17,8192,28672
17,10240,8192
17,57344,8192
17,8192,8192
24,57344,8192
24,8192,8192
24,8192,28672
24,10240,8192
32,8192,28672
32,57344,8192
32,10240,8192
32,8192,8192
40,8192,8192
40,8192,28672
40,10240,8192
40,57344,8192
48,10240,8192
48,8192,8192
48,8192,28672
48,57344,8192
56,8192,8192
56,8192,28672
56,57344,8192
56,10240,8192
64,8192,8192
64,8192,28672
64,57344,8192
64,10240,8192
72,8192,8192
72,8192,28672
72,57344,8192
72,10240,8192
80,8192,28672
80,57344,8192
80,10240,8192
80,8192,8192
88,8192,8192
88,8192,28672
88,10240,8192
88,57344,8192
96,8192,8192
96,8192,28672
96,57344,8192
96,10240,8192
104,8192,28672
104,57344,8192
104,10240,8192
104,8192,8192
112,8192,8192
112,8192,28672
112,57344,8192
112,10240,8192
120,10240,8192
120,8192,8192
120,8192,28672
120,57344,8192
128,57344,8192
128,10240,8192
128,8192,8192
128,8192,28672
136,10240,8192
136,8192,8192
136,8192,28672
136,57344,8192
144,10240,8192
144,8192,8192
144,8192,28672
144,57344,8192
152,10240,8192
152,8192,8192
152,8192,28672
152,57344,8192
160,57344,8192
160,10240,8192
160,8192,8192
160,8192,28672
168,8192,8192
168,8192,28672
168,57344,8192
168,10240,8192
176,10240,8192
176,8192,8192
176,8192,28672
176,57344,8192
184,10240,8192
184,8192,8192
184,8192,28672
184,57344,8192
192,8192,8192
192,8192,28672
192,57344,8192
192,10240,8192
200,8192,28672
200,57344,8192
200,10240,8192
200,8192,8192
208,57344,8192
208,10240,8192
208,8192,8192
208,8192,28672
216,57344,8192
216,10240,8192
216,8192,8192
216,8192,28672
224,10240,8192
224,8192,8192
224,8192,28672
224,57344,8192
232,10240,8192
232,8192,8192
232,8192,28672
232,57344,8192
240,10240,8192
240,8192,8192
240,8192,28672
240,57344,8192
248,8192,28672
248,57344,8192
248,10240,8192
248,8192,8192
256,8192,28672
256,10240,8192
256,57344,8192
256,8192,8192
512,10240,8192
512,8192,8192
512,8192,28672
512,57344,8192
1024,10240,8192
1024,8192,8192
1024,8192,28672
1024,57344,8192
1536,8192,8192
1536,8192,28672
1536,57344,8192
1536,10240,8192
2048,8192,28672
2048,57344,8192
2048,8192,8192
2048,10240,8192
3072,8192,28672
3072,10240,8192
3072,8192,8192
3072,57344,8192
4096,10240,8192
4096,8192,8192
4096,8192,28672
4096,57344,8192
8192,10240,8192
8192,8192,8192
8192,8192,28672
8192,57344,8192
16384,8192,28672
16384,57344,8192
16384,10240,8192
16384,8192,8192
18432,8192,8192
18432,8192,28672
18432,57344,8192
18432,10240,8192
20480,10240,8192
20480,8192,8192
20480,8192,28672
20480,57344,8192
32768,57344,8192
32768,10240,8192
32768,8192,8192
32768,8192,28672
189 changes: 189 additions & 0 deletions aiter/configs/a8w8_gemm_model_config/Llama-3.1-70B-Instruct-TP2.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
M,N,K
1,5120,8192
1,8192,14336
1,8192,4096
1,28672,8192
2,8192,4096
2,5120,8192
2,8192,14336
2,28672,8192
4,5120,8192
4,8192,14336
4,8192,4096
4,28672,8192
8,8192,4096
8,8192,14336
8,28672,8192
8,5120,8192
16,8192,14336
16,5120,8192
16,28672,8192
16,8192,4096
17,8192,14336
17,28672,8192
17,8192,4096
17,5120,8192
24,28672,8192
24,8192,4096
24,5120,8192
24,8192,14336
32,8192,4096
32,5120,8192
32,28672,8192
32,8192,14336
40,8192,4096
40,5120,8192
40,8192,14336
40,28672,8192
48,5120,8192
48,8192,4096
48,28672,8192
48,8192,14336
56,8192,4096
56,8192,14336
56,28672,8192
56,5120,8192
64,5120,8192
64,28672,8192
64,8192,14336
64,8192,4096
72,8192,14336
72,8192,4096
72,28672,8192
72,5120,8192
80,8192,4096
80,5120,8192
80,28672,8192
80,8192,14336
88,8192,4096
88,5120,8192
88,28672,8192
88,8192,14336
96,5120,8192
96,8192,14336
96,8192,4096
96,28672,8192
104,8192,14336
104,8192,4096
104,28672,8192
104,5120,8192
112,28672,8192
112,5120,8192
112,8192,14336
112,8192,4096
120,5120,8192
120,8192,4096
120,28672,8192
120,8192,14336
128,5120,8192
128,28672,8192
128,8192,4096
128,8192,14336
136,5120,8192
136,8192,14336
136,8192,4096
136,28672,8192
144,5120,8192
144,8192,14336
144,8192,4096
144,28672,8192
152,8192,4096
152,28672,8192
152,8192,14336
152,5120,8192
160,8192,4096
160,5120,8192
160,28672,8192
160,8192,14336
168,8192,14336
168,5120,8192
168,8192,4096
168,28672,8192
176,5120,8192
176,28672,8192
176,8192,4096
176,8192,14336
184,28672,8192
184,5120,8192
184,8192,14336
184,8192,4096
192,8192,4096
192,5120,8192
192,28672,8192
192,8192,14336
200,8192,14336
200,8192,4096
200,5120,8192
200,28672,8192
208,5120,8192
208,8192,14336
208,8192,4096
208,28672,8192
216,8192,14336
216,5120,8192
216,8192,4096
216,28672,8192
224,5120,8192
224,28672,8192
224,8192,4096
224,8192,14336
232,8192,4096
232,5120,8192
232,28672,8192
232,8192,14336
240,8192,4096
240,28672,8192
240,8192,14336
240,5120,8192
248,8192,4096
248,28672,8192
248,5120,8192
248,8192,14336
256,8192,14336
256,28672,8192
256,8192,4096
256,5120,8192
512,8192,4096
512,5120,8192
512,28672,8192
512,8192,14336
1024,5120,8192
1024,8192,4096
1024,28672,8192
1024,8192,14336
1536,8192,4096
1536,28672,8192
1536,8192,14336
1536,5120,8192
2048,28672,8192
2048,5120,8192
2048,8192,14336
2048,8192,4096
3072,5120,8192
3072,8192,14336
3072,8192,4096
3072,28672,8192
4096,5120,8192
4096,8192,14336
4096,8192,4096
4096,28672,8192
8192,5120,8192
8192,28672,8192
8192,8192,4096
8192,8192,14336
16384,8192,4096
16384,5120,8192
16384,28672,8192
16384,8192,14336
18432,8192,4096
18432,8192,14336
18432,5120,8192
18432,28672,8192
20480,8192,4096
20480,5120,8192
20480,28672,8192
20480,8192,14336
32768,5120,8192
32768,28672,8192
32768,8192,4096
32768,8192,14336
Loading