Seqlen of Kernel Benchmark #14

Sekri0 · 2024-09-30T08:53:42Z

I noticed that the dimension of seqlen (variable M in engine/test.sh) in kernel Benchmark is very small. Does this mean that the test only considers the decode stage and ignores the prefill stage?
I tried to increase M to 4096 and found that the startup of the test became very slow (GPU utilization remained 0 for the first hour).

lswzjuer · 2024-10-03T02:13:17Z

Sorry, I didn't reply in time due to the National Day holiday. Our solution is mainly for the decoding stage. As mentioned in the paper, the GEMV problem only exists in the decoding stage, so the M value range we tested is 1~16

Sekri0 · 2024-10-07T02:21:08Z

Thanks for the reply, I have one more question. In the end-to-end experiment, which kernel is used in the prefill phase of the w2a8 model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seqlen of Kernel Benchmark #14

Seqlen of Kernel Benchmark #14

Sekri0 commented Sep 30, 2024

lswzjuer commented Oct 3, 2024

Sekri0 commented Oct 7, 2024

Seqlen of Kernel Benchmark #14

Seqlen of Kernel Benchmark #14

Comments

Sekri0 commented Sep 30, 2024

lswzjuer commented Oct 3, 2024

Sekri0 commented Oct 7, 2024