You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that the dimension of seqlen (variable M in engine/test.sh) in kernel Benchmark is very small. Does this mean that the test only considers the decode stage and ignores the prefill stage?
I tried to increase M to 4096 and found that the startup of the test became very slow (GPU utilization remained 0 for the first hour).
The text was updated successfully, but these errors were encountered:
Sorry, I didn't reply in time due to the National Day holiday. Our solution is mainly for the decoding stage. As mentioned in the paper, the GEMV problem only exists in the decoding stage, so the M value range we tested is 1~16
I noticed that the dimension of seqlen (variable M in engine/test.sh) in kernel Benchmark is very small. Does this mean that the test only considers the decode stage and ignores the prefill stage?
I tried to increase M to 4096 and found that the startup of the test became very slow (GPU utilization remained 0 for the first hour).
The text was updated successfully, but these errors were encountered: