Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seqlen of Kernel Benchmark #14

Open
Sekri0 opened this issue Sep 30, 2024 · 2 comments
Open

Seqlen of Kernel Benchmark #14

Sekri0 opened this issue Sep 30, 2024 · 2 comments

Comments

@Sekri0
Copy link

Sekri0 commented Sep 30, 2024

I noticed that the dimension of seqlen (variable M in engine/test.sh) in kernel Benchmark is very small. Does this mean that the test only considers the decode stage and ignores the prefill stage?
I tried to increase M to 4096 and found that the startup of the test became very slow (GPU utilization remained 0 for the first hour).

@lswzjuer
Copy link
Contributor

lswzjuer commented Oct 3, 2024

Sorry, I didn't reply in time due to the National Day holiday. Our solution is mainly for the decoding stage. As mentioned in the paper, the GEMV problem only exists in the decoding stage, so the M value range we tested is 1~16

@Sekri0
Copy link
Author

Sekri0 commented Oct 7, 2024

Thanks for the reply, I have one more question. In the end-to-end experiment, which kernel is used in the prefill phase of the w2a8 model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants