-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiheadAttention conversion #64
Comments
Thank you for reaching out. |
You mean you need to refactor MultiheadAttention, do you |
Yes, you should complete the computation contained in the By the way, maybe you can |
That is, That is, I need to replace the Softmax calculations in it, right? |
Do I need to do attention calculations myself like this? def forward(self, query: torch.Tensor, key: torch.Tensor, value: torch.Tensor,
|
I see that the internal calculation of MultiheadAttention looks like this, and here is the part that can be replaced, but I don't know how to change it, because it has two return values, and ours has only one return value, I don't know how to fix it. attn_output, attn_output_weights = _scaled_dot_product_attention(q, k, v, attn_mask, dropout_p) |
start = time.time() 这样进行统计之后,我发现我的运行速度没有降低,为什么呢? |
Please refer to our FLOPS testing codes. |
Can you provide me with the following link? |
My sequence is only 197 in length, is that too short? |
How torch.nn.MultiheadAttention integrates with sageattention?
What should I do?
The text was updated successfully, but these errors were encountered: