-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Support custom ascendc kernels in vllm-ascend #233
Conversation
setup.py
Outdated
# First, run the standard build_ext command to compile the extensions | ||
super().run() | ||
|
||
# copy vllm/vllm_flash_attn/*.py from self.build_lib to current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not comment useless code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I'll remove some of the unused code later.
c8bfa85
to
9f0dc37
Compare
Add multi step scheduler support for vllm-ascend Signed-off-by: new-TonyWang <[email protected]>
1fb60ef
to
8ae9108
Compare
e811591
to
a52dd05
Compare
Add custom ascendc kernel support in vllm-ascend, this PR mainly include 3 parts: - AscendC implementation of rotary_embedding, and its unitest. - CMakeLists.txt to compile AscendC kernel and related torch library binding to this kernel. - Build and pack all the compiled so into the vllm_ascend's package. For now, this rotary embedding kernel dose not support the scenario with `neoxStyle=False`, So its not used in the actual modeling parts. We will soon add this implements into the vllm-ascend and integrate it into the modeling parts. No change at all --------- Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
…build Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
3cc0e28
to
29b4aa0
Compare
Signed-off-by: ganyi <[email protected]>
reinterpret_cast<TYPE *>(cosSinCache), rotDim, queryStride, keyStride, dstQueryStride, dstKeyStride, \ | ||
numHeads, numKvHeads, headSize, numTokens, loopCnt, blockDim); | ||
|
||
static const int64_t maxParallelSize = 65535; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some comments here for the magic number? Or can we get the parallelsize from runtime API? why 65535 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
This PR add custom ascendc kernel rotary_embedding support in vllm-ascend, related CMakeLists and setuptools is also added in this PR.
Related: #156