Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Support custom ascendc kernels in vllm-ascend #233

Merged
merged 11 commits into from
Apr 3, 2025

Conversation

ganyi1996ppo
Copy link
Collaborator

@ganyi1996ppo ganyi1996ppo commented Mar 4, 2025

This PR add custom ascendc kernel rotary_embedding support in vllm-ascend, related CMakeLists and setuptools is also added in this PR.

Related: #156

@ganyi1996ppo ganyi1996ppo requested review from wangxiyuan and Yikun March 4, 2025 01:52
@ganyi1996ppo ganyi1996ppo changed the title [core] Support custom ascendc kernels in vllm-ascend [core] Support custom ascendc kernels in vllm-ascend [draft] Mar 4, 2025
setup.py Outdated
# First, run the standard build_ext command to compile the extensions
super().run()

# copy vllm/vllm_flash_attn/*.py from self.build_lib to current
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not comment useless code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I'll remove some of the unused code later.

wangxiyuan referenced this pull request Mar 18, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Add multi step scheduler support for vllm-ascend

Signed-off-by: new-TonyWang <[email protected]>
@ganyi1996ppo ganyi1996ppo changed the title [core] Support custom ascendc kernels in vllm-ascend [draft] [core] Support custom ascendc kernels in vllm-ascend Mar 20, 2025
@ganyi1996ppo ganyi1996ppo force-pushed the ganyi/cus_ops branch 3 times, most recently from 1fb60ef to 8ae9108 Compare March 21, 2025 03:04
ganyi1996ppo and others added 7 commits April 1, 2025 09:40
Add custom ascendc kernel support in vllm-ascend, this PR mainly include
3 parts:
-  AscendC implementation of rotary_embedding, and its unitest.
- CMakeLists.txt to compile AscendC kernel and related torch library
binding to this kernel.
-  Build and pack all the compiled so into the vllm_ascend's package.

For now, this rotary embedding kernel dose not support the scenario with
`neoxStyle=False`, So its not used in the actual modeling parts. We will
soon add this implements into the vllm-ascend and integrate it into the
modeling parts.

No change at all

---------

Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
reinterpret_cast<TYPE *>(cosSinCache), rotDim, queryStride, keyStride, dstQueryStride, dstKeyStride, \
numHeads, numKvHeads, headSize, numTokens, loopCnt, blockDim);

static const int64_t maxParallelSize = 65535;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some comments here for the magic number? Or can we get the parallelsize from runtime API? why 65535 here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
Signed-off-by: ganyi <[email protected]>
@wangxiyuan wangxiyuan merged commit ce82599 into vllm-project:main Apr 3, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants