-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aligning modling code for GPT2 to work with vLLM (fallback) #36934
base: main
Are you sure you want to change the base?
Conversation
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the |
base_model_prefix = "model" # vllm | ||
# base_model_prefix = "transformer" # transformers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done for weight loading.
We need the prefix set for different platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm planning to investigate this on the vLLM side to see if we can remove this requirement
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice thanks!
Co-authored-by: Arthur <[email protected]>
Just make sure CI is green! |
@ArthurZucker I don't think the CI errors stem from the modeling changes. Do you want me to investigate further? |
I could make this optional so that a model which does not have it simply doesn't support TP, rather than not working at all? |
Co-authored-by: Harry Mellor <[email protected]>
@ArthurZucker would it be okay to merge? The CI is broken for issues not related to the PR it seems 😅 |
This PR changes the modeling code for GPT2 to support working on vLLM using the
transformers fallback backend
.The changes are as follows:
kwargs
which is used to propagate information aboutattention_indices
in vLLMbase_model_tp_plan
, which is currently empty. This is a required attribute for vLLMattn_outputs
(took help from the llama code)This PR is dependent on vllm-project/vllm#15290 on the vLLM side to work.
One can use the following snippet to check the implementation: