Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aligning modling code for GPT2 to work with vLLM (fallback) #36934

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

ariG23498
Copy link
Contributor

This PR changes the modeling code for GPT2 to support working on vLLM using the transformers fallback backend.

The changes are as follows:

  • Introduction of kwargs which is used to propagate information about attention_indices in vLLM
  • Adding a base_model_tp_plan, which is currently empty. This is a required attribute for vLLM
  • Changing the reshape structure for the attn_outputs (took help from the llama code)

This PR is dependent on vllm-project/vllm#15290 on the vLLM side to work.

One can use the following snippet to check the implementation:

from vllm import LLM, SamplingParams
llm = LLM("openai-community/gpt2", model_impl="transformers", tensor_parallel_size=1)
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
from transformers import AutoModelForCausalLM, AutoTokenizer

tok = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

inputs = tok("Hello there ", return_tensors="pt")
outputs = model.generate(**inputs)
print(tok.batch_decode(outputs))

@github-actions github-actions bot marked this pull request as draft March 24, 2025 15:55
Copy link

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

Comment on lines 464 to 465
base_model_prefix = "model" # vllm
# base_model_prefix = "transformer" # transformers
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done for weight loading.

We need the prefix set for different platforms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to investigate this on the vLLM side to see if we can remove this requirement

@ariG23498 ariG23498 marked this pull request as ready for review March 24, 2025 15:56
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice thanks!

@ArthurZucker
Copy link
Collaborator

Just make sure CI is green!

@ariG23498
Copy link
Contributor Author

@ArthurZucker I don't think the CI errors stem from the modeling changes. Do you want me to investigate further?

@hmellor
Copy link
Member

hmellor commented Mar 24, 2025

Adding a base_model_tp_plan, which is currently empty. This is a required attribute for vLLM

I could make this optional so that a model which does not have it simply doesn't support TP, rather than not working at all?

@ariG23498
Copy link
Contributor Author

@ArthurZucker would it be okay to merge? The CI is broken for issues not related to the PR it seems 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants