[V1] AsyncLLM data parallel #13923

njhill · 2025-02-26T20:07:29Z

The engine core client starts an engine core proc per dp rank and load balances requests between them. A dummy request is sent to idle ranks when the global req count goes from 0->1, and when each engine finishes all requests it will continue in an idle forward loop.

Working for single node:

vllm serve -dp 2 ...

I aimed to keep the data parallel logic isolated as much as possible (in subclasses of the core engine and client) to avoid adding complexity/overhead to the more common default dp=1 case.

Follow-on after this PR:

Multi-node
Balance based on waiting queue lengths rather than in-flight counts
Make it work with API server scale-out

github-actions · 2025-02-26T20:07:39Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Nick Hill <[email protected]>

vllm/v1/engine/core_client.py

Signed-off-by: Nick Hill <[email protected]>

v-lmn · 2025-03-03T06:50:42Z

how to test,I mean how to run the server,I think we need two command right?
terminal 1 command 1
terminal 2 command 2
can you complete the command line

vllm/config.py

mergify · 2025-03-03T08:52:03Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

youkaichao

the DP-related part looks good to me.

cc @robertgshaw2-redhat I'm not familiar with the frontend processing part, maybe Robert can take a look?

njhill · 2025-03-03T15:38:07Z

how to test,I mean how to run the server,I think we need two command right?
terminal 1 command 1
terminal 2 command 2
can you complete the command line

@v-lmn no for single node you can run a single command, with --data-parallel=N. Multi-node isn't added yet but when it is, that will require a different command to be run on the other node(s).

…gine # Conflicts: # vllm/v1/core/scheduler.py # vllm/v1/engine/core.py # vllm/v1/engine/core_client.py

Signed-off-by: Nick Hill <[email protected]>

mergify · 2025-03-03T21:34:45Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Nick Hill <[email protected]>

tlrmchlsmth · 2025-03-25T14:49:40Z

tests/v1/test_async_llm_dp.py

+if not current_platform.is_cuda():
+    pytest.skip(reason="V1 currently only supported on CUDA.",
+                allow_module_level=True)


Not sure if DP works on TPU or AMD GPUs, but modify this reason string since V1 works there at least experimentally?

vllm/vllm/engine/arg_utils.py

Lines 1669 to 1675 in d0cfec7

# No support for device type other than CUDA, AMD (experiemntal) or

# TPU (experimental) so far.

if not (current_platform.is_cuda_alike() or current_platform.is_tpu()):

_raise_or_fallback(

feature_name=f"device type={current_platform.device_type}",

recommend_to_remove=False)

return False

We could actually use supports_v1 now that this PR has landed (probably only want to turn tests on for CUDA and RoCM though)

#15417

tlrmchlsmth · 2025-03-25T14:51:38Z

tests/v1/test_async_llm_dp.py

+@pytest.mark.asyncio
+async def test_load(monkeypatch, output_kind: RequestOutputKind):
+    with monkeypatch.context() as m, ExitStack() as after:
+        m.setenv("VLLM_USE_V1", "1")


Remove this now that V1 is on by default?

tlrmchlsmth · 2025-03-25T14:54:02Z

vllm/v1/core/sched/scheduler.py

            outputs=outputs,
            scheduler_stats=self.make_stats(),
        )
+        if self.include_finished_set:


mergify · 2025-03-27T02:11:27Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

tlrmchlsmth

Left a few minor comments. The PR still looks very good and would be great to get it landed soon.

Signed-off-by: Nick Hill <[email protected]> # Conflicts: # examples/offline_inference/data_parallel.py

Signed-off-by: Nick Hill <[email protected]>

njhill · 2025-03-27T15:29:42Z

Thanks @tlrmchlsmth! Have addressed those comments. Also had to make some additional adjustments to ensure compatibility with @youkaichao's offline multi-node scenario added in #15484.

Signed-off-by: Nick Hill <[email protected]>

mergify · 2025-03-27T17:52:59Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Nick Hill <[email protected]>

# Conflicts: # vllm/v1/core/sched/scheduler.py

youkaichao · 2025-03-28T10:18:22Z

vllm/v1/engine/core.py

+        local_dp_rank = vllm_config.parallel_config.data_parallel_rank_local
+
+        assert dp_size > 1
+        assert 0 <= local_dp_rank <= dp_rank < dp_size


why do we need this check?

It's not strictly needed, I just thought it might be good here to verify that the config is in a coherent state.

youkaichao · 2025-03-28T10:18:49Z

vllm/v1/engine/core.py

+        from vllm.platforms import current_platform
+        if current_platform.is_cuda_alike():
+            from vllm.platforms.cuda import device_id_to_physical_device_id
+            tp_size = vllm_config.parallel_config.tensor_parallel_size


you can use world_size to be general, not just tp_size

Signed-off-by: Nick Hill <[email protected]>

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: xinyuxiao <[email protected]>

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

Signed-off-by: Nick Hill <[email protected]>

mergify bot added the v1 label Feb 26, 2025

[V1] AsyncLLM data parallel WIP

9ca44ce

Signed-off-by: Nick Hill <[email protected]>

njhill force-pushed the multi-engine branch from 1e63f80 to 9ca44ce Compare February 26, 2025 20:32

njhill added 7 commits February 26, 2025 20:03

Handle pausing loop

3f51611

Signed-off-by: Nick Hill <[email protected]>

More single-node updates

d8c591e

Signed-off-by: Nick Hill <[email protected]>

some cleanup

65e225d

Signed-off-by: Nick Hill <[email protected]>

fix up utility methods

5ce57b6

Signed-off-by: Nick Hill <[email protected]>

revert config check

a3f1102

Signed-off-by: Nick Hill <[email protected]>

fixes

a66fb01

Signed-off-by: Nick Hill <[email protected]>

cleanup

67672c2

Signed-off-by: Nick Hill <[email protected]>

youkaichao reviewed Feb 27, 2025

View reviewed changes

vllm/v1/engine/core_client.py Outdated Show resolved Hide resolved

njhill added 7 commits February 27, 2025 12:47

fixes

cf52fbf

Signed-off-by: Nick Hill <[email protected]>

reconcile with LLMEngine DP in decoupled engine case

a4ec81b

Signed-off-by: Nick Hill <[email protected]>

minor simplification

292aa00

Signed-off-by: Nick Hill <[email protected]>

rework

4b62ffd

Signed-off-by: Nick Hill <[email protected]>

class refactor

407c72e

Signed-off-by: Nick Hill <[email protected]>

fix

31bf7ea

Signed-off-by: Nick Hill <[email protected]>

adjust core engine init

fde51ce

Signed-off-by: Nick Hill <[email protected]>

youkaichao reviewed Mar 3, 2025

View reviewed changes

vllm/config.py Show resolved Hide resolved

mergify bot added the needs-rebase label Mar 3, 2025

youkaichao approved these changes Mar 3, 2025

View reviewed changes

Merge remote-tracking branch 'refs/remotes/origin/main' into multi-en…

d5a3e68

…gine # Conflicts: # vllm/v1/core/scheduler.py # vllm/v1/engine/core.py # vllm/v1/engine/core_client.py

mergify bot removed the needs-rebase label Mar 3, 2025

njhill added 2 commits March 3, 2025 08:12

fix new typing

6d89a1b

Signed-off-by: Nick Hill <[email protected]>

fix 🤦

448abd9

Signed-off-by: Nick Hill <[email protected]>

Additional debug

05ab310

Signed-off-by: Nick Hill <[email protected]>

tlrmchlsmth reviewed Mar 27, 2025

View reviewed changes

mergify bot added the needs-rebase label Mar 27, 2025

tlrmchlsmth approved these changes Mar 27, 2025

View reviewed changes

njhill added 3 commits March 27, 2025 08:24

Merge remote-tracking branch 'origin/main' into multi-engine

5295c34

Signed-off-by: Nick Hill <[email protected]> # Conflicts: # examples/offline_inference/data_parallel.py

Address review comments on tests

4f897b8

Signed-off-by: Nick Hill <[email protected]>

Merge remote-tracking branch 'origin/main' into multi-engine

62f32ed

mergify bot removed the needs-rebase label Mar 27, 2025

njhill added 2 commits March 27, 2025 09:45

Fix env var fallback

771ccf1

Signed-off-by: Nick Hill <[email protected]>

Fix test supports_v1 check

05a0e83

Signed-off-by: Nick Hill <[email protected]>

mergify bot added the needs-rebase label Mar 27, 2025

njhill added 2 commits March 27, 2025 10:59

Fix yapf 🤦

bc41b13

Signed-off-by: Nick Hill <[email protected]>

Merge remote-tracking branch 'origin/main' into multi-engine

ccecb42

# Conflicts: # vllm/v1/core/sched/scheduler.py

mergify bot removed the needs-rebase label Mar 27, 2025

simon-mo merged commit 15dac21 into vllm-project:main Mar 27, 2025
53 of 59 checks passed

njhill deleted the multi-engine branch March 28, 2025 00:32

youkaichao mentioned this pull request Mar 28, 2025

[Bugfix] Data parallel example will all use same GPUs if the users script initializes torch.cuda #14598

Closed

youkaichao reviewed Mar 28, 2025

View reviewed changes

simon-mo mentioned this pull request Mar 29, 2025

[Roadmap] vLLM Roadmap Q2 2025 #15735

Open

66 tasks

lengrongfu pushed a commit to lengrongfu/vllm that referenced this pull request Apr 2, 2025

[V1] AsyncLLM data parallel (vllm-project#13923)

59d55bd

Signed-off-by: Nick Hill <[email protected]>

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Apr 2, 2025

[V1] AsyncLLM data parallel (vllm-project#13923)

650dfbb

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Kyle Sayers <[email protected]>

tlrmchlsmth mentioned this pull request Apr 3, 2025

[RFC]: Data Parallel Attention and Expert Parallel MoEs #16037

Open

13 tasks

Alex4210987 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Apr 5, 2025

[V1] AsyncLLM data parallel (vllm-project#13923)

b28af68

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: xinyuxiao <[email protected]>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[V1] AsyncLLM data parallel (vllm-project#13923)

2f4bbe3

Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

nishith-fujitsu pushed a commit to nishith-fujitsu/vllm that referenced this pull request Apr 9, 2025

[V1] AsyncLLM data parallel (vllm-project#13923)

4ee2d49

Signed-off-by: Nick Hill <[email protected]>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1] AsyncLLM data parallel #13923

[V1] AsyncLLM data parallel #13923

njhill commented Feb 26, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 26, 2025

v-lmn commented Mar 3, 2025

mergify bot commented Mar 3, 2025

youkaichao left a comment

njhill commented Mar 3, 2025

mergify bot commented Mar 3, 2025

tlrmchlsmth Mar 25, 2025

tlrmchlsmth Mar 27, 2025

tlrmchlsmth Mar 25, 2025

tlrmchlsmth Mar 25, 2025

mergify bot commented Mar 27, 2025

tlrmchlsmth left a comment

njhill commented Mar 27, 2025

mergify bot commented Mar 27, 2025

youkaichao Mar 28, 2025

njhill Mar 28, 2025

youkaichao Mar 28, 2025

	# No support for device type other than CUDA, AMD (experiemntal) or
	# TPU (experimental) so far.
	if not (current_platform.is_cuda_alike() or current_platform.is_tpu()):
	_raise_or_fallback(
	feature_name=f"device type={current_platform.device_type}",
	recommend_to_remove=False)
	return False

[V1] AsyncLLM data parallel #13923

[V1] AsyncLLM data parallel #13923

Conversation

njhill commented Feb 26, 2025 • edited by github-actions bot Loading

github-actions bot commented Feb 26, 2025

v-lmn commented Mar 3, 2025

mergify bot commented Mar 3, 2025

youkaichao left a comment

Choose a reason for hiding this comment

njhill commented Mar 3, 2025

mergify bot commented Mar 3, 2025

tlrmchlsmth Mar 25, 2025

Choose a reason for hiding this comment

tlrmchlsmth Mar 27, 2025

Choose a reason for hiding this comment

tlrmchlsmth Mar 25, 2025

Choose a reason for hiding this comment

tlrmchlsmth Mar 25, 2025

Choose a reason for hiding this comment

mergify bot commented Mar 27, 2025

tlrmchlsmth left a comment

Choose a reason for hiding this comment

njhill commented Mar 27, 2025

mergify bot commented Mar 27, 2025

youkaichao Mar 28, 2025

Choose a reason for hiding this comment

njhill Mar 28, 2025

Choose a reason for hiding this comment

youkaichao Mar 28, 2025

Choose a reason for hiding this comment

njhill commented Feb 26, 2025 •

edited by github-actions bot

Loading