Skip to content

Commit af35d3a

Browse files
authored
[TPU][V1][Bugfix] Fix chunked prefill with padding (vllm-project#15037)
Signed-off-by: NickLucche <[email protected]>
1 parent 3b45714 commit af35d3a

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

vllm/v1/worker/tpu_model_runner.py

+3
Original file line numberDiff line numberDiff line change
@@ -410,6 +410,9 @@ def _prepare_inputs(self, scheduler_output: "SchedulerOutput"):
410410
# Do the padding and copy the tensors to the TPU.
411411
padded_total_num_scheduled_tokens = _get_padded_token_len(
412412
total_num_scheduled_tokens)
413+
# Zero out to avoid spurious values from prev iteration (last cp chunk)
414+
self.input_ids_cpu[
415+
total_num_scheduled_tokens:padded_total_num_scheduled_tokens] = 0
413416
self.input_ids = self.input_ids_cpu[:
414417
padded_total_num_scheduled_tokens].to(
415418
self.device)

0 commit comments

Comments
 (0)