Pipeline Extraction #1279

kylesayrs · 2025-03-24T20:50:21Z

Purpose

Extract data pipelines from modifiers to enable multiple modifiers to be active at the same time

TODO

Extract pipeline from quantization modifier
Compress any modules that didn't receive samples on finalize (accounts for cases like moes where some experts might receive zero samples)

Changes

Implement data pipeline registry
- Inferred pipeline is selected using modifiers and can be overridden by user
Implement independent pipeline
- This pipeline treats each modifier as a separate stage and assigns a pipeline to each modifier
- Meant to replicate current LC behavior
Implement sequential_epoch_end
- This callback should be called after one sequential layer has been calibrated with one epoch
- This callback triggers compression and replaces passing a callback_modifier
Implement calibration_epoch_end
- This callback triggers at the end of a calibration epoch, and is used to trigger compression in between pipelines composed using the independent pipeline
- Originally, these compression events were triggered by reaching the end of each module’s initialize function. Now a separate event is required
Implement session.initialize_recipe
- This is required because, without this, modifiers cannot be created until initialize is called, and initialize cannot be called until the data pipeline is decided
- This change aligns with the LLM Compressor design where modifiers are created when the recipe is passed
Implement session.get_modifiers
- In order to perform data pipeline inference and other sequential pipeline inference, these functions must get the list of active modifiers before they initialize
- This function gets all the active modifiers across all ModifierStages
Prepare smoothquant for pipeline extraction
- Replace uses of _hf_hook. pre_forward with align_module_device for clarity
- Specify resolved_mappings_ type hint for clarity
- Trigger _apply_smoothing on the sequential_epoch_end and calibration_epoch_end
- Add a guard which allows the _apply_smoothing function to be called multiple times per session (as is required by sequential pipeline)

Testing

Quantized llama3-8b using both the independent (basic + sequential) and sequential pipelines

Signed-off-by: Kyle Sayers <[email protected]>

…lines

Signed-off-by: Kyle Sayers <[email protected]>

brian-dellabetta

Definitely looks cleaner this way! Leaving comments rather than approving, as I am still getting up to speed with pipelines

brian-dellabetta · 2025-03-26T20:25:45Z

src/llmcompressor/pipelines/registry.py

+PIPELINES: Dict[str, PipelineFn] = {
+    "sequential": sequential.run_pipeline,
+    "layer_sequential": layer_sequential.run_pipeline,
+    "basic": basic.run_pipeline,
+    "independent": independent.run_pipeline,
+}


If we make pipelines a class satisfying an abstract Pipeline base class with

@abstractmethod def run_pipeline(model:PreTrainedModel, dataloader:DataLoader):...

would it avoid needing maps like this or types like

PipelineFn = Callable[[PreTrainedModel, torch.utils.data.DataLoader], None]

in the typing.py file?

I purposefully avoided adding a base class, since I think it adds more infrastructure than is required. Such a class would only have one method, which imho doesn't justify a class definition.

Signed-off-by: Kyle Sayers <[email protected]>

brian-dellabetta

I know you're looking for feedback on this, but I'm not sure I understand it enough to approve. I do like the removal of all the try/catch code in GPTQ. Maybe we can have a deep dive session on this next week?

## Purpose ## * Revert the behavior regression introduced as a result of #1114 * When calibrating a model using the `QuantizationModifier`, quantization should be enabled when calibrating ## Changes ## * Remove "disabling quantization" from the calibration forward pass * Add "disabling quantization" to the sequential pipelines in order to continue to disable quantization during calibration for GPTQ and SGPT * When [calibration pipelines become shared between modifiers](#1279), the decision of whether to disabling quantization during calibration will have to be moved to the calibration pipelines themselves. Some work needs to be done to demonstrate that GPTQ and SGPT do not suffer accuracy regression from enabling activation quantization during calibration (in theory, the change should increase accuracy) --------- Signed-off-by: Kyle Sayers <[email protected]>

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs added 14 commits March 11, 2025 20:09

WIP

c817cef

Signed-off-by: Kyle Sayers <[email protected]>

WIP

79079c1

Signed-off-by: Kyle Sayers <[email protected]>

rename to sequential_layer_end, compress on finalize

1871c7a

Signed-off-by: Kyle Sayers <[email protected]>

remove sequential callback from basic pipeline

37994f8

Signed-off-by: Kyle Sayers <[email protected]>

comments

da7db86

Signed-off-by: Kyle Sayers <[email protected]>

rename to sequential_epoch_end

533828e

Signed-off-by: Kyle Sayers <[email protected]>

rename to sequential_epoch_end

f747d10

Signed-off-by: Kyle Sayers <[email protected]>

leave sq pipeline changes for later

233c901

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'kylesayrs/oneshot-callbacks' into kylesayrs/shared-pipe…

e14cda6

…lines

implement registry

b2e4461

Signed-off-by: Kyle Sayers <[email protected]>

remove unnecessary args

b9a0f6f

Signed-off-by: Kyle Sayers <[email protected]>

independent pipeline

4e937c3

Signed-off-by: Kyle Sayers <[email protected]>

implement independent pipeline

8d87dfd

Signed-off-by: Kyle Sayers <[email protected]>

introduce calibration_epoch_end

551fc23

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs changed the title ~~[WIP] Shared Pipeline Extraction~~ [WIP] Pipeline Extraction Mar 25, 2025

vllm-project deleted a comment from github-actions bot Mar 25, 2025

kylesayrs changed the title ~~[WIP] Pipeline Extraction~~ Pipeline Extraction Mar 25, 2025

kylesayrs marked this pull request as ready for review March 25, 2025 04:43

kylesayrs mentioned this pull request Mar 25, 2025

[WIP] Oneshot Callbacks #1244

Closed

brian-dellabetta reviewed Mar 26, 2025

View reviewed changes

kylesayrs added the ready When a PR is ready for review label Mar 27, 2025

kylesayrs mentioned this pull request Mar 27, 2025

[Callbacks] Remove compression_ready #1169

Open

kylesayrs added 2 commits March 28, 2025 10:08

extract quantization pipeline

3d7c068

Signed-off-by: Kyle Sayers <[email protected]>

apply calibration status as well

1f96965

Signed-off-by: Kyle Sayers <[email protected]>

brian-dellabetta reviewed Mar 28, 2025

View reviewed changes

kylesayrs removed the ready When a PR is ready for review label Apr 2, 2025

kylesayrs marked this pull request as draft April 2, 2025 05:53

kylesayrs added 2 commits April 14, 2025 21:33

add QuantizationMixin

75c3442

Signed-off-by: Kyle Sayers <[email protected]>

remove on_finialize for qmod

d83858e

Signed-off-by: Kyle Sayers <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline Extraction #1279

Pipeline Extraction #1279

kylesayrs commented Mar 24, 2025 •

edited

Loading

brian-dellabetta left a comment

brian-dellabetta Mar 26, 2025 •

edited

Loading

kylesayrs Mar 26, 2025 •

edited

Loading

brian-dellabetta left a comment

Pipeline Extraction #1279

Are you sure you want to change the base?

Pipeline Extraction #1279

Conversation

kylesayrs commented Mar 24, 2025 • edited Loading

Purpose

TODO

Changes

Testing

brian-dellabetta left a comment

Choose a reason for hiding this comment

brian-dellabetta Mar 26, 2025 • edited Loading

Choose a reason for hiding this comment

kylesayrs Mar 26, 2025 • edited Loading

Choose a reason for hiding this comment

brian-dellabetta left a comment

Choose a reason for hiding this comment

kylesayrs commented Mar 24, 2025 •

edited

Loading

brian-dellabetta Mar 26, 2025 •

edited

Loading

kylesayrs Mar 26, 2025 •

edited

Loading