Skip to content

Pipeline Extraction #1279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from
Draft

Pipeline Extraction #1279

wants to merge 18 commits into from

Conversation

kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented Mar 24, 2025

Purpose

  • Extract data pipelines from modifiers to enable multiple modifiers to be active at the same time

TODO

  • Extract pipeline from quantization modifier
  • Compress any modules that didn't receive samples on finalize (accounts for cases like moes where some experts might receive zero samples)

Changes

  • Implement data pipeline registry
    • Inferred pipeline is selected using modifiers and can be overridden by user
  • Implement independent pipeline
    • This pipeline treats each modifier as a separate stage and assigns a pipeline to each modifier
    • Meant to replicate current LC behavior
  • Implement sequential_epoch_end
    • This callback should be called after one sequential layer has been calibrated with one epoch
    • This callback triggers compression and replaces passing a callback_modifier
  • Implement calibration_epoch_end
    • This callback triggers at the end of a calibration epoch, and is used to trigger compression in between pipelines composed using the independent pipeline
    • Originally, these compression events were triggered by reaching the end of each module’s initialize function. Now a separate event is required
  • Implement session.initialize_recipe
    • This is required because, without this, modifiers cannot be created until initialize is called, and initialize cannot be called until the data pipeline is decided
    • This change aligns with the LLM Compressor design where modifiers are created when the recipe is passed
  • Implement session.get_modifiers
    • In order to perform data pipeline inference and other sequential pipeline inference, these functions must get the list of active modifiers before they initialize
    • This function gets all the active modifiers across all ModifierStages
  • Prepare smoothquant for pipeline extraction
    • Replace uses of _hf_hook. pre_forward with align_module_device for clarity
    • Specify resolved_mappings_ type hint for clarity
    • Trigger _apply_smoothing on the sequential_epoch_end and calibration_epoch_end
    • Add a guard which allows the _apply_smoothing function to be called multiple times per session (as is required by sequential pipeline)

Testing

  • Quantized llama3-8b using both the independent (basic + sequential) and sequential pipelines

@kylesayrs kylesayrs changed the title [WIP] Shared Pipeline Extraction [WIP] Pipeline Extraction Mar 25, 2025
@vllm-project vllm-project deleted a comment from github-actions bot Mar 25, 2025
@kylesayrs kylesayrs changed the title [WIP] Pipeline Extraction Pipeline Extraction Mar 25, 2025
@kylesayrs kylesayrs marked this pull request as ready for review March 25, 2025 04:43
Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely looks cleaner this way! Leaving comments rather than approving, as I am still getting up to speed with pipelines

Comment on lines +15 to +20
PIPELINES: Dict[str, PipelineFn] = {
"sequential": sequential.run_pipeline,
"layer_sequential": layer_sequential.run_pipeline,
"basic": basic.run_pipeline,
"independent": independent.run_pipeline,
}
Copy link
Collaborator

@brian-dellabetta brian-dellabetta Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make pipelines a class satisfying an abstract Pipeline base class with

@abstractmethod
def run_pipeline(model:PreTrainedModel, dataloader:DataLoader):...

would it avoid needing maps like this or types like

PipelineFn = Callable[[PreTrainedModel, torch.utils.data.DataLoader], None]

in the typing.py file?

Copy link
Collaborator Author

@kylesayrs kylesayrs Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I purposefully avoided adding a base class, since I think it adds more infrastructure than is required. Such a class would only have one method, which imho doesn't justify a class definition.

@kylesayrs kylesayrs added the ready When a PR is ready for review label Mar 27, 2025
Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you're looking for feedback on this, but I'm not sure I understand it enough to approve. I do like the removal of all the try/catch code in GPTQ. Maybe we can have a deep dive session on this next week?

dsikka pushed a commit that referenced this pull request Apr 1, 2025
## Purpose ##
* Revert the behavior regression introduced as a result of #1114
* When calibrating a model using the `QuantizationModifier`,
quantization should be enabled when calibrating

## Changes ##
* Remove "disabling quantization" from the calibration forward pass
* Add "disabling quantization" to the sequential pipelines in order to
continue to disable quantization during calibration for GPTQ and SGPT
* When [calibration pipelines become shared between modifiers](#1279),
the decision of whether to disabling quantization during calibration
will have to be moved to the calibration pipelines themselves. Some work
needs to be done to demonstrate that GPTQ and SGPT do not suffer
accuracy regression from enabling activation quantization during
calibration (in theory, the change should increase accuracy)

---------

Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs removed the ready When a PR is ready for review label Apr 2, 2025
@kylesayrs kylesayrs marked this pull request as draft April 2, 2025 05:53
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants