generated from jobindjohn/obsidian-publish-mkdocs
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* PUSH NOTE : PyTorch Functionalization.md * PUSH NOTE : Ahead-of-Time (AOT) Compilation.md * PUSH NOTE : PyTorch Quantization for TensorRT.md * PUSH NOTE : Edward Z. Yang.md * PUSH NOTE : Let's talk about the Python Dispatcher.md * PUSH NOTE : PyTorch - Functionalization in PyTorch - Everything you need to know.md * PUSH NOTE : PyTorch - ExecuTorch - Export IR Specification.md * PUSH NOTE : PyTorch Compilers - What makes PyTorch beloved makes it hard to compile.md * PUSH ATTACHMENT : Pasted image 20240926160205.png * PUSH NOTE : PyTorch Eager Mode Quantization TensorRT Acceleration.md * PUSH NOTE : PyTorch - Quantization.md * PUSH NOTE : PyTorch - PyTorch 2 Export Post Training Quantization.md * PUSH NOTE : PyTorch - ExecuTorch - Quantization Overview.md * PUSH ATTACHMENT : Pasted image 20240925193351.png * PUSH NOTE : PyTorch - ExecuTorch - How ExecuTorch works?.md * PUSH NOTE : PyTorch Conference 2024 - What’s new in torch.export?.md * PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 9.md
- Loading branch information
Showing
17 changed files
with
303 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
--- | ||
tags: | ||
- compilers | ||
- pytorch | ||
- optimization | ||
share: true | ||
--- | ||
Generally: Compilation that occurs before the program is executed. | ||
|
||
Specifically to ML (PyTorch): | ||
- When a model is AOT compiled (using `torch.jit.script`(or trace) or `torch.export`), the entire program is translated from python into an intermediate representation that is independent of it. That is, you don't need a python interpreter to run that IR. | ||
- Note: torchscript is AOT in the sense that it requires to capture the whole graph before runtime but it performs further optimizations just-in-time. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
--- | ||
tags: | ||
- pytorch | ||
- compilers | ||
share: true | ||
--- | ||
Given a program/function of PyTorch operators, functionalization will return a new function, that: | ||
1. Has the same semantics as the old function | ||
2. Has no mutations in it | ||
|
||
Functionalization operates at the level of our ATen API. | ||
|
||
More info on [[PyTorch - Functionalization in PyTorch - Everything you need to know|PyTorch - Functionalization in PyTorch - Everything you need to know]] |
12 changes: 12 additions & 0 deletions
12
docs/000 Zettelkasten/PyTorch Quantization for TensorRT.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
--- | ||
tags: | ||
- quantization | ||
- efficient_dl | ||
share: true | ||
--- | ||
|
||
There seems to be quite a few possible ways to do this: | ||
- [[PyTorch Eager Mode Quantization TensorRT Acceleration|PyTorch Eager Mode Quantization TensorRT Acceleration]] | ||
- 1. torchao quantization 2. ONNX conversion 3. Graph Surgery (changing some ops in the onnx graph) 4.. tensorrt conversion | ||
- Seems very cumbersome | ||
- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
--- | ||
affiliation: | ||
- "[[FAIR|FAIR]]" | ||
- "[[Stanford|Stanford]]" | ||
- "[[MIT|MIT]]" | ||
- "[[PyTorch|PyTorch]]" | ||
share: true | ||
--- | ||
Notes: | ||
- Has a pretty cool [YouTube channel](https://www.youtube.com/@edwardzyang) where he shares (bi-weekly) PyTorch meetings | ||
- For me, it's a nice source to get more involved with PyTorch compiler-ish libraries/tools like [[ExecuTorch|ExecuTorch]], [[torch.export|torch.export]] | ||
- Also it is interesting to see the interaction between engineers |
9 changes: 9 additions & 0 deletions
9
docs/100 Reference notes/104 Other/Let's talk about the Python Dispatcher.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
--- | ||
authors: | ||
- "[[Edward Z. Yang|Edward Z. Yang]]" | ||
year: 2020 | ||
tags: | ||
- blog | ||
url: http://blog.ezyang.com/2020/09/lets-talk-about-the-pytorch-dispatcher/ | ||
share: true | ||
--- |
25 changes: 25 additions & 0 deletions
25
...100 Reference notes/104 Other/PyTorch - ExecuTorch - Export IR Specification.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
--- | ||
authors: | ||
- "[[PyTorch - Functionalization in PyTorch - Everything you need to know|PyTorch - Functionalization in PyTorch - Everything you need to know]]" | ||
year: 2024 | ||
tags: | ||
- paper | ||
url: https://pytorch.org/executorch/main/ir-exir.html | ||
share: true | ||
--- | ||
The Exported IR is a specification that consists of the following parts: | ||
|
||
1. A definition of computation graph model. | ||
2. Set of operators allowed in the graph. | ||
|
||
A dialect also provides further constraints meant for a specific purpose or stage in some compilation phase. Some dialects are: | ||
- aten dialect | ||
- edge dialect | ||
- backend dialect | ||
|
||
Executorch compilation first exports to aten, then to edge and finally to backend. | ||
|
||
|
||
## Aten Dialect | ||
|
||
- [[PyTorch Functionalization|PyTorch Functionalization]] is performed, removing any tensor aliases and mutations, and allowing for more flexible graph transformations to be made. |
38 changes: 38 additions & 0 deletions
38
docs/100 Reference notes/104 Other/PyTorch - ExecuTorch - How ExecuTorch works?.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
--- | ||
authors: | ||
- "[[PyTorch Quantization for TensorRT|PyTorch Quantization for TensorRT]]" | ||
year: 2024 | ||
tags: | ||
- pytorch | ||
- compilers | ||
- efficient_dl | ||
- documentation | ||
url: https://pytorch.org/executorch/main/intro-how-it-works | ||
share: true | ||
--- | ||
# What are the steps to run a model with ExecuTorch? | ||
|
||
## 1. Export the model | ||
|
||
- Capture the pytorch program as a *graph* | ||
## 2. Compile the exported model to an ExecuTorch program | ||
|
||
Captured Graph -> ExecuTorch program | ||
|
||
Possible Optimizations: | ||
- Compressing the model (e.g., quantization) | ||
- Lowering subgraphs to on-device specialized hardware accelerators to improve latency. | ||
- memory planning, i.e. to efficiently plan the location of intermediate tensors to reduce the runtime memory footprint. | ||
|
||
## 3. Run the ExecuTorch program to a target device | ||
|
||
- Light runtime with memory planning for fast inference :) | ||
|
||
## Key Benefits | ||
|
||
- Export that is robust and powerful | ||
- Operator Standardization | ||
- Standardization for compiler interfaces (aka delegates) and the OSS ecosystem | ||
- First-party SDK and toolchain | ||
- Ease of customization | ||
- Low overhead runtime and execution |
15 changes: 15 additions & 0 deletions
15
docs/100 Reference notes/104 Other/PyTorch - ExecuTorch - Quantization Overview.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
--- | ||
authors: | ||
- "[[PyTorch Quantization for TensorRT|PyTorch Quantization for TensorRT]]" | ||
year: 2024 | ||
tags: | ||
- documentation | ||
url: https://pytorch.org/executorch/main/quantization-overview.html | ||
share: true | ||
--- | ||
|
||
|
||
![[Pasted image 20240925193351.png|400]] | ||
|
||
> Quantization is usually tied to execution backends that have quantized operators implemented. Thus each backend is opinionated about how the model should be quantized, expressed in a backend specific `Quantizer` class. | ||
22 changes: 22 additions & 0 deletions
22
...4 Other/PyTorch - Functionalization in PyTorch - Everything you need to know.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
--- | ||
authors: | ||
- "[[Brian Hirsh|Brian Hirsh]]" | ||
year: 2023 | ||
tags: | ||
- documentation | ||
url: https://dev-discuss.pytorch.org/t/functionalization-in-pytorch-everything-you-wanted-to-know/965 | ||
share: true | ||
--- | ||
Given a program/function of PyTorch operators, functionalization will return a new function, that: | ||
1. Has the same semantics as the old function | ||
2. Has no mutations in it | ||
|
||
Exposed in [functorch API](https://pytorch.org/functorch/0.2.0/generated/functorch.experimental.functionalize.html?highlight=functionalize#functorch.experimental.functionalize). | ||
|
||
Functionalization operates at the level of our ATen API. | ||
|
||
Why? | ||
- Compilers don't like mutations: Graph partitioning is harder if nodes have side effects, etc. | ||
|
||
Notes: | ||
- [[PyTorch Functionalization|PyTorch Functionalization]] |
39 changes: 39 additions & 0 deletions
39
...erence notes/104 Other/PyTorch - PyTorch 2 Export Post Training Quantization.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
--- | ||
authors: | ||
- "[[Jerry Zhang|Jerry Zhang]]" | ||
year: 2024 | ||
tags: | ||
- documentation | ||
url: https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html | ||
share: true | ||
--- | ||
Uses `prepare_pt2e` and `convert_pt2e`. | ||
|
||
``` | ||
float_model(Python) Example Input | ||
\ / | ||
\ / | ||
—------------------------------------------------------- | ||
| export | | ||
—------------------------------------------------------- | ||
| | ||
FX Graph in ATen Backend Specific Quantizer | ||
| / | ||
—-------------------------------------------------------- | ||
| prepare_pt2e | | ||
—-------------------------------------------------------- | ||
| | ||
Calibrate/Train | ||
| | ||
—-------------------------------------------------------- | ||
| convert_pt2e | | ||
—-------------------------------------------------------- | ||
| | ||
Quantized Model | ||
| | ||
—-------------------------------------------------------- | ||
| Lowering | | ||
—-------------------------------------------------------- | ||
| | ||
Executorch, Inductor or <Other Backends> | ||
``` |
25 changes: 25 additions & 0 deletions
25
docs/100 Reference notes/104 Other/PyTorch - Quantization.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
--- | ||
authors: | ||
- "[[PyTorch Quantization for TensorRT|PyTorch Quantization for TensorRT]]" | ||
year: 2024 | ||
tags: | ||
- documentation | ||
url: https://pytorch.org/docs/main/quantization.html#prototype-pytorch-2-export-quantization | ||
share: true | ||
--- | ||
### Backend/Hardware Support | ||
|
||
| Hardware | Kernel Library | Eager Mode Quantization | FX Graph Mode Quantization | Quantization Mode Support | | ||
| ---------- | -------------------------- | ------------------------------------ | -------------------------- | ------------------------- | | ||
| server CPU | fbgemm/onednn | Supported | | All Supported | | ||
| mobile CPU | qnnpack/xnnpack | | | | | ||
| server GPU | TensorRT (early prototype) | Not support this it requires a graph | Supported | Static Quantization | | ||
|
||
Today, PyTorch supports the following backends for running quantized operators efficiently: | ||
|
||
- x86 CPUs with AVX2 support or higher (without AVX2 some operations have inefficient implementations), via x86 optimized by [fbgemm](https://github.com/pytorch/FBGEMM) and [onednn](https://github.com/oneapi-src/oneDNN) (see the details at [RFC](https://github.com/pytorch/pytorch/issues/83888)) | ||
- ARM CPUs (typically found in mobile/embedded devices), via [qnnpack](https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/native/quantized/cpu/qnnpack) | ||
- (early prototype) support for NVidia GPU via [TensorRT](https://developer.nvidia.com/tensorrt) through fx2trt (to be open sourced) | ||
|
||
Note: | ||
- This is a bit old, as fx2trt is already available in [torch-tensorrt](https://pytorch.org/TensorRT/_modules/torch_tensorrt/fx/fx2trt.html). However, there |
37 changes: 37 additions & 0 deletions
37
...ther/PyTorch Compilers - What makes PyTorch beloved makes it hard to compile.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
--- | ||
authors: | ||
- "[[Peng Wu|Peng Wu]]" | ||
year: 2022 | ||
tags: | ||
- presentation | ||
url: https://chips-compilers-mlsys-22.github.io/assets/slides/PyTorch%20Compilers%20(Compiler%20&%20Chips%20Symposium%202022).pdf | ||
share: true | ||
--- | ||
**Multiple pytorch compilers** | ||
- TorchScript (torch.jit.script, torch.jit.trace) | ||
- supports python subset | ||
- full graph capture = [[Ahead-of-Time (AOT) Compilation|Ahead-of-Time (AOT) Compilation]] | ||
- executed by TS interpreter | ||
- nnc, nvfuser | ||
- torch.fx | ||
- torch.package, torch.deploy | ||
- torch-mlir | ||
- TorchDynamo, TorchInductor | ||
- TorchDynamo captures partial graphs (if strict=False), and falls-back to eager. | ||
|
||
|
||
**What makes TorchDynamo graph capture sound and out-of-the-box?** | ||
- Partial graph capture: Ability to skip unwanted parts of eager | ||
- Guarded graphs: Ability to check if captured graph is valid for execution | ||
- Note: Basically, it inserts assertions/runtime checks to see that the partial graph is sound at runtime, if not, it jit recompiles. | ||
- Just-in-time recapture: recapture a graph if captured graph is invalid for execution | ||
|
||
**Dynamo workflow** | ||
- Captures FX Graph | ||
- Sends FX Graph to compiler hook to compile (which can be another compiler like TRT or torchscript) | ||
|
||
![[Pasted image 20240926160205.png|800]] | ||
|
||
Note: tbh this seems like an arbitrary separation, because torchdynamo also is meant for inference (torch.export), but this is probably because this tutorial is 2 years old | ||
|
||
|
20 changes: 20 additions & 0 deletions
20
...erence notes/104 Other/PyTorch Conference 2024 - What’s new in torch.export?.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
--- | ||
authors: | ||
- "[[Avik Chaudhuri|Avik Chaudhuri]]" | ||
year: 2024 | ||
tags: | ||
- presentation | ||
url: https://static.sched.com/hosted_files/pytorch2024/6b/What%E2%80%99s%20new%20in%20torch.export_.pptx.pdf?_gl=1*1s5cwnu*_gcl_au*MTk3MjgxODE5OC4xNzI3MjU4NDM2 | ||
share: true | ||
--- | ||
## [Recap] What is torch.export and why? | ||
|
||
- "Sound", whole-graph capture of pytorch models | ||
- Emits "IR": backend-agnostic | ||
- For easier backend-specific lowering (trt, etc) | ||
- For python-free environments | ||
|
||
## Composable APIs | ||
- Useful: torch.export.export_for_inference | ||
|
||
|
21 changes: 21 additions & 0 deletions
21
...erence notes/104 Other/PyTorch Eager Mode Quantization TensorRT Acceleration.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
--- | ||
authors: | ||
- "[[Lei Mao|Lei Mao]]" | ||
year: 2024 | ||
tags: | ||
- website | ||
- paper | ||
url: https://leimao.github.io/blog/PyTorch-Eager-Mode-Quantization-TensorRT-Acceleration/ | ||
share: true | ||
--- | ||
|
||
> [!tldr] Abstract | ||
> The TensorRT acceleration for the quantized PyTorch model from the PyTorch eager mode quantization interface involves three steps: | ||
> | ||
> 1. Perform PyTorch eager mode quantization on the floating-point PyTorch model in PyTorch and export the quantized PyTorch model to ONNX. | ||
> 2. Fix the quantized ONNX model graph so that it can be parsed by the TensorRT parser. | ||
> 3. Build the quantized ONNX model to a TensorRT engine, profile the performance, and verify the accuracy.> 1 | ||
> | ||
> The source code for this post can be found on [GitHub](https://leimao.github.io/blog/PyTorch-Eager-Mode-Quantization-TensorRT-Acceleration/#:~:text=be%20found%20on-,GitHub,-.) . | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.