Skip to content

Commit

Permalink
[PUBLISHER] Merge #41
Browse files Browse the repository at this point in the history
* PUSH NOTE : PyTorch Functionalization.md

* PUSH NOTE : Ahead-of-Time (AOT) Compilation.md

* PUSH NOTE : PyTorch Quantization for TensorRT.md

* PUSH NOTE : Edward Z. Yang.md

* PUSH NOTE : Let's talk about the Python Dispatcher.md

* PUSH NOTE : PyTorch - Functionalization in PyTorch - Everything you need to know.md

* PUSH NOTE : PyTorch - ExecuTorch - Export IR Specification.md

* PUSH NOTE : PyTorch Compilers - What makes PyTorch beloved makes it hard to compile.md

* PUSH ATTACHMENT : Pasted image 20240926160205.png

* PUSH NOTE : PyTorch Eager Mode Quantization TensorRT Acceleration.md

* PUSH NOTE : PyTorch - Quantization.md

* PUSH NOTE : PyTorch - PyTorch 2 Export Post Training Quantization.md

* PUSH NOTE : PyTorch - ExecuTorch - Quantization Overview.md

* PUSH ATTACHMENT : Pasted image 20240925193351.png

* PUSH NOTE : PyTorch - ExecuTorch - How ExecuTorch works?.md

* PUSH NOTE : PyTorch Conference 2024 - What’s new in torch.export?.md

* PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 9.md
  • Loading branch information
dgcnz authored Sep 28, 2024
1 parent 4765516 commit c33cfa6
Show file tree
Hide file tree
Showing 17 changed files with 303 additions and 2 deletions.
12 changes: 12 additions & 0 deletions docs/000 Zettelkasten/Ahead-of-Time (AOT) Compilation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
tags:
- compilers
- pytorch
- optimization
share: true
---
Generally: Compilation that occurs before the program is executed.

Specifically to ML (PyTorch):
- When a model is AOT compiled (using `torch.jit.script`(or trace) or `torch.export`), the entire program is translated from python into an intermediate representation that is independent of it. That is, you don't need a python interpreter to run that IR.
- Note: torchscript is AOT in the sense that it requires to capture the whole graph before runtime but it performs further optimizations just-in-time.
13 changes: 13 additions & 0 deletions docs/000 Zettelkasten/PyTorch Functionalization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
tags:
- pytorch
- compilers
share: true
---
Given a program/function of PyTorch operators, functionalization will return a new function, that:
1. Has the same semantics as the old function
2. Has no mutations in it

Functionalization operates at the level of our ATen API.

More info on [[PyTorch - Functionalization in PyTorch - Everything you need to know|PyTorch - Functionalization in PyTorch - Everything you need to know]]
12 changes: 12 additions & 0 deletions docs/000 Zettelkasten/PyTorch Quantization for TensorRT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
tags:
- quantization
- efficient_dl
share: true
---

There seems to be quite a few possible ways to do this:
- [[PyTorch Eager Mode Quantization TensorRT Acceleration|PyTorch Eager Mode Quantization TensorRT Acceleration]]
- 1. torchao quantization 2. ONNX conversion 3. Graph Surgery (changing some ops in the onnx graph) 4.. tensorrt conversion
- Seems very cumbersome
-
12 changes: 12 additions & 0 deletions docs/100 Reference notes/102 Authors/Edward Z. Yang.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
affiliation:
- "[[FAIR|FAIR]]"
- "[[Stanford|Stanford]]"
- "[[MIT|MIT]]"
- "[[PyTorch|PyTorch]]"
share: true
---
Notes:
- Has a pretty cool [YouTube channel](https://www.youtube.com/@edwardzyang) where he shares (bi-weekly) PyTorch meetings
- For me, it's a nice source to get more involved with PyTorch compiler-ish libraries/tools like [[ExecuTorch|ExecuTorch]], [[torch.export|torch.export]]
- Also it is interesting to see the interaction between engineers
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
authors:
- "[[Edward Z. Yang|Edward Z. Yang]]"
year: 2020
tags:
- blog
url: http://blog.ezyang.com/2020/09/lets-talk-about-the-pytorch-dispatcher/
share: true
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
authors:
- "[[PyTorch - Functionalization in PyTorch - Everything you need to know|PyTorch - Functionalization in PyTorch - Everything you need to know]]"
year: 2024
tags:
- paper
url: https://pytorch.org/executorch/main/ir-exir.html
share: true
---
The Exported IR is a specification that consists of the following parts:

1. A definition of computation graph model.
2. Set of operators allowed in the graph.

A dialect also provides further constraints meant for a specific purpose or stage in some compilation phase. Some dialects are:
- aten dialect
- edge dialect
- backend dialect

Executorch compilation first exports to aten, then to edge and finally to backend.


## Aten Dialect

- [[PyTorch Functionalization|PyTorch Functionalization]] is performed, removing any tensor aliases and mutations, and allowing for more flexible graph transformations to be made.
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
authors:
- "[[PyTorch Quantization for TensorRT|PyTorch Quantization for TensorRT]]"
year: 2024
tags:
- pytorch
- compilers
- efficient_dl
- documentation
url: https://pytorch.org/executorch/main/intro-how-it-works
share: true
---
# What are the steps to run a model with ExecuTorch?

## 1. Export the model

- Capture the pytorch program as a *graph*
## 2. Compile the exported model to an ExecuTorch program

Captured Graph -> ExecuTorch program

Possible Optimizations:
- Compressing the model (e.g., quantization)
- Lowering subgraphs to on-device specialized hardware accelerators to improve latency.
- memory planning, i.e. to efficiently plan the location of intermediate tensors to reduce the runtime memory footprint.

## 3. Run the ExecuTorch program to a target device

- Light runtime with memory planning for fast inference :)

## Key Benefits

- Export that is robust and powerful
- Operator Standardization
- Standardization for compiler interfaces (aka delegates) and the OSS ecosystem
- First-party SDK and toolchain
- Ease of customization
- Low overhead runtime and execution
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
authors:
- "[[PyTorch Quantization for TensorRT|PyTorch Quantization for TensorRT]]"
year: 2024
tags:
- documentation
url: https://pytorch.org/executorch/main/quantization-overview.html
share: true
---


![[Pasted image 20240925193351.png|400]]

> Quantization is usually tied to execution backends that have quantized operators implemented. Thus each backend is opinionated about how the model should be quantized, expressed in a backend specific `Quantizer` class.
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
authors:
- "[[Brian Hirsh|Brian Hirsh]]"
year: 2023
tags:
- documentation
url: https://dev-discuss.pytorch.org/t/functionalization-in-pytorch-everything-you-wanted-to-know/965
share: true
---
Given a program/function of PyTorch operators, functionalization will return a new function, that:
1. Has the same semantics as the old function
2. Has no mutations in it

Exposed in [functorch API](https://pytorch.org/functorch/0.2.0/generated/functorch.experimental.functionalize.html?highlight=functionalize#functorch.experimental.functionalize).

Functionalization operates at the level of our ATen API.

Why?
- Compilers don't like mutations: Graph partitioning is harder if nodes have side effects, etc.

Notes:
- [[PyTorch Functionalization|PyTorch Functionalization]]
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
authors:
- "[[Jerry Zhang|Jerry Zhang]]"
year: 2024
tags:
- documentation
url: https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html
share: true
---
Uses `prepare_pt2e` and `convert_pt2e`.

```
float_model(Python) Example Input
\ /
\ /
—-------------------------------------------------------
| export |
—-------------------------------------------------------
|
FX Graph in ATen Backend Specific Quantizer
| /
—--------------------------------------------------------
| prepare_pt2e |
—--------------------------------------------------------
|
Calibrate/Train
|
—--------------------------------------------------------
| convert_pt2e |
—--------------------------------------------------------
|
Quantized Model
|
—--------------------------------------------------------
| Lowering |
—--------------------------------------------------------
|
Executorch, Inductor or <Other Backends>
```
25 changes: 25 additions & 0 deletions docs/100 Reference notes/104 Other/PyTorch - Quantization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
authors:
- "[[PyTorch Quantization for TensorRT|PyTorch Quantization for TensorRT]]"
year: 2024
tags:
- documentation
url: https://pytorch.org/docs/main/quantization.html#prototype-pytorch-2-export-quantization
share: true
---
### Backend/Hardware Support

| Hardware | Kernel Library | Eager Mode Quantization | FX Graph Mode Quantization | Quantization Mode Support |
| ---------- | -------------------------- | ------------------------------------ | -------------------------- | ------------------------- |
| server CPU | fbgemm/onednn | Supported | | All Supported |
| mobile CPU | qnnpack/xnnpack | | | |
| server GPU | TensorRT (early prototype) | Not support this it requires a graph | Supported | Static Quantization |

Today, PyTorch supports the following backends for running quantized operators efficiently:

- x86 CPUs with AVX2 support or higher (without AVX2 some operations have inefficient implementations), via x86 optimized by [fbgemm](https://github.com/pytorch/FBGEMM) and [onednn](https://github.com/oneapi-src/oneDNN) (see the details at [RFC](https://github.com/pytorch/pytorch/issues/83888))
- ARM CPUs (typically found in mobile/embedded devices), via [qnnpack](https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/native/quantized/cpu/qnnpack)
- (early prototype) support for NVidia GPU via [TensorRT](https://developer.nvidia.com/tensorrt) through fx2trt (to be open sourced)

Note:
- This is a bit old, as fx2trt is already available in [torch-tensorrt](https://pytorch.org/TensorRT/_modules/torch_tensorrt/fx/fx2trt.html). However, there
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
authors:
- "[[Peng Wu|Peng Wu]]"
year: 2022
tags:
- presentation
url: https://chips-compilers-mlsys-22.github.io/assets/slides/PyTorch%20Compilers%20(Compiler%20&%20Chips%20Symposium%202022).pdf
share: true
---
**Multiple pytorch compilers**
- TorchScript (torch.jit.script, torch.jit.trace)
- supports python subset
- full graph capture = [[Ahead-of-Time (AOT) Compilation|Ahead-of-Time (AOT) Compilation]]
- executed by TS interpreter
- nnc, nvfuser
- torch.fx
- torch.package, torch.deploy
- torch-mlir
- TorchDynamo, TorchInductor
- TorchDynamo captures partial graphs (if strict=False), and falls-back to eager.


**What makes TorchDynamo graph capture sound and out-of-the-box?**
- Partial graph capture: Ability to skip unwanted parts of eager
- Guarded graphs: Ability to check if captured graph is valid for execution
- Note: Basically, it inserts assertions/runtime checks to see that the partial graph is sound at runtime, if not, it jit recompiles.
- Just-in-time recapture: recapture a graph if captured graph is invalid for execution

**Dynamo workflow**
- Captures FX Graph
- Sends FX Graph to compiler hook to compile (which can be another compiler like TRT or torchscript)

![[Pasted image 20240926160205.png|800]]

Note: tbh this seems like an arbitrary separation, because torchdynamo also is meant for inference (torch.export), but this is probably because this tutorial is 2 years old


Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
authors:
- "[[Avik Chaudhuri|Avik Chaudhuri]]"
year: 2024
tags:
- presentation
url: https://static.sched.com/hosted_files/pytorch2024/6b/What%E2%80%99s%20new%20in%20torch.export_.pptx.pdf?_gl=1*1s5cwnu*_gcl_au*MTk3MjgxODE5OC4xNzI3MjU4NDM2
share: true
---
## [Recap] What is torch.export and why?

- "Sound", whole-graph capture of pytorch models
- Emits "IR": backend-agnostic
- For easier backend-specific lowering (trt, etc)
- For python-free environments

## Composable APIs
- Useful: torch.export.export_for_inference


Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
authors:
- "[[Lei Mao|Lei Mao]]"
year: 2024
tags:
- website
- paper
url: https://leimao.github.io/blog/PyTorch-Eager-Mode-Quantization-TensorRT-Acceleration/
share: true
---

> [!tldr] Abstract
> The TensorRT acceleration for the quantized PyTorch model from the PyTorch eager mode quantization interface involves three steps:
>
> 1. Perform PyTorch eager mode quantization on the floating-point PyTorch model in PyTorch and export the quantized PyTorch model to ONNX.
> 2. Fix the quantized ONNX model graph so that it can be parsed by the TensorRT parser.
> 3. Build the quantized ONNX model to a TensorRT engine, profile the performance, and verify the accuracy.> 1
>
> The source code for this post can be found on [GitHub](https://leimao.github.io/blog/PyTorch-Eager-Mode-Quantization-TensorRT-Acceleration/#:~:text=be%20found%20on-,GitHub,-.) .

Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Where:
- Interpretation of 2 terms: Time is spent in a $s$ if an episode starts in $s$ or if another state transitions into $s$.


- $\overline{VE}$ only guaranties local optimality.
- $\overline{VE}$ only guarantees local optimality.


## 9.3 Stochastic-gradient and Semi-gradient Methods
Expand Down Expand Up @@ -125,7 +125,8 @@ Examples of $U_t$:
> Where:
> - $\mathbf{x}(s) = \left(x_1(s), \dots, x_d(s)\right)^\intercal$
- Chapter also explores the convergence of TD(0) with SGD and linear approximation and finds it converges to the *TD fixed point* (Eqs. 9.11, 9.12), $\mathbf{w}_{TD}$.
- The gradient Monte Carlo algorithm converges to the global optimum of the VE under linear function approximation if $\alpha$ is reduced over time according to the usual conditions.
- Chapter also explores the convergence of TD(0) with SGD and linear approximation and finds it converges to the *TD fixed point* (Eqs. 9.11, 9.12), $\mathbf{w}_{TD}$. This is not the global optimum, but a point near the local optimum.


> [!NOTE] Equation 9.14
Expand Down
Binary file added docs/images/Pasted image 20240925193351.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/Pasted image 20240926160205.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit c33cfa6

Please sign in to comment.