[PUBLISHER] Merge #41

* PUSH NOTE : PyTorch Functionalization.md * PUSH NOTE : Ahead-of-Time (AOT) Compilation.md * PUSH NOTE : PyTorch Quantization for TensorRT.md * PUSH NOTE : Edward Z. Yang.md * PUSH NOTE : Let's talk about the Python Dispatcher.md * PUSH NOTE : PyTorch - Functionalization in PyTorch - Everything you need to know.md * PUSH NOTE : PyTorch - ExecuTorch - Export IR Specification.md * PUSH NOTE : PyTorch Compilers - What makes PyTorch beloved makes it hard to compile.md * PUSH ATTACHMENT : Pasted image 20240926160205.png * PUSH NOTE : PyTorch Eager Mode Quantization TensorRT Acceleration.md * PUSH NOTE : PyTorch - Quantization.md * PUSH NOTE : PyTorch - PyTorch 2 Export Post Training Quantization.md * PUSH NOTE : PyTorch - ExecuTorch - Quantization Overview.md * PUSH ATTACHMENT : Pasted image 20240925193351.png * PUSH NOTE : PyTorch - ExecuTorch - How ExecuTorch works?.md * PUSH NOTE : PyTorch Conference 2024 - What’s new in torch.export?.md * PUSH NOTE : Reinforcement Learning - An Introduction - Chapter 9.md
dgcnz · Sep 28, 2024 · c33cfa6 · c33cfa6
1 parent 4765516
commit c33cfa6
Show file tree

Hide file tree

Showing 17 changed files with 303 additions and 2 deletions.
diff --git a/docs/000 Zettelkasten/Ahead-of-Time (AOT) Compilation.md b/docs/000 Zettelkasten/Ahead-of-Time (AOT) Compilation.md
@@ -0,0 +1,12 @@
+---
+tags:
+  - compilers
+  - pytorch
+  - optimization
+share: true
+---
+Generally: Compilation that occurs before the program is executed.
+
+Specifically to ML (PyTorch): 
+- When a model is AOT compiled (using `torch.jit.script`(or trace) or `torch.export`), the entire program is translated from python into an intermediate representation that is independent of it. That is, you don't need a python interpreter to run that IR.
+- Note: torchscript is AOT in the sense that it requires to capture the whole graph before runtime but it performs further optimizations just-in-time.
diff --git a/docs/000 Zettelkasten/PyTorch Functionalization.md b/docs/000 Zettelkasten/PyTorch Functionalization.md
@@ -0,0 +1,13 @@
+---
+tags:
+  - pytorch
+  - compilers
+share: true
+---
+Given a program/function of PyTorch operators, functionalization will return a new function, that:
+1. Has the same semantics as the old function
+2. Has no mutations in it
+
+Functionalization operates at the level of our ATen API.
+
+More info on [[PyTorch - Functionalization in PyTorch - Everything you need to know|PyTorch - Functionalization in PyTorch - Everything you need to know]]
diff --git a/docs/000 Zettelkasten/PyTorch Quantization for TensorRT.md b/docs/000 Zettelkasten/PyTorch Quantization for TensorRT.md
@@ -0,0 +1,12 @@
+---
+tags:
+  - quantization
+  - efficient_dl
+share: true
+---
+
+There seems to be quite a few possible ways to do this:
+- [[PyTorch Eager Mode Quantization TensorRT Acceleration|PyTorch Eager Mode Quantization TensorRT Acceleration]] 
+	- 1. torchao quantization 2. ONNX conversion 3. Graph Surgery (changing some ops in the onnx graph) 4.. tensorrt conversion
+	- Seems very cumbersome
+- 
diff --git a/docs/100 Reference notes/102 Authors/Edward Z. Yang.md b/docs/100 Reference notes/102 Authors/Edward Z. Yang.md
@@ -0,0 +1,12 @@
+---
+affiliation:
+  - "[[FAIR|FAIR]]"
+  - "[[Stanford|Stanford]]"
+  - "[[MIT|MIT]]"
+  - "[[PyTorch|PyTorch]]"
+share: true
+---
+Notes:
+- Has a pretty cool [YouTube channel](https://www.youtube.com/@edwardzyang) where he shares (bi-weekly) PyTorch meetings
+	- For me, it's a nice source to get more involved with PyTorch compiler-ish libraries/tools like [[ExecuTorch|ExecuTorch]], [[torch.export|torch.export]]
+	- Also it is interesting to see the interaction between engineers
diff --git a/docs/100 Reference notes/104 Other/Let's talk about the Python Dispatcher.md b/docs/100 Reference notes/104 Other/Let's talk about the Python Dispatcher.md
@@ -0,0 +1,9 @@
+---
+authors:
+  - "[[Edward Z. Yang|Edward Z. Yang]]"
+year: 2020
+tags:
+  - blog
+url: http://blog.ezyang.com/2020/09/lets-talk-about-the-pytorch-dispatcher/
+share: true
+---
diff --git a/...100 Reference notes/104 Other/PyTorch - ExecuTorch - Export IR Specification.md b/...100 Reference notes/104 Other/PyTorch - ExecuTorch - Export IR Specification.md
@@ -0,0 +1,25 @@
+---
+authors:
+  - "[[PyTorch - Functionalization in PyTorch - Everything you need to know|PyTorch - Functionalization in PyTorch - Everything you need to know]]"
+year: 2024
+tags:
+  - paper
+url: https://pytorch.org/executorch/main/ir-exir.html
+share: true
+---
+The Exported IR is a specification that consists of the following parts:
+
+1. A definition of computation graph model.
+2. Set of operators allowed in the graph.
+
+A dialect also provides further constraints meant for a specific purpose or stage in some compilation phase. Some dialects are:
+- aten dialect
+- edge dialect
+- backend dialect
+
+Executorch compilation first exports to aten, then to edge and finally to backend.
+
+
+## Aten Dialect
+
+- [[PyTorch Functionalization|PyTorch Functionalization]] is performed, removing any tensor aliases and mutations, and allowing for more flexible graph transformations to be made.
diff --git a/docs/100 Reference notes/104 Other/PyTorch - ExecuTorch - How ExecuTorch works?.md b/docs/100 Reference notes/104 Other/PyTorch - ExecuTorch - How ExecuTorch works?.md
@@ -0,0 +1,38 @@
+---
+authors:
+  - "[[PyTorch Quantization for TensorRT|PyTorch Quantization for TensorRT]]"
+year: 2024
+tags:
+  - pytorch
+  - compilers
+  - efficient_dl
+  - documentation
+url: https://pytorch.org/executorch/main/intro-how-it-works
+share: true
+---
+# What are the steps to run a model with ExecuTorch?
+
+## 1. Export the model
+
+- Capture the pytorch program as a *graph* 
+## 2. Compile the exported model to an ExecuTorch program
+
+Captured Graph -> ExecuTorch program
+
+Possible Optimizations:
+- Compressing the model (e.g., quantization) 
+- Lowering subgraphs to on-device specialized hardware accelerators to improve latency.
+- memory planning, i.e. to efficiently plan the location of intermediate tensors to reduce the runtime memory footprint.
+
+## 3. Run the ExecuTorch program to a target device
+
+- Light runtime with memory planning for fast inference :)
+
+## Key Benefits
+
+- Export that is robust and powerful
+- Operator Standardization
+- Standardization for compiler interfaces (aka delegates) and the OSS ecosystem
+- First-party SDK and toolchain
+- Ease of customization
+- Low overhead runtime and execution
diff --git a/docs/100 Reference notes/104 Other/PyTorch - ExecuTorch - Quantization Overview.md b/docs/100 Reference notes/104 Other/PyTorch - ExecuTorch - Quantization Overview.md
@@ -0,0 +1,15 @@
+---
+authors:
+  - "[[PyTorch Quantization for TensorRT|PyTorch Quantization for TensorRT]]"
+year: 2024
+tags:
+  - documentation
+url: https://pytorch.org/executorch/main/quantization-overview.html
+share: true
+---
+
+
+![[Pasted image 20240925193351.png|400]]
+
+> Quantization is usually tied to execution backends that have quantized operators implemented. Thus each backend is opinionated about how the model should be quantized, expressed in a backend specific `Quantizer` class.
+
diff --git a/...4 Other/PyTorch - Functionalization in PyTorch - Everything you need to know.md b/...4 Other/PyTorch - Functionalization in PyTorch - Everything you need to know.md
@@ -0,0 +1,22 @@
+---
+authors:
+  - "[[Brian Hirsh|Brian Hirsh]]"
+year: 2023
+tags:
+  - documentation
+url: https://dev-discuss.pytorch.org/t/functionalization-in-pytorch-everything-you-wanted-to-know/965
+share: true
+---
+Given a program/function of PyTorch operators, functionalization will return a new function, that:
+1. Has the same semantics as the old function
+2. Has no mutations in it
+
+Exposed in [functorch API](https://pytorch.org/functorch/0.2.0/generated/functorch.experimental.functionalize.html?highlight=functionalize#functorch.experimental.functionalize).
+
+Functionalization operates at the level of our ATen API.
+
+Why?
+- Compilers don't like mutations: Graph partitioning is harder if nodes have side effects, etc.
+
+Notes:
+- [[PyTorch Functionalization|PyTorch Functionalization]]
diff --git a/...erence notes/104 Other/PyTorch - PyTorch 2 Export Post Training Quantization.md b/...erence notes/104 Other/PyTorch - PyTorch 2 Export Post Training Quantization.md
@@ -0,0 +1,39 @@
+---
+authors:
+  - "[[Jerry Zhang|Jerry Zhang]]"
+year: 2024
+tags:
+  - documentation
+url: https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html
+share: true
+---
+Uses `prepare_pt2e` and `convert_pt2e`.
+
+```
+float_model(Python)                          Example Input
+    \                                              /
+     \                                            /
+—-------------------------------------------------------
+|                        export                        |
+—-------------------------------------------------------
+                            |
+                    FX Graph in ATen     Backend Specific Quantizer
+                            |                       /
+—--------------------------------------------------------
+|                     prepare_pt2e                      |
+—--------------------------------------------------------
+                            |
+                     Calibrate/Train
+                            |
+—--------------------------------------------------------
+|                    convert_pt2e                       |
+—--------------------------------------------------------
+                            |
+                    Quantized Model
+                            |
+—--------------------------------------------------------
+|                       Lowering                        |
+—--------------------------------------------------------
+                            |
+        Executorch, Inductor or <Other Backends>
+```
diff --git a/docs/100 Reference notes/104 Other/PyTorch - Quantization.md b/docs/100 Reference notes/104 Other/PyTorch - Quantization.md
@@ -0,0 +1,25 @@
+---
+authors:
+  - "[[PyTorch Quantization for TensorRT|PyTorch Quantization for TensorRT]]"
+year: 2024
+tags:
+  - documentation
+url: https://pytorch.org/docs/main/quantization.html#prototype-pytorch-2-export-quantization
+share: true
+---
+### Backend/Hardware Support
+
+| Hardware   | Kernel Library             | Eager Mode Quantization              | FX Graph Mode Quantization | Quantization Mode Support |
+| ---------- | -------------------------- | ------------------------------------ | -------------------------- | ------------------------- |
+| server CPU | fbgemm/onednn              | Supported                            |                            | All Supported             |
+| mobile CPU | qnnpack/xnnpack            |                                      |                            |                           |
+| server GPU | TensorRT (early prototype) | Not support this it requires a graph | Supported                  | Static Quantization       |
+
+Today, PyTorch supports the following backends for running quantized operators efficiently:
+
+- x86 CPUs with AVX2 support or higher (without AVX2 some operations have inefficient implementations), via x86 optimized by [fbgemm](https://github.com/pytorch/FBGEMM) and [onednn](https://github.com/oneapi-src/oneDNN) (see the details at [RFC](https://github.com/pytorch/pytorch/issues/83888))
+- ARM CPUs (typically found in mobile/embedded devices), via [qnnpack](https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/native/quantized/cpu/qnnpack)
+- (early prototype) support for NVidia GPU via [TensorRT](https://developer.nvidia.com/tensorrt) through fx2trt (to be open sourced)
+
+Note:
+- This is a bit old, as fx2trt is already available in [torch-tensorrt](https://pytorch.org/TensorRT/_modules/torch_tensorrt/fx/fx2trt.html). However, there 
diff --git a/...ther/PyTorch Compilers - What makes PyTorch beloved makes it hard to compile.md b/...ther/PyTorch Compilers - What makes PyTorch beloved makes it hard to compile.md
@@ -0,0 +1,37 @@
+---
+authors:
+  - "[[Peng Wu|Peng Wu]]"
+year: 2022
+tags:
+  - presentation
+url: https://chips-compilers-mlsys-22.github.io/assets/slides/PyTorch%20Compilers%20(Compiler%20&%20Chips%20Symposium%202022).pdf
+share: true
+---
+**Multiple pytorch compilers**
+- TorchScript (torch.jit.script, torch.jit.trace)
+	- supports python subset
+	- full graph capture = [[Ahead-of-Time (AOT) Compilation|Ahead-of-Time (AOT) Compilation]]
+	- executed by TS interpreter
+- nnc, nvfuser
+- torch.fx
+- torch.package, torch.deploy
+- torch-mlir
+- TorchDynamo, TorchInductor
+	- TorchDynamo captures partial graphs (if strict=False), and falls-back to eager.
+
+
+**What makes TorchDynamo graph capture sound and out-of-the-box?**
+- Partial graph capture: Ability to skip unwanted parts of eager
+- Guarded graphs: Ability to check if captured graph is valid for execution
+	- Note: Basically, it inserts assertions/runtime checks to see that the partial graph is sound at runtime, if not, it jit recompiles.
+- Just-in-time recapture: recapture a graph if captured graph is invalid for execution
+
+**Dynamo workflow**
+- Captures FX Graph
+- Sends FX Graph to compiler hook to compile (which can be another compiler like TRT or torchscript)
+
+![[Pasted image 20240926160205.png|800]]
+
+Note: tbh this seems like an arbitrary separation, because torchdynamo also is meant for inference (torch.export), but this is probably because this tutorial is 2 years old
+
+
diff --git a/...erence notes/104 Other/PyTorch Conference 2024 - What’s new in torch.export?.md b/...erence notes/104 Other/PyTorch Conference 2024 - What’s new in torch.export?.md
@@ -0,0 +1,20 @@
+---
+authors:
+  - "[[Avik Chaudhuri|Avik Chaudhuri]]"
+year: 2024
+tags:
+  - presentation
+url: https://static.sched.com/hosted_files/pytorch2024/6b/What%E2%80%99s%20new%20in%20torch.export_.pptx.pdf?_gl=1*1s5cwnu*_gcl_au*MTk3MjgxODE5OC4xNzI3MjU4NDM2
+share: true
+---
+## [Recap] What is torch.export and why?
+
+- "Sound", whole-graph capture of pytorch models
+- Emits "IR": backend-agnostic
+- For easier backend-specific lowering (trt, etc)
+- For python-free environments
+
+## Composable APIs
+- Useful: torch.export.export_for_inference 
+
+
diff --git a/...erence notes/104 Other/PyTorch Eager Mode Quantization TensorRT Acceleration.md b/...erence notes/104 Other/PyTorch Eager Mode Quantization TensorRT Acceleration.md
@@ -0,0 +1,21 @@
+---
+authors:
+  - "[[Lei Mao|Lei Mao]]"
+year: 2024
+tags:
+  - website
+  - paper
+url: https://leimao.github.io/blog/PyTorch-Eager-Mode-Quantization-TensorRT-Acceleration/
+share: true
+---
+
+> [!tldr] Abstract
+> The TensorRT acceleration for the quantized PyTorch model from the PyTorch eager mode quantization interface involves three steps:
+> 
+> 1. Perform PyTorch eager mode quantization on the floating-point PyTorch model in PyTorch and export the quantized PyTorch model to ONNX.
+> 2. Fix the quantized ONNX model graph so that it can be parsed by the TensorRT parser.
+> 3. Build the quantized ONNX model to a TensorRT engine, profile the performance, and verify the accuracy.> 1
+> 
+> The source code for this post can be found on [GitHub](https://leimao.github.io/blog/PyTorch-Eager-Mode-Quantization-TensorRT-Acceleration/#:~:text=be%20found%20on-,GitHub,-.) .
+
+
diff --git a/...ference notes/104 Other/Reinforcement Learning - An Introduction - Chapter 9.md b/...ference notes/104 Other/Reinforcement Learning - An Introduction - Chapter 9.md
@@ -70,7 +70,7 @@ Where:
 	- Interpretation of 2 terms: Time is spent in a $s$ if an episode starts in $s$ or if another state transitions into $s$.
 
 
-- $\overline{VE}$ only guaranties local optimality.
+- $\overline{VE}$ only guarantees local optimality.
 
 
 ## 9.3 Stochastic-gradient and Semi-gradient Methods
@@ -125,7 +125,8 @@ Examples of $U_t$:
 > Where:
 > - $\mathbf{x}(s) = \left(x_1(s), \dots, x_d(s)\right)^\intercal$
 
-- Chapter also explores the convergence of TD(0) with SGD and linear approximation and finds it converges to the *TD fixed point* (Eqs. 9.11, 9.12), $\mathbf{w}_{TD}$.
+- The gradient Monte Carlo algorithm converges to the global optimum of the VE under linear function approximation if $\alpha$ is reduced over time according to the usual conditions.
+- Chapter also explores the convergence of TD(0) with SGD and linear approximation and finds it converges to the *TD fixed point* (Eqs. 9.11, 9.12), $\mathbf{w}_{TD}$. This is not the global optimum, but a point near the local optimum.
 
 
 > [!NOTE] Equation 9.14

diff --git a/docs/images/Pasted image 20240925193351.png b/docs/images/Pasted image 20240925193351.png
diff --git a/docs/images/Pasted image 20240926160205.png b/docs/images/Pasted image 20240926160205.png