Skip to content

Commit e895d0f

Browse files
author
olivedevteam
committed
Update docs from 152afe3
1 parent 61e3a7b commit e895d0f

File tree

3 files changed

+229
-1
lines changed

3 files changed

+229
-1
lines changed

_sources/how-to/configure-workflows/model-opt-and-transform/onnx.md.txt

+70
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,79 @@
55
Olive provides multiple transformations and optimizations based on various ONNX to improve model performance.
66

77
## Model Optimizer
8+
89
`OnnxPeepholeOptimizer` optimizes an ONNX model by fusing nodes. Fusing nodes involves merging multiple nodes in a model into a single node to
910
reduce the computational cost and improve the performance of the model. The optimization process involves analyzing the structure of the ONNX model and identifying nodes that can be fused.
1011

1112
Also, inserts a `Cast` operation for cases where `ArgMax` input. For example, before ONNXRuntime 1.20, TensorProto.INT64 isn't supported on CPU or CUDA EP so a `Cast` operator inserted to cast the inputs to TensorProto.INT32.
1213

14+
The `OnnxPeepholeOptimizer` integrates `onnxscript` and `onnxoptimizer` to optimize ONNX models. By default, [`onnxscript.optimizer.optimize`](https://onnxscript.ai/tutorial/optimizer/optimize.html) will run automatically. To enable [`onnxoptimizer.optimize`](https://github.com/onnx/optimizer), set `"onnxoptimizer": true` in the pass configuration.
15+
16+
### onnxscript.optimizer.optimize
17+
18+
| Optimization | Description |
19+
|------------------------------------|-----------------------------------------------------------------------------|
20+
| **Constant Folding** | Applies constant folding optimization to the model. |
21+
| **Constant Propagation** | Applies constant propagation optimization to the model. Applied as part of constant folding. |
22+
| **Sequence Simplification** | Simplifies Sequence-based ops (e.g., SequenceConstruct, ConcatFromSequence). Part of constant folding. |
23+
| **Remove Unused Nodes** | Removes unused nodes from the model. |
24+
| **Remove Unused Functions** | Removes unused function protos from the model. |
25+
| **Inline Functions with Unused Outputs** | Inlines function nodes with unused outputs. |
26+
| **Inline Simple Functions** | Inlines simple functions based on a node count threshold. |
27+
28+
### onnxoptimizer
29+
30+
| Optimization | Description |
31+
|------------------------------------|--------------------------------------------------------------------------------------|
32+
| **Eliminate Nop Cast** | Eliminates no-operation (nop) Casts. |
33+
| **Eliminate Nop Dropout** | Eliminates no-operation Dropouts. |
34+
| **Eliminate Nop Flatten** | Eliminates no-operation Flattens. |
35+
| **Extract Constant to Initializer** | Extracts constants to initializers. |
36+
| **Eliminate If with Const Cond** | Eliminates If nodes with constant conditions. |
37+
| **Eliminate Nop Monotone ArgMax** | Eliminates nop monotone ArgMax. |
38+
| **Eliminate Nop Pad** | Eliminates no-operation Pads. |
39+
| **Eliminate Nop Concat** | Eliminates no-operation Concats. |
40+
| **Eliminate Nop Split** | Eliminates no-operation Splits. |
41+
| **Eliminate Nop Expand** | Eliminates no-operation Expands. |
42+
| **Eliminate Shape Gather** | Eliminates Shape Gather operations. |
43+
| **Eliminate Slice after Shape** | Eliminates Slice nodes that occur after Shape nodes. |
44+
| **Eliminate Nop Transpose** | Eliminates no-operation Transposes. |
45+
| **Fuse Add Bias into Conv** | Fuses Add operations as biases into Conv layers. |
46+
| **Fuse BN into Conv** | Fuses BatchNormalization into Conv layers. |
47+
| **Fuse Consecutive Concats** | Fuses consecutive Concat operations. |
48+
| **Fuse Consecutive LogSoftmax** | Fuses consecutive LogSoftmax operations. |
49+
| **Fuse Consecutive Reduce+Unsqueeze** | Fuses consecutive Reduce and Unsqueeze operations. |
50+
| **Fuse Consecutive Squeezes** | Fuses consecutive Squeeze operations. |
51+
| **Fuse Consecutive Transposes** | Fuses consecutive Transpose operations. |
52+
| **Fuse MatMul+Add Bias into GEMM** | Fuses MatMul and Add operations into GEMM layers. |
53+
| **Fuse Pad into Conv** | Fuses Pad operations into Conv layers. |
54+
| **Fuse Pad into Pool** | Fuses Pad operations into Pool layers. |
55+
| **Fuse Transpose into GEMM** | Fuses Transpose operations into GEMM layers. |
56+
| **Fuse Concat into Reshape** | Fuses Concat operations into Reshape layers. |
57+
| **Eliminate Nop Reshape** | Eliminates no-operation Reshapes. |
58+
| **Eliminate Nop with Unit** | Eliminates no-operation nodes with unit values. |
59+
| **Eliminate Common Subexpression** | Eliminates common sub-expressions. |
60+
| **Fuse QKV** | Fuses query, key, and value layers in transformer models. |
61+
| **Fuse Consecutive Unsqueezes** | Fuses consecutive Unsqueeze operations. |
62+
| **Eliminate Deadend Nodes** | Eliminates dead-end nodes. |
63+
| **Eliminate Identity Nodes** | Eliminates Identity nodes. |
64+
| **Eliminate Shape Ops** | Eliminates Shape operations where possible. |
65+
| **Fuse Consecutive Slices** | Fuses consecutive Slice operations. |
66+
| **Eliminate Unused Initializer** | Eliminates unused initializers. |
67+
| **Eliminate Duplicate Initializer** | Eliminates duplicate initializers. |
68+
1369
Please refer to [OnnxPeepholeOptimizer](../../../reference/pass.rst#onnx_peephole_optimizer) for more details about the pass and its config parameters.
1470

1571
### Example Configuration
72+
1673
```json
1774
{
1875
"type": "OnnxPeepholeOptimizer"
1976
}
2077
```
2178

2279
## ORT Transformers Optimization
80+
2381
While ONNX Runtime automatically applies most optimizations while loading transformer models, some of the latest optimizations that have not
2482
yet been integrated into ONNX Runtime.
2583
`OrtTransformersOptimization` provides an offline capability to optimize [transformers](https://huggingface.co/docs/transformers/index) models
@@ -32,16 +90,20 @@ for more details on the optimizations done by this tool.
3290
Please refer to [OrtTransformersOptimization](../../../reference/pass.rst#ort_transformers_optimization) for more details about the pass and its config parameters.
3391

3492
### Example Configuration
93+
3594
```json
3695
{
3796
"type": "OrtTransformersOptimization",
3897
"model_type": "bert"
3998
}
4099
```
100+
41101
## Append Pre/Post Processing Ops
102+
42103
`AppendPrePostProcessingOps` inserts pre and post processing ops into the ONNX graph.
43104

44105
### Example Configuration
106+
45107
```json
46108
{
47109
"type": "AppendPrePostProcessingOps",
@@ -51,6 +113,7 @@ Please refer to [OrtTransformersOptimization](../../../reference/pass.rst#ort_tr
51113
}
52114
}
53115
```
116+
54117
```json
55118
{
56119
"type": "AppendPrePostProcessingOps",
@@ -66,6 +129,7 @@ You can refer to [here](https://github.com/microsoft/onnxruntime-extensions/blob
66129

67130
* Olive introduces two placeholders to represent the model input/output shape dimension value: `__model_input__` and `__model_output__`.
68131
* To support the IoMapEntry, the step need choose use the full form. For example:
132+
69133
```json
70134
"YCbCrToPixels": {
71135
"params": {
@@ -78,6 +142,7 @@ You can refer to [here](https://github.com/microsoft/onnxruntime-extensions/blob
78142
],
79143
}
80144
```
145+
81146
* The `tool_command_args` will be used to describe the input parameters to create the `PrePostProcessor` instance. It is list of `PrePostProcessorInput`.
82147
The `name` is the tensor name. The `data_type` and `shape` will be used to create the tensor type. The `shape` can be a list of integers or a list of string.
83148

@@ -167,6 +232,7 @@ Here are some examples to describe the pre/post processing which is exactly same
167232
`InsertBeamSearch` chains two model components (for example, encoder and decoder) together by inserting beam search op in between them.
168233

169234
### Example Configuration
235+
170236
```json
171237
{
172238
"type": "InsertBeamSearch",
@@ -175,13 +241,15 @@ Here are some examples to describe the pre/post processing which is exactly same
175241
```
176242

177243
## ORT Performance Tuning
244+
178245
ONNX Runtime provides high performance across a range of hardware options through its Execution Providers interface for different execution
179246
environments.
180247
For each model running with each execution provider, there are settings that can be tuned (e.g. thread number, execution mode, etc) to
181248
improve performance.
182249
`OrtSessionParamsTuning` covers basic knobs that can be leveraged to find the best performance for your model and hardware.
183250

184251
### Example Configuration
252+
185253
```json
186254
{
187255
"type": "OrtSessionParamsTuning",
@@ -220,6 +288,7 @@ LoRA, QLoRA and related techniques allow us to fine-tune a pre-trained model by
220288
### Example Configuration
221289

222290
a. As external initializers
291+
223292
```json
224293
{
225294
"type": "ExtractAdapters",
@@ -228,6 +297,7 @@ a. As external initializers
228297
```
229298

230299
b. As constant inputs with packed weights
300+
231301
```json
232302
{
233303
"type": "ExtractAdapters",

0 commit comments

Comments
 (0)