You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: _sources/how-to/configure-workflows/model-opt-and-transform/onnx.md.txt
+70
Original file line number
Diff line number
Diff line change
@@ -5,21 +5,79 @@
5
5
Olive provides multiple transformations and optimizations based on various ONNX to improve model performance.
6
6
7
7
## Model Optimizer
8
+
8
9
`OnnxPeepholeOptimizer` optimizes an ONNX model by fusing nodes. Fusing nodes involves merging multiple nodes in a model into a single node to
9
10
reduce the computational cost and improve the performance of the model. The optimization process involves analyzing the structure of the ONNX model and identifying nodes that can be fused.
10
11
11
12
Also, inserts a `Cast` operation for cases where `ArgMax` input. For example, before ONNXRuntime 1.20, TensorProto.INT64 isn't supported on CPU or CUDA EP so a `Cast` operator inserted to cast the inputs to TensorProto.INT32.
12
13
14
+
The `OnnxPeepholeOptimizer` integrates `onnxscript` and `onnxoptimizer` to optimize ONNX models. By default, [`onnxscript.optimizer.optimize`](https://onnxscript.ai/tutorial/optimizer/optimize.html) will run automatically. To enable [`onnxoptimizer.optimize`](https://github.com/onnx/optimizer), set `"onnxoptimizer": true` in the pass configuration.
Please refer to [OnnxPeepholeOptimizer](../../../reference/pass.rst#onnx_peephole_optimizer) for more details about the pass and its config parameters.
14
70
15
71
### Example Configuration
72
+
16
73
```json
17
74
{
18
75
"type": "OnnxPeepholeOptimizer"
19
76
}
20
77
```
21
78
22
79
## ORT Transformers Optimization
80
+
23
81
While ONNX Runtime automatically applies most optimizations while loading transformer models, some of the latest optimizations that have not
24
82
yet been integrated into ONNX Runtime.
25
83
`OrtTransformersOptimization` provides an offline capability to optimize [transformers](https://huggingface.co/docs/transformers/index) models
@@ -32,16 +90,20 @@ for more details on the optimizations done by this tool.
32
90
Please refer to [OrtTransformersOptimization](../../../reference/pass.rst#ort_transformers_optimization) for more details about the pass and its config parameters.
33
91
34
92
### Example Configuration
93
+
35
94
```json
36
95
{
37
96
"type": "OrtTransformersOptimization",
38
97
"model_type": "bert"
39
98
}
40
99
```
100
+
41
101
## Append Pre/Post Processing Ops
102
+
42
103
`AppendPrePostProcessingOps` inserts pre and post processing ops into the ONNX graph.
43
104
44
105
### Example Configuration
106
+
45
107
```json
46
108
{
47
109
"type": "AppendPrePostProcessingOps",
@@ -51,6 +113,7 @@ Please refer to [OrtTransformersOptimization](../../../reference/pass.rst#ort_tr
51
113
}
52
114
}
53
115
```
116
+
54
117
```json
55
118
{
56
119
"type": "AppendPrePostProcessingOps",
@@ -66,6 +129,7 @@ You can refer to [here](https://github.com/microsoft/onnxruntime-extensions/blob
66
129
67
130
* Olive introduces two placeholders to represent the model input/output shape dimension value: `__model_input__` and `__model_output__`.
68
131
* To support the IoMapEntry, the step need choose use the full form. For example:
132
+
69
133
```json
70
134
"YCbCrToPixels": {
71
135
"params": {
@@ -78,6 +142,7 @@ You can refer to [here](https://github.com/microsoft/onnxruntime-extensions/blob
78
142
],
79
143
}
80
144
```
145
+
81
146
* The `tool_command_args` will be used to describe the input parameters to create the `PrePostProcessor` instance. It is list of `PrePostProcessorInput`.
82
147
The `name` is the tensor name. The `data_type` and `shape` will be used to create the tensor type. The `shape` can be a list of integers or a list of string.
83
148
@@ -167,6 +232,7 @@ Here are some examples to describe the pre/post processing which is exactly same
167
232
`InsertBeamSearch` chains two model components (for example, encoder and decoder) together by inserting beam search op in between them.
168
233
169
234
### Example Configuration
235
+
170
236
```json
171
237
{
172
238
"type": "InsertBeamSearch",
@@ -175,13 +241,15 @@ Here are some examples to describe the pre/post processing which is exactly same
175
241
```
176
242
177
243
## ORT Performance Tuning
244
+
178
245
ONNX Runtime provides high performance across a range of hardware options through its Execution Providers interface for different execution
179
246
environments.
180
247
For each model running with each execution provider, there are settings that can be tuned (e.g. thread number, execution mode, etc) to
181
248
improve performance.
182
249
`OrtSessionParamsTuning` covers basic knobs that can be leveraged to find the best performance for your model and hardware.
183
250
184
251
### Example Configuration
252
+
185
253
```json
186
254
{
187
255
"type": "OrtSessionParamsTuning",
@@ -220,6 +288,7 @@ LoRA, QLoRA and related techniques allow us to fine-tune a pre-trained model by
0 commit comments