[Performance] FP16 Clip and Handle Bias introduces insufficient optimization. #23613

SuhwanSong · 2025-02-07T11:28:10Z

Describe the issue

From commit 530a2d7, ONNX Runtime (ORT) enables FP16 Clip and handles bias, introducing 2X latency.

•	ONNX Version: 1.16.0
•	Opset Version: 21
•	Hardware: CPU-only execution (single-threaded)
•	Tested Operator: Clip (with min and max)

Original Model

Optimized Model of `82036b0`

Optimized Model of `530a2d7`

Result of `82036b0`

Session creation time cost: 0.00417382 s
First inference time cost: 0 ms
Total inference time cost: 1.22059 s
Total inference requests: 16384
Average inference time cost: 0.0744988 ms
Total inference run time: 1.22691 s
Number of inferences per second: 13353.8
Avg CPU usage: 4 %
Peak working set size: 24379392 bytes

Result of `530a2d7`

Session creation time cost: 0.00390387 s
First inference time cost: 0 ms
Total inference time cost: 2.57813 s
Total inference requests: 16384
Average inference time cost: 0.157357 ms
Total inference run time: 2.58419 s
Number of inferences per second: 6340.1
Avg CPU usage: 4 %
Peak working set size: 23986176 bytes

To reproduce

Download and unzip poc.zip
Run the following command.

./onnxruntime_perf_test -r 16384 -m times -o 99 -x 1 -y 1 -c 1 ./poc/model.onnx

Urgency

No response

Platform

Linux

OS Version

6.8.0

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.20.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

poc.zip

Is this a quantized model?

No

The text was updated successfully, but these errors were encountered:

SuhwanSong added the performance issues related to performance regressions label Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] FP16 Clip and Handle Bias introduces insufficient optimization. #23613

[Performance] FP16 Clip and Handle Bias introduces insufficient optimization. #23613

SuhwanSong commented Feb 7, 2025

[Performance] FP16 Clip and Handle Bias introduces insufficient optimization. #23613

[Performance] FP16 Clip and Handle Bias introduces insufficient optimization. #23613

Comments

SuhwanSong commented Feb 7, 2025

Describe the issue

Original Model

Optimized Model of 82036b0

Optimized Model of 530a2d7

Result of 82036b0

Result of 530a2d7

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Optimized Model of `82036b0`

Optimized Model of `530a2d7`

Result of `82036b0`

Result of `530a2d7`