Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] FP16 Clip and Handle Bias introduces insufficient optimization. #23613

Open
SuhwanSong opened this issue Feb 7, 2025 · 0 comments
Labels
performance issues related to performance regressions

Comments

@SuhwanSong
Copy link

Describe the issue

From commit 530a2d7, ONNX Runtime (ORT) enables FP16 Clip and handles bias, introducing 2X latency.

•	ONNX Version: 1.16.0
•	Opset Version: 21
•	Hardware: CPU-only execution (single-threaded)
•	Tested Operator: Clip (with min and max)

Original Model

Image

Optimized Model of 82036b0

Image

Optimized Model of 530a2d7

Image

Result of 82036b0

Session creation time cost: 0.00417382 s
First inference time cost: 0 ms
Total inference time cost: 1.22059 s
Total inference requests: 16384
Average inference time cost: 0.0744988 ms
Total inference run time: 1.22691 s
Number of inferences per second: 13353.8
Avg CPU usage: 4 %
Peak working set size: 24379392 bytes

Result of 530a2d7

Session creation time cost: 0.00390387 s
First inference time cost: 0 ms
Total inference time cost: 2.57813 s
Total inference requests: 16384
Average inference time cost: 0.157357 ms
Total inference run time: 2.58419 s
Number of inferences per second: 6340.1
Avg CPU usage: 4 %
Peak working set size: 23986176 bytes

To reproduce

  1. Download and unzip poc.zip
  2. Run the following command.
./onnxruntime_perf_test -r 16384 -m times -o 99 -x 1 -y 1 -c 1 ./poc/model.onnx

Urgency

No response

Platform

Linux

OS Version

6.8.0

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.20.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

poc.zip

Is this a quantized model?

No

@SuhwanSong SuhwanSong added the performance issues related to performance regressions label Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance issues related to performance regressions
Projects
None yet
Development

No branches or pull requests

1 participant