You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Session creation time cost: 0.00417382 s
First inference time cost: 0 ms
Total inference time cost: 1.22059 s
Total inference requests: 16384
Average inference time cost: 0.0744988 ms
Total inference run time: 1.22691 s
Number of inferences per second: 13353.8
Avg CPU usage: 4 %
Peak working set size: 24379392 bytes
Session creation time cost: 0.00390387 s
First inference time cost: 0 ms
Total inference time cost: 2.57813 s
Total inference requests: 16384
Average inference time cost: 0.157357 ms
Total inference run time: 2.58419 s
Number of inferences per second: 6340.1
Avg CPU usage: 4 %
Peak working set size: 23986176 bytes
Describe the issue
From commit 530a2d7, ONNX Runtime (ORT) enables FP16 Clip and handles bias, introducing 2X latency.
Original Model
Optimized Model of 82036b0
Optimized Model of 530a2d7
Result of 82036b0
Result of 530a2d7
To reproduce
./onnxruntime_perf_test -r 16384 -m times -o 99 -x 1 -y 1 -c 1 ./poc/model.onnx
Urgency
No response
Platform
Linux
OS Version
6.8.0
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.20.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
poc.zip
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: