You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am trying to statically quantize my onnx model. Regular inference and dynamic quantization both work fine, but when I try to statically quantize, I get the following error:
2025-02-06 10:42:33.768767764 [W:onnxruntime:, execution_frame.cc:651 AllocateMLValueTensorPreAllocateBuffer] Shape mismatch attempting to re-use buffer. {1,1,1,1,1} != {1,1,1,0,1}. Validate usage of dim_value (values should be > 0) and dim_param (all values with the same string should equate to the same size) in shapes in the model.
2025-02-06 10:42:33.768863477 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Reshape node. Name:'/model/down_levels.2/down_levels.2.0/self_attn/Slice_9_output_0_ReduceMax_Reshape' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:44 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, onnxruntime::TensorShapeVector&, bool) input_shape_size == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{1,1,1,0,1}, requested shape:{1}
I'm using onnxruntime==1.16.0 and onnx==1.16.0. If I use the latest version of onnxruntime I get a similar error (for a different node but still the ReduceMax operation) but the error message doesn't include the shape mismatch:
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running ReduceMax node. Name:'/model/up_levels.0/up_levels.0.0/self_attn/Slice_14_output_0_ReduceMax' Status Message:
I have tried both of the fixes described in 17061, but I get the same error in both cases.
I also thought the issue might be due to the allowzero parameter in the reshape node, I changed it from 0 to 1, but that didn't help either.
To reproduce
Here is the python code I'm running for static quantization, loading random data:
from onnxruntime.quantization import CalibrationDataReader, create_calibrator, write_calibration_table
import subprocess
import torch
from PIL import Image
import os
import numpy as np
import onnxruntime
from onnxruntime.quantization import quantize_static, QuantType,QuantFormat
class DataReader(CalibrationDataReader):
def __init__(self,
total=1,
batch_size=1,
model_path='augmented_model.onnx'):
'''
:param image_folder: image dataset folder
:param total: number of batches
:param batch_size: batch size of inference
:param model_path: model name and path
'''
self.model_path = model_path
self.preprocess_flag = True
self.enum_data_dicts = iter([])
self.datasize = 0
self.total = total
self.batch_size = batch_size
self.sess_options = onnxruntime.SessionOptions()
self.sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL
self.get_input_name()
def get_dataset_size(self):
return len(os.listdir(self.image_folder))
def get_input_name(self):
session = onnxruntime.InferenceSession(self.model_path, self.sess_options,providers=['CPUExecutionProvider'])
self.input_name = session.get_inputs()[0].name
def get_next(self):
iter_data = next(self.enum_data_dicts, None)
if iter_data:
return iter_data
self.enum_data_dicts = None
if self.batch_size == 1:
data = self.load_serial()
else:
data = self.load_batches()
self.enum_data_dicts = iter(data)
return next(self.enum_data_dicts, None)
def load_serial(self):
nchw_data_list = self.preprocess_imagenet(self.batch_size)
input_name = self.input_name
data = []
for i in range(len(nchw_data_list)):
nhwc_data = nchw_data_list[i]
data.append({input_name: nhwc_data})
return data
def load_batches(self):
batch_size = self.batch_size
input_name = self.input_name
total = self.total
batches = []
for _ in range(total):
nchw_data_list = self.preprocess_imagenet(batch_size)
nchw_data_batch = []
for i in range(len(nchw_data_list)):
nhwc_data = np.squeeze(nchw_data_list[i], 0)
nchw_data_batch.append(nhwc_data)
batch_data = np.concatenate(np.expand_dims(nchw_data_batch, axis=0), axis=0)
data = {input_name: batch_data}
batches.append(data)
return batches
def preprocess_imagenet(self, size_limit=1):
'''
Loads a batch of images and preprocess them
parameter images_folder: path to folder storing images
parameter size_limit: number of images to load.
return: list of matrices characterizing multiple images
'''
unconcatenated_batch_data = []
for _ in range(size_limit):
image_data = torch.randn(3,1024,1024).numpy()
image_data = np.expand_dims(image_data, 0)
unconcatenated_batch_data.append(image_data)
batch_data = np.concatenate(np.expand_dims(unconcatenated_batch_data, axis=0), axis=0)
return batch_data
loader = DataReader(total=1,batch_size=1,model_path='/home/ubuntu/standard-training/epoch_80.onnx')
def quantize_onnx_model(onnx_model_path, quantized_model_path):
command = ["python", "-m", "onnxruntime.quantization.preprocess", "--input", f"{onnx_model_path}","--output", "optimized_model.onnx"]
_ = subprocess.run(command, capture_output=True, text=True)
quantize_static("/home/ubuntu/standard-training/optimized_model.onnx",
quantized_model_path,
calibration_data_reader=loader,
activation_type=QuantType.QInt8,
weight_type=QuantType.QInt8)
#extra_options={'ActivationSymmetric':True})
print(f"quantized model saved to:{quantized_model_path}")
print('ONNX full precision model size (MB):', os.path.getsize(onnx_model_path)/(1024*1024))
print('ONNX quantized model size (MB):', os.path.getsize(quantized_model_path)/(1024*1024))
quantize_onnx_model('/home/ubuntu/standard-training/epoch_80.onnx', '/home/ubuntu/standard-training/epoch_80_quantized.onnx')
Describe the issue
Hello, I am trying to statically quantize my onnx model. Regular inference and dynamic quantization both work fine, but when I try to statically quantize, I get the following error:
I'm using
onnxruntime==1.16.0
andonnx==1.16.0
. If I use the latest version ofonnxruntime
I get a similar error (for a different node but still the ReduceMax operation) but the error message doesn't include the shape mismatch:I have tried both of the fixes described in 17061, but I get the same error in both cases.
I also thought the issue might be due to the
allowzero
parameter in the reshape node, I changed it from0
to1
, but that didn't help either.To reproduce
Here is the python code I'm running for static quantization, loading random data:
Here is a google drive link to the ONNX model.
Urgency
No response
Platform
Linux
OS Version
Ubuntu 22.04.5 LTS
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.16.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: