Static Quantization "Shape mismatch" Error #23600

ktadgh · 2025-02-06T11:40:01Z

Describe the issue

Hello, I am trying to statically quantize my onnx model. Regular inference and dynamic quantization both work fine, but when I try to statically quantize, I get the following error:

2025-02-06 10:42:33.768767764 [W:onnxruntime:, execution_frame.cc:651 AllocateMLValueTensorPreAllocateBuffer] Shape mismatch attempting to re-use buffer. {1,1,1,1,1} != {1,1,1,0,1}. Validate usage of dim_value (values should be > 0) and dim_param (all values with the same string should equate to the same size) in shapes in the model.
2025-02-06 10:42:33.768863477 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Reshape node. Name:'/model/down_levels.2/down_levels.2.0/self_attn/Slice_9_output_0_ReduceMax_Reshape' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:44 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, onnxruntime::TensorShapeVector&, bool) input_shape_size == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{1,1,1,0,1}, requested shape:{1}

I'm using onnxruntime==1.16.0 and onnx==1.16.0. If I use the latest version of onnxruntime I get a similar error (for a different node but still the ReduceMax operation) but the error message doesn't include the shape mismatch:

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running ReduceMax node. Name:'/model/up_levels.0/up_levels.0.0/self_attn/Slice_14_output_0_ReduceMax' Status Message:

I have tried both of the fixes described in 17061, but I get the same error in both cases.

I also thought the issue might be due to the allowzero parameter in the reshape node, I changed it from 0 to 1, but that didn't help either.

To reproduce

Here is the python code I'm running for static quantization, loading random data:

from onnxruntime.quantization import CalibrationDataReader, create_calibrator, write_calibration_table
import subprocess
import torch
from PIL import Image
import os
import numpy as np
import onnxruntime
from onnxruntime.quantization import quantize_static, QuantType,QuantFormat


class DataReader(CalibrationDataReader):
    def __init__(self,
                 total=1,
                 batch_size=1,
                 model_path='augmented_model.onnx'):
        '''
        :param image_folder: image dataset folder
        :param total: number of batches
        :param batch_size: batch size of inference
        :param model_path: model name and path
        '''

        self.model_path = model_path
        self.preprocess_flag = True
        self.enum_data_dicts = iter([])
        self.datasize = 0
        self.total = total
        self.batch_size = batch_size
        self.sess_options = onnxruntime.SessionOptions()
        self.sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL
        self.get_input_name()
    def get_dataset_size(self):
        return len(os.listdir(self.image_folder))

    def get_input_name(self):
        session = onnxruntime.InferenceSession(self.model_path, self.sess_options,providers=['CPUExecutionProvider'])
        self.input_name = session.get_inputs()[0].name


    def get_next(self):
        iter_data = next(self.enum_data_dicts, None)
        if iter_data:
            return iter_data

        self.enum_data_dicts = None
        if self.batch_size == 1:
            data = self.load_serial()
        else:
            data = self.load_batches()

        self.enum_data_dicts = iter(data)

        return next(self.enum_data_dicts, None)


    def load_serial(self):
        nchw_data_list = self.preprocess_imagenet(self.batch_size)
        input_name = self.input_name

        data = []
        for i in range(len(nchw_data_list)):
            nhwc_data = nchw_data_list[i]
            data.append({input_name: nhwc_data})
        return data

    def load_batches(self):
        batch_size = self.batch_size
        input_name = self.input_name
        total = self.total

        batches = []
        for _ in range(total):
            nchw_data_list = self.preprocess_imagenet(batch_size)
            nchw_data_batch = []
            for i in range(len(nchw_data_list)):
                nhwc_data = np.squeeze(nchw_data_list[i], 0)
                nchw_data_batch.append(nhwc_data)
            batch_data = np.concatenate(np.expand_dims(nchw_data_batch, axis=0), axis=0)
            data = {input_name: batch_data}
            batches.append(data)

        return batches

    def preprocess_imagenet(self, size_limit=1):
        '''
        Loads a batch of images and preprocess them
        parameter images_folder: path to folder storing images
        parameter size_limit: number of images to load.
        return: list of matrices characterizing multiple images
        '''
        unconcatenated_batch_data = []

        for _ in range(size_limit):
            image_data = torch.randn(3,1024,1024).numpy()
            image_data = np.expand_dims(image_data, 0)
            unconcatenated_batch_data.append(image_data)
        batch_data = np.concatenate(np.expand_dims(unconcatenated_batch_data, axis=0), axis=0)
        return batch_data

loader = DataReader(total=1,batch_size=1,model_path='/home/ubuntu/standard-training/epoch_80.onnx')


def quantize_onnx_model(onnx_model_path, quantized_model_path):

    command = ["python", "-m", "onnxruntime.quantization.preprocess", "--input", f"{onnx_model_path}","--output", "optimized_model.onnx"]
    _ = subprocess.run(command, capture_output=True, text=True)

    quantize_static("/home/ubuntu/standard-training/optimized_model.onnx",
                     quantized_model_path,
                     calibration_data_reader=loader,
                     activation_type=QuantType.QInt8,
                     weight_type=QuantType.QInt8)
                     #extra_options={'ActivationSymmetric':True})

    print(f"quantized model saved to:{quantized_model_path}")
    print('ONNX full precision model size (MB):', os.path.getsize(onnx_model_path)/(1024*1024))
    print('ONNX quantized model size (MB):', os.path.getsize(quantized_model_path)/(1024*1024))

quantize_onnx_model('/home/ubuntu/standard-training/epoch_80.onnx', '/home/ubuntu/standard-training/epoch_80_quantized.onnx')

Here is a google drive link to the ONNX model.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04.5 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

github-actions bot added the quantization issues related to quantization label Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Static Quantization "Shape mismatch" Error #23600

Static Quantization "Shape mismatch" Error #23600

ktadgh commented Feb 6, 2025 •

edited

Loading

Static Quantization "Shape mismatch" Error #23600

Static Quantization "Shape mismatch" Error #23600

Comments

ktadgh commented Feb 6, 2025 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

ktadgh commented Feb 6, 2025 •

edited

Loading