Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static Quantization "Shape mismatch" Error #23600

Open
ktadgh opened this issue Feb 6, 2025 · 0 comments
Open

Static Quantization "Shape mismatch" Error #23600

ktadgh opened this issue Feb 6, 2025 · 0 comments
Labels
quantization issues related to quantization

Comments

@ktadgh
Copy link

ktadgh commented Feb 6, 2025

Describe the issue

Hello, I am trying to statically quantize my onnx model. Regular inference and dynamic quantization both work fine, but when I try to statically quantize, I get the following error:

2025-02-06 10:42:33.768767764 [W:onnxruntime:, execution_frame.cc:651 AllocateMLValueTensorPreAllocateBuffer] Shape mismatch attempting to re-use buffer. {1,1,1,1,1} != {1,1,1,0,1}. Validate usage of dim_value (values should be > 0) and dim_param (all values with the same string should equate to the same size) in shapes in the model.
2025-02-06 10:42:33.768863477 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Reshape node. Name:'/model/down_levels.2/down_levels.2.0/self_attn/Slice_9_output_0_ReduceMax_Reshape' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/tensor/reshape_helper.h:44 onnxruntime::ReshapeHelper::ReshapeHelper(const onnxruntime::TensorShape&, onnxruntime::TensorShapeVector&, bool) input_shape_size == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{1,1,1,0,1}, requested shape:{1}

I'm using onnxruntime==1.16.0 and onnx==1.16.0. If I use the latest version of onnxruntime I get a similar error (for a different node but still the ReduceMax operation) but the error message doesn't include the shape mismatch:

onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running ReduceMax node. Name:'/model/up_levels.0/up_levels.0.0/self_attn/Slice_14_output_0_ReduceMax' Status Message: 

I have tried both of the fixes described in 17061, but I get the same error in both cases.

I also thought the issue might be due to the allowzero parameter in the reshape node, I changed it from 0 to 1, but that didn't help either.

To reproduce

Here is the python code I'm running for static quantization, loading random data:

from onnxruntime.quantization import CalibrationDataReader, create_calibrator, write_calibration_table
import subprocess
import torch
from PIL import Image
import os
import numpy as np
import onnxruntime
from onnxruntime.quantization import quantize_static, QuantType,QuantFormat


class DataReader(CalibrationDataReader):
    def __init__(self,
                 total=1,
                 batch_size=1,
                 model_path='augmented_model.onnx'):
        '''
        :param image_folder: image dataset folder
        :param total: number of batches
        :param batch_size: batch size of inference
        :param model_path: model name and path
        '''

        self.model_path = model_path
        self.preprocess_flag = True
        self.enum_data_dicts = iter([])
        self.datasize = 0
        self.total = total
        self.batch_size = batch_size
        self.sess_options = onnxruntime.SessionOptions()
        self.sess_options.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_DISABLE_ALL
        self.get_input_name()
    def get_dataset_size(self):
        return len(os.listdir(self.image_folder))

    def get_input_name(self):
        session = onnxruntime.InferenceSession(self.model_path, self.sess_options,providers=['CPUExecutionProvider'])
        self.input_name = session.get_inputs()[0].name


    def get_next(self):
        iter_data = next(self.enum_data_dicts, None)
        if iter_data:
            return iter_data

        self.enum_data_dicts = None
        if self.batch_size == 1:
            data = self.load_serial()
        else:
            data = self.load_batches()

        self.enum_data_dicts = iter(data)

        return next(self.enum_data_dicts, None)


    def load_serial(self):
        nchw_data_list = self.preprocess_imagenet(self.batch_size)
        input_name = self.input_name

        data = []
        for i in range(len(nchw_data_list)):
            nhwc_data = nchw_data_list[i]
            data.append({input_name: nhwc_data})
        return data

    def load_batches(self):
        batch_size = self.batch_size
        input_name = self.input_name
        total = self.total

        batches = []
        for _ in range(total):
            nchw_data_list = self.preprocess_imagenet(batch_size)
            nchw_data_batch = []
            for i in range(len(nchw_data_list)):
                nhwc_data = np.squeeze(nchw_data_list[i], 0)
                nchw_data_batch.append(nhwc_data)
            batch_data = np.concatenate(np.expand_dims(nchw_data_batch, axis=0), axis=0)
            data = {input_name: batch_data}
            batches.append(data)

        return batches

    def preprocess_imagenet(self, size_limit=1):
        '''
        Loads a batch of images and preprocess them
        parameter images_folder: path to folder storing images
        parameter size_limit: number of images to load.
        return: list of matrices characterizing multiple images
        '''
        unconcatenated_batch_data = []

        for _ in range(size_limit):
            image_data = torch.randn(3,1024,1024).numpy()
            image_data = np.expand_dims(image_data, 0)
            unconcatenated_batch_data.append(image_data)
        batch_data = np.concatenate(np.expand_dims(unconcatenated_batch_data, axis=0), axis=0)
        return batch_data

loader = DataReader(total=1,batch_size=1,model_path='/home/ubuntu/standard-training/epoch_80.onnx')


def quantize_onnx_model(onnx_model_path, quantized_model_path):

    command = ["python", "-m", "onnxruntime.quantization.preprocess", "--input", f"{onnx_model_path}","--output", "optimized_model.onnx"]
    _ = subprocess.run(command, capture_output=True, text=True)

    quantize_static("/home/ubuntu/standard-training/optimized_model.onnx",
                     quantized_model_path,
                     calibration_data_reader=loader,
                     activation_type=QuantType.QInt8,
                     weight_type=QuantType.QInt8)
                     #extra_options={'ActivationSymmetric':True})

    print(f"quantized model saved to:{quantized_model_path}")
    print('ONNX full precision model size (MB):', os.path.getsize(onnx_model_path)/(1024*1024))
    print('ONNX quantized model size (MB):', os.path.getsize(quantized_model_path)/(1024*1024))

quantize_onnx_model('/home/ubuntu/standard-training/epoch_80.onnx', '/home/ubuntu/standard-training/epoch_80_quantized.onnx')

Here is a google drive link to the ONNX model.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04.5 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@github-actions github-actions bot added the quantization issues related to quantization label Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quantization issues related to quantization
Projects
None yet
Development

No branches or pull requests

1 participant