Incorrect beta read in Softmax node #3039

ShardulNalegave · 2025-01-20T07:50:29Z

I'm trying to run MobileNetV2 on ESP32S3 using the esp-tflite-micro framework which is based on the tflite-micro repository.
The model is fully integer quantized and has a size of around 1.8MB.

I have verified that the resulting model works fine through a Python script where it gives same output as the original model.

The problem arises when running on edge. The application fails at AllocateTensors(). The issue is in file quantization_util.cc

void QuantizeMultiplierSmallerThanOneExp(double double_multiplier,
                                         int32_t* quantized_multiplier,
                                         int* left_shift) {
  TFLITE_CHECK_LT(double_multiplier, 1.); // <----- aborts here
  TFLITE_CHECK_GT(double_multiplier, 0.);
  int shift;
  QuantizeMultiplier(double_multiplier, quantized_multiplier, &shift);
  TFLITE_CHECK_LE(shift, 0);
  *left_shift = shift;
}

This function is called by PreprocessSoftmaxScaling:-

void PreprocessSoftmaxScaling(double beta, double input_scale,
                              int input_integer_bits,
                              int32_t* quantized_multiplier, int* left_shift) {
  // If the overall multiplier (input and beta) is large, then exp() of an
  // input difference of 1 scaled by this will be large.  In other words, we
  // can cap the multiplier and know that, when it is used, the output will be
  // (round to) zero wherever the input is not at the maximum value.

  // If the overall scale is less than one, and input_integer_bits=0, then the
  // result is double equivalent of Q0.31 (actually with more precision). Thus
  // this generates a Q(input_integer_bits).(31-input_integer_bits)
  // representation.
#if TFLITE_SINGLE_ROUNDING
  const double max_real_multiplier = (1LL << 30) - 1.0;
#else
  const double max_real_multiplier = (1LL << 31) - 1.0;
#endif

#ifdef TFLITE_EMULATE_FLOAT
  const double input_beta = IntegerDoubleMultiply(beta, input_scale);
  int shift;
  int64_t fraction = IntegerFrExp(input_beta, &shift);
  shift += (31 - input_integer_bits);
  double input_beta_real_multiplier =
      DoubleFromFractionAndShift(fraction, shift);
  if (IntegerDoubleCompare(input_beta_real_multiplier, max_real_multiplier) >
      0) {
    input_beta_real_multiplier = max_real_multiplier;
  }
#else   // TFLITE_EMULATE_FLOAT
  const double input_beta_real_multiplier =
      std::min<double>(beta * input_scale * (1 << (31 - input_integer_bits)),
                       max_real_multiplier);
#endif  // TFLITE_EMULATE_FLOAT

  QuantizeMultiplierGreaterThanOne(input_beta_real_multiplier,
                                   quantized_multiplier, left_shift);
}

After some debugging, it looks like the value of input_beta_real_multiplier is zero because of beta being zero. All other values are non-zero. This shouldn't be the case as when viewed in Netron the node has beta = 1.

Now, I decided to instead bypass this by manually replacing beta with 1. This does let the model run with an inference time of around 830ms. The problem then is that the output layer is completely different from that of Python test code.
The output layers has dimensions 1x1000 where each value is a confidence score for a particular label, but in this case all labels get the exact same confidence score.

I have uploaded the complete example application on github (includes esp-tflite-micro code, Python code and the quantized tflite model).
The example contains the modified code for esp-tflite-micro (to bypass the beta issue).
Check it out here: https://github.com/ShardulNalegave/esp-mbnetv2-test

The text was updated successfully, but these errors were encountered:

ShardulNalegave · 2025-01-23T18:44:38Z

It looks like no matter what input I give, I will get the same output.
To test this, I decided to fill the input tensor with completely random int8 values and the output layer was exactly the same as before.

0.996094 is equal to the max possible value after dequantizing the outputs. I.e. all outputs are just 127

ShardulNalegave · 2025-02-03T09:54:53Z

It looks like the kBeta issue and output mismatch issue are not related. I can say this as I am observing a deviation in outputs of previous nodes and that the previous FullyConnected node gives a output of all values equal to -60.

Thus, I will henceforth use this thread to discuss the kBeta issue and will create separate threads for other issues.

ShardulNalegave mentioned this issue Feb 3, 2025

Deviation in outputs of nodes between TFLM and Tflite (Python) #3046

Open

ShardulNalegave changed the title ~~Incorrect beta read in Softmax node, manual override causes output layer mismatch with Python test code (edit: output layer is incorrect and doesn't change for any input)~~ Incorrect beta read in Softmax node Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect beta read in Softmax node #3039

Incorrect beta read in Softmax node #3039

ShardulNalegave commented Jan 20, 2025

ShardulNalegave commented Jan 23, 2025

ShardulNalegave commented Feb 3, 2025

Incorrect beta read in Softmax node #3039

Incorrect beta read in Softmax node #3039

Comments

ShardulNalegave commented Jan 20, 2025

ShardulNalegave commented Jan 23, 2025

ShardulNalegave commented Feb 3, 2025