Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect beta read in Softmax node #3039

Open
ShardulNalegave opened this issue Jan 20, 2025 · 2 comments
Open

Incorrect beta read in Softmax node #3039

ShardulNalegave opened this issue Jan 20, 2025 · 2 comments

Comments

@ShardulNalegave
Copy link

I'm trying to run MobileNetV2 on ESP32S3 using the esp-tflite-micro framework which is based on the tflite-micro repository.
The model is fully integer quantized and has a size of around 1.8MB.

I have verified that the resulting model works fine through a Python script where it gives same output as the original model.

The problem arises when running on edge. The application fails at AllocateTensors(). The issue is in file quantization_util.cc

void QuantizeMultiplierSmallerThanOneExp(double double_multiplier,
                                         int32_t* quantized_multiplier,
                                         int* left_shift) {
  TFLITE_CHECK_LT(double_multiplier, 1.); // <----- aborts here
  TFLITE_CHECK_GT(double_multiplier, 0.);
  int shift;
  QuantizeMultiplier(double_multiplier, quantized_multiplier, &shift);
  TFLITE_CHECK_LE(shift, 0);
  *left_shift = shift;
}

This function is called by PreprocessSoftmaxScaling:-

void PreprocessSoftmaxScaling(double beta, double input_scale,
                              int input_integer_bits,
                              int32_t* quantized_multiplier, int* left_shift) {
  // If the overall multiplier (input and beta) is large, then exp() of an
  // input difference of 1 scaled by this will be large.  In other words, we
  // can cap the multiplier and know that, when it is used, the output will be
  // (round to) zero wherever the input is not at the maximum value.

  // If the overall scale is less than one, and input_integer_bits=0, then the
  // result is double equivalent of Q0.31 (actually with more precision). Thus
  // this generates a Q(input_integer_bits).(31-input_integer_bits)
  // representation.
#if TFLITE_SINGLE_ROUNDING
  const double max_real_multiplier = (1LL << 30) - 1.0;
#else
  const double max_real_multiplier = (1LL << 31) - 1.0;
#endif

#ifdef TFLITE_EMULATE_FLOAT
  const double input_beta = IntegerDoubleMultiply(beta, input_scale);
  int shift;
  int64_t fraction = IntegerFrExp(input_beta, &shift);
  shift += (31 - input_integer_bits);
  double input_beta_real_multiplier =
      DoubleFromFractionAndShift(fraction, shift);
  if (IntegerDoubleCompare(input_beta_real_multiplier, max_real_multiplier) >
      0) {
    input_beta_real_multiplier = max_real_multiplier;
  }
#else   // TFLITE_EMULATE_FLOAT
  const double input_beta_real_multiplier =
      std::min<double>(beta * input_scale * (1 << (31 - input_integer_bits)),
                       max_real_multiplier);
#endif  // TFLITE_EMULATE_FLOAT

  QuantizeMultiplierGreaterThanOne(input_beta_real_multiplier,
                                   quantized_multiplier, left_shift);
}

After some debugging, it looks like the value of input_beta_real_multiplier is zero because of beta being zero. All other values are non-zero. This shouldn't be the case as when viewed in Netron the node has beta = 1.

Image

Now, I decided to instead bypass this by manually replacing beta with 1. This does let the model run with an inference time of around 830ms. The problem then is that the output layer is completely different from that of Python test code.
The output layers has dimensions 1x1000 where each value is a confidence score for a particular label, but in this case all labels get the exact same confidence score.

I have uploaded the complete example application on github (includes esp-tflite-micro code, Python code and the quantized tflite model).
The example contains the modified code for esp-tflite-micro (to bypass the beta issue).
Check it out here: https://github.com/ShardulNalegave/esp-mbnetv2-test

@ShardulNalegave
Copy link
Author

It looks like no matter what input I give, I will get the same output.
To test this, I decided to fill the input tensor with completely random int8 values and the output layer was exactly the same as before.

Image

0.996094 is equal to the max possible value after dequantizing the outputs. I.e. all outputs are just 127

@ShardulNalegave ShardulNalegave changed the title Incorrect beta read in Softmax node, manual override causes output layer mismatch with Python test code Incorrect beta read in Softmax node, manual override causes output layer mismatch with Python test code (edit: output layer is incorrect and doesn't change for any input) Jan 23, 2025
@ShardulNalegave
Copy link
Author

It looks like the kBeta issue and output mismatch issue are not related. I can say this as I am observing a deviation in outputs of previous nodes and that the previous FullyConnected node gives a output of all values equal to -60.

Thus, I will henceforth use this thread to discuss the kBeta issue and will create separate threads for other issues.

@ShardulNalegave ShardulNalegave changed the title Incorrect beta read in Softmax node, manual override causes output layer mismatch with Python test code (edit: output layer is incorrect and doesn't change for any input) Incorrect beta read in Softmax node Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant