[libc] Alternative algorithm for decimal FP printf #123643

statham-arm · 2025-01-20T16:36:20Z

The existing options for bin→dec float conversion are all based on the Ryū algorithm, which generates 9 output digits at a time using a table lookup. For users who can't afford the space cost of the table, the table-lookup subroutine is replaced with one that computes the needed table entry on demand, but the algorithm is otherwise unmodified.

The performance problem with computing table entries on demand is that now you need to calculate a power of 10 for each 9 digits you output. But if you're calculating a custom power of 10 anyway, it's easier to just compute one, and multiply the whole mantissa by it.

This patch adds a header file alongside float_dec_converter.h, which replaces the whole Ryū system instead of just the table-lookup routine, implementing this alternative simpler algorithm. The result is accurate enough to satisfy (minimally) the accuracy demands of IEEE 754-2019 even in 128-bit long double. The new float128 test cases demonstrate this by testing the cases closest to the 39-digit rounding boundary.

In my tests of generating 39 output digits (the maximum number supported by this algorithm) this code is also both faster and smaller than the USE_DYADIC_FLOAT version of the existing Ryū code.

The existing options for bin→dec float conversion are all based on the Ryū algorithm, which generates 9 output digits at a time using a table lookup. For users who can't afford the space cost of the table, the table-lookup subroutine is replaced with one that computes the needed table entry on demand, but the algorithm is otherwise unmodified. The performance problem with computing table entries on demand is that now you need to calculate a power of 10 for each 9 digits you output. But if you're calculating a custom power of 10 anyway, it's easier to just compute one, and multiply the _whole_ mantissa by it. This patch adds a header file alongside `float_dec_converter.h`, which replaces the whole Ryū system instead of just the table-lookup routine, implementing this alternative simpler algorithm. The result is accurate enough to satisfy (minimally) the accuracy demands of IEEE 754-2019 even in 128-bit long double. The new float128 test cases demonstrate this by testing the cases closest to the 39-digit rounding boundary. In my tests of generating 39 output digits (the maximum number supported by this algorithm) this code is also both faster and smaller than the USE_DYADIC_FLOAT version of the existing Ryū code.

llvmbot · 2025-01-20T16:36:57Z

@llvm/pr-subscribers-libc

Author: Simon Tatham (statham-arm)

Changes

The existing options for bin→dec float conversion are all based on the Ryū algorithm, which generates 9 output digits at a time using a table lookup. For users who can't afford the space cost of the table, the table-lookup subroutine is replaced with one that computes the needed table entry on demand, but the algorithm is otherwise unmodified.

The performance problem with computing table entries on demand is that now you need to calculate a power of 10 for each 9 digits you output. But if you're calculating a custom power of 10 anyway, it's easier to just compute one, and multiply the whole mantissa by it.

This patch adds a header file alongside float_dec_converter.h, which replaces the whole Ryū system instead of just the table-lookup routine, implementing this alternative simpler algorithm. The result is accurate enough to satisfy (minimally) the accuracy demands of IEEE 754-2019 even in 128-bit long double. The new float128 test cases demonstrate this by testing the cases closest to the 39-digit rounding boundary.

In my tests of generating 39 output digits (the maximum number supported by this algorithm) this code is also both faster and smaller than the USE_DYADIC_FLOAT version of the existing Ryū code.

Patch is 46.40 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/123643.diff

12 Files Affected:

(modified) libc/config/config.json (+4)
(modified) libc/docs/configure.rst (+1)
(modified) libc/src/__support/CPP/algorithm.h (+2)
(modified) libc/src/__support/FPUtil/dyadic_float.h (+179)
(modified) libc/src/__support/big_int.h (+12-6)
(modified) libc/src/__support/sign.h (+2)
(modified) libc/src/stdio/printf_core/CMakeLists.txt (+3)
(modified) libc/src/stdio/printf_core/converter_atlas.h (+4)
(added) libc/src/stdio/printf_core/float_dec_converter_limited.h (+593)
(modified) libc/test/src/stdio/CMakeLists.txt (+2-1)
(modified) libc/test/src/stdio/sprintf_test.cpp (+107)
(modified) libc/test/src/stdlib/CMakeLists.txt (+2-1)

diff --git a/libc/config/config.json b/libc/config/config.json
index 9a5d5c3c68da60..c38d4242292185 100644
--- a/libc/config/config.json
+++ b/libc/config/config.json
@@ -30,6 +30,10 @@
       "value": false,
       "doc": "Use the same mode for double and long double in printf."
     },
+    "LIBC_CONF_PRINTF_FLOAT_TO_STR_USE_FLOAT320": {
+      "value": false,
+      "doc": "Use an alternative printf float implementation based on 320-bit floats"
+    },
     "LIBC_CONF_PRINTF_DISABLE_FIXED_POINT": {
       "value": false,
       "doc": "Disable printing fixed point values in printf and friends."
diff --git a/libc/docs/configure.rst b/libc/docs/configure.rst
index 3db750b1aed214..940a07754c4582 100644
--- a/libc/docs/configure.rst
+++ b/libc/docs/configure.rst
@@ -43,6 +43,7 @@ to learn about the defaults for your platform and target.
     - ``LIBC_CONF_PRINTF_DISABLE_WRITE_INT``: Disable handling of %n in printf format string.
     - ``LIBC_CONF_PRINTF_FLOAT_TO_STR_NO_SPECIALIZE_LD``: Use the same mode for double and long double in printf.
     - ``LIBC_CONF_PRINTF_FLOAT_TO_STR_USE_DYADIC_FLOAT``: Use dyadic float for faster and smaller but less accurate printf doubles.
+    - ``LIBC_CONF_PRINTF_FLOAT_TO_STR_USE_FLOAT320``: Use an alternative printf float implementation based on 320-bit floats
     - ``LIBC_CONF_PRINTF_FLOAT_TO_STR_USE_MEGA_LONG_DOUBLE_TABLE``: Use large table for better printf long double performance.
 * **"pthread" options**
     - ``LIBC_CONF_RAW_MUTEX_DEFAULT_SPIN_COUNT``: Default number of spins before blocking if a mutex is in contention (default to 100).
diff --git a/libc/src/__support/CPP/algorithm.h b/libc/src/__support/CPP/algorithm.h
index f5dc9067409eb6..7704b3fa81f0c1 100644
--- a/libc/src/__support/CPP/algorithm.h
+++ b/libc/src/__support/CPP/algorithm.h
@@ -26,6 +26,8 @@ template <class T> LIBC_INLINE constexpr const T &min(const T &a, const T &b) {
   return (a < b) ? a : b;
 }
 
+template <class T> LIBC_INLINE constexpr T abs(T a) { return a < 0 ? -a : a; }
+
 template <class InputIt, class UnaryPred>
 LIBC_INLINE constexpr InputIt find_if_not(InputIt first, InputIt last,
                                           UnaryPred q) {
diff --git a/libc/src/__support/FPUtil/dyadic_float.h b/libc/src/__support/FPUtil/dyadic_float.h
index 289fd01680547d..e4397d63e34b29 100644
--- a/libc/src/__support/FPUtil/dyadic_float.h
+++ b/libc/src/__support/FPUtil/dyadic_float.h
@@ -26,6 +26,40 @@
 namespace LIBC_NAMESPACE_DECL {
 namespace fputil {
 
+// Decide whether to round up a UInt at a given bit position, based on
+// the current rounding mode. The assumption is that the caller is
+// going to make the integer `value >> rshift`, and then might need to
+// round it up by 1 depending on the value of the bits shifted off the
+// bottom.
+//
+// `logical_sign` causes the behavior of FE_DOWNWARD and FE_UPWARD to
+// be reversed, which is what you'd want if this is the mantissa of a
+// negative floating-point number.
+template <size_t Bits>
+LIBC_INLINE constexpr bool
+need_to_round_up(const LIBC_NAMESPACE::UInt<Bits> &value, size_t rshift,
+                 Sign logical_sign) {
+  switch (quick_get_round()) {
+  case FE_TONEAREST:
+    if (rshift > 0 && rshift <= Bits && value.get_bit(rshift - 1)) {
+      // We round up, unless the value is an exact halfway case and
+      // the bit that will end up in the units place is 0, in which
+      // case tie-break-to-even says round down.
+      return value.get_bit(rshift) != 0 || (value << (Bits - rshift + 1)) != 0;
+    } else {
+      return false;
+    }
+  case FE_TOWARDZERO:
+    return false;
+  case FE_DOWNWARD:
+    return logical_sign.is_neg() && (value << (Bits - rshift)) != 0;
+  case FE_UPWARD:
+    return logical_sign.is_pos() && (value << (Bits - rshift)) != 0;
+  default:
+    __builtin_unreachable();
+  }
+}
+
 // A generic class to perform computations of high precision floating points.
 // We store the value in dyadic format, including 3 fields:
 //   sign    : boolean value - false means positive, true means negative
@@ -101,6 +135,27 @@ template <size_t Bits> struct DyadicFloat {
     return exponent + (Bits - 1);
   }
 
+  // Produce a correctly rounded DyadicFloat from a too-large mantissa,
+  // by shifting it down and rounding if necessary.
+  template <size_t MantissaBits>
+  LIBC_INLINE constexpr static DyadicFloat<Bits>
+  round(Sign result_sign, int result_exponent,
+        const LIBC_NAMESPACE::UInt<MantissaBits> &input_mantissa,
+        size_t rshift) {
+    MantissaType result_mantissa(input_mantissa >> rshift);
+    if (need_to_round_up(input_mantissa, rshift, result_sign)) {
+      ++result_mantissa;
+      if (result_mantissa == 0) {
+        // Rounding up made the mantissa integer wrap round to 0,
+        // carrying a bit off the top. So we've rounded up to the next
+        // exponent.
+        result_mantissa.set_bit(Bits - 1);
+        ++result_exponent;
+      }
+    }
+    return DyadicFloat(result_sign, result_exponent, result_mantissa);
+  }
+
 #ifdef LIBC_TYPES_HAS_FLOAT16
   template <typename T, bool ShouldSignalExceptions>
   LIBC_INLINE constexpr cpp::enable_if_t<
@@ -374,6 +429,34 @@ template <size_t Bits> struct DyadicFloat {
 
     return new_mant;
   }
+
+  LIBC_INLINE constexpr MantissaType
+  as_mantissa_type_rounded(bool *overflowed = nullptr) const {
+    if (mantissa.is_zero())
+      return 0;
+
+    MantissaType new_mant = mantissa;
+    if (exponent > 0) {
+      new_mant <<= exponent;
+      if (overflowed)
+        *overflowed = (new_mant >> exponent) != mantissa;
+    } else if (exponent < 0) {
+      size_t shift = -exponent;
+      new_mant >>= shift;
+      if (need_to_round_up(mantissa, shift, sign))
+        ++new_mant;
+    }
+
+    if (sign.is_neg()) {
+      new_mant = (~new_mant) + 1;
+    }
+
+    return new_mant;
+  }
+
+  LIBC_INLINE constexpr DyadicFloat operator-() const {
+    return DyadicFloat(sign.negate(), exponent, mantissa);
+  }
 };
 
 // Quick add - Add 2 dyadic floats with rounding toward 0 and then normalize the
@@ -433,6 +516,12 @@ LIBC_INLINE constexpr DyadicFloat<Bits> quick_add(DyadicFloat<Bits> a,
   return result.normalize();
 }
 
+template <size_t Bits>
+LIBC_INLINE constexpr DyadicFloat<Bits> quick_sub(DyadicFloat<Bits> a,
+                                                  DyadicFloat<Bits> b) {
+  return quick_add(a, -b);
+}
+
 // Quick Mul - Slightly less accurate but efficient multiplication of 2 dyadic
 // floats with rounding toward 0 and then normalize the output:
 //   result.exponent = a.exponent + b.exponent + Bits,
@@ -464,6 +553,96 @@ LIBC_INLINE constexpr DyadicFloat<Bits> quick_mul(const DyadicFloat<Bits> &a,
   return result;
 }
 
+// Correctly rounded multiplication of 2 dyadic floats, assuming the
+// exponent remains within range.
+template <size_t Bits>
+LIBC_INLINE constexpr DyadicFloat<Bits>
+rounded_mul(const DyadicFloat<Bits> &a, const DyadicFloat<Bits> &b) {
+  using DblMant = LIBC_NAMESPACE::UInt<(2 * Bits)>;
+  Sign result_sign = (a.sign != b.sign) ? Sign::NEG : Sign::POS;
+  int result_exponent = a.exponent + b.exponent + static_cast<int>(Bits);
+  auto product = DblMant(a.mantissa) * DblMant(b.mantissa);
+  // As in quick_mul(), renormalize by 1 bit manually rather than countl_zero
+  if (product.get_bit(2 * Bits - 1) == 0) {
+    product <<= 1;
+    result_exponent -= 1;
+  }
+
+  return DyadicFloat<Bits>::round(result_sign, result_exponent, product, Bits);
+}
+
+// Approximate reciprocal - given a nonzero a, make a good approximation to 1/a.
+// The method is Newton-Raphson iteration, based on quick_mul.
+template <size_t Bits, typename = cpp::enable_if_t<(Bits >= 32)>>
+LIBC_INLINE constexpr DyadicFloat<Bits>
+approx_reciprocal(const DyadicFloat<Bits> &a) {
+  // Given an approximation x to 1/a, a better one is x' = x(2-ax).
+  //
+  // You can derive this by using the Newton-Raphson formula with the function
+  // f(x) = 1/x - a. But another way to see that it works is to say: suppose
+  // that ax = 1-e for some small error e. Then ax' = ax(2-ax) = (1-e)(1+e) =
+  // 1-e^2. So the error in x' is the square of the error in x, i.e. the number
+  // of correct bits in x' is double the number in x.
+
+  // An initial approximation to the reciprocal
+  DyadicFloat<Bits> x(Sign::POS, -32 - a.exponent - Bits,
+                      uint64_t(0xFFFFFFFFFFFFFFFF) /
+                          static_cast<uint64_t>(a.mantissa >> (Bits - 32)));
+
+  // The constant 2, which we'll need in every iteration
+  DyadicFloat<Bits> two(Sign::POS, 1, 1);
+
+  // We expect at least 31 correct bits from our 32-bit starting approximation
+  size_t ok_bits = 31;
+
+  // The number of good bits doubles in each iteration, except that rounding
+  // errors introduce a little extra each time. Subtract a bit from our
+  // accuracy assessment to account for that.
+  while (ok_bits < Bits) {
+    x = quick_mul(x, quick_sub(two, quick_mul(a, x)));
+    ok_bits = 2 * ok_bits - 1;
+  }
+
+  return x;
+}
+
+// Correctly rounded division of 2 dyadic floats, assuming the
+// exponent remains within range.
+template <size_t Bits>
+LIBC_INLINE constexpr DyadicFloat<Bits>
+rounded_div(const DyadicFloat<Bits> &af, const DyadicFloat<Bits> &bf) {
+  using DblMant = LIBC_NAMESPACE::UInt<(Bits * 2 + 64)>;
+
+  // Make an approximation to the quotient as a * (1/b). Both the
+  // multiplication and the reciprocal are a bit sloppy, which doesn't
+  // matter, because we're going to correct for that below.
+  auto qf = fputil::quick_mul(af, fputil::approx_reciprocal(bf));
+
+  // Switch to BigInt and stop using quick_add and quick_mul: now
+  // we're working in exact integers so as to get the true remainder.
+  DblMant a = af.mantissa, b = bf.mantissa, q = qf.mantissa;
+  q <<= 2; // leave room for a round bit, even if exponent decreases
+  a <<= af.exponent - bf.exponent - qf.exponent + 2;
+  DblMant qb = q * b;
+  if (qb < a) {
+    DblMant too_small = a - b;
+    while (qb <= too_small) {
+      qb += b;
+      ++q;
+    }
+  } else {
+    while (qb > a) {
+      qb -= b;
+      --q;
+    }
+  }
+
+  DyadicFloat<(Bits * 2)> qbig(qf.sign, qf.exponent - 2, q);
+  auto qfinal = DyadicFloat<Bits>::round(qbig.sign, qbig.exponent + Bits,
+                                         qbig.mantissa, Bits);
+  return qfinal;
+}
+
 // Simple polynomial approximation.
 template <size_t Bits>
 LIBC_INLINE constexpr DyadicFloat<Bits>
diff --git a/libc/src/__support/big_int.h b/libc/src/__support/big_int.h
index a95ab4ff8e1abf..7cf5462f567aaa 100644
--- a/libc/src/__support/big_int.h
+++ b/libc/src/__support/big_int.h
@@ -936,6 +936,18 @@ struct BigInt {
   // Return the i-th word of the number.
   LIBC_INLINE constexpr WordType &operator[](size_t i) { return val[i]; }
 
+  // Return the i-th bit of the number.
+  LIBC_INLINE constexpr bool get_bit(size_t i) const {
+    const size_t word_index = i / WORD_SIZE;
+    return 1 & (val[word_index] >> (i % WORD_SIZE));
+  }
+
+  // Set the i-th bit of the number.
+  LIBC_INLINE constexpr void set_bit(size_t i) {
+    const size_t word_index = i / WORD_SIZE;
+    val[word_index] |= WordType(1) << (i % WORD_SIZE);
+  }
+
 private:
   LIBC_INLINE friend constexpr int cmp(const BigInt &lhs, const BigInt &rhs) {
     constexpr auto compare = [](WordType a, WordType b) {
@@ -989,12 +1001,6 @@ struct BigInt {
   LIBC_INLINE constexpr void clear_msb() {
     val.back() &= mask_trailing_ones<WordType, WORD_SIZE - 1>();
   }
-
-  LIBC_INLINE constexpr void set_bit(size_t i) {
-    const size_t word_index = i / WORD_SIZE;
-    val[word_index] |= WordType(1) << (i % WORD_SIZE);
-  }
-
   LIBC_INLINE constexpr static Division divide_unsigned(const BigInt &dividend,
                                                         const BigInt &divider) {
     BigInt remainder = dividend;
diff --git a/libc/src/__support/sign.h b/libc/src/__support/sign.h
index 4a629e44881956..e0de0e0798acb0 100644
--- a/libc/src/__support/sign.h
+++ b/libc/src/__support/sign.h
@@ -29,6 +29,8 @@ struct Sign {
   static const Sign POS;
   static const Sign NEG;
 
+  LIBC_INLINE constexpr Sign negate() const { return Sign(!is_negative); }
+
 private:
   LIBC_INLINE constexpr explicit Sign(bool is_negative)
       : is_negative(is_negative) {}
diff --git a/libc/src/stdio/printf_core/CMakeLists.txt b/libc/src/stdio/printf_core/CMakeLists.txt
index 9eaffe2f7ed621..ea58067c7070a6 100644
--- a/libc/src/stdio/printf_core/CMakeLists.txt
+++ b/libc/src/stdio/printf_core/CMakeLists.txt
@@ -16,6 +16,9 @@ endif()
 if(LIBC_CONF_PRINTF_FLOAT_TO_STR_NO_SPECIALIZE_LD)
   list(APPEND printf_config_copts "-DLIBC_COPT_FLOAT_TO_STR_NO_SPECIALIZE_LD")
 endif()
+if(LIBC_CONF_PRINTF_FLOAT_TO_STR_USE_FLOAT320)
+  list(APPEND printf_config_copts "-DLIBC_COPT_FLOAT_TO_STR_USE_FLOAT320")
+endif()
 if(LIBC_CONF_PRINTF_DISABLE_FIXED_POINT)
   list(APPEND printf_config_copts "-DLIBC_COPT_PRINTF_DISABLE_FIXED_POINT")
 endif()
diff --git a/libc/src/stdio/printf_core/converter_atlas.h b/libc/src/stdio/printf_core/converter_atlas.h
index 18cfe1e717cbea..dfb91b30e80f82 100644
--- a/libc/src/stdio/printf_core/converter_atlas.h
+++ b/libc/src/stdio/printf_core/converter_atlas.h
@@ -26,7 +26,11 @@
 // defines convert_float_decimal
 // defines convert_float_dec_exp
 // defines convert_float_dec_auto
+#ifdef LIBC_COPT_FLOAT_TO_STR_USE_FLOAT320
+#include "src/stdio/printf_core/float_dec_converter_limited.h"
+#else
 #include "src/stdio/printf_core/float_dec_converter.h"
+#endif
 // defines convert_float_hex_exp
 #include "src/stdio/printf_core/float_hex_converter.h"
 #endif // LIBC_COPT_PRINTF_DISABLE_FLOAT
diff --git a/libc/src/stdio/printf_core/float_dec_converter_limited.h b/libc/src/stdio/printf_core/float_dec_converter_limited.h
new file mode 100644
index 00000000000000..d3e5ab13cc9490
--- /dev/null
+++ b/libc/src/stdio/printf_core/float_dec_converter_limited.h
@@ -0,0 +1,593 @@
+//===-- Decimal Float Converter for printf (320-bit float) ------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements an alternative to the Ryū printf algorithm in
+// float_dec_converter.h. Instead of generating output digits 9 at a time on
+// demand, in this implementation, a float is converted to decimal by computing
+// just one power of 10 and multiplying/dividing the entire input by it,
+// generating the whole string of decimal output digits in one go.
+//
+// This avoids the large constant lookup table of Ryū, making it more suitable
+// for low-memory embedded contexts; but it's also faster than the fallback
+// version of Ryū which computes table entries on demand using DyadicFloat,
+// because those must calculate a potentially large power of 10 per 9-digit
+// output block, whereas this computes just one, which does the whole job.
+//
+// (OK, not quite. It computes one power of 10 per _attempt_. It must start by
+// guessing the decimal exponent of the output, and if it guesses too high or
+// too low then it adjusts the guess and tries again from scratch.)
+//
+// The calculation is done in 320-bit DyadicFloat, which provides enough
+// precision to generate 39 correct digits of output from any floating-point
+// size up to and including 128-bit long double, because the rounding errors in
+// computing the largest necessary power of 10 are still smaller than the
+// distance (in the 320-bit float format) between adjacent 39-decimal-digit
+// outputs.
+//
+// No further digits beyond the 39th are generated: if the printf format string
+// asks for more precision than that, the answer is padded with 0s. This is a
+// permitted option in IEEE 754-2019 (section 5.12.2): you're allowed to define
+// a limit H on the number of decimal digits you can generate, and pad with 0s
+// if asked for more than that, subject to the constraint that H must be
+// consistent across all float formats you support (you can't use a smaller H
+// for single precision than double or long double), and must be large enough
+// that even in the largest supported precision the only numbers misrounded are
+// ones extremely close to a rounding boundary. 39 digits is the smallest
+// permitted value for an implementation supporting binary128.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_STDIO_PRINTF_CORE_FLOAT_DEC_CONVERTER_LIMITED_H
+#define LLVM_LIBC_SRC_STDIO_PRINTF_CORE_FLOAT_DEC_CONVERTER_LIMITED_H
+
+#include "src/__support/CPP/algorithm.h"
+#include "src/__support/CPP/string.h"
+#include "src/__support/CPP/string_view.h"
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/dyadic_float.h"
+#include "src/__support/FPUtil/rounding_mode.h"
+#include "src/__support/integer_to_string.h"
+#include "src/__support/macros/config.h"
+#include "src/stdio/printf_core/core_structs.h"
+#include "src/stdio/printf_core/float_inf_nan_converter.h"
+#include "src/stdio/printf_core/writer.h"
+
+#include <inttypes.h>
+#include <stddef.h>
+#include <string.h>
+
+namespace LIBC_NAMESPACE_DECL {
+namespace printf_core {
+
+enum class ConversionType { E, F, G };
+using StorageType = fputil::FPBits<long double>::StorageType;
+
+constexpr unsigned MAX_DIGITS = 39;
+constexpr size_t DF_BITS = 320;
+constexpr char DECIMAL_POINT = '.';
+
+struct DigitsInput {
+  // Input mantissa, stored with the explicit leading 1 bit (if any) at the
+  // top. So either it has a value in the range [2^127,2^128) representing a
+  // real number in [1,2), or it has the value 0, representing 0.
+  UInt128 mantissa;
+
+  // Input exponent, as a power of 2 to multiply into mantissa.
+  int exponent;
+
+  // Input sign.
+  Sign sign;
+
+  // Constructor which accepts a mantissa direct from a floating-point format,
+  // and shifts it up to the top of the UInt128 so that a function consuming
+  // this struct afterwards doesn't have to remember which format it came from.
+  DigitsInput(int32_t fraction_len, StorageType mantissa_, int exponent_,
+              Sign sign)
+      : mantissa(UInt128(mantissa_) << (127 - fraction_len)),
+        exponent(exponent_), sign(sign) {
+    if (!(mantissa & (UInt128(1) << 127)) && mantissa != 0) {
+      // Normalize a denormalized input.
+      int shift = cpp::countl_zero(mantissa);
+      mantissa <<= shift;
+      exponent -= shift;
+    }
+  }
+};
+
+struct DigitsOutput {
+  // Output from decimal_digits().
+  //
+  // `digits` is a buffer containing nothing but ASCII digits. Even if the
+  // decimal point needs to appear somewhere in the final output string, it
+  // isn't represented in _this_ string; the client of this object will insert
+  // it in an appropriate place. `ndigits` gives the buffer size.
+  //
+  // `exponent` represents the exponent you would display if the decimal point
+  // comes after the first digit of decimal_digits, e.g. if digits == "1234"
+  // and exponent = 3 then this represents 1.234e3, or just the integer 1234.
+  size_t ndigits;
+  int exponent;
+  char digits[MAX_DIGITS + 1];
+};
+
+// Calculate the actual digits of a decimal representation of an FP number.
+//
+// If `e_mode` is true, then `precision` indicates the desired number of output
+// decimal digits. On return, `decimal_digits` will be a string of length
+// exactly `precision` starting with a nonzero digit; `decimal_exponent` will
+// be filled in to indicate the exponent as shown above.
+//
+// If `e_mode` is false, then `precision` indicates the desired number of
+// digits after the decimal point. On return, the last digit in the string
+// `decimal_digits` has a place value of _at least_ 10^-precision. But also, at
+// most `MAX_DIGITS` digits are returned, so the caller may need to pad it at
+// the end with the appropriate number of extra 0s.
+LIBC_INLINE
+DigitsOutput decimal_digits(DigitsInput input, int precision, bool e_mode) {
+  if (input.mantissa == 0) {
+    // Special-case zero, by manually generating the right number of zero
+    // digits and setting an appropriate exponent.
+    DigitsOutput output;
+    if (!e_mode) {
+      // In F mode, it's enough to return an empty string of digits. That's the
+      // same thing we do when given a non...
[truncated]

statham-arm · 2025-01-20T16:37:36Z

This PR has a performance dependency on #123580. I haven't listed a formal dependency between the two, because in correctness terms they're independent. But the claim in my commit message here that the new USE_FLOAT320 algorithm is faster than USE_DYADIC_FLOAT is only true once IntegerToString is optimized. So I don't intend to land this until after #123580.

michaelrj-google

This patch looks very good, I have some comments about the details of the algorithm but I like the direction you're going.

michaelrj-google · 2025-01-21T23:02:48Z

libc/src/stdio/printf_core/float_dec_converter_limited.h

+  // need to iterate.
+  fputil::DyadicFloat<DF_BITS> ten(Sign::POS, 1, 5);
+
+  while (true) {


I'm pretty sure there's a way to calculate this without the while loop. Specifically I think that if we cap the number of requested digits (log10_input - log10_low_digit) to be at most min(MAX_DIGITS, length of number) you'll be within one step of the true result.
I think there are some other tweaks as well. I'll leave those on the specific spots.

I think you're probably right that we can guarantee to use only one iteration of the main algorithm, but I think it's slightly more complicated than you say, because of the need to round correctly.

For some binary exponents, there's only one choice of power of 10, because everything from 2^n to 2^(n+1) has the same number of decimal digits. Other binary exponents straddle a power-of-10 boundary. For the latter, I think what you'd have to do is to convert based on the smaller exponent, so that the decimal string is either the right length or one digit too long. But if it's one digit too long, you must make sure to repeat the rounding from the full version and not just discard the final decimal digit and use it to re-round what's left. (Or, equivalently, remember whether the rounded result was smaller or larger.)

That's a good point about needing to be careful about the rounding, I wasn't thinking about that at the time. The main thing I'd noticed was one of the test cases (line 1567 in sprintf_test.cpp) seemed to be taking an inefficient path: Since the precision was 500 it started with an exponent of -500, which is well out of the realm of possibility. I realize it skips to a precision of MAX_DIGITS on the next iteration, but it still seems like a lot more steps than necessary.

Yes, I see what you mean – that's a case where the initial run tries to put the low-order digit where %f mode expects it, but as you say, it's ludicrously out of range in a way that we could have recognised just from bounds on the exponent without spending a whole loop iteration on it!

michaelrj-google · 2025-01-21T23:07:19Z

libc/src/stdio/printf_core/float_dec_converter_limited.h

+        // In F mode, if the number would generate too many digits to fit, then
+        // reset to E mode, so that we generate the most significant digits of
+        // the number, and rely on our caller to append zeroes at the end.
+        precision = MAX_DIGITS;
+        e_mode = true;


You can check if an f conversion needs to be done in e mode before the loop, because that will happen when the number of digits generated is more than MAX_DIGITS and the formula for the max number of digits in the number is digits = exponent >= 0 ? log10_input + max(0, fract_width - exp) : log10_input - exponent.

I don't think that formula can be quite right, because of the thing I mention in the comment above, where some binary exponents can go with two possible decimal exponents depending on the mantissa, because a power of 10 lies in their interval. So any test that doesn't even look at the mantissa must be potentially inaccurate.

But I'm sure you're right that most cases of "too many output digits, go to e mode" can be detected before the conversion to decimal digits, and we can narrow it down so that the remaining ones only generate one digit too many, which can be dealt with by re-rounding the same float320 output.

I think you're right about there being possibly one more digit, and I forgot to add fract_width to the second term, but I think if you add 1 you'd get a fairly tight upper bound to the number of digits. It's a similar formula to what is used in float_to_string for length_for_num and zero_blocks_after_point.

michaelrj-google · 2025-01-21T23:10:48Z

libc/src/stdio/printf_core/float_dec_converter_limited.h

+      if (view.size() > size_t(precision)) { /* too big */
+        log10_input++;
+        continue;


In this case the output buffer contains all the correct digits, it just also has some extras. It's possible to avoid a complete recalculation by decrementing output.ndigits and incrementing output.exponent the correct number of times before returning.

As I mention in the comment above, I think this needs more care over re-rounding than you mention, but fair enough.

michaelrj-google · 2025-01-21T23:22:28Z

libc/src/stdio/printf_core/float_dec_converter_limited.h

+  for (int pos = start; pos < limit; ++pos) {
+    if (pos >= 0 && pos < int(output.ndigits)) {
+      // Fetch a digit from the buffer
+      RET_IF_RESULT_NEGATIVE(writer->write(output.digits[pos], 1));


the write function has an overload that just takes a char without a length, so anywhere you're doing writer->write(char, 1) can just be writer->write(char)

michaelrj-google · 2025-01-21T23:23:27Z

libc/src/stdio/printf_core/float_dec_converter_limited.h

+template <typename T, cpp::enable_if_t<cpp::is_floating_point_v<T>, int> = 0>
+LIBC_INLINE int convert_float_decimal_typed(Writer *writer,
+                                            const FormatSection &to_conv,
+                                            fputil::FPBits<T> float_bits) {
+  return convert_float_typed<T>(writer, to_conv, float_bits, ConversionType::F);
+}
+
+template <typename T, cpp::enable_if_t<cpp::is_floating_point_v<T>, int> = 0>
+LIBC_INLINE int convert_float_dec_exp_typed(Writer *writer,
+                                            const FormatSection &to_conv,
+                                            fputil::FPBits<T> float_bits) {
+  return convert_float_typed<T>(writer, to_conv, float_bits, ConversionType::E);
+}
+
+template <typename T, cpp::enable_if_t<cpp::is_floating_point_v<T>, int> = 0>
+LIBC_INLINE int convert_float_dec_auto_typed(Writer *writer,
+                                             const FormatSection &to_conv,
+                                             fputil::FPBits<T> float_bits) {
+  return convert_float_typed<T>(writer, to_conv, float_bits, ConversionType::G);
+}


these are unused

I don't think they are – libc/src/stdlib/str_from_util.h refers to them? Surely if the usual float_dec_converter.h provides them for that purpose, my replacement must provide them too.

ah, I think I understand what happened. I tried deleting them and it still built because the strfrom code isn't getting the compile flags applied. For those to be affected you'd need to update the cmake in src/stdlib

michaelrj-google · 2025-01-21T23:26:29Z

libc/src/stdio/printf_core/float_dec_converter_limited.h

+
+#include <inttypes.h>
+#include <stddef.h>
+#include <string.h>


we try to avoid including public libc headers internally. For right now I'd recommend just using the __builtin_mem*** variants since that's simplest.

statham-arm · 2025-01-22T17:04:21Z

Thanks for the comments! I've pushed a fixup patch dealing with the two easy ones (avoid including public headers, use the write(char) overload). I'll need to do some more rework for the larger issue of trying to avoid the loop.

Compared to the previous version: The helper function `need_to_round_up` is replaced with `rounding_direction`, which has three rather than two return values: it can distinguish 'round down' from 'already exact, no rounding at all'. The method `DyadicFloat::as_mantissa_type_rounded` no longer returns an overflow indication, because we now avoid overflowing in the first place by early detection of wildly out-of-range exponents. But it does return a rounding direction indicating whether the mantissa was rounded up, down, or was exact. I had to fix `BigInt::decrement()`, which turned out to be accidentally an exact copy of `increment()`. Then `decimal_digits` is rewritten essentially as suggested in the review comments: - there's no loop - the initial estimated exponent is used to detect the need to shift from F to E mode in advance rather than spotting it after the fact by overflow - so now the initial decimal exponent is always either correct or one too small - and if it's one too small, we handle it by truncating a digit off the output decimal mantissa, re-rounding carefully to avoid a double-rounding error.

github-actions · 2025-01-27T15:50:23Z

✅ With the latest revision this PR passed the C/C++ code formatter.

statham-arm · 2025-01-27T16:30:10Z

@michaelrj-google , I think this is ready for another review. It took me a few days to get the revised algorithm re-rounding correctly in all modes and passing all the tests, but I think it's there now, and I agree it's an improvement on the old algorithm, because it now reliably calculates just one power of 10, which is always good enough to get the right output with only a minor postprocessing cost.

michaelrj-google

The algorithm looks good to me, I've left some comments but they're mostly code style related. Thank you for figuring out how to get rid of the loop, I think the result is much more elegant.

michaelrj-google · 2025-01-27T21:37:25Z

libc/src/stdio/printf_core/float_dec_converter_limited.h

+  int log10_input_min = ((input.exponent - 1) * 1292913986LL) >> 32;
+  int log10_input_max = (input.exponent * 1292913986LL) >> 32;


nit: move the log10 approximation into a helper function

michaelrj-google · 2025-01-27T21:38:47Z

libc/src/stdio/printf_core/float_dec_converter_limited.h

+  int log10_input_max = (input.exponent * 1292913986LL) >> 32;
+
+  // Make a DyadicFloat containing the value 10, to use as the base for
+  // exponentation inside the following loop, potentially more than once if we


nit: the loop was removed, update this comment

michaelrj-google · 2025-01-27T21:43:27Z

libc/src/stdio/printf_core/float_dec_converter_limited.h

+  fputil::DyadicFloat<DF_BITS> flt_mantissa(input.sign, input.exponent - 127,
+                                            input.mantissa);


I think the number to subtract input.exponent by should be fract_len - 1 instead of a constant 127.

No, 127 really is right, because by this time we've normalised away the input FP format, and reduced it to my DigitsInput struct which always puts the mantissa at the top of a UInt128. I'll leave the code as it is but add a comment clarifying that.

in that case it might make sense to just subtract <bits in mantissa> - 1 for clarity.

michaelrj-google · 2025-01-27T21:50:05Z

libc/src/stdio/printf_core/float_dec_converter_limited.h

+    // X.YZe+NN and instead got WX.YZe+NN. So when we shorten the digit string
+    // by one, we'll also need to increment the output exponent.
+    if (output.ndigits > size_t(precision)) {
+      assert(output.ndigits == size_t(precision) + 1);


nit: here and elsewhere, replace assert with LIBC_ASSERT. You'll also need to add #include "src/__support/libc_assert.h"

michaelrj-google · 2025-01-27T21:56:47Z

libc/src/stdio/printf_core/float_dec_converter_limited.h

+    int round_digit = output.digits[--output.ndigits] - '0';
+    int new_low_digit =
+        output.ndigits == 0 ? 0 : output.digits[output.ndigits - 1] - '0';


We're trying to make sure that our code is character encoding independant. To get the integer value for a character instead of using c - '0' use b36_char_to_int(c).
Also I personally prefer to split decrement operations into a separate line for ease of reading, but that's more of a nit.

Wow, that's surely above the call of duty? The C standard has always guaranteed that the char values of the decimal digits are consecutive. Not the alphabet, because EBCDIC was still a thing when C was getting started, but even EBCDIC puts 0,1,...,9 in sensible places.

michaelrj-google · 2025-01-27T21:59:54Z

libc/src/stdio/printf_core/float_dec_converter_limited.h

+    LIBC_NAMESPACE::UInt<64> round_word = (new_low_digit * 256) +
+                                          ((round_digit * 0x19a) >> 4) +


I'm not sure I understand what (round_digit * 0x19a) >> 4 represents, could you clarify, possibly by defining it as another variable?

michaelrj-google · 2025-01-27T22:01:32Z

libc/src/stdio/printf_core/float_dec_converter_limited.h

+      // at a time. (A bit painful, but better than going back to the integer
+      // we made it from and doing the decimal conversion all over again.)
+      for (size_t i = output.ndigits; i-- > 0;) {
+        if (output.digits[i]++ != '9')


nit: same as above, I personally prefer to keep decrements/increments separate.

For character encoding independence this loop is fine, since it's only checking equality.

I'm confused – surely here I'm depending on exactly the same assumption about character encoding that I did in the earlier char_value - '0' expression? Namely, that the decimal digit characters appear consecutively, so that I can increment one to the next digit by writing output.digits[i]++.

Neither piece of code depends on the specific ASCII encodings of the decimal digits. But both depend on them being consecutive. So either you've misunderstood, or I have: why is one of those OK and the other not?

you're right, I did misunderstand. This should be output.digits[i] = int_to_b36_char(b36_char_to_int(output.digits[i]) + 1);

michaelrj-google · 2025-01-27T22:04:58Z

libc/src/stdio/printf_core/float_dec_converter_limited.h

+                          radix::Dec::WithWidth<2>::WithSign>
+        expcvt{output.exponent};
+    cpp::string_view expview = expcvt.view();
+    expbuf[0] = (to_conv.conv_name & 32) | 'E';


For character encoding independence, use isupper and toupper here.

lntue · 2025-01-29T14:14:29Z

libc/src/__support/FPUtil/dyadic_float.h

+      // We round up, unless the value is an exact halfway case and
+      // the bit that will end up in the units place is 0, in which
+      // case tie-break-to-even says round down.
+      return value.get_bit(rshift) != 0 || (value << (Bits - rshift + 1)) != 0


It looks like value.get_bit(rshift) will be overflow if rshift == Bits?

lntue · 2025-01-29T14:19:20Z

libc/src/__support/FPUtil/dyadic_float.h

+  case FE_TOWARDZERO:
+    return -1;
+  case FE_DOWNWARD:
+    return logical_sign.is_neg() && (value << (Bits - rshift)) != 0 ? +1 : -1;


Bits - rshift could be negative here?

lntue · 2025-01-29T14:19:46Z

libc/src/__support/FPUtil/dyadic_float.h

+  case FE_DOWNWARD:
+    return logical_sign.is_neg() && (value << (Bits - rshift)) != 0 ? +1 : -1;
+  case FE_UPWARD:
+    return logical_sign.is_pos() && (value << (Bits - rshift)) != 0 ? +1 : -1;


lntue · 2025-01-29T14:52:57Z

libc/src/__support/FPUtil/dyadic_float.h

+  }
+
+  DyadicFloat<(Bits * 2)> qbig(qf.sign, qf.exponent - 2, q);
+  auto qfinal = DyadicFloat<Bits>::round(qbig.sign, qbig.exponent + Bits,


nit: just return DyadicFloat<Bits>::round...

lntue

Thanks @statham-arm ! It looks good to me with just a few nits.

michaelrj-google

LGTM, thanks for writing this patch!

statham-arm · 2025-01-30T09:08:01Z

Thank you both for approving!

Would you please also look at #123580, which this FP printf strategy depends on? It works correctly with or without that patch, but it's a huge slowdown without it, because this patch depends on IntegerToString<BigInt> to make its output digit strings, and that's currently much slower than it needs to be.

The existing options for bin→dec float conversion are all based on the Ryū algorithm, which generates 9 output digits at a time using a table lookup. For users who can't afford the space cost of the table, the table-lookup subroutine is replaced with one that computes the needed table entry on demand, but the algorithm is otherwise unmodified. The performance problem with computing table entries on demand is that now you need to calculate a power of 10 for each 9 digits you output. But if you're calculating a custom power of 10 anyway, it's easier to just compute one, and multiply the _whole_ mantissa by it. This patch adds a header file alongside `float_dec_converter.h`, which replaces the whole Ryū system instead of just the table-lookup routine, implementing this alternative simpler algorithm. The result is accurate enough to satisfy (minimally) the accuracy demands of IEEE 754-2019 even in 128-bit long double. The new float128 test cases demonstrate this by testing the cases closest to the 39-digit rounding boundary. In my tests of generating 39 output digits (the maximum number supported by this algorithm) this code is also both faster and smaller than the USE_DYADIC_FLOAT version of the existing Ryū code.

statham-arm requested review from gchatelet, lntue and michaelrj-google January 20, 2025 16:36

llvmbot added the libc label Jan 20, 2025

michaelrj-google reviewed Jan 21, 2025

View reviewed changes

fixup! [libc] Alternative algorithm for decimal FP printf

797da34

Build and format fixes

d38304e

michaelrj-google reviewed Jan 27, 2025

View reviewed changes

statham-arm added 2 commits January 28, 2025 12:00

Address review nits from 2025-01-27

97c2a2a

Address review nits 2025-01-28

045ba35

lntue reviewed Jan 29, 2025

View reviewed changes

Address review nits 2025-01-29

9efdb3b

michaelrj-google approved these changes Jan 29, 2025

View reviewed changes

lntue approved these changes Jan 29, 2025

View reviewed changes

statham-arm merged commit b53da77 into llvm:main Feb 4, 2025
15 checks passed

statham-arm deleted the printf-float320 branch February 4, 2025 09:21

		int log10_input_min = ((input.exponent - 1) * 1292913986LL) >> 32;
		int log10_input_max = (input.exponent * 1292913986LL) >> 32;

		fputil::DyadicFloat<DF_BITS> flt_mantissa(input.sign, input.exponent - 127,
		input.mantissa);

		LIBC_NAMESPACE::UInt<64> round_word = (new_low_digit * 256) +
		((round_digit * 0x19a) >> 4) +

[libc] Alternative algorithm for decimal FP printf #123643

[libc] Alternative algorithm for decimal FP printf #123643

Conversation

statham-arm commented Jan 20, 2025

llvmbot commented Jan 20, 2025

statham-arm commented Jan 20, 2025

michaelrj-google left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

statham-arm commented Jan 22, 2025

github-actions bot commented Jan 27, 2025 • edited Loading

statham-arm commented Jan 27, 2025

michaelrj-google left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lntue left a comment

Choose a reason for hiding this comment

michaelrj-google left a comment

Choose a reason for hiding this comment

statham-arm commented Jan 30, 2025

github-actions bot commented Jan 27, 2025 •

edited

Loading