struct (un)packing of half-precision `nan` floats is non-invertible #130317

seh-dev · 2025-02-19T19:22:31Z

Bug report

Bug description:

I noticed that chaining struct.unpack() and struct.pack() for IEEE 754 Half Precision floats (e) is non-invertible for nan. E.g.:

import struct

original_bytes = b'\xff\xff'

unpacked_float = struct.unpack('e', original_bytes)[0]  # nan
repacked_bytes = struct.pack('e', unpacked_float)  # b'\x00\xfe'  != b'\xff\xff'

IEEE nans aren't unique, so this isn't that surprising... However I found it curious that the same behavior is not exhibited for float (f) or double (d) format, where every original bit pattern I tested could be recovered from the unpacked nan object.

Is this by design?

Here's a quick pytest script that tests over a broad range of nan/inf/-inf cases for each encoding format.

# /// script
# requires-python = ">=3.11"
# dependencies = ["pytest"]
# ///
import struct
import pytest


# Floating Point Encodings Based on IEEE 754 per https://en.wikipedia.org/wiki/IEEE_754#Basic_and_interchange_formats
# binary 16 (half precision) - 1 bit sign, 5 bit exponent, 11 bit significand
# binary 32 (single precision) - 1 bit sign, 8 bit exponent, 23 bit significand
# binary 64 (double precision) - 1 bit sign, 11 bit exponent, 52 bit significand


MAX_TEST_CASES = 100000  # limit number of bit patterns being sampled so we aren't waiting too long


@pytest.mark.parametrize(["precision_format", "precision", "exponent_bits"], [("f", 32, 8), ("d", 64, 11), ("e", 16, 5)])
@pytest.mark.parametrize("sign_bit", [0, 1])
@pytest.mark.parametrize("endianness", ["little", "big"])
def test_struct_floats(precision_format: str, precision: int, exponent_bits: int, sign_bit: int, endianness: str):
    significand_bits = precision - exponent_bits - 1

    n_tests = min(MAX_TEST_CASES, 2**significand_bits)

    significand_patterns = [significand_bits * "0", significand_bits * "1"] + [
        bin(i + 1)[2:] for i in range(1, 2**significand_bits, 2**significand_bits // n_tests)
    ]

    for i in range(n_tests):
        binary = str(sign_bit) + "1" * exponent_bits + significand_patterns[i]
        if endianness == "big":
            format = ">" + precision_format
        elif endianness == "little":
            format = "<" + precision_format
        else:
            raise NotImplementedError()

        test_bytes = int(binary, base=2).to_bytes(precision // 8, endianness)

        unpacked = struct.unpack(format, test_bytes)
        assert len(unpacked) == 1

        repacked = struct.pack(format, unpacked[0])

        assert (
            repacked == test_bytes
        ), f"struct pack/unpack was not invertible for format {format} with raw value: {test_bytes} -> unpacks to {unpacked[0]}, repacks to {repacked}"

if __name__ == "__main__":
    pytest.main([__file__])

CPython versions tested on:

3.13, 3.11, 3.12

Operating systems tested on:

Linux, Windows

The text was updated successfully, but these errors were encountered:

skirpichev · 2025-02-20T03:57:03Z

It seems you are on IEEE-platform, or unpacking special values will fail for float and double formats. So for those formats, pack/unpack functions work by copying bits.

But not PyFloat_Pack2() and PyFloat_Unpack2(). E.g. the later just ignores all payload in the nan value and maps data to one or another quiet nan:

cpython/Objects/floatobject.c

Lines 2402 to 2405 in 12e1d30

    
           else { 
        
               /* NaN */ 
        
               return sign ? -fabs(Py_NAN) : fabs(Py_NAN); 
        
           }

The PyFloat_Pack2() also ignores all payload from double nan:

cpython/Objects/floatobject.c

Lines 2050 to 2059 in 12e1d30

    
               else if (isnan(x)) { 
        
                   /* There are 2046 distinct half-precision NaNs (1022 signaling and 
        
                      1024 quiet), but there are only two quiet NaNs that don't arise by 
        
                      quieting a signaling NaN; we get those by setting the topmost bit 
        
                      of the fraction field and clearing all other fraction bits. We 
        
                      choose the one with the appropriate sign. */ 
        
                   sign = (copysign(1.0, x) == -1.0); 
        
                   e = 0x1f; 
        
                   bits = 512; 
        
               }

Is this by design?

Looks as a bug for me.

CC @mdickinson

Edit: assuming doubles are binary64, following patch fix your tests:

diff --git a/Objects/floatobject.c b/Objects/floatobject.c
index 3b72a1e7c3..e473fb72fe 100644
--- a/Objects/floatobject.c
+++ b/Objects/floatobject.c
@@ -2048,14 +2048,16 @@ PyFloat_Pack2(double x, char *data, int le)
         bits = 0;
     }
     else if (isnan(x)) {
-        /* There are 2046 distinct half-precision NaNs (1022 signaling and
-           1024 quiet), but there are only two quiet NaNs that don't arise by
-           quieting a signaling NaN; we get those by setting the topmost bit
-           of the fraction field and clearing all other fraction bits. We
-           choose the one with the appropriate sign. */
         sign = (copysign(1.0, x) == -1.0);
         e = 0x1f;
-        bits = 512;
+
+        uint64_t v;
+
+        memcpy(&v, &x, sizeof(v));
+        bits = v & 0x1ff;
+        if (v & 0x800000000000) {
+            bits += 0x200;
+        }
     }
     else {
         sign = (x < 0.0);
@@ -2401,7 +2403,16 @@ PyFloat_Unpack2(const char *data, int le)
         }
         else {
             /* NaN */
-            return sign ? -fabs(Py_NAN) : fabs(Py_NAN);
+            uint64_t v = ((sign? 0xff00000000000000 : 0x7f00000000000000)
+                          + 0xf0000000000000);
+
+            if (f & 0x200) {
+                v += 0x800000000000;
+                f -= 0x200;
+            }
+            v += f;
+            memcpy(&x, &v, sizeof(v));
+            return x;
         }
     }

FYI: #55943. Probably the reason why payload was ignored is that the patch was adapted from numpy sources.

skirpichev · 2025-02-20T09:09:12Z

@tim-one, does it looks as an issue for you?

seh-dev added the type-bug An unexpected behavior, bug, or error label Feb 19, 2025

skirpichev added the extension-modules C modules in the Modules dir label Feb 20, 2025

skirpichev self-assigned this Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

struct (un)packing of half-precision `nan` floats is non-invertible #130317

struct (un)packing of half-precision `nan` floats is non-invertible #130317

seh-dev commented Feb 19, 2025 •

edited by github-actions bot

Loading

skirpichev commented Feb 20, 2025 •

edited

Loading

skirpichev commented Feb 20, 2025

struct (un)packing of half-precision nan floats is non-invertible #130317

struct (un)packing of half-precision nan floats is non-invertible #130317

Comments

seh-dev commented Feb 19, 2025 • edited by github-actions bot Loading

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

skirpichev commented Feb 20, 2025 • edited Loading

skirpichev commented Feb 20, 2025

struct (un)packing of half-precision `nan` floats is non-invertible #130317

struct (un)packing of half-precision `nan` floats is non-invertible #130317

seh-dev commented Feb 19, 2025 •

edited by github-actions bot

Loading

skirpichev commented Feb 20, 2025 •

edited

Loading