Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

struct (un)packing of half-precision nan floats is non-invertible #130317

Open
seh-dev opened this issue Feb 19, 2025 · 2 comments
Open

struct (un)packing of half-precision nan floats is non-invertible #130317

seh-dev opened this issue Feb 19, 2025 · 2 comments
Assignees
Labels
extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@seh-dev
Copy link

seh-dev commented Feb 19, 2025

Bug report

Bug description:

I noticed that chaining struct.unpack() and struct.pack() for IEEE 754 Half Precision floats (e) is non-invertible for nan. E.g.:

import struct

original_bytes = b'\xff\xff'

unpacked_float = struct.unpack('e', original_bytes)[0]  # nan
repacked_bytes = struct.pack('e', unpacked_float)  # b'\x00\xfe'  != b'\xff\xff'

IEEE nans aren't unique, so this isn't that surprising... However I found it curious that the same behavior is not exhibited for float (f) or double (d) format, where every original bit pattern I tested could be recovered from the unpacked nan object.

Is this by design?

Here's a quick pytest script that tests over a broad range of nan/inf/-inf cases for each encoding format.

# /// script
# requires-python = ">=3.11"
# dependencies = ["pytest"]
# ///
import struct
import pytest


# Floating Point Encodings Based on IEEE 754 per https://en.wikipedia.org/wiki/IEEE_754#Basic_and_interchange_formats
# binary 16 (half precision) - 1 bit sign, 5 bit exponent, 11 bit significand
# binary 32 (single precision) - 1 bit sign, 8 bit exponent, 23 bit significand
# binary 64 (double precision) - 1 bit sign, 11 bit exponent, 52 bit significand


MAX_TEST_CASES = 100000  # limit number of bit patterns being sampled so we aren't waiting too long


@pytest.mark.parametrize(["precision_format", "precision", "exponent_bits"], [("f", 32, 8), ("d", 64, 11), ("e", 16, 5)])
@pytest.mark.parametrize("sign_bit", [0, 1])
@pytest.mark.parametrize("endianness", ["little", "big"])
def test_struct_floats(precision_format: str, precision: int, exponent_bits: int, sign_bit: int, endianness: str):
    significand_bits = precision - exponent_bits - 1

    n_tests = min(MAX_TEST_CASES, 2**significand_bits)

    significand_patterns = [significand_bits * "0", significand_bits * "1"] + [
        bin(i + 1)[2:] for i in range(1, 2**significand_bits, 2**significand_bits // n_tests)
    ]

    for i in range(n_tests):
        binary = str(sign_bit) + "1" * exponent_bits + significand_patterns[i]
        if endianness == "big":
            format = ">" + precision_format
        elif endianness == "little":
            format = "<" + precision_format
        else:
            raise NotImplementedError()

        test_bytes = int(binary, base=2).to_bytes(precision // 8, endianness)

        unpacked = struct.unpack(format, test_bytes)
        assert len(unpacked) == 1

        repacked = struct.pack(format, unpacked[0])

        assert (
            repacked == test_bytes
        ), f"struct pack/unpack was not invertible for format {format} with raw value: {test_bytes} -> unpacks to {unpacked[0]}, repacks to {repacked}"

if __name__ == "__main__":
    pytest.main([__file__])

Image

CPython versions tested on:

3.13, 3.11, 3.12

Operating systems tested on:

Linux, Windows

@seh-dev seh-dev added the type-bug An unexpected behavior, bug, or error label Feb 19, 2025
@skirpichev skirpichev added the extension-modules C modules in the Modules dir label Feb 20, 2025
@skirpichev skirpichev self-assigned this Feb 20, 2025
@skirpichev
Copy link
Member

skirpichev commented Feb 20, 2025

It seems you are on IEEE-platform, or unpacking special values will fail for float and double formats. So for those formats, pack/unpack functions work by copying bits.

But not PyFloat_Pack2() and PyFloat_Unpack2(). E.g. the later just ignores all payload in the nan value and maps data to one or another quiet nan:

cpython/Objects/floatobject.c

Lines 2402 to 2405 in 12e1d30

else {
/* NaN */
return sign ? -fabs(Py_NAN) : fabs(Py_NAN);
}

The PyFloat_Pack2() also ignores all payload from double nan:

cpython/Objects/floatobject.c

Lines 2050 to 2059 in 12e1d30

else if (isnan(x)) {
/* There are 2046 distinct half-precision NaNs (1022 signaling and
1024 quiet), but there are only two quiet NaNs that don't arise by
quieting a signaling NaN; we get those by setting the topmost bit
of the fraction field and clearing all other fraction bits. We
choose the one with the appropriate sign. */
sign = (copysign(1.0, x) == -1.0);
e = 0x1f;
bits = 512;
}

Is this by design?

Looks as a bug for me.

CC @mdickinson

Edit: assuming doubles are binary64, following patch fix your tests:

diff --git a/Objects/floatobject.c b/Objects/floatobject.c
index 3b72a1e7c3..e473fb72fe 100644
--- a/Objects/floatobject.c
+++ b/Objects/floatobject.c
@@ -2048,14 +2048,16 @@ PyFloat_Pack2(double x, char *data, int le)
         bits = 0;
     }
     else if (isnan(x)) {
-        /* There are 2046 distinct half-precision NaNs (1022 signaling and
-           1024 quiet), but there are only two quiet NaNs that don't arise by
-           quieting a signaling NaN; we get those by setting the topmost bit
-           of the fraction field and clearing all other fraction bits. We
-           choose the one with the appropriate sign. */
         sign = (copysign(1.0, x) == -1.0);
         e = 0x1f;
-        bits = 512;
+
+        uint64_t v;
+
+        memcpy(&v, &x, sizeof(v));
+        bits = v & 0x1ff;
+        if (v & 0x800000000000) {
+            bits += 0x200;
+        }
     }
     else {
         sign = (x < 0.0);
@@ -2401,7 +2403,16 @@ PyFloat_Unpack2(const char *data, int le)
         }
         else {
             /* NaN */
-            return sign ? -fabs(Py_NAN) : fabs(Py_NAN);
+            uint64_t v = ((sign? 0xff00000000000000 : 0x7f00000000000000)
+                          + 0xf0000000000000);
+
+            if (f & 0x200) {
+                v += 0x800000000000;
+                f -= 0x200;
+            }
+            v += f;
+            memcpy(&x, &v, sizeof(v));
+            return x;
         }
     }
 

FYI: #55943. Probably the reason why payload was ignored is that the patch was adapted from numpy sources.

@skirpichev
Copy link
Member

@tim-one, does it looks as an issue for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants