Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: allow for Null(DataType) equivalency comparison #729

Closed
wants to merge 1 commit into from

Conversation

rtyler
Copy link
Member

@rtyler rtyler commented Mar 5, 2025

The change made in e2378a579790172800d91701c143f4522957a9de modified the equivalency behaviors of Scalars in a way that mean Null(DataType::String) != Null(DataType::String) which I don't believe is correct since we can perform an equivalency on the DataType.

This changes re-introduces that support with a test case that passes on v0.6.1 and fails on v0.7.0, v0.8.0, and main.

The change made in `e2378a579790172800d91701c143f4522957a9de` modified
the equivalency behaviors of Scalars in a way that mean
Null(DataType::String) != Null(DataType::String) which I don't believe
is correct since we can perform an equivalency on the DataType.

This changes re-introduces that support with a test case that passes on
v0.6.1 and fails on v0.7.0, v0.8.0, and `main`.

Signed-off-by: R. Tyler Croy <[email protected]>
@rtyler rtyler marked this pull request as ready for review March 5, 2025 14:16
Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change made in e2378a5 modified the equivalency behaviors of Scalars in a way that mean Null(DataType::String) != Null(DataType::String) which I don't believe is correct since we can perform an equivalency on the DataType.

The change fixed a day-one bug in the implementation of Scalar and so is a breaking change but not a regression.

Nulls are, by definition, not comparable and comparing NULL with anything (including other NULL) produces NULL. It's a bit of a theoretical stretch to even assign a type to a NULL value, but it's awfully convenient in practice so all systems I know of do it (basically stating, "I expected a string but got NULL instead").

Kernel's PartialOrd implementation had the correct behavior all along, but the derived PartialEq resorted to a physical comparison of the enum variants which was logically incorrect. Further, the inconsistency between the two traits produces a weird pseudo-UB situation:

If PartialOrd or Ord are also implemented for Self and Rhs, their methods must also be consistent with PartialEq (see the documentation of those traits for the exact requirements). It’s easy to accidentally make them disagree by deriving some of the traits and manually implementing others. ... Violating these requirements is a logic error. The behavior resulting from a logic error is not specified, but users of the trait must ensure that such logic errors do not result in undefined behavior. This means that unsafe code must not rely on the correctness of these methods.

References:

Wikipedia's entry on SQL NULL semantics:

Since Null is not a member of any data domain, it is not considered a "value", but rather a marker (or placeholder) indicating the undefined value. Because of this, comparisons with Null can never result in either True or False, but always in a third logical result, Unknown.[9]

Spark NULL handling semantics for comparison operators:

Apache spark supports the standard comparison operators such as >, >=, =, < and <=. The result of these operators is unknown or NULL when one of the operands or both the operands are unknown or NULL. In order to compare the NULL values for equality, Spark provides a null-safe equal operator (<=>), which returns False when one of the operand is NULL and returns ‘True` when both the operands are NULL.

Arrow compute's binary:

If any index is null in either a or b, the corresponding index in the result will also be null

@rtyler
Copy link
Member Author

rtyler commented Mar 5, 2025

I strongly disagree, I think a Nullable of a particular type should be considered equivalent.

I'm not really interested in debating the point however.

Lumping Scalar::Null(DataType) into the same bucket as 0x0 nulls is silly to me.

@rtyler rtyler closed this Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants