Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Fix UNION field nullability tracking #14356
Core: Fix UNION field nullability tracking #14356
Changes from 1 commit
ada016c
77e9082
7b15ad3
e306f2d
c49925e
4fe8805
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another name for this one is
try_new_with_coerce_types
to emphasize it is coercing the typesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this one should be coercing types, but it's not doing this. It just takes a type from first UNION branch.
This is not new logic though. It's just moved from plan builder over here, to avoid duplicating code.
I didn't know why this logic existed in this shape, but it felt intentional enough not to simply replace it in this PR. I would prefer it to be fixed later... Hence this function name to make the caller wonder what "loose types" mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was noted once before. Thank you for making it more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> #14380
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible alternative: coerce type to common type? https://www.postgresql.org/docs/17/typeconv-union-case.html is what PG does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is what should be happening.
There are two places where this code is run:
For now in the (1) case I retained pre-existing logic, see current main
datafusion/datafusion/expr/src/logical_plan/builder.rs
Lines 1530 to 1531 in e718c1a
For (2) we don't want any coercions at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> #14380
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is already code that computes the coerced schema in the analyzer:
datafusion/datafusion/optimizer/src/analyzer/type_coercion.rs
Line 912 in c077ef5
Can we reuse the same logic? Maybe we can move the coercion code here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the optimizer case, no coercion logic should be invoked, the types must match.
So we need a version of this code which does that. (#14296 (comment))
For the variant which constructs new schema from some uncoerced inputs -- currently called "loose types" and currently not doing coercion to maintain behavior as it was -- yes, i agree this one could use the coerce_union_schema from the analyzer (perhaps moving the logic into here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can file a ticket describing what is desired and leave a comment in the code referring to that ticket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making sure I understand the use case. If I want to construct a UNION logical plan with different types that are coercible (be it by current builtin rules or future user-defined rules), then I would use the
Union::try_new_with_loose_types
and have the analyzer pass handle coercion. Is this right?Then what exactly is the use case for the
Union::try_new
? Since it's used in the schema recompute which can occur after the analyzer type coercion. Do we therefore need it to always perform proper coercion, including any future user-defined coercion rules?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
Note: this is not my design. It was exactly the same before the PR. Just the code moved around.
"schema recompute" is an overloaded term
if it runs after analyzer, it doesn't have to do any type coercion. In fact, it MUST NOT do any type coercion (#14296 (comment)). And in fact the
try_new
does not do any coercions. It's still needed to do column pruning.In fact, IMO we should remove "schema recompute" from optimizer: #14357. For column pruning we should explicitly prune inputs of union and the unin itself using the same set of "required columns/indices". No need for a generic "recompute schema".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added link to #14380 in the code.
while #14357 is also very relevant, i don't see a place where a link would be suitable, so not adding it for now