-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an example of boundary analysis simple expressions. #14688
Add an example of boundary analysis simple expressions. #14688
Conversation
The goal of this change is to add an example to explain data flow during boundary analysis of AND and OR expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @clflushopt -- this is great. I think it also shows a bit how complex the API is currently to use. I can't wait to see it become easier to use over time ❤️
FYI @ozankabak @hiltontj and @berkaysynnada
)?; | ||
|
||
// The analysis will return better bounds thanks to the column statistics. | ||
// TODO: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the TODO? Maybe we can say here "The analysis has concluded that id must be between 5000 (due to the predicate) and 10,000 (due to the statistics)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover, will remove. For context I added a TODO
to show the case of unsupported boundary analysis on OR
conjunctions but I feel it might be best to make and
and or
examples in a separate change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S @alamb feel free to merge if this up to it; I am making a separate PR (easier for review) to add concrete examples for AND/OR
conjunctions with documentation in the library guide. Thanks !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @clflushopt, that's a really nice example, and I hope it will be even more meaningful after #14699 -- for example we can explicitly show the user the distribution is uniform, therefore selectivity is between 0.5 - 0.6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it -- thanks @clflushopt and @berkaysynnada
Which issue does this PR close?
Rationale for this change
The goal of this change is to add an example to explain data flow during boundary analysis this is mostly to help us get a better intuition for how we should proceed next for #4158 and #4159.
What changes are included in this PR?
In
datafusion-examples/examples/expr_api.rs
there is now a new function that explains data flow during boundary analysis and selectivity calculation. To allow contributors to better understand the implementation as it exists, documented in the design doc of #3929, I used the same demonstrating example.Are these changes tested?
The change re-uses parts of the API that is already covered by existing tests.
Are there any user-facing changes?
Yes, this introduces a new example.