Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an example of boundary analysis simple expressions. #14688

Merged
merged 4 commits into from
Feb 17, 2025

Conversation

clflushopt
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

The goal of this change is to add an example to explain data flow during boundary analysis this is mostly to help us get a better intuition for how we should proceed next for #4158 and #4159.

What changes are included in this PR?

In datafusion-examples/examples/expr_api.rs there is now a new function that explains data flow during boundary analysis and selectivity calculation. To allow contributors to better understand the implementation as it exists, documented in the design doc of #3929, I used the same demonstrating example.

Are these changes tested?

The change re-uses parts of the API that is already covered by existing tests.

Are there any user-facing changes?

Yes, this introduces a new example.

The goal of this change is to add an example to explain data flow during
boundary analysis of AND and OR expressions.
@alamb alamb added the documentation Improvements or additions to documentation label Feb 16, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @clflushopt -- this is great. I think it also shows a bit how complex the API is currently to use. I can't wait to see it become easier to use over time ❤️

FYI @ozankabak @hiltontj and @berkaysynnada

)?;

// The analysis will return better bounds thanks to the column statistics.
// TODO:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the TODO? Maybe we can say here "The analysis has concluded that id must be between 5000 (due to the predicate) and 10,000 (due to the statistics)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover, will remove. For context I added a TODO to show the case of unsupported boundary analysis on OR conjunctions but I feel it might be best to make and and or examples in a separate change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S @alamb feel free to merge if this up to it; I am making a separate PR (easier for review) to add concrete examples for AND/OR conjunctions with documentation in the library guide. Thanks !

@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Feb 16, 2025
@clflushopt clflushopt changed the title feat(examples): Add an example of boundary analysis for AND/OR expressions Add an example of boundary analysis simple expressions. Feb 16, 2025
Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @clflushopt, that's a really nice example, and I hope it will be even more meaningful after #14699 -- for example we can explicitly show the user the distribution is uniform, therefore selectivity is between 0.5 - 0.6

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it -- thanks @clflushopt and @berkaysynnada

@alamb alamb merged commit ee2dc83 into apache:main Feb 17, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants