Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Improving cost calculations and cost based optimizations #3929

Open
2 of 5 tasks
isidentical opened this issue Oct 22, 2022 · 7 comments
Open
2 of 5 tasks

[EPIC] Improving cost calculations and cost based optimizations #3929

isidentical opened this issue Oct 22, 2022 · 7 comments
Labels
enhancement New feature or request

Comments

@isidentical
Copy link
Contributor

isidentical commented Oct 22, 2022

Design document: https://docs.google.com/document/d/1M4mmV7KA1LSj-D-WJA338B4ydlm-8A8D5OPuDE5_SD4/edit

This is a meta issue for improving cost calculations and cost-based optimizations in DataFusion. We already have some statistics collected (mainly from the table sources) and there are estimations for statistics by some of the execution plan nodes, and the overall idea is to improve these as well as possible CBOs.

Main Goals

  • Have enough statistics to start nested join optimizations (Implement nested join optimization #3843). This involves being able to estimate the weight of a join side, and do global re-ordering between join sides to minimize the overall cost of parent joins by reducing the output as much as possible at the bottom levels.
  • Provide a more reliable static analysis phase for physical execution operators (so that range based pruning/predicate pruning can leverage the existing infrastructure on their implementations)
  • What else?

Work in Progress

Planned

Future

  • Support for histograms, so better value distribution when working with cardinality estimations / filter selectivity. Currently, none of the providers we use can directly pass it to us, so we either have to take a peek at the data or only expose the API for other services (like ballista) which can actually collect it and pass to us.

P.S.: feel free to update the text directly or let me know (and I can update it myself)

@isidentical isidentical added the enhancement New feature or request label Oct 22, 2022
@isidentical
Copy link
Contributor Author

@alamb @Dandandan @mingmwang I've created the meta/epic issue as we discussed

@alamb
Copy link
Contributor

alamb commented Oct 25, 2022

I believe the next step is some sort of design document.

@isidentical
Copy link
Contributor Author

I'd be happy to start one, and if anyone is interested I can also give write access (shoot me your google emails at [email protected]).

@Dandandan
Copy link
Contributor

Maybe you can share the doc publicly so anyone can do suggestions?

@isidentical
Copy link
Contributor Author

It should be publicly accessible now: https://docs.google.com/document/d/1M4mmV7KA1LSj-D-WJA338B4ydlm-8A8D5OPuDE5_SD4/ (also pinning this to the issue)

It is an overall discovery of the stuff we are doing right now and how they can actually help us in the future (as well as some possible points) but it is in a very early stage. I'd be thrilled to hear about what you are thinking as well as potentially other unexplored areas).

@alamb
Copy link
Contributor

alamb commented Oct 26, 2022

I plan to review the doc carefully tomorrow ❤️

@isidentical
Copy link
Contributor Author

isidentical commented Oct 26, 2022

Thanks @alamb! I'll also try to talk a bit more about it with real-world examples in tomorrow's meeting from scratch (if we would have the time for that in this meetup, and if I can actually make it there), just in case if anyone else here is planning to attend.

@alamb alamb changed the title [META] Improving cost calculations and cost based optimizations Epic: Improving cost calculations and cost based optimizations Nov 15, 2023
@alamb alamb changed the title Epic: Improving cost calculations and cost based optimizations [EPIC] Improving cost calculations and cost based optimizations Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants