-
Notifications
You must be signed in to change notification settings - Fork 114
[PROPOSAL]: Changes to support for multiple index types #342
Comments
Seems IndexTypeHandler is not necessary as we should create new rules for Non-covering index / bloom filter index. |
thanks @sezruby , it still is a good choice, for e.g. filterindexrule can directly use bloom filter and non-covering index. It's generally a better abstraction I think to separate out index type logic from rules from now on moving forward. |
Ok these are the points:
And lastly how can we apply both a covering index and a BF index using one Filter Rule? |
+1 to what @sezruby is saying. I'm in favor of separating out the rules per index type since the logic might be totally different. Ideally, we would have:
The important thing we should consider is ensuring the duplication is minimum to the extent possible. @apoorvedave1 @thugsatbay What are your thoughts on this? |
yeah so if we do this design, i think rules will become simpler and clearer. I am not saying to not write new rules. I am saying if rules can be reused, we don't need to duplicate it if possible. One example is Filter rule. If join rule requires a different logic, we can write a new join rule. but filter rule doesn't require duplication.
We either implement this ranking logic, or we stick with creating different rules for each index type and hardcode their ordering.
hybrid scan logic gets extracted out of the filter rule into the index type handler. CoveringIndexHandler knows how to handle hybrid scan for covering index. Same for BF, NC
If you take a look at the design, what I have done is added an IndexTypeHandler for exactly this question. Given multiple index types, a ranker decides which index to pick. if it is a covering index, covering handler will update the plan. if it's a bf index, bf handler will update the plan. Here's the bottom line. Given a data source and two types of indexes which are eligible. E.g. a covering index, which was not updated for long, and a bf index, which is most recently updated. How do we decide which index to choose from? |
If we look at FilterIndexRule or JoinIndexRule. We are trying to find if there is a filter or join condition. Once it is done, we are trying to figure can we use covering index. And if they are multiple index we rank them to chose the best. The question is are we ever going to compare two different type of index. If no, then how we figure out which we prefer more based on type, staleness .. it becomes complex. If yes, then we need to do inside the rule (what metrics to use to compare for 2 different type of indexses is debatable) ? What I believe is that changing the For each Rule we would already know what type of index we can apply. Going through each eligible index and applying on the rule would give us a new For future cost optimization problem (as discussed with @apoorvedave1) which rule+index combination to chose first. Each Rule should expose an internal API telling which index it supports (). Also each rule should expose an internal API allowing to give preference order of selection of index. |
This is a great proposal. I would suggest to have an Epic for it where to include the work that needs to happen. From the top of my head it will require:
I'm sure I lost a lot of other work needed for this. |
@andrei-ionescu Yep! We have an epic tracking this work #157 (we are using ZenHub so we can see all the linked issues). The goal is to add a few index types so we can flesh up all the generality and change the design abstractions as needed. What I've requested @apoorvedave1 to do was to update this issue with a detailed break-down of all the steps. He will most likely get to this early next week. |
@rapoth, @apoorvedave1: For file skipping indexing please have a look on the XSkipper built by IBM. |
Closing old issues. Further discussions can continue in #441. |
Problem Statement
Code changes required to support multiple index types like bloom filter index and partition elimination index.
Background and Motivation
Why limit to covering indexes? Let's expand hyperspace to make it flexible for more index types.
Proposed Solution
Make changes to the existing design to allow for flexibility in adding more index types.
Design
Changes to Action classes
Applying Rules: Updated
Have multiple internal rules which work on specialized types of indexes. For e.g. CIJoinRule, CIFilterRule, BFFilterRule, PEFilterRule etc.
Generate final plans by applying all rules independently to the current plan.
(global ranker)
PartitionEliminationIndex Design
Extending the new index config defined in this design doc: #341
we can define the PartitionElimination non-covering index as below:
Using PartitionElimination Index
PartitionEliminationIndex is a reverse index from index columns and the data files which contain these values. These could be useful especially for point lookups and range queries.
Implementation
Refactoring Tasks
Tasks:
PartitionEliminationIndex specific tasks
PartitionEliminationFilterIndexRule
Creating PEIndex
Refreshing PEIndex
Optimizing PEIndex
Order of PRs:
Refactoring
BFIndex
a. CreateIndex
b. Supported Rules for this index type
PEIndex
a. CreateIndex (2d)
b. Supported Rules for this index type (2w)
Performance Implications (if applicable)
None
Alternate Design Options
The text was updated successfully, but these errors were encountered: