Fixing ordering issue in subquery for time column in GAPFILL based queries #15096
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current GAPFILL implementation expects the time column series to always be the first column in any subquery executed before the GAPFILL operation, which is not an ideal behaviour since it breaks perfectly valid SQLs like:
The current implementation of
findTimeBucketColumnIndex
simply looks for the GAPFILL expression in the relevant subquery and operates on the assumption that the time bucket column is positioned in the same index as GAPFILL when GAPFILL is stripped while execution. This assumption only works for queries where GAPFILL is present in the lowest subquery and is the first to be executed. It fails for queries where a subquery is executed before GAPFILL (AGGREGATE_GAPFILL and AGGREGATE_GAPFILL_AGGREGATE).This PR fixes this by recomputing the
timeBucketColumnIndex
for the aforementioned GAPFILL types using the broker response to identify the index of the time bucket column. It also patches the rest of the GAPFILL computation logic to remove the assumption that GAPFILL is the first column in the subquery.Testing
Successfully tested that the fix works in TIMESTAMP quickstart. Also added multiple unit tests to make sure each possible flow is working correctly with the fix.
(Note: current unit tests did not have queries that trigger CountGapfillProcessor and SumAvgGapfillProcessor and this PR adds tests for those query shapes as well).