Is softDeleted clause might be able to run in 1/3 the time #4322
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Raising in case anyone want's to try this on a real, large index.
Without blindly believing profilers, the profiler output suggests that a 2 way Boolean query might be able to replaced with a more targeted single field lookup.
When examined in the profile output, the exists
softDeletedMetadata
clause seems to get expanded into this:"description" : "-ConstantScore(DocValuesFieldExistsQuery [field=softDeletedMetadata.deletedBy] DocValuesFieldExistsQuery [field=softDeletedMetadata.deleteTime]) #*:*",
This probably means if any of the known indexed fields of
softDeletedMetadata
exist then so doessoftDeletedMetadata
. If we assume thatsoftDeletedMetadata.deletedTime
exists for all softDeletedMetadata records then the profiledtime_in_nanos
drops to 1/3 of the original value.Hard to show real world results in a dev machine where the index fits in a single node's RAM, but in theory this might speed up the expensive count all and find first page of all queries on the initial page load.
Given that the initial page load time is single digit seconds, this might have a real user facing impact.
Specifying a specific field name is more fragile than the current clause so the improvement needs to be worth it.
What does this change?
How should a reviewer test this change?
How can success be measured?
Who should look at this?
Tested? Documented?