Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is softDeleted clause might be able to run in 1/3 the time #4322

Closed

Conversation

tonytw1
Copy link
Contributor

@tonytw1 tonytw1 commented Aug 23, 2024

Raising in case anyone want's to try this on a real, large index.

Without blindly believing profilers, the profiler output suggests that a 2 way Boolean query might be able to replaced with a more targeted single field lookup.

When examined in the profile output, the exists softDeletedMetadata clause seems to get expanded into this: "description" : "-ConstantScore(DocValuesFieldExistsQuery [field=softDeletedMetadata.deletedBy] DocValuesFieldExistsQuery [field=softDeletedMetadata.deleteTime]) #*:*",

This probably means if any of the known indexed fields of softDeletedMetadata exist then so does softDeletedMetadata. If we assume that softDeletedMetadata.deletedTime exists for all softDeletedMetadata records then the profiled time_in_nanos drops to 1/3 of the original value.

Hard to show real world results in a dev machine where the index fits in a single node's RAM, but in theory this might speed up the expensive count all and find first page of all queries on the initial page load.

Given that the initial page load time is single digit seconds, this might have a real user facing impact.

Specifying a specific field name is more fragile than the current clause so the improvement needs to be worth it.

What does this change?

How should a reviewer test this change?

How can success be measured?

Who should look at this?

Tested? Documented?

  • locally by committer
  • locally by Guardian reviewer
  • on the Guardian's TEST environment
  • relevant documentation added or amended (if needed)

When examined in the profile output, the exists `softDeletedMetadata` clause seems to get expanded into this:
`"description" : "-ConstantScore(DocValuesFieldExistsQuery [field=softDeletedMetadata.deletedBy] DocValuesFieldExistsQuery [field=softDeletedMetadata.deleteTime]) #*:*",`

This probably means if any of the indexed fields of `softDeletedMetadata` exist then so does `softDeletedMetadata`.
If we assume that `softDeletedMetadata.deletedTime` exists for all softDeletedMetadata records then the profiled `time_in_nanos` drops to 1/3 of the original value.

Hard to show real world results in a dev machine where the index fits in a single node's RAM, but in theory this might speed up the expensive count all and find first page of all queries on the initial page load.

Given that the initial page load time is single digit seconds, this might have a real user facing impact.

Specifying a specific field name is more fragile than the current clause to the improvement needs to be worth it.
@tonytw1 tonytw1 marked this pull request as ready for review August 23, 2024 19:14
@tonytw1 tonytw1 requested review from a team as code owners August 23, 2024 19:14
@tonytw1 tonytw1 closed this Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant