Removing organization join in jobs ingestion to improve performance #1965
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The join to
mod_hpcdb.hpcdb_organizations
is not needed in the this query and causes MySQL to make a bad query plan that takes much longer that it should. Removing the join let's MySQL use the intended, and better, query plan changing the actions runtime from 15-20 minutes to under a minute when testing onmetrics-dev
. Below are the two query plans.Without organization table join
With organization table join
With the organization join, the query is rewritten to select from
mod_hpcdb.hpcdb_organizations
first and then frommod_hpcdb.hpcdb_jobs
which joins tomod_hpcdb.hpcdb_jobs_to_ingest
. I think the rewritting to joinmod_hpcdb.hpcdb_jobs
tomod_hpcdb.hpcdb_jobs_to_ingest
is the cause of the performance problem since I think it is going to try to join all rows inmod_hpcdb.hpcdb_jobs
to a row inmod_hpcdb.hpcdb_jobs_to_ingest
instead of the other way around.The organization value,
organization_id
, from themod_hpcdb.resources
table is now used.Tests performed
Tested in docker and on metrics-dev
Checklist: