Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing organization join in jobs ingestion to improve performance #1965

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

eiffel777
Copy link
Contributor

The join to mod_hpcdb.hpcdb_organizations is not needed in the this query and causes MySQL to make a bad query plan that takes much longer that it should. Removing the join let's MySQL use the intended, and better, query plan changing the actions runtime from 15-20 minutes to under a minute when testing on metrics-dev. Below are the two query plans.

Without organization table join

table type rows filtered Extra
jti index 88128 100.0 Using index
j eq_ref 1 100.0 Using where
req ref 1 100.0
pi_map ref 1 100.0 Using index
alloc eq_ref 1 100.0
p eq_ref 1 100.0
pi eq_ref 1 100.0
sa ref 1 100.0 Using where
res eq_ref 1 100.0
ares eq_ref 1 100.0 Using index
innerj ref 1 100.0 Using where; Using index

With organization table join

table type rows filtered Extra
o index 1 100.0 Using index
res ref 10 100.0
j ref 5143 100.0 Using where
req ref 1 100.0
pi_map ref 1 100.0 Using index
alloc eq_ref 1 100.0
p eq_ref 1 100.0
pi eq_ref 1 100.0
sa ref 1 100.0 Using where
jti eq_ref 1 100.0 Using index
ares eq_ref 1 100.0 Using index
innerj ref 1 100.0 Using where; Using index

With the organization join, the query is rewritten to select from mod_hpcdb.hpcdb_organizations first and then from mod_hpcdb.hpcdb_jobs which joins to mod_hpcdb.hpcdb_jobs_to_ingest. I think the rewritting to join mod_hpcdb.hpcdb_jobs to mod_hpcdb.hpcdb_jobs_to_ingest is the cause of the performance problem since I think it is going to try to join all rows in mod_hpcdb.hpcdb_jobs to a row in mod_hpcdb.hpcdb_jobs_to_ingest instead of the other way around.

The organization value, organization_id, from the mod_hpcdb.resources table is now used.

Tests performed

Tested in docker and on metrics-dev

Checklist:

  • The pull request description is suitable for a Changelog entry
  • The milestone is set correctly on the pull request
  • The appropriate labels have been added to the pull request

@eiffel777 eiffel777 added enhancement Enhancement of the functionality of an existing feature Category:ETL Extract Transform Load labels Jan 27, 2025
@eiffel777 eiffel777 added this to the 11.5.0 milestone Jan 27, 2025
@eiffel777 eiffel777 self-assigned this Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category:ETL Extract Transform Load enhancement Enhancement of the functionality of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant