Handle constraint mapping failures in data refresh #4916
Labels
💻 aspect: code
Concerns the software code in the repository
✨ goal: improvement
Improvement to an existing user-facing feature
🟩 priority: low
Low priority and doesn't need to be rushed
🧱 stack: catalog
Related to the catalog and Airflow DAGs
Problem
In the data refresh
remap_table_constraints
step, we drop constraints from the existing media table and then apply them to the temp table before promotion. However, if anything goes wrong when applying constraints or promoting the new tables/indices, we could end up in an extended situation where the live tables are missing constraints. Moreover, if another data refresh is run from the beginning, it will attempt to copy constraints from the constraint-less table.The sql to drop/remap constraints is also all applied in a single task per table, meaning it is not idempotent and cannot be rerun if the task fails partway through without manually cleaning up.
Description
One very simple approach would be to hard code the constraints that should be applied, rather than trying to generate ALTER TABLE statements based on the existing implementation. Updates to the production DB (if we manually added a new constraint in prod) would not be automatically persisted in the next data refresh, but this is perhaps a good thing: it requires the addition of constraints to be reflected in code and go through an approval process. This mirrors what we do with the elasticsearch configuration.
We could do the same thing for the indices as well. This would also have the benefit of reducing some code complexity!
Additional context
#4833 (comment)
The text was updated successfully, but these errors were encountered: