Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move some common database import subroutines to Script::Utils #3200

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mwiencek
Copy link
Member

@mwiencek mwiencek commented Mar 8, 2024

Problem

  • We have some identical or very similar-looking subroutines duplicated between admin/MBImport.pl and admin/replication/ImportReplicationChanges: empty (to tell if a table is empty) and ImportTable (to import a table from a file). The two implementations of the latter function differ slightly.

  • For the incremental dumps (MusicBrainz::Script::Role::IncrementalDump), I'd like to reuse the ImportTable function to import a dbmirror2 packet into some temporary tables (to parse it).

    We currently parse the old packets by hand; I'm surprised it works at all (it probably doesn't in some cases) because I don't believe it fully or properly implement's parsing of PostgreSQL's COPY text format. dbmirror2 packets contain JSON, which may contain JSON escape sequences underneath COPY's text format escape sequences, and the mixing of these can be tricky to parse. It's much, much easier to let PostgreSQL do the parsing!

Solution

This just adds two shared subroutines, is_table_empty and copy_table_from_file to Script::Utils, and modifies admin/MBImport.pl and admin/replication/ImportReplicationChanges to use them.

Testing

We have existing automated tests that make heavy use of these scripts.

I did not test the --fix-broken-utf8 or --ignore-errors flags specifically.

admin/MBImport.pl and admin/replication/ImportReplicationChanges contained very
similar implementations of `ImportTable`, so it would be ideal to share them.
I'd also like to use the same functionality in a future commit (to load
dbmirror2 packets into temporary tables).

The implementations in these two files did diverge slightly. For one,
MBImport.pl's allowed fixing broken UTF-8 byte sequences. I'm not sure how
necessary that is in 2024, or what the historical reasons for adding it were,
but I kept the functionality behind a flag in `copy_table_from_file`.

MBImport.pl's also supported the flags `$delete_first` (to empty the table
before importing) and `$fProgress` (to control whether progress is shown).
I've basically kept all of MBImport.pl's code, with these features behind
`%opts` flags.

I kept the definitions of `ImportTable`, but they now call
`copy_table_from_file` internally. I couldn't replace all of the `ImportTable`
calls with direct calls to `copy_table_from_file`, because `ImportTable` also
updates statistics local to each file and has a different return value.
Copy link
Member

@reosarevok reosarevok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems sensible to me, but I'd prefer if we had a test for the extra flags if they wouldn't be too complex. Although it does kinda feel maybe we can just drop the utf8 flag, but maybe we should check with @mayhem first if he remembers the reasons. The comment mentioning automodsaccepted which I have never even seen suggests that this is ancient and the extra options might be unneeded.

@yvanzo yvanzo marked this pull request as draft May 28, 2024 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants