Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PS-9328: Fix sporadic rpl.rpl_backup_locked_by_applier test failures (8.0 version) #5424

Merged
merged 1 commit into from
Sep 12, 2024

Conversation

dlenev
Copy link
Contributor

@dlenev dlenev commented Sep 12, 2024

Fixed sporadic rpl.rpl_backup_locked_by_applier test failures which were caused by race condition in the test case.

Logic of the test assumes that conditional debug sync point in Rpl_applier_reader::purge_applied_logs(), which is used in this test, is supposed to be activated only once, when replication thread reaches it for the first time. The test case waits for this event, runs some commands to check that backup lock can't be taken, and then resumes execution of replication thread. After that it disables this conditional sync point.

However, if replication thread manages to reach this sync point for the second time, after test has resumed its execution and right before conditional sync point is disabled, it will wait on sync point until timeout is reached. As result the fact that debug sync point was reached but not handled will be detected during post test case check and cause its failure.

We fix this problem by moving disabling conditional sync point before we resume replication thread execution when sync point is reached for the first time.

The fix is applied to both 8.0 and 8.4 trees, since they are both affected.

Fixed sporadic rpl.rpl_backup_locked_by_applier test failures which were
caused by race condition in the test case.

Logic of the test assumes that conditional debug sync point in
Rpl_applier_reader::purge_applied_logs(), which is used in this test,
is supposed to be activated only once, when replication thread reaches
it for the first time. The test case waits for this event, runs some
commands to check that backup lock can't be taken, and then resumes
execution of replication thread. After that it disables this conditional
sync point.

However, if replication thread manages to reach this sync point for the
second time, after test has resumed its execution and right before
conditional sync point is disabled, it will wait on sync point until
timeout is reached. As result the fact that debug sync point was reached
but not handled will be detected during post test case check and cause
its failure.

We fix this problem by moving disabling conditional sync point before
we resume replication thread execution when sync point is reached for
the first time.

The fix is applied to both 8.0 and 8.4 trees, since they are both
affected.
@dlenev
Copy link
Contributor Author

dlenev commented Sep 12, 2024

See @percona-ysorokin approval for identical 8.4 version: #5422

@dlenev dlenev merged commit f3c6c56 into percona:8.0 Sep 12, 2024
8 of 11 checks passed
@dlenev dlenev deleted the ps-8.0-rpl_backup_locked_by_applier-fix branch September 12, 2024 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant