Skip to content

raft: ignore setting the lead field from a MsgDeFortify at current term #142997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 18, 2025

Conversation

iskettaneh
Copy link
Contributor

When receiving a MsgDeFortifyLeader at the same term as we are, we
should not set the lead field to the sender. Mainly for two reasons:

  1. By definition, a MsgDeFortifyLeader is sent by an ex-leader until it
    hears of a new committed term. If we forgot the leader at the current
    term, we shouldn't remember it since we are using the fact that
    lead==None to indicate that this replica has been leaderless for
    some time. Read the leaderlessWatcher for more details.

  2. This could lead to a situation where no replica can win an election
    as it could require votes from some replicas that still know who think
    they know leader is (due to MsgDefortifyLeader), and that have recently
    campaigned and lost, which reset the electionElapsed to 0. Meaning that
    this replica is in a heartbeat lease, and will reject MsgVotes.

Fixes: #142994

Release note: None

@iskettaneh iskettaneh requested a review from a team as a code owner March 17, 2025 18:03
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@iskettaneh iskettaneh requested a review from miraradeva March 17, 2025 18:03
Copy link
Contributor

@miraradeva miraradeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @iskettaneh)


-- commits line 25 at r2:
nit: "that still know who think they know ..." -> "that still know who the leader is"


-- commits line 27 at r2:
Maybe say "lost for a different reason (e.g. not up-to-date log)".

This commit adds a test that reproduces a liveness issue that we
currently have with leader leases where no peer can successfully
campaign after a temporary loss of quorum.

References: cockroachdb#142994

Release note: None
When receiving a MsgDeFortifyLeader at the same term as we are, we
should not set the lead field to the sender. Mainly for two reasons:

1) By definition, a MsgDeFortifyLeader is sent by an ex-leader until it
hears of a new committed term. If we forgot the leader at the current
term, we shouldn't remember it since we are using the fact that
lead==None to indicate that this replica has been leaderless for
some time. Read the leaderlessWatcher for more details.

2) This could lead to a situation where no replica can win an election
as it could require votes from some replicas that still know who know
the leader is (due to MsgDefortifyLeader), and that have recently
campaigned and lost due to not having the most up-to-date log, which
reset the electionElapsed to 0. Meaning that this replica is in a
heartbeat lease, and will reject MsgVotes.

Fixes: cockroachdb#142994

Release note: None
@iskettaneh
Copy link
Contributor Author

TFTR!

bors r+

@craig craig bot merged commit 8ef4468 into cockroachdb:master Mar 18, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

raft: liveness not guaranteed after temporary loss of quorum with leaderLeases
3 participants