fix: Force Lease Expiration When Leader Exits #2379
+80
−11
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
/kind bug
What this PR does / why we need it:
Currently, when the leader exits (say, after receiving a
SIGINT
) the workers need to wait for its lease to expire before a leader is re-elected. This patch mimics the behaviour of the Go Client implementation of usingctx.Done()
: https://github.com/kubernetes/client-go/blob/1309f64d6648411b4a36a2f7fa84dd8df31884b6/tools/leaderelection/leaderelection.go#L265-L291. It captures theSIGINT
and forces the lease to exit by setting the expiration to a date in the past, and it also sets theacquire_time
to None to force a leader election.Issue Reproduction
As mentioned in the issue: leaderelection do not stop leading properly #2075, to reproduce this issue you can follow
leaderelection/example.py
. Run it on 2-3 nodes (or tmux screens) and once a leader is elected hitCtrl+C
to force the leader to exit. The workers then wait for the leader's lease to expire before a new leader is elected.Expected behavior
The leader exiting should trigger a leader election without having the workers wait for the lease to expire.
Which issue(s) this PR fixes:
Fixes #2075
Special notes for your reviewer:
This is still not a complete fix. It is definitely hacky at the moment and I would love any guidance here! Currently, the patch only handles
SIGINT
but a leader may exit for various reasons, and there should be a more elegant way of handling this. Probably using the thread context but I was not able to figure that out. Further, the implementation of theforce_expire_lease()
function is not elegant; you shouldn't need to setacquire_time
toNone
and settingexpiration
to the past is also a code smell in my opinion. This patch is a proof of concept because of this.I also had to change the imports to point to my definitions of
electionconfig.py
andleaderelectionrecord.py
for this to work and I'm sure there is a better way of handling this.If its more sensible to mark this PR a draft, I'm happy to do so!
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: