Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Force Lease Expiration When Leader Exits #2379

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

RaghavRoy145
Copy link

@RaghavRoy145 RaghavRoy145 commented Mar 30, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

Currently, when the leader exits (say, after receiving a SIGINT) the workers need to wait for its lease to expire before a leader is re-elected. This patch mimics the behaviour of the Go Client implementation of using ctx.Done(): https://github.com/kubernetes/client-go/blob/1309f64d6648411b4a36a2f7fa84dd8df31884b6/tools/leaderelection/leaderelection.go#L265-L291. It captures the SIGINT and forces the lease to exit by setting the expiration to a date in the past, and it also sets the acquire_time to None to force a leader election.

  • Issue Reproduction

    As mentioned in the issue: leaderelection do not stop leading properly #2075, to reproduce this issue you can follow leaderelection/example.py. Run it on 2-3 nodes (or tmux screens) and once a leader is elected hit Ctrl+C to force the leader to exit. The workers then wait for the leader's lease to expire before a new leader is elected.

  • Expected behavior

    The leader exiting should trigger a leader election without having the workers wait for the lease to expire.

Which issue(s) this PR fixes:

Fixes #2075

Special notes for your reviewer:

This is still not a complete fix. It is definitely hacky at the moment and I would love any guidance here! Currently, the patch only handles SIGINT but a leader may exit for various reasons, and there should be a more elegant way of handling this. Probably using the thread context but I was not able to figure that out. Further, the implementation of the force_expire_lease() function is not elegant; you shouldn't need to set acquire_time to None and setting expiration to the past is also a code smell in my opinion. This patch is a proof of concept because of this.

I also had to change the imports to point to my definitions of electionconfig.py and leaderelectionrecord.py for this to work and I'm sure there is a better way of handling this.

If its more sensible to mark this PR a draft, I'm happy to do so!

Does this PR introduce a user-facing change?

None

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

None

Currently, when the leader exits (say, after receiving a SIGINT),
the workers need to wait for its lease to expire before a leader
is re-elected. This patch mimics the behaviour of the Go Client implementation
of using ctx.Done() by capturing the SIGINT and forcing the expiration date to a
past date and also sets the acquire_time to None to start the leader election.
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. labels Mar 30, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: RaghavRoy145
Once this PR has been reviewed and has the lgtm label, please assign yliaog for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 30, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @RaghavRoy145!

It looks like this is your first PR to kubernetes-client/python 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-client/python has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 30, 2025
@RaghavRoy145
Copy link
Author

/assign @yliaog

@RaghavRoy145
Copy link
Author

/assign @yliaog

Oops, I was supposed to do that after the reviews 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

leaderelection do not stop leading properly
3 participants