Support Cascade Delete When Removing Karpenter from my Cluster #1040

jonathan-innis · 2024-02-22T07:55:27Z

Description

What problem are you trying to solve?

I'd like to be able to configure cascading delete behavior for Karpenter so that I can set values on NodePool deletion or on CRD deletion that convey to Karpenter that I want a more expedited termination of my nodes rather than waiting for all nodes to fully drain.

Right now, it's possible for nodes to hang due to stuck pods or fully blocking PDBs due to our graceful drain logic. Because a NodePool deletion or CRD deletion causes all the nodes to gracefully drain, it's also possible for these deletion operations to hang, halting the whole process. Ideally, a user could send through something like --grace-period when they are deleting a resource and Karpenter could reason about how to pass that down to all resources that the deletion cascades to.

Minimally, we should allow CRD deletions to get unblocked so that cluster operators can uninstall Karpenter from clusters without being blocked by graceful node drains that may hang.

An initial implementation of this was tried here #466 and there was some discussion in the community about enabling the ability to pass gracePeriod through to CRs in the same way that you can pass them through to pods today to affect the deletionTimestamp for a CR, allowing controller authors to build custom logic around this gracePeriod concept.

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

sftim · 2024-02-26T17:11:47Z

enabling the ability to pass gracePeriod through to CRs in the same way that you can pass them through to pods today to affect the deletionTimestamp for a CR

Building a coalition of supporters for this idea is effort, but may pay off really well.

k8s-triage-robot · 2024-05-26T17:41:19Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-06-25T17:55:33Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-07-25T18:30:33Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-07-25T18:30:38Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jonathan-innis · 2024-08-01T21:49:25Z

/reopen

k8s-ci-robot · 2024-08-01T21:49:29Z

@jonathan-innis: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

jonathan-innis · 2024-08-01T21:52:07Z

/remove-lifecycle rotten

jonathan-innis · 2024-08-01T21:52:19Z

/triage accepted

jonathan-innis · 2024-08-01T22:06:12Z

Discussed this in WG today: The consensus was that folks, in general, still want the ability to have the graceful termination of their nodes -- so they don't want Karpenter to always do a forceful termination of all the nodes on their behalf. There are currently workarounds with the TerminationGracePeriod implementation that would allow users to start the teardown of Karpenter's CRDs, have the NodeClaims start to be terminated, and then have a user or automation annotate all of the nodes with the karpenter.sh/nodeclaim-termination-timestamp to mark the time that the NodeClaim has to be removed by.

In the case that you want forceful termination, you could mark the timestamp to be the current time and then everything should start forcefully removing itself, with the instances that were launched by Karpneter torn down as well.

jonathan-innis added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 22, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 26, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 25, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 25, 2024

k8s-ci-robot reopened this Aug 1, 2024

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 1, 2024

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 1, 2024

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Cascade Delete When Removing Karpenter from my Cluster #1040

Support Cascade Delete When Removing Karpenter from my Cluster #1040

jonathan-innis commented Feb 22, 2024

sftim commented Feb 26, 2024

k8s-triage-robot commented May 26, 2024

k8s-triage-robot commented Jun 25, 2024

k8s-triage-robot commented Jul 25, 2024

k8s-ci-robot commented Jul 25, 2024

jonathan-innis commented Aug 1, 2024

k8s-ci-robot commented Aug 1, 2024

jonathan-innis commented Aug 1, 2024

jonathan-innis commented Aug 1, 2024

jonathan-innis commented Aug 1, 2024

Support Cascade Delete When Removing Karpenter from my Cluster #1040

Support Cascade Delete When Removing Karpenter from my Cluster #1040

Comments

jonathan-innis commented Feb 22, 2024

Description

sftim commented Feb 26, 2024

k8s-triage-robot commented May 26, 2024

k8s-triage-robot commented Jun 25, 2024

k8s-triage-robot commented Jul 25, 2024

k8s-ci-robot commented Jul 25, 2024

jonathan-innis commented Aug 1, 2024

k8s-ci-robot commented Aug 1, 2024

jonathan-innis commented Aug 1, 2024

jonathan-innis commented Aug 1, 2024

jonathan-innis commented Aug 1, 2024