Custom Deprovisioning Trigger Mechanism #688

sidewinder12s · 2023-07-07T18:39:23Z

Description

Allow users to trigger node Drift

What problem are you trying to solve?

In Karpenter pre-v0.28, I had started using the karpenter drift annotation karpenter.sh/voluntary-disruption=drifted as a way to force nodes to get replaced in an orderly fashion when I changed configuration that was not supported by Karpenters Drift detection.

In v0.28 this was removed and now the annotation is simply removed by Karpenter.

I found the ability to trigger drift useful in testing and in filling in the gaps in Drift support. I'd also assume long term, there may be corner cases users would want to trigger replacement on that drift cannot detect or detect easily.

Perhaps just another annotation indicating user requested drift so that Karpenter can replace nodes in an orderly manner and while respecting deprovisioning controls.

How important is this feature to you?

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

njtran · 2023-07-10T20:53:51Z

While you could just do kubectl delete node, it sounds like you want your own custom way to mark nodes for Karpenter to eventually deprovision them. We discussed doing this when migrating all deprovisioning into the deprovisioning controller as it is now, but we haven't seen the signal to implement something like this yet.

In the near future, we'll release some additional work needed for the full machine migration, which moves these disruption decisions to the Machine Status conditions here: #319.

In all, I think it's reasonable to allow a custom controller to annotate nodes as needing deprovisioning, rather than letting users mess with the Expiration/Drift annotations.

May I ask what kind of custom signals you use to choose nodes as being deprovisioned? Is there a gap in the deprovisioning logic that we need to implement, or is it simply just a matter of filling in the gaps where Drift isn't implemented yet?

If you're also willing to design and implement this (both should be relatively simple), I'm happy to walk through this with you and get you ramped up on it.

sidewinder12s · 2023-07-10T22:31:28Z

I mean, right now there are a ton of gaps. It was quite useful that I could just annotate all at once and Karpenter would do a serial replacement to minimize disruption in the cluster. It generally hasn't been clear to me how disruptive it'd be to just delete a pile of nodes wholesale.

Long term, I was sorta expecting there are always going to be gaps (is karpenter going to be able to drift on userdata changes for example?).

My current migration is dockerd -> containerd which is not drifted.

I don't think I have slack probably this quarter to get into this.

njtran · 2023-07-12T00:51:47Z

Long term, I was sorta expecting there are always going to be gaps (is karpenter going to be able to drift on userdata changes for example?).

Yeah we should be able to drift for all provisioning configurations in the future. This work is in-flight. Documentation here, and first PR of a couple here

njtran · 2023-08-14T17:41:01Z

I updated the title to be more general based off our discussion.

This could look like users adding an annotation/label/taint for Karpenter to discover and then implement it as another deprovisioner in the Deprovisioning loop.

sidewinder12s · 2023-08-14T18:45:27Z

One workflow that has come up that might be applicable for this is how to eventually force all pods that are using karpenter.sh/do-not-evict: "true" to get consolidated/drifted in a timely manner.

We'd found that the nodes end up getting left around since Karpenter is also not cordoning the nodes that these pods are on to allow them to eventually drain out of the cluster.

Another thought though would be to modify how this annotation is treated if maintenance window type support was added to Karpenter (ref: #753). I think at least operationally allowing a maintenance window to force node replacements may be desirable by a lot of organizations.

dblackdblack · 2023-09-19T15:31:43Z

#688

It generally hasn't been clear to me how disruptive it'd be to just delete a pile of nodes wholesale.

Exactly this. If this is actually a safe operation, I'd be happy to do that, but it just isn't obvious right now whether kubectl delete no is "safe" when I have a routine need to replace all nodes. As an example, there was just a 12h network problem in us-west-2 and I want to safely replace all the nodes in the cluster just to be safe.

If kubectl delete no is safe, could you update the docs? If it isn't safe, please also update the docs 😄

This is essentially the equivalent of an ASG's instance refresh feature that's notably missing since I switched to karpenter.

yuvalavidor · 2023-09-28T16:11:20Z

One workflow that has come up that might be applicable for this is how to eventually force all pods that are using karpenter.sh/do-not-evict: "true" to get consolidated/drifted in a timely manner.

I think that the obvious use case here is a cluster upgrade where you want to upgrade your nodes.
kubectl delete node is a way of doing that, but i would like to see a way that spins up a new node with the same size before the node is deleted to reduce the time for pods to be scheduled, and in a statefulset its even more time consuming

and also, im actually asking, what would be the best way of doing so in that situation right now (0.30.0)

sidewinder12s · 2023-09-29T17:56:11Z

One workflow that has come up that might be applicable for this is how to eventually force all pods that are using karpenter.sh/do-not-evict: "true" to get consolidated/drifted in a timely manner.

I think that the obvious use case here is a cluster upgrade where you want to upgrade your nodes. kubectl delete node is a way of doing that, but i would like to see a way that spins up a new node with the same size before the node is deleted to reduce the time for pods to be scheduled, and in a statefulset its even more time consuming

and also, im actually asking, what would be the best way of doing so in that situation right now (0.30.0)

I haven't really found any better way to deal with it than draining or deleting the node. I actually am not sure if with Karpenter now on delete would launch a new node before draining it.

garvinp-stripe · 2023-10-12T04:06:22Z

Wanted to add more to this. Currently we have an internal system that marks node for killing for reasons that K8s has no idea about for SOX and/or compliances or CVE related reasons. We are building a controller to read these and annotated our nodes for killing. We realize having 2 systems that kill nodes means it is difficult if not impossible to understand termination budget of the fleet so if we can get Karpenter to reap instead then we can maintain a realistic termination budget.

njtran · 2023-10-17T00:25:13Z

@garvinp-stripe or @sidewinder12s, let me know if you want to help implement this. I can help guide through this, and it should be fairly straightforward.

tasdikrahman · 2023-11-02T22:54:58Z

/assign

njtran · 2023-11-03T22:16:20Z

Posting thoughts here for you on how to approach this:

Design

How should we allow users to signal a node being terminated?
- Should we use an annotation or a status condition? We mark nodeClaims with status conditions to indicate when they're expired/drifted. Maybe we could do the same? But that's tougher to do from a user's perspective. In my eyes, annotation on the node/NodeClaim may be the easiest and make the most sense.
Should this have any special disruption behavior from what's listed here in the docs?

Implementation
Currently, we iterate through the disruption, one method at a time.

There's a Method interface() that consists of

ShouldDisrupt(): a func used to filter if a node should be disrupted
ComputeCommand(): returns a Command, consisting of a list of commands, and a list of replacements. this should consist of a scheduling simulation that determines if we need to spin up a replacement.
Type(): just a string representation for metrics/logs
ConsolidationType(): used for Consolidations, can just ignore.

jeesmon · 2023-11-14T13:59:55Z

@tasdikrahman Any update on this? We are looking for a similar feature to bulk recycle nodes on-demand. Thanks!

garvinp-stripe · 2023-11-15T19:31:15Z

Sorry @njtran I was out of town for a bit. Let me check with my team to see if I can spend some time around this

tasdikrahman · 2023-11-16T13:24:49Z

Hey folks sorry for the delay, I am starting work on this, this week. Will keep you posted here.

jonathan-innis · 2023-11-25T04:14:13Z

@tasdikrahman Feel free to reach out if you need any assistance here. I know that @njtran is on PTO for a bit so he may be slower to respond.

garvinp-stripe · 2024-06-28T17:43:41Z

@njtran With forceful expiration coming back. Do you think we can simply mark nodes as "expired" for this scenario?

jonathan-innis · 2024-08-05T17:10:27Z

With forceful expiration coming back. Do you think we can simply mark nodes as "expired" for this scenario

Seems like this would basically have the same effect as just doing a kubectl delete at this point. The other thing is that I imagine that this request to disrupt is for a subset of nodes owned by a NodePool, not for the whole thing.

I'm not sure that there's a good workaround for this problem right now other than doing something hacky like messing with Karpenter's drift annotation to make it different than the one that is attached to the NodePool or NodeClass. Still seems reasonable to me to support something like an annotation on the NodeClaim/Node that would trigger a status condition and the disruption controller to mark the node as eligible for removal.

sidewinder12s added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 7, 2023

njtran added needs-design deprovisioning Issues related to node deprovisioning labels Jul 10, 2023

njtran mentioned this issue Jul 14, 2023

How should I be forcing an update of all my nodes? aws/karpenter-provider-aws#4256

Closed

njtran added the v1.x Issues prioritized for post-1.0 label Aug 9, 2023

njtran changed the title ~~Allow users to trigger node drift~~ Custom Deprovisioning Trigger Mechanism Aug 14, 2023

njtran added the help-wanted label Aug 14, 2023

njtran mentioned this issue Oct 2, 2023

karpenter Not Pre-spinning any replacement nodes when draining and deleting the nodes manually. aws/karpenter-provider-aws#4720

Closed

njtran transferred this issue from aws/karpenter-provider-aws Nov 2, 2023

njtran assigned tasdikrahman Nov 2, 2023

k8s-ci-robot added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed help-wanted labels Nov 22, 2023

tasdikrahman mentioned this issue Dec 7, 2023

[WIP]feat: Add UserAnnotedExpiration as a new disruption method #845

Closed

garvinp-stripe mentioned this issue Feb 16, 2024

Mega Issue: Manual node provisioning #749

Open

njtran unassigned tasdikrahman Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Deprovisioning Trigger Mechanism #688

Custom Deprovisioning Trigger Mechanism #688

sidewinder12s commented Jul 7, 2023 •

edited

Loading

njtran commented Jul 10, 2023

sidewinder12s commented Jul 10, 2023

njtran commented Jul 12, 2023 •

edited

Loading

njtran commented Aug 14, 2023

sidewinder12s commented Aug 14, 2023

dblackdblack commented Sep 19, 2023 •

edited

Loading

yuvalavidor commented Sep 28, 2023

sidewinder12s commented Sep 29, 2023

garvinp-stripe commented Oct 12, 2023

njtran commented Oct 17, 2023

tasdikrahman commented Nov 2, 2023

njtran commented Nov 3, 2023

jeesmon commented Nov 14, 2023

garvinp-stripe commented Nov 15, 2023

tasdikrahman commented Nov 16, 2023

jonathan-innis commented Nov 25, 2023

garvinp-stripe commented Jun 28, 2024

jonathan-innis commented Aug 5, 2024

Custom Deprovisioning Trigger Mechanism #688

Custom Deprovisioning Trigger Mechanism #688

Comments

sidewinder12s commented Jul 7, 2023 • edited Loading

Description

njtran commented Jul 10, 2023

sidewinder12s commented Jul 10, 2023

njtran commented Jul 12, 2023 • edited Loading

njtran commented Aug 14, 2023

sidewinder12s commented Aug 14, 2023

dblackdblack commented Sep 19, 2023 • edited Loading

yuvalavidor commented Sep 28, 2023

sidewinder12s commented Sep 29, 2023

garvinp-stripe commented Oct 12, 2023

njtran commented Oct 17, 2023

tasdikrahman commented Nov 2, 2023

njtran commented Nov 3, 2023

jeesmon commented Nov 14, 2023

garvinp-stripe commented Nov 15, 2023

tasdikrahman commented Nov 16, 2023

jonathan-innis commented Nov 25, 2023

garvinp-stripe commented Jun 28, 2024

jonathan-innis commented Aug 5, 2024

sidewinder12s commented Jul 7, 2023 •

edited

Loading

njtran commented Jul 12, 2023 •

edited

Loading

dblackdblack commented Sep 19, 2023 •

edited

Loading