Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Deprovisioning Trigger Mechanism #688

Open
sidewinder12s opened this issue Jul 7, 2023 · 18 comments
Open

Custom Deprovisioning Trigger Mechanism #688

sidewinder12s opened this issue Jul 7, 2023 · 18 comments
Labels
deprovisioning Issues related to node deprovisioning help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. needs-design v1.x Issues prioritized for post-1.0

Comments

@sidewinder12s
Copy link

sidewinder12s commented Jul 7, 2023

Description

Allow users to trigger node Drift

What problem are you trying to solve?

In Karpenter pre-v0.28, I had started using the karpenter drift annotation karpenter.sh/voluntary-disruption=drifted as a way to force nodes to get replaced in an orderly fashion when I changed configuration that was not supported by Karpenters Drift detection.

In v0.28 this was removed and now the annotation is simply removed by Karpenter.

I found the ability to trigger drift useful in testing and in filling in the gaps in Drift support. I'd also assume long term, there may be corner cases users would want to trigger replacement on that drift cannot detect or detect easily.

Perhaps just another annotation indicating user requested drift so that Karpenter can replace nodes in an orderly manner and while respecting deprovisioning controls.

How important is this feature to you?

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@sidewinder12s sidewinder12s added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 7, 2023
@njtran
Copy link
Contributor

njtran commented Jul 10, 2023

While you could just do kubectl delete node, it sounds like you want your own custom way to mark nodes for Karpenter to eventually deprovision them. We discussed doing this when migrating all deprovisioning into the deprovisioning controller as it is now, but we haven't seen the signal to implement something like this yet.

In the near future, we'll release some additional work needed for the full machine migration, which moves these disruption decisions to the Machine Status conditions here: #319.

In all, I think it's reasonable to allow a custom controller to annotate nodes as needing deprovisioning, rather than letting users mess with the Expiration/Drift annotations.

May I ask what kind of custom signals you use to choose nodes as being deprovisioned? Is there a gap in the deprovisioning logic that we need to implement, or is it simply just a matter of filling in the gaps where Drift isn't implemented yet?

If you're also willing to design and implement this (both should be relatively simple), I'm happy to walk through this with you and get you ramped up on it.

@njtran njtran added needs-design deprovisioning Issues related to node deprovisioning labels Jul 10, 2023
@sidewinder12s
Copy link
Author

I mean, right now there are a ton of gaps. It was quite useful that I could just annotate all at once and Karpenter would do a serial replacement to minimize disruption in the cluster. It generally hasn't been clear to me how disruptive it'd be to just delete a pile of nodes wholesale.

Long term, I was sorta expecting there are always going to be gaps (is karpenter going to be able to drift on userdata changes for example?).

My current migration is dockerd -> containerd which is not drifted.

I don't think I have slack probably this quarter to get into this.

@njtran
Copy link
Contributor

njtran commented Jul 12, 2023

Long term, I was sorta expecting there are always going to be gaps (is karpenter going to be able to drift on userdata changes for example?).

Yeah we should be able to drift for all provisioning configurations in the future. This work is in-flight. Documentation here, and first PR of a couple here

@njtran njtran added the v1.x Issues prioritized for post-1.0 label Aug 9, 2023
@njtran njtran changed the title Allow users to trigger node drift Custom Deprovisioning Trigger Mechanism Aug 14, 2023
@njtran
Copy link
Contributor

njtran commented Aug 14, 2023

I updated the title to be more general based off our discussion.

This could look like users adding an annotation/label/taint for Karpenter to discover and then implement it as another deprovisioner in the Deprovisioning loop.

@sidewinder12s
Copy link
Author

One workflow that has come up that might be applicable for this is how to eventually force all pods that are using karpenter.sh/do-not-evict: "true" to get consolidated/drifted in a timely manner.

We'd found that the nodes end up getting left around since Karpenter is also not cordoning the nodes that these pods are on to allow them to eventually drain out of the cluster.

Another thought though would be to modify how this annotation is treated if maintenance window type support was added to Karpenter (ref: #753). I think at least operationally allowing a maintenance window to force node replacements may be desirable by a lot of organizations.

@dblackdblack
Copy link

dblackdblack commented Sep 19, 2023

#688

It generally hasn't been clear to me how disruptive it'd be to just delete a pile of nodes wholesale.

Exactly this. If this is actually a safe operation, I'd be happy to do that, but it just isn't obvious right now whether kubectl delete no is "safe" when I have a routine need to replace all nodes. As an example, there was just a 12h network problem in us-west-2 and I want to safely replace all the nodes in the cluster just to be safe.

If kubectl delete no is safe, could you update the docs? If it isn't safe, please also update the docs 😄

This is essentially the equivalent of an ASG's instance refresh feature that's notably missing since I switched to karpenter.

@yuvalavidor
Copy link

One workflow that has come up that might be applicable for this is how to eventually force all pods that are using karpenter.sh/do-not-evict: "true" to get consolidated/drifted in a timely manner.

I think that the obvious use case here is a cluster upgrade where you want to upgrade your nodes.
kubectl delete node is a way of doing that, but i would like to see a way that spins up a new node with the same size before the node is deleted to reduce the time for pods to be scheduled, and in a statefulset its even more time consuming

and also, im actually asking, what would be the best way of doing so in that situation right now (0.30.0)

@sidewinder12s
Copy link
Author

One workflow that has come up that might be applicable for this is how to eventually force all pods that are using karpenter.sh/do-not-evict: "true" to get consolidated/drifted in a timely manner.

I think that the obvious use case here is a cluster upgrade where you want to upgrade your nodes. kubectl delete node is a way of doing that, but i would like to see a way that spins up a new node with the same size before the node is deleted to reduce the time for pods to be scheduled, and in a statefulset its even more time consuming

and also, im actually asking, what would be the best way of doing so in that situation right now (0.30.0)

I haven't really found any better way to deal with it than draining or deleting the node. I actually am not sure if with Karpenter now on delete would launch a new node before draining it.

@garvinp-stripe
Copy link
Contributor

Wanted to add more to this. Currently we have an internal system that marks node for killing for reasons that K8s has no idea about for SOX and/or compliances or CVE related reasons. We are building a controller to read these and annotated our nodes for killing. We realize having 2 systems that kill nodes means it is difficult if not impossible to understand termination budget of the fleet so if we can get Karpenter to reap instead then we can maintain a realistic termination budget.

@njtran
Copy link
Contributor

njtran commented Oct 17, 2023

@garvinp-stripe or @sidewinder12s, let me know if you want to help implement this. I can help guide through this, and it should be fairly straightforward.

@njtran njtran transferred this issue from aws/karpenter-provider-aws Nov 2, 2023
@tasdikrahman
Copy link
Member

/assign

@njtran
Copy link
Contributor

njtran commented Nov 3, 2023

Posting thoughts here for you on how to approach this:

Design

  • How should we allow users to signal a node being terminated?
    • Should we use an annotation or a status condition? We mark nodeClaims with status conditions to indicate when they're expired/drifted. Maybe we could do the same? But that's tougher to do from a user's perspective. In my eyes, annotation on the node/NodeClaim may be the easiest and make the most sense.
  • Should this have any special disruption behavior from what's listed here in the docs?

Implementation
Currently, we iterate through the disruption, one method at a time.

There's a Method interface() that consists of

  • ShouldDisrupt(): a func used to filter if a node should be disrupted
  • ComputeCommand(): returns a Command, consisting of a list of commands, and a list of replacements. this should consist of a scheduling simulation that determines if we need to spin up a replacement.
  • Type(): just a string representation for metrics/logs
  • ConsolidationType(): used for Consolidations, can just ignore.

@jeesmon
Copy link

jeesmon commented Nov 14, 2023

@tasdikrahman Any update on this? We are looking for a similar feature to bulk recycle nodes on-demand. Thanks!

@garvinp-stripe
Copy link
Contributor

Sorry @njtran I was out of town for a bit. Let me check with my team to see if I can spend some time around this

@tasdikrahman
Copy link
Member

Hey folks sorry for the delay, I am starting work on this, this week. Will keep you posted here.

@k8s-ci-robot k8s-ci-robot added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed help-wanted labels Nov 22, 2023
@jonathan-innis
Copy link
Member

@tasdikrahman Feel free to reach out if you need any assistance here. I know that @njtran is on PTO for a bit so he may be slower to respond.

@garvinp-stripe
Copy link
Contributor

@njtran With forceful expiration coming back. Do you think we can simply mark nodes as "expired" for this scenario?

@jonathan-innis
Copy link
Member

With forceful expiration coming back. Do you think we can simply mark nodes as "expired" for this scenario

Seems like this would basically have the same effect as just doing a kubectl delete at this point. The other thing is that I imagine that this request to disrupt is for a subset of nodes owned by a NodePool, not for the whole thing.

I'm not sure that there's a good workaround for this problem right now other than doing something hacky like messing with Karpenter's drift annotation to make it different than the one that is attached to the NodePool or NodeClass. Still seems reasonable to me to support something like an annotation on the NodeClaim/Node that would trigger a status condition and the disruption controller to mark the node as eligible for removal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprovisioning Issues related to node deprovisioning help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. needs-design v1.x Issues prioritized for post-1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants