How to handle `spec.startupTaints` on Node Restart #896

jonathan-innis · 2023-12-19T21:00:17Z

Description

Observed Behavior:

Certain DaemonSets require logic to run before pods are scheduled to make sure that the pod sandbox is properly configured to receive the pod and run the containers once the pod schedules. Cilium was the first example of this for Karpenter, where the CNI needed to be fully configured on the node and some startup processes needed to get squared away before pods could actually be bound.

If pods bound ahead of the startup logic running, they would begin to fail because the CNI wasn't fully set-up and the pod IP assignment wasn't ready. Thus, Karpenter implemented a spec.startupTaints field in its NodeClaims to ensure that pods do not schedule to nodes until the nodes are ready to receive them. DaemonSets are responsible for pulling off the startupTaints as the various startup processes complete.

This works fine on initial node boot; however, there was an issue opened in the aws/karpenter-provider-aws repo (aws/karpenter-provider-aws#5293) that indicated that when a node restarts after the kubelet has joined and this process has already occurred, the pods will begin to fail because there will be no such ordering on node restart.

This is a difficult problem to solve in the context of just Karpenter, since Karpenter would have to be node restart aware (a difficult thing to know from just looking at the apiserver for details) since the kubelet ping is heartbeat-based. Realistically, this seems like a behavior change that we should explore in the upstream project. Most notably, what's the expected behavior when a node restarts and all of the processes on it restart and no longer have any ordering mechanism?

Expected Behavior:

Restart of nodes should allow for the same ordering mechanism that was offered on initial join. Since pods are disrupted anyways under the hood, maybe it's possible to just cleanup the pod bindings as part of the restart, evict the pods off, re-add the taints, etc.

I think we should do some exploring of the trade-offs here.

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

k8s-triage-robot · 2024-03-18T21:49:58Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Bryce-Soghigian · 2024-03-19T05:25:16Z

/remove-lifecycle stale

Bryce-Soghigian · 2024-03-19T05:33:26Z

Note on impact: I imagine this becoming problematic for AKS as well, because we have a component that will attempt to restart, reimage then redeploy the nodes if they are not ready for too long. See AKS Node Repair.

k8s-triage-robot · 2024-06-17T06:13:46Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

jmdeal · 2024-06-17T06:49:45Z

/remove-lifecycle stale
/lifecycle frozen

daimaxiaxie · 2024-09-14T09:08:07Z

It seems that karpenter.sh/unregistered also has this problem? What is the problem now?

jonathan-innis added the kind/bug Categorizes issue or PR as related to a bug. label Dec 19, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 18, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 19, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 17, 2024

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle `spec.startupTaints` on Node Restart #896

How to handle `spec.startupTaints` on Node Restart #896

jonathan-innis commented Dec 19, 2023

k8s-triage-robot commented Mar 18, 2024

Bryce-Soghigian commented Mar 19, 2024

Bryce-Soghigian commented Mar 19, 2024

k8s-triage-robot commented Jun 17, 2024

jmdeal commented Jun 17, 2024

daimaxiaxie commented Sep 14, 2024

How to handle spec.startupTaints on Node Restart #896

How to handle spec.startupTaints on Node Restart #896

Comments

jonathan-innis commented Dec 19, 2023

Description

k8s-triage-robot commented Mar 18, 2024

Bryce-Soghigian commented Mar 19, 2024

Bryce-Soghigian commented Mar 19, 2024

k8s-triage-robot commented Jun 17, 2024

jmdeal commented Jun 17, 2024

daimaxiaxie commented Sep 14, 2024

How to handle `spec.startupTaints` on Node Restart #896

How to handle `spec.startupTaints` on Node Restart #896