Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle spec.startupTaints on Node Restart #896

Open
jonathan-innis opened this issue Dec 19, 2023 · 6 comments
Open

How to handle spec.startupTaints on Node Restart #896

jonathan-innis opened this issue Dec 19, 2023 · 6 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@jonathan-innis
Copy link
Member

Description

Observed Behavior:

Certain DaemonSets require logic to run before pods are scheduled to make sure that the pod sandbox is properly configured to receive the pod and run the containers once the pod schedules. Cilium was the first example of this for Karpenter, where the CNI needed to be fully configured on the node and some startup processes needed to get squared away before pods could actually be bound.

If pods bound ahead of the startup logic running, they would begin to fail because the CNI wasn't fully set-up and the pod IP assignment wasn't ready. Thus, Karpenter implemented a spec.startupTaints field in its NodeClaims to ensure that pods do not schedule to nodes until the nodes are ready to receive them. DaemonSets are responsible for pulling off the startupTaints as the various startup processes complete.

This works fine on initial node boot; however, there was an issue opened in the aws/karpenter-provider-aws repo (aws/karpenter-provider-aws#5293) that indicated that when a node restarts after the kubelet has joined and this process has already occurred, the pods will begin to fail because there will be no such ordering on node restart.

This is a difficult problem to solve in the context of just Karpenter, since Karpenter would have to be node restart aware (a difficult thing to know from just looking at the apiserver for details) since the kubelet ping is heartbeat-based. Realistically, this seems like a behavior change that we should explore in the upstream project. Most notably, what's the expected behavior when a node restarts and all of the processes on it restart and no longer have any ordering mechanism?

Expected Behavior:

Restart of nodes should allow for the same ordering mechanism that was offered on initial join. Since pods are disrupted anyways under the hood, maybe it's possible to just cleanup the pod bindings as part of the restart, evict the pods off, re-add the taints, etc.

I think we should do some exploring of the trade-offs here.

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@jonathan-innis jonathan-innis added the kind/bug Categorizes issue or PR as related to a bug. label Dec 19, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 18, 2024
@Bryce-Soghigian
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 19, 2024
@Bryce-Soghigian
Copy link
Member

Note on impact: I imagine this becoming problematic for AKS as well, because we have a component that will attempt to restart, reimage then redeploy the nodes if they are not ready for too long. See AKS Node Repair.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 17, 2024
@jmdeal
Copy link
Member

jmdeal commented Jun 17, 2024

/remove-lifecycle stale
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 17, 2024
@daimaxiaxie
Copy link
Contributor

It seems that karpenter.sh/unregistered also has this problem? What is the problem now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

6 participants