-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle spec.startupTaints
on Node Restart
#896
Comments
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
Note on impact: I imagine this becoming problematic for AKS as well, because we have a component that will attempt to restart, reimage then redeploy the nodes if they are not ready for too long. See AKS Node Repair. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
It seems that |
Description
Observed Behavior:
Certain DaemonSets require logic to run before pods are scheduled to make sure that the pod sandbox is properly configured to receive the pod and run the containers once the pod schedules. Cilium was the first example of this for Karpenter, where the CNI needed to be fully configured on the node and some startup processes needed to get squared away before pods could actually be bound.
If pods bound ahead of the startup logic running, they would begin to fail because the CNI wasn't fully set-up and the pod IP assignment wasn't ready. Thus, Karpenter implemented a
spec.startupTaints
field in its NodeClaims to ensure that pods do not schedule to nodes until the nodes are ready to receive them. DaemonSets are responsible for pulling off the startupTaints as the various startup processes complete.This works fine on initial node boot; however, there was an issue opened in the
aws/karpenter-provider-aws
repo (aws/karpenter-provider-aws#5293) that indicated that when a node restarts after the kubelet has joined and this process has already occurred, the pods will begin to fail because there will be no such ordering on node restart.This is a difficult problem to solve in the context of just Karpenter, since Karpenter would have to be node restart aware (a difficult thing to know from just looking at the apiserver for details) since the kubelet ping is heartbeat-based. Realistically, this seems like a behavior change that we should explore in the upstream project. Most notably, what's the expected behavior when a node restarts and all of the processes on it restart and no longer have any ordering mechanism?
Expected Behavior:
Restart of nodes should allow for the same ordering mechanism that was offered on initial join. Since pods are disrupted anyways under the hood, maybe it's possible to just cleanup the pod bindings as part of the restart, evict the pods off, re-add the taints, etc.
I think we should do some exploring of the trade-offs here.
The text was updated successfully, but these errors were encountered: