Leaked pods with topology blocks provisioning #640

ellistarn · 2023-10-26T17:12:36Z

Description

Observed Behavior:

2023-10-26T16:47:09.627Z	ERROR	controller.provisioner	creating scheduler, tracking topology counts, getting node ip-192-168-128-105.us-west-2.compute.internal, Node "ip-192-168-128-105.us-west-2.compute.internal" not found	{"commit": "2012cf9"}
2023-10-26T16:47:19.627Z	ERROR	controller.provisioner	creating scheduler, tracking topology counts, getting node ip-192-168-128-105.us-west-2.compute.internal, Node "ip-192-168-128-105.us-west-2.compute.internal" not found	{"commit": "2012cf9"}
2023-10-26T16:47:29.628Z	ERROR	controller.provisioner	creating scheduler, tracking topology counts, getting node ip-192-168-128-105.us-west-2.compute.internal, Node "ip-192-168-128-105.us-west-2.compute.internal" not found	{"commit": "2012cf9"}

Expected Behavior:

Provisioning should not block on pods awaiting garbage collection.

Reproduction Steps (Please include YAML):

Here's what's happening:

Karpenter looks at other pods during scheduling if pod topology, pod affinity, or pod antiaffinity is defined.
We retrieve the Node for those pods using pod.Spec.NodeName
However, if the Node does not exist, Karpenter will error out and try again
Karpenter attempts to drain pods from Nodes when they are terminated, however, pods that tolerate NodeSchedule/NoExecute cannot be drained, as they would immediately reschedule after being drained.
Karpenter then deletes the EC2 instance, and the corresponding Node object.
Any pods that were on that node and unable to be drained, are leaked, and sit in the API server, until they are able to be garbage collected by the Kube Controller Manager.
In the meantime, Karpenter will fail to schedule, since the node cannot be discovered.

There are two paths forward:

[Short Term] I can make a change to our topology logic to ignore pods if their node cannot be found -- this will prevent the process from locking up during this edge case.
[Longer Term] We want to address this holistically via Mega Issue: Node Disruption Lifecycle Taints #624, which changes how nodes are deregistered from the API server, and causes the pods to not leak.

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 0
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: inflate
      tolerations:
      - key: node.kubernetes.io/unschedulable
        effect: "NoSchedule"
      terminationGracePeriodSeconds: 0
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: 1
EOF

Versions:

Chart Version: v0.31.1
Kubernetes Version (kubectl version): 1.27

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

ellistarn added the kind/bug Categorizes issue or PR as related to a bug. label Oct 26, 2023

ellistarn mentioned this issue Oct 27, 2023

fix: Ignore pods awaiting garbage collection during topology calculat… #642

Merged

ellistarn self-assigned this Oct 27, 2023

ellistarn closed this as completed Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leaked pods with topology blocks provisioning #640

Leaked pods with topology blocks provisioning #640

ellistarn commented Oct 26, 2023

Leaked pods with topology blocks provisioning #640

Leaked pods with topology blocks provisioning #640

Comments

ellistarn commented Oct 26, 2023

Description