Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leaked pods with topology blocks provisioning #640

Closed
ellistarn opened this issue Oct 26, 2023 · 0 comments
Closed

Leaked pods with topology blocks provisioning #640

ellistarn opened this issue Oct 26, 2023 · 0 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@ellistarn
Copy link
Contributor

Description

Observed Behavior:

2023-10-26T16:47:09.627Z	ERROR	controller.provisioner	creating scheduler, tracking topology counts, getting node ip-192-168-128-105.us-west-2.compute.internal, Node "ip-192-168-128-105.us-west-2.compute.internal" not found	{"commit": "2012cf9"}
2023-10-26T16:47:19.627Z	ERROR	controller.provisioner	creating scheduler, tracking topology counts, getting node ip-192-168-128-105.us-west-2.compute.internal, Node "ip-192-168-128-105.us-west-2.compute.internal" not found	{"commit": "2012cf9"}
2023-10-26T16:47:29.628Z	ERROR	controller.provisioner	creating scheduler, tracking topology counts, getting node ip-192-168-128-105.us-west-2.compute.internal, Node "ip-192-168-128-105.us-west-2.compute.internal" not found	{"commit": "2012cf9"}

Expected Behavior:

Provisioning should not block on pods awaiting garbage collection.

Reproduction Steps (Please include YAML):

Here's what's happening:

  1. Karpenter looks at other pods during scheduling if pod topology, pod affinity, or pod antiaffinity is defined.
  2. We retrieve the Node for those pods using pod.Spec.NodeName
  3. However, if the Node does not exist, Karpenter will error out and try again
  4. Karpenter attempts to drain pods from Nodes when they are terminated, however, pods that tolerate NodeSchedule/NoExecute cannot be drained, as they would immediately reschedule after being drained.
  5. Karpenter then deletes the EC2 instance, and the corresponding Node object.
  6. Any pods that were on that node and unable to be drained, are leaked, and sit in the API server, until they are able to be garbage collected by the Kube Controller Manager.
  7. In the meantime, Karpenter will fail to schedule, since the node cannot be discovered.

There are two paths forward:

  1. [Short Term] I can make a change to our topology logic to ignore pods if their node cannot be found -- this will prevent the process from locking up during this edge case.
  2. [Longer Term] We want to address this holistically via Mega Issue: Node Disruption Lifecycle Taints #624, which changes how nodes are deregistered from the API server, and causes the pods to not leak.
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 0
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: inflate
      tolerations:
      - key: node.kubernetes.io/unschedulable
        effect: "NoSchedule"
      terminationGracePeriodSeconds: 0
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: 1
EOF

Versions:

  • Chart Version: v0.31.1
  • Kubernetes Version (kubectl version): 1.27
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@ellistarn ellistarn added the kind/bug Categorizes issue or PR as related to a bug. label Oct 26, 2023
@ellistarn ellistarn self-assigned this Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant