Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a detailed metric for deprovisioning eligible nodes #695

Open
Tracked by #1051
hamishforbes opened this issue Jun 12, 2023 · 1 comment
Open
Tracked by #1051

Add a detailed metric for deprovisioning eligible nodes #695

hamishforbes opened this issue Jun 12, 2023 · 1 comment
Labels
deprovisioning Issues related to node deprovisioning help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. metrics-audit

Comments

@hamishforbes
Copy link

Tell us about your request

Currently Karpenter exposes karpenter_deprovisioning_eligible_machines as a total count of nodes that are eligible for deprovisioning by deprovisioner type (e.g. consolidation/emptiness)

This is good, I can easily tell that a cluster has nodes that could be deprovisioned.

What's missing is
a) Which nodes?
b) What's blocking deprovisioning (if anything)?

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Ideally I want a dashboard and/or alert that says "Hey there are x nodes in your cluster that could be deprovisioned but aren't because of Y"

I can choose to ignore if that's because they are spots, or because replacing them would take more nodes etc

But I could easily see that if I manually migrate a do-not-evict pod I can free up a node to be consolidated

Are you currently working around this issue?

Manually going through every node (with some educated guessing based on node resource consumption) and running kubectl describe to see why that node is or is not consolidatable

Additional Context

A label on the node would be helpful too / instead, although my preference would be for a metric, if you could at least do kubectl get nodes -l karpenter.sh/deprovisioning-eligible that would be an improvement!

Attachments

No response

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@hamishforbes hamishforbes added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 12, 2023
@njtran njtran added deprovisioning Issues related to node deprovisioning metrics-audit labels Jun 12, 2023
@njtran
Copy link
Contributor

njtran commented Aug 14, 2023

Hey, we're accepting of this if you want to contribute! Here are my thoughts. Feel free to reach out to me on slack if you need.

a) Which nodes?
b) What's blocking deprovisioning (if anything)?

a) If we wanted to add this into the nodes, we would probably surface this through a node-state-metric like the pod-state-metrics.
b) Karpenter surfaces what's blocked as deprovisioning through events. Can you possibly monitor the events in your cluster? Some of the reasons that deprovisioning is blocked is based off the pods on the node. This could come from things like PDBs or do-not-evict pods, where the blocked condition can change at any second if one of the pods/PDBs changes. I worry about the CPU impact on trying to get this right, so we'd need to do some profiling as well to make sure there's no large impact.

Specifically for "Hey there are x nodes in your cluster that could be deprovisioned but aren't because of Y", it's also important to note that it may take a bit to safely deprovision all the nodes that are eligible, which is a tad different from just being blocked as deprovisioning.

@njtran njtran transferred this issue from aws/karpenter-provider-aws Nov 2, 2023
@k8s-ci-robot k8s-ci-robot added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed help-wanted labels Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprovisioning Issues related to node deprovisioning help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. metrics-audit
Projects
None yet
Development

No branches or pull requests

3 participants