-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Karpenter consolidation replaces the node with exact same node (EC2 instance) type #4826
Comments
As far as cordening and draining of selected nodes for disruption are concerned , it will be well handled once the issue kubernetes-sigs/karpenter#624 is solved |
You're using an unreleased version of the beta? |
Can you provide some logs? |
@badrish-s does your t4g.nano node have any node memory pressure taints? |
This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity. |
Apologies. I was on vacation and didn't get a chance to respond to this issue earlier. I am back now, and I did try to set up the Karpenter v1beta1 freshly (using instructions)and applied the same set of nodepools, nodeclass and deployments. I am unable to reproduce the issue now, i.e. the node is NOT being replaced on continuous basis with exact same node type. I'd like to mention that, the KARPENTER VERSION I am currently using is the latest i.e. v0.32.2 - this was different when I originally tested and reported this issue during October 2023 - I was testing with an internal only image - v0-2012cf98c2e2e9625e858842c9f2d177efb0c364. I believe I did something incorrect earlier or the issue is been addressed now with latest version v0.32.2. git-hub actions has closed this issue due to inactivity and I will let it remain that way until I see this again (hopefully never). Thanks for looking into this! |
Sounds good @badrish-s. Glad to hear that the issue appears to be resolved on the latest version! |
Description
Observed Behavior:
Karpenter consolidation replaces the node with exact same node (EC2 instance) type when
spec.disruption.consolidationPolicy: WhenUnderutilized
is set. Also,eks-node-viewer
doesn't show the node to be deleted/replaced as "Cordoned" - this was atleast the behaviour observed during consolidation with earlier versions of Karpenter.Expected Behavior:
My understanding is, consolidation should kick-in during below situations for OnDemand instance types:
However, I noticed the "Replace node" happens whenever Karpenter finds the node is underutilized - the node is replaced on continuous basis and with the exact same node type. In my case t4g.nano was replaced with a t4g.nano, the replacement node was not efficient then the original node in any way, rather exactly same. This behaviour made me think the replacement is happening based on utilization only.
Also, the node to be deleted/replaced should be Cordoned first, drained and deleted only after pods are placed onto the new node.
Reproduction Steps:
NodeClass.yaml (karpenter-demo is my Cluster name):
Nodepool.yaml
Deployment.yaml
Screens from eks-node-viewer:
ip-192-168-151-16.us-west-2.compute.internal
(t4g.nano) is being consolidated (because it is underutilized?) and replaced withip-192-168-24-218.us-west-2.compute.internal
(again, t4g.nano).After sometime,
ip-192-168-24-218.us-west-2.compute.internal
will be again replaced with another t4g.nano instance and the cycle repeats continuously.Additionally, unlike earlier versions of Karpenter
eks-node-viewer
doesn't show the node to be replaced as "Cordoned". Since the logs were rotating fast, it was hard to check if pods were being graciously moved to the new node.Do I have something misconfigured in NodePool or NodeClass or Deployment manifest? or is this the expected Consolidation behaviour in v1beta1 that needs additional configuration to make it work as expected? If there are no misconfigurations or additional configurations to control this, then this is a potential bug that needs attention.
Versions:
kubectl version
):The text was updated successfully, but these errors were encountered: