Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add karpenter preferNoSchedule taint design #649

Closed
wants to merge 1 commit into from

Conversation

sadath-12
Copy link
Contributor

@sadath-12 sadath-12 commented Oct 31, 2023

Fixes #624 Part 2 (Candidate (PreferNoSchedule Taint) )

Description

This is design for understanding our current issues with disruption and cordoning of nodes and hopefully for the future implementation

How was this change tested?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc. Overall, there's a bit of nuance and comprehensive overview to how this affects schedulability missing in the doc. I think it'd be good to explore taint usage in Kubernetes and explore how users might set pod tolerations to address these. On top of that, I think you could benefit from refreshing yourself on the disruption and scheduling docs.

Some high level feedback:

  1. I'm not sure this tackles the core reason why we're wanting to do the PreferNoSchedule taint or understands the key scenarios here. We use a karpenter.sh/disruption:NoSchedule=disrupting taint already, which should cover Scenario 1.
  2. Scenario 2 is a bit confusing, it seems you're misunderstanding that the do-not-evict annotation is important here for pods, not nodes, and not properly addressing the nuance of the need for PreferNoSchedule (Quick note, you should refer to it as do-not-disrupt for v1beta1 APIs.)
  3. In the cases, I could be misunderstanding, but you should explore the overarching scenarios of how a NoSchedule taint can affect a cluster vs the PreferNoSchedule. Imagine a set of nodes are all Drifted. How does that affect schedulability? How does it affect churn and impact limits? What's the expected behavior here?
  4. Not sure if you need some of the sections here. I would think critically about what you think the format of the doc should be, and how it should drive a discussion here that we'll ultimately have in the working group.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly , I think NoSchedule should cover most of the cases for our use and for the do-not-disrupt thing on pods if I'm right if some node has the pod that has this annotation the node won't be disrupted and if in some case if it is a bigger node that we want to take down then it will be stuck just because of that pod untill user intends to remove that pod manually till then some non critical pods that could bear that churn could be scheduled there . That's what I thought

Copy link

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 19, 2023
@jonathan-innis jonathan-innis added needs-design and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 27, 2023
@Legion2
Copy link

Legion2 commented Dec 5, 2023

Let me add a use case for the preferNoSchedule:
Think about a nood pool specifically for jobs (short or long running). Some jobs should not be disturbed, so they have the do-not-evict annotation. This causes big nodes to survive for long time with just a few small jobs running on them, because karpenter can not remove the nodes because they cannot evict the pods. However, the jobs will finish eventually and then the underutilized nodes should be removed. However, in the meantime some new jobs are created and the pods are scheduled by the scheduler on these nearly empty node, because the default scheduler tries to evenly distribute the workloads so all nodes are evenly utilized.
Here the preferNoSchedule taint comes into play, if karpenter decides that the node pool is underutilized, it tries to find some nodes it can remove however it can not directly remove a node because they all still have some jobs which are not evictable. But it can taint some nodes and thereby control the scheduling of new jobs/pods. Which eventually lead to empty nodes which can be removed.

Also preferNoSchedule is only a preference expressed to the scheduler, it does not block a pod from being scheduled it may influence where it is scheduled. If there is no capacity on any other node pods will still be scheduled on nodes with the preferNoSchedule taint.

@sadath-12 sadath-12 closed this Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mega Issue: Node Disruption Lifecycle Taints
4 participants