-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add karpenter preferNoSchedule taint design #649
Conversation
Signed-off-by: sadath-12 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc. Overall, there's a bit of nuance and comprehensive overview to how this affects schedulability missing in the doc. I think it'd be good to explore taint usage in Kubernetes and explore how users might set pod tolerations to address these. On top of that, I think you could benefit from refreshing yourself on the disruption and scheduling docs.
Some high level feedback:
- I'm not sure this tackles the core reason why we're wanting to do the PreferNoSchedule taint or understands the key scenarios here. We use a
karpenter.sh/disruption:NoSchedule=disrupting
taint already, which should cover Scenario 1. - Scenario 2 is a bit confusing, it seems you're misunderstanding that the
do-not-evict
annotation is important here for pods, not nodes, and not properly addressing the nuance of the need for PreferNoSchedule (Quick note, you should refer to it asdo-not-disrupt
for v1beta1 APIs.) - In the cases, I could be misunderstanding, but you should explore the overarching scenarios of how a NoSchedule taint can affect a cluster vs the PreferNoSchedule. Imagine a set of nodes are all Drifted. How does that affect schedulability? How does it affect churn and impact limits? What's the expected behavior here?
- Not sure if you need some of the sections here. I would think critically about what you think the format of the doc should be, and how it should drive a discussion here that we'll ultimately have in the working group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly , I think NoSchedule should cover most of the cases for our use and for the do-not-disrupt
thing on pods if I'm right if some node has the pod that has this annotation the node won't be disrupted and if in some case if it is a bigger node that we want to take down then it will be stuck just because of that pod untill user intends to remove that pod manually till then some non critical pods that could bear that churn could be scheduled there . That's what I thought
This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity. |
Let me add a use case for the preferNoSchedule: Also preferNoSchedule is only a preference expressed to the scheduler, it does not block a pod from being scheduled it may influence where it is scheduled. If there is no capacity on any other node pods will still be scheduled on nodes with the preferNoSchedule taint. |
Fixes #624 Part 2 (Candidate (PreferNoSchedule Taint) )
Description
This is design for understanding our current issues with disruption and cordoning of nodes and hopefully for the future implementation
How was this change tested?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.