Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make consolidation configurable #1209

Open
hgambarian opened this issue Apr 25, 2024 · 8 comments
Open

Make consolidation configurable #1209

hgambarian opened this issue Apr 25, 2024 · 8 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@hgambarian
Copy link

hgambarian commented Apr 25, 2024

Description

What problem are you trying to solve?
Currently when the node is underUtilized, or consolidable, the controller cordons the node and drains immediately.
But when we have karpenter.sh/do-not-evict: true, the disruption will be blocked.

I want something like:

  • If the node is underUtilized - Cordon the node
  • When the jobs finish their stuff and completed, the node becomes empty
  • When the node is empty, drain,kill the node

How important is this feature to you?

  • Huge cost-optimization
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@hgambarian hgambarian added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 25, 2024
@jonathan-innis
Copy link
Member

Does #624's suggestion to use a PreferNoSchedule for consolidation solve this issue combined with #916? With the addition of both of these features we would: 1) Ignore the karpenter.sh/do-not-disrupt annotation when considering candidates and only respect them during the drain operation and 2) Taint the node as PreferNoSchedule when we see that they are underutilized, which should cause pods to be pushed away from the node during the consolidateAfter timeframe?

@jonathan-innis
Copy link
Member

/assign jonathan-innis

@hgambarian
Copy link
Author

Does #624's suggestion to use a PreferNoSchedule for consolidation solve this issue combined with #916? With the addition of both of these features we would: 1) Ignore the karpenter.sh/do-not-disrupt annotation when considering candidates and only respect them during the drain operation and 2) Taint the node as PreferNoSchedule when we see that they are underutilized, which should cause pods to be pushed away from the node during the consolidateAfter timeframe?

Thanks, looks like this is what I want.

@eden881
Copy link

eden881 commented Aug 5, 2024

Our team needs exactly that. We use our cluster to run a mix of workload types, some are services which can be evicted normally, and some are processes (such as Argo workflow pods) which should not be interrupted when possible.
We annotate the process pods with karpenter.sh/do-not-disrupt and it blocks consolidation as it should, but as the cluster schedules more process pods they land freely on those nodes that should otherwise get disrupted.

If Karpenter could preemptively cordon a node which it wants to disrupt and wait for pods annotated with karpenter.sh/do-not-disrupt to finish, it would be a huge cost saver and a game-changing feature for us.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 3, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 3, 2024
@frittentheke
Copy link

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Dec 3, 2024
@engedaam
Copy link
Contributor

/assign @jmdeal
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

8 participants