Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an example for different budget for different disruption reasons #19

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

InsomniaCoder
Copy link
Contributor

Issue #, if available:

Description of changes:

  • Adding an example of different disruption budget for different disruption reasons.

I came across this after upgrading in v1 and it's quite useful as we needed to keep the disruption quite strict to limit blast radius of situation like AMI update/EKS upgrade, and we noticed that it affected the consolidation activity.

with this now we are allowed to consolidate more efficiently while keeping the strict policy for update.

Let me know if it makes sense or if it's not that useful feel free to close it.

Thank you

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@jakeskyaws
Copy link
Contributor

Thank you for submitting this contribution! Its great to hear how these blueprints / Karpenter v1 can a help solve real world challenges.

For users to fully benefit from your example could you share how this was tested and how we could replicate this. Although we are adding this as an example to the README.md, it would be great for users confidence to know how to validate the configuration. Thanks again.

@InsomniaCoder
Copy link
Contributor Author

Thank you for submitting this contribution! Its great to hear how these blueprints / Karpenter v1 can a help solve real world challenges.

For users to fully benefit from your example could you share how this was tested and how we could replicate this. Although we are adding this as an example to the README.md, it would be great for users confidence to know how to validate the configuration. Thanks again.

I have recently applied this new nodepool configuration in production (after finishing v1 upgrade).

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: multiple-consolidations
spec:
  disruption:
    budgets:
    - nodes: "1"
      reasons:
      - Drifted
    - duration: 14m0s
      nodes: "0"
      reasons:
      - Drifted
      schedule: '*/15 * * * *'
    - nodes: "3"
      reasons:
      - Empty
      - Underutilized
    - duration: 9m0s
      nodes: "0"
      reasons:
      - Empty
      - Underutilized
      schedule: '*/10 * * * *'
    consolidateAfter: 5m0s
    consolidationPolicy: WhenEmptyOrUnderutilized
  template:
    spec:
      expireAfter: 720h
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
      - key: karpenter.k8s.aws/instance-memory
        operator: Gt
        values:
        - "16000"
      - key: karpenter.k8s.aws/instance-category
        operator: In
        values:
        - r
        - m
        - c
      - key: karpenter.k8s.aws/instance-generation
        operator: Gt
        values:
        - "4"
      - key: karpenter.sh/capacity-type
        operator: In
        values:
        - on-demand
      - key: kubernetes.io/arch
        operator: In
        values:
        - amd64
        - arm64
      - key: topology.kubernetes.io/zone
        operator: In
        values:
        - eu-west-1a
        - eu-west-1b
        - eu-west-1c
      - key: kubernetes.io/os
        operator: In
        values:
        - linux
  weight: 1

(redacted some details such as taint, and selector)

This is the metrics showing that it behaves as needed. the metric being used is karpenter_nodepools_allowed_disruptions

image

green line is drifted reason which is 1 node every 15 minutes (start acting for example minutes 44, 59)

read and blue represents underutilised and empty which is 3 every 10 minutes starting for example 39, 49, 59

Let me know if you need me to share these in the doc somehow or anything needed.

Thank you

@chrismld
Copy link
Contributor

@InsomniaCoder first of all, THANK YOU so much for not just letting us know these blueprints have been useful for you, but also for making contributions as well, you rock! I have a few recommendations about this:

  • Can you please make this part of the "Multipe Budgets" section, and move the "Multiple Budgets" section after the "Reasons" section? That way we can keep a consistent order and going deeper every time.
  • As Jake suggested (and you already answered), it would be really helpful if you can incorporate what you described here, specially to show the results others will see by having this configuration in place.
  • Can you please break down each budget, you're already doing it partially but it was a bit hard for me to follow along. Maybe you can explain the four scenarios, then show the NodePool config, and then the results.
  • Can you also either add a note or directly make it explicit that the budget config will "in a given time frame, at most x nodes can be disrupting at a given moment".
  • Let's see how long it ends up being this blueprint, maybe it will be worth it to actually have a dedicated blueprint for this and tested (following Jake's recommendation).

We think this contributions the blueprint will end up being even more awesome :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants