Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: scale disruption cost by the node utilization #2028

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cnmcavoy
Copy link
Contributor

Fixes #N/A

Description
Scales the disruption cost of nodes by their utilization of pod resources. A node with 1 pod and 99% utilization should have a higher disruption cost than a node with 1 pod and 10% utilization.

How was this change tested?
We've been looking at improving utilization of resources in our clusters, and noticed that Karpenter tends to prefer consolidating underutilized nodes with fewer pods rather than nodes with the most wasted resources. It seems like this can happen because the disruptionutils.ReschedulingCost(ctx, pods) produces a value that is roughly equivalent to the pod count (scaled by pod priority class). So nodes with fewer pods and higher utilization will be candidates for consolidation, while underutilized nodes with many smaller pods are not.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cnmcavoy
Once this PR has been reviewed and has the lgtm label, please assign jonathan-innis for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 24, 2025
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 24, 2025
@jonathan-innis
Copy link
Member

cc: @rschalo

@jonathan-innis
Copy link
Member

@rschalo Has been thinking on this problem -- might be worth having a discussion over Slack or in a GH issue to discuss how y'all are both thinking about this

@rschalo
Copy link
Contributor

rschalo commented Feb 25, 2025

Would love to chat via Slack @cnmcavoy

Signed-off-by: Cameron McAvoy <cmcavoy@indeed.com>
@cnmcavoy cnmcavoy force-pushed the disruption-utilization branch from 0374e0d to 9c0d57a Compare February 27, 2025 23:06
@coveralls
Copy link

Pull Request Test Coverage Report for Build 13577616476

Details

  • 12 of 12 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.09%) to 81.731%

Totals Coverage Status
Change from base Build 13548001391: 0.09%
Covered Lines: 9493
Relevant Lines: 11615

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants