feat: scale disruption cost by the node utilization #2028

cnmcavoy · 2025-02-24T23:20:41Z

Fixes #N/A

Description
Scales the disruption cost of nodes by their utilization of pod resources. A node with 1 pod and 99% utilization should have a higher disruption cost than a node with 1 pod and 10% utilization.

How was this change tested?
We've been looking at improving utilization of resources in our clusters, and noticed that Karpenter tends to prefer consolidating underutilized nodes with fewer pods rather than nodes with the most wasted resources. It seems like this can happen because the disruptionutils.ReschedulingCost(ctx, pods) produces a value that is roughly equivalent to the pod count (scaled by pod priority class). So nodes with fewer pods and higher utilization will be candidates for consolidation, while underutilized nodes with many smaller pods are not.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

k8s-ci-robot · 2025-02-24T23:20:46Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cnmcavoy
Once this PR has been reviewed and has the lgtm label, please assign jonathan-innis for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jonathan-innis · 2025-02-25T03:55:10Z

cc: @rschalo

jonathan-innis · 2025-02-25T03:55:44Z

@rschalo Has been thinking on this problem -- might be worth having a discussion over Slack or in a GH issue to discuss how y'all are both thinking about this

rschalo · 2025-02-25T15:34:39Z

Would love to chat via Slack @cnmcavoy

Signed-off-by: Cameron McAvoy <cmcavoy@indeed.com>

coveralls · 2025-02-27T23:29:05Z

Pull Request Test Coverage Report for Build 13577616476

Details

12 of 12 (100.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.09%) to 81.731%

Totals
Change from base Build 13548001391:	0.09%
Covered Lines:	9493
Relevant Lines:	11615

💛 - Coveralls

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 24, 2025

k8s-ci-robot requested review from jackfrancis and tallaxes February 24, 2025 23:20

k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 24, 2025

feat: scale disruption cost by the node utilization

9c0d57a

Signed-off-by: Cameron McAvoy <cmcavoy@indeed.com>

cnmcavoy force-pushed the disruption-utilization branch from 0374e0d to 9c0d57a Compare February 27, 2025 23:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: scale disruption cost by the node utilization #2028

feat: scale disruption cost by the node utilization #2028

cnmcavoy commented Feb 24, 2025

k8s-ci-robot commented Feb 24, 2025

jonathan-innis commented Feb 25, 2025

jonathan-innis commented Feb 25, 2025

rschalo commented Feb 25, 2025

coveralls commented Feb 27, 2025

feat: scale disruption cost by the node utilization #2028

Are you sure you want to change the base?

feat: scale disruption cost by the node utilization #2028

Conversation

cnmcavoy commented Feb 24, 2025

k8s-ci-robot commented Feb 24, 2025

jonathan-innis commented Feb 25, 2025

jonathan-innis commented Feb 25, 2025

rschalo commented Feb 25, 2025

coveralls commented Feb 27, 2025

Pull Request Test Coverage Report for Build 13577616476

Details

💛 - Coveralls