Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: decrease minimum allowed size for OsDiskSizeGB from 100 to 30 #688

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bongi23
Copy link

@bongi23 bongi23 commented Feb 10, 2025

Fixes #

Description
Karpenter supports deploying linux only instances and the minimum disk size allowed for these is 30GB. The changes I am proposing allow Karpenter to request VMs with a minimum OSDiskSize of 30GB instead of 100GB

How was this change tested?
Manually modified AKSNodeClass CRD on my company staging cluster and let karpenter deploy successfully a VM with OSDiskSize less than 100
*

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

Release Note


@bongi23
Copy link
Author

bongi23 commented Feb 10, 2025

@microsoft-github-policy-service agree

@bongi23 bongi23 marked this pull request as ready for review February 10, 2025 18:00
@Bryce-Soghigian
Copy link
Collaborator

We experience many performance incidents regarding disk sizes smaller than 100gb, can you provide more justification as to why you want to reduce the limit?

@alexmaurizio
Copy link

alexmaurizio commented Feb 11, 2025

We experience many performance incidents regarding disk sizes smaller than 100gb, can you provide more justification as to why you want to reduce the limit?

One example is running a staging environment (like we do, we run two mirror kubernetes for staging and production) where availability is not crucial and we can tolerate destroying a node if any problem arises.

A 100GB disk will allocate a Premium SSD class P10 (128GB) by default on certain Azure tiers, which is 4 times as expensive as a P4 (32GB) disk. There are a lot of instances which do not have (and we don't plan to use) ephemereal disk, or as everyone calls it, local NVME storage. Managed disks are enough and having 5 inexpensive instances running 128GB P10 disks can sometimes cost more in storage than in machines, which is not the smartest of the choices in my opinion.

The default should stay the same, just allow a configuration for users.

PLUS, Azure mandates a 100GB minimum storage for Windows machines, but a 30GB minimum storage for Ubuntu machines.
This reflects this mandated behaviour, not sticking to Windows machines only.

(Just as a comparison, our two other cloud providers runs the same workloads on total 8GB OS disk for each node, or 20GB just for having more room for docker images pull)

@bongi23
Copy link
Author

bongi23 commented Feb 11, 2025

We experience many performance incidents regarding disk sizes smaller than 100gb, can you provide more justification as to why you want to reduce the limit?

What kind of performance incidents? Are they related to disk performance?
In addition to what said by @alexmaurizio, our microservices are stateless (and I think this is a very common scenario) and disk performance are negligible. In any case, I should be free to save money and accept bad performance on staging environments at least 😄

@lmoretto
Copy link

lmoretto commented Feb 11, 2025

+1 for this. The client should have the freedom (and indeed it's the client's responsibility) to define their infrastructure requirements, without facing such opinionated defaults and constraints

@sgtraverso
Copy link

Agree with this. Let the user define infrastructure requirements, pls.

@bongi23
Copy link
Author

bongi23 commented Feb 13, 2025

Maybe related to #518

@bongi23
Copy link
Author

bongi23 commented Feb 18, 2025

@Bryce-Soghigian hei Bryce, sorry to bother you. Can you give us a feedback please?

@Bryce-Soghigian
Copy link
Collaborator

tangentially related to this change,

CC: @tallaxes

image

Perhaps we should be doing better defaulting based on vcpu unless the user specifed the disk size in their nodeclass?

@tallaxes
Copy link
Collaborator

Let's match what AKS is currently doing - at least for minimum, but possibly (and maybe separately) for default as well. @Bryce-Soghigian, please also check GB vs GiB just in case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants