Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collect memory stats, but it's not really suitable #889

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

iximeow
Copy link
Member

@iximeow iximeow commented Mar 26, 2025

in case i'm doing anything funky with the approach here, but i think this is as good as it could get at the moment.

not suitable for.. a few reasons. but this does get reasonable numbers! propolis-server memory use is in no small part... the binary itself.

given a 32 GiB VM with a few cores, this takes almost five seconds on my workstation to get through all of /proc/{getpid()}/xmap. given there being no I/O here, that's five seconds of CPU time to calculate memory stats. not good, not something i'd want to merge.

additionally, while walking /proc/{getpid()}/xmap we at various points hold the process' address space lock (for writing!), which can be very much observed by the guest VM itself. i happened to log into the above-mentioned VM while memory stat collection was going on, and login hung for two seconds while that finished. steady-state behavior is less bad (while true; do date; sleep 0.2; done does not show any disruption for example), but, again, not shippable behavior without at least a feature flag.

using pmap -x <pid> does similar operations with respect to the system, opening xmap and reading entries and on, but it seems somewhat faster than the code here. hundreds of milliseconds instead of plural seconds. i'm not sure what's different, but the service-interrupting nature of the metric collection makes it kind of a showstopper either way.

not suitable for.. a few reasons. but this does get reasonable numbers!
`propolis-server` memory use is in no small part... the binary itself.

given a 32 GiB VM with a few cores, this takes almost five seconds on
my workstation to get through all of /proc/{getpid()}/xmap. given
there being no I/O here, that's five seconds of CPU time to calculate
memory stats. not good, not something i'd want to merge.

additionally, while walking /proc/{getpid()}/xmap we at various points
hold the process' address space lock (for writing!), which can be very
much observed by the guest VM itself. i happened to log into the
above-mentioned VM while memory stat collection was going on, and login
hung for two seconds while that finished. steady-state behavior is less
bad (`while true; do date; sleep 0.2; done` does not show any
disruption for example), but, again, not shippable behavior without at
*least* a feature flag.

using `pmap -x <pid>` does similar operations with respect to the
system, opening `xmap` and reading entries and on, but it seems somewhat
faster than the code here. hundreds of milliseconds instead of plural
seconds. i'm not sure what's different, but the service-interrupting
nature of the metric collection makes it kind of a showstopper either
way.
github-actions[bot]

This comment was marked as resolved.

iximeow and others added 2 commits March 26, 2025 16:43

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

// Safety: getpid() does not alter any state, or even read mutable state.
let mypid = unsafe { libc::getpid() };
let xmap_path = format!("/proc/{}/xmap", mypid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could just be "/proc/self/xmap" ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i hoped and thought so, but i don't have a /proc/self* at all locally. i was kinda curious about it: if you have it maybe it's added by something downstream of illumos, and helios doesn't get the fun?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pmooney informs me that /proc/self/ in fact there and just not in ls /proc (or, consequently, tab completion for /proc/s :) )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the s-e-l-f is silent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants