Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

propolis-server should not crash when failing to start a VM #866

Merged
merged 4 commits into from
Feb 21, 2025
Merged

Conversation

iximeow
Copy link
Member

@iximeow iximeow commented Feb 20, 2025

tripped over this while trying to figure out how it was that propolis-server is now failing to create VMs with a toml that was working fine just an hour ago. what was a crash with stdout having an INFO ... migration: InstanceMigrateStatusResponse { migration_in: None, migration_out: None} (silently truncated early due to panic) now very helpfully does not kill propolis-server and instead has an ERRO ... failed to add low memory region: Not enough space.

Fixes #838

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit of a bummer we have to do this, but yeah, seems good to me!

Base automatically changed from rust-1.85-clippy to master February 20, 2025 22:34
@@ -226,7 +226,26 @@ impl<'a> VmEnsureNotStarted<'a> {
kernel_vm_paused: false,
})
}
Err(e) => Err(self.fail(e).await),
Err(e) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor caveat: this works in part because in today's code, no one ever actually drops a VmEnsureObjectsCreated (both code paths that instantiate one more or less immediately call ensure_active on it, which infallibly moves the runtime to Vm::make_active; after that point it's guaranteed that someone will call shutdown_background on the runtime before it gets dropped).

I think fixing this might be mildly involved (though maybe there's an elegant solution I haven't seen yet), so I don't think I'd hold up the whole PR for it, but it might be good to note this property in a comment somewhere around here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, from this nudging and our chat, i think being more intentional about creating and initializing tasks on vmm_rt is pretty workable! that's how i've got the PR now. at least this leaves us with a clearly-safe way to handle fallibility on the path to a VmEnsureObjectsCreated.

if we decided to do something fallible between a VmEnsureObjectsCreated and an ensure_active, well, that would be pretty unfortunate.

this approach also fixes the immediate issue, but also makes sure that
we have a clear way to do fallible init on the way to
VmEnsureObjectsCreated.

another approach here could be to move the call tree leading to VMM
runtime creation and init into a separate thread, be it spawn_blocking
or a scoped thread. this would start from `create_and_activate_vm()` to
capture both the migration-based and local-request-based init. i'm just
hesitant about moving more than strictly needed into "strange" scopes
that might make future-us have to think twice about the execution
environment.
Copy link
Contributor

@gjcolombo gjcolombo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for going the additional mile here!

Comment on lines 260 to 261
// `VmEnsureObjectsCreated` (and later state transitions) take care to
// `shutdown_background` the runtime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this isn't quite right, I think; once you have a VmEnsureActive you are assured that the runtime will be shut down in the background when it's dropped. You get that by calling VmEnsureObjectsCreated::ensure_active.

We might want to put a WARNING: comment on VmEnsureObjectsCreated indicating that it can't currently be dropped (or maybe just add a Drop impl for it that calls shutdown_background, though I would have to convince myself that this won't leak any tasks).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i'd conflated ActiveVm and VmEnsureActive, yeah you're right.

i'll go with a stern warning. a Drop impl that calls shutdown_background would immediately leak any object's tasks, but they're all also cancelled so they should stop "soon", so i think it's fine. but being in a situation where we're dropping VmEnsureObjectsCreated itself feels like a bug. and if you innocently destructure VmEnsureObjectsCreated outside ensure_active you still have to be careful about the runtime again..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iximeow iximeow merged commit bd7a8cf into master Feb 21, 2025
11 checks passed
@iximeow iximeow deleted the ixi/838 branch February 21, 2025 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

failing to start a VM produces a "cannot drop a runtime in a context where blocking is not allowed" panic
3 participants