-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
propolis-server should not crash when failing to start a VM #866
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bit of a bummer we have to do this, but yeah, seems good to me!
@@ -226,7 +226,26 @@ impl<'a> VmEnsureNotStarted<'a> { | |||
kernel_vm_paused: false, | |||
}) | |||
} | |||
Err(e) => Err(self.fail(e).await), | |||
Err(e) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor caveat: this works in part because in today's code, no one ever actually drops a VmEnsureObjectsCreated
(both code paths that instantiate one more or less immediately call ensure_active
on it, which infallibly moves the runtime to Vm::make_active
; after that point it's guaranteed that someone will call shutdown_background
on the runtime before it gets dropped).
I think fixing this might be mildly involved (though maybe there's an elegant solution I haven't seen yet), so I don't think I'd hold up the whole PR for it, but it might be good to note this property in a comment somewhere around here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, from this nudging and our chat, i think being more intentional about creating and initializing tasks on vmm_rt
is pretty workable! that's how i've got the PR now. at least this leaves us with a clearly-safe way to handle fallibility on the path to a VmEnsureObjectsCreated
.
if we decided to do something fallible between a VmEnsureObjectsCreated
and an ensure_active
, well, that would be pretty unfortunate.
this approach also fixes the immediate issue, but also makes sure that we have a clear way to do fallible init on the way to VmEnsureObjectsCreated. another approach here could be to move the call tree leading to VMM runtime creation and init into a separate thread, be it spawn_blocking or a scoped thread. this would start from `create_and_activate_vm()` to capture both the migration-based and local-request-based init. i'm just hesitant about moving more than strictly needed into "strange" scopes that might make future-us have to think twice about the execution environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for going the additional mile here!
// `VmEnsureObjectsCreated` (and later state transitions) take care to | ||
// `shutdown_background` the runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this isn't quite right, I think; once you have a VmEnsureActive
you are assured that the runtime will be shut down in the background when it's dropped. You get that by calling VmEnsureObjectsCreated::ensure_active
.
We might want to put a WARNING:
comment on VmEnsureObjectsCreated
indicating that it can't currently be dropped (or maybe just add a Drop impl for it that calls shutdown_background
, though I would have to convince myself that this won't leak any tasks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, i'd conflated ActiveVm
and VmEnsureActive
, yeah you're right.
i'll go with a stern warning. a Drop impl that calls shutdown_background
would immediately leak any object's tasks, but they're all also cancelled so they should stop "soon", so i think it's fine. but being in a situation where we're dropping VmEnsureObjectsCreated
itself feels like a bug. and if you innocently destructure VmEnsureObjectsCreated
outside ensure_active
you still have to be careful about the runtime again..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(7053acf)
tripped over this while trying to figure out how it was that
propolis-server
is now failing to create VMs with a toml that was working fine just an hour ago. what was a crash with stdout having anINFO ... migration: InstanceMigrateStatusResponse { migration_in: None, migration_out: None}
(silently truncated early due to panic) now very helpfully does not killpropolis-server
and instead has anERRO ... failed to add low memory region: Not enough space
.Fixes #838