Skip to content

Commit 7f7b760

Browse files
committed
server: add state machine docs
Add some doc comments to the propolis-server state machine modules that describe how the state machine works. Also, rename a couple of routines for clarity. This PR contains no functional changes.
1 parent c849bab commit 7f7b760

File tree

3 files changed

+192
-29
lines changed

3 files changed

+192
-29
lines changed

bin/propolis-server/src/lib/vm/ensure.rs

+82-18
Original file line numberDiff line numberDiff line change
@@ -4,26 +4,90 @@
44

55
//! Tools for handling instance ensure requests.
66
//!
7-
//! To initialize a new VM, the server must (1) create a set of VM objects from
8-
//! an instance spec, (2) set up VM services that use those objects, (3) use the
9-
//! objects and services to drive the VM state machine to the `ActiveVm` state,
10-
//! and (4) notify the original caller of the "instance ensure" API of the
11-
//! completion of its request. If VM initialization fails, the actions required
12-
//! to compensate and drive the state machine to `RundownComplete` depend on how
13-
//! many steps were completed.
7+
//! This module handles the first high-level phase of a VM's lifecycle, which
8+
//! creates all of the VM's components and attendant data structures. These are
9+
//! handed off to a `StateDriver` that implements the main VM event loop. See
10+
//! the [`state_driver`] module docs for more details.
1411
//!
15-
//! When live migrating into an instance, the live migration task interleaves
16-
//! initialization steps with the steps of the live migration protocol, and
17-
//! needs to be able to unwind initialization correctly whenever the migration
18-
//! protocol fails.
12+
//! This module uses distinct structs that each represent a distinct phase of VM
13+
//! initialization. When a server receives a new ensure request, it creates the
14+
//! first of these structures, then hands it off to the procedure described in
15+
//! the ensure request to drive the rest of the initialization process, as in
16+
//! the diagram below:
1917
//!
20-
//! The `VmEnsure` types in this module exist to hide the gory details of
21-
//! initializing and unwinding from higher-level operations like the live
22-
//! migration task. Each type represents a phase of the initialization process
23-
//! and has a routine that consumes the current phase and moves to the next
24-
//! phase. If a higher-level operation fails, it can call a failure handler on
25-
//! its current phase to unwind the whole operation and drive the VM state
26-
//! machine to the correct resting state.
18+
//! ```text
19+
//! +-------------------------+
20+
//! | |
21+
//! | Initial state (no VM) |
22+
//! | |
23+
//! +-----------+-------------+
24+
//! |
25+
//! Receive ensure request
26+
//! |
27+
//! v
28+
//! VmEnsureNotStarted
29+
//! |
30+
//! |
31+
//! +---------v----------+
32+
//! Yes | | No
33+
//! +-------+ Live migration? +---------+
34+
//! | | | |
35+
//! | +--------------------+ |
36+
//! | |
37+
//! +-----v------+ |
38+
//! |Get params | |
39+
//! |from source | |
40+
//! +-----+------+ |
41+
//! | |
42+
//! +-----v------+ +---------v-----------+
43+
//! |Initialize | |Initialize components|
44+
//! |components | | from params |
45+
//! +-----+------+ +---------+-----------+
46+
//! | |
47+
//! v v
48+
//! VmEnsureObjectsCreated VmEnsureObjectsCreated
49+
//! | |
50+
//! | |
51+
//! +-----v------+ |
52+
//! |Import state| |
53+
//! |from source | |
54+
//! +-----+------+ |
55+
//! | |
56+
//! | |
57+
//! | +------------------+ |
58+
//! +-------->Launch VM services<----------+
59+
//! +--------+---------+
60+
//! |
61+
//! |
62+
//! +--------v---------+
63+
//! |Move VM to Active |
64+
//! +--------+---------+
65+
//! |
66+
//! |
67+
//! v
68+
//! VmEnsureActive<'_>
69+
//! ```
70+
//!
71+
//! When initializing a VM from scratch, the ensure request contains a spec that
72+
//! determines what components the VM should create, and they are created into
73+
//! their default initial states. When migrating in, the VM-ensure structs are
74+
//! handed off to the migration protocol, which fetches a spec from the
75+
//! migration source, uses its contents to create the VM's components, and
76+
//! imports the source VM's device state into those components.
77+
//!
78+
//! Once all components exist and are initialized, this module sets up "VM
79+
//! services" (e.g. the serial console and metrics) that connect this VM to
80+
//! other Oxide APIs and services. It then updates the server's VM state machine
81+
//! and yields a completed "active" VM that can be passed into a state driver
82+
//! run loop.
83+
//!
84+
//! Separating the initialization steps in this manner hides the gory details of
85+
//! initializing a VM (and unwinding initialization) from higher-level
86+
//! procedures like the migration protocol. Each initialize phase has a failure
87+
//! handler that allows a higher-level driver to unwind the entire ensure
88+
//! operation and drive the VM state machine to the correct resting state.
89+
//!
90+
//! [`state_driver`]: crate::vm::state_driver
2791
2892
use std::sync::Arc;
2993

bin/propolis-server/src/lib/vm/mod.rs

+1-1
Original file line numberDiff line numberDiff line change
@@ -610,7 +610,7 @@ impl Vm {
610610

611611
let vm_for_driver = self.clone();
612612
guard.driver = Some(tokio::spawn(async move {
613-
state_driver::run_state_driver(
613+
state_driver::ensure_vm_and_launch_driver(
614614
log_for_driver,
615615
vm_for_driver,
616616
external_publisher,

bin/propolis-server/src/lib/vm/state_driver.rs

+109-10
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,92 @@
22
// License, v. 2.0. If a copy of the MPL was not distributed with this
33
// file, You can obtain one at https://mozilla.org/MPL/2.0/.
44

5-
//! A task to handle requests to change a VM's state or configuration.
5+
//! Structures and tasks that handle VM state and configuration change requests.
6+
//!
7+
//! This module handles the second high-level phase of a VM's lifecycle: once a
8+
//! VM's components and services exist, it enters an event loop that changes the
9+
//! VM's state in response to external API requests and signals arriving from
10+
//! within the VM. See the [`ensure`] module for more information about the
11+
//! initialization phase.
12+
//!
13+
//! This module's main struct is the [`StateDriver`], which holds references to
14+
//! an active VM's components, the VM's event queues, and the sender side of a
15+
//! channel that publishes instance state updates. External API requests are
16+
//! routed to the driver's event queue and handled by the driver task. This
17+
//! model ensures that only one task handles VM events and updates VM state; the
18+
//! idea is to minimize the number of different tasks and threads one has to
19+
//! consider when reasoning about concurrency in the VM state machine.
20+
//!
21+
//! On a migration in, the state driver implicitly starts the VM before entering
22+
//! the main event loop:
23+
//!
24+
//! ```text
25+
//! +-----------------------+
26+
//! | VM components created |
27+
//! +-----------+-----------+
28+
//! |
29+
//! |
30+
//! Yes +--------------v--------------+ No
31+
//! +-+ Initialized via migration? +-+
32+
//! | +-----------------------------+ |
33+
//! | |
34+
//! +------v--------+ |
35+
//! | Auto-start VM | |
36+
//! +------+--------+ |
37+
//! | |
38+
//! +------------v------------+ |
39+
//! | Start devices and vCPUs | |
40+
//! +------------+------------+ |
41+
//! | |
42+
//! | |
43+
//! | +-----------------------+ |
44+
//! +----> Enter main event loop <----+
45+
//! +-----------------------+
46+
//! ```
47+
//!
48+
//! Once in the main event loop, a VM generally remains active until it receives
49+
//! a signal telling it to do something else:
50+
//!
51+
//! ```text
52+
//! +-----------------+ +-----------------+ error during startup
53+
//! | Not yet started | | Not yet started | +--------+
54+
//! | (migrating in) | | (Creating) +-------> Failed |
55+
//! +-------+---------+ +--------+--------+ +--------+
56+
//! | |
57+
//! | | Successful start request
58+
//! +-----------+ |
59+
//! +v----------v-----------+ API/chipset request
60+
//! +---------+ Running +------+
61+
//! | +---^-------+--------^--+ +--v--------+
62+
//! | | | +------+ Rebooting |
63+
//! | | | +-----------+
64+
//! +--------v------+ | |
65+
//! | Migrating out +------+ | API/chipset request
66+
//! +--------+------+ |
67+
//! | +----v-----+
68+
//! | | Stopping |
69+
//! | +----+-----+
70+
//! | |
71+
//! | | +-----------------+
72+
//! | +----v-----+ | Destroyed |
73+
//! +----------------> Stopped +------> (after rundown) |
74+
//! +----------+ +-----------------+
75+
//! ```
76+
//!
77+
//! The state driver's [`InputQueue`] receives events that can push a running VM
78+
//! out of its steady "running" state. These can come either from the external
79+
//! API or from events happening in the guest (e.g. a vCPU asserting a pin on
80+
//! the virtual chipset that should reset or halt the VM). The policy that
81+
//! determines what API requests can be accepted in which states is implemented
82+
//! in the [`request_queue`] module.
83+
//!
84+
//! The "stopped" and "failed" states are terminal states. When the state driver
85+
//! reaches one of these states, it exits the event loop, returning its final
86+
//! state to the wrapper function that launched the driver. The wrapper task is
87+
//! responsible for running down the VM objects and structures and resetting the
88+
//! server so that it can start another VM.
89+
//!
90+
//! [`ensure`]: crate::vm::ensure
691
792
use std::{
893
sync::{Arc, Mutex},
@@ -248,7 +333,10 @@ struct StateDriver {
248333
migration_src_state: crate::migrate::source::PersistentState,
249334
}
250335

251-
/// The values returned by a state driver task when it exits.
336+
/// Contains a state driver's terminal state and the channel it used to publish
337+
/// state updates to the rest of the server. The driver's owner can use these to
338+
/// publish the VM's terminal state after running down all of its objects and
339+
/// services.
252340
pub(super) struct StateDriverOutput {
253341
/// The channel this driver used to publish external instance state changes.
254342
pub state_publisher: StatePublisher,
@@ -258,9 +346,14 @@ pub(super) struct StateDriverOutput {
258346
pub final_state: InstanceState,
259347
}
260348

261-
/// Creates a new set of VM objects in response to an `ensure_request` directed
262-
/// to the supplied `vm`.
263-
pub(super) async fn run_state_driver(
349+
/// Given an instance ensure request, processes the request and hands the
350+
/// resulting activated VM off to a [`StateDriver`] that will drive the main VM
351+
/// event loop.
352+
///
353+
/// Returns the final state driver disposition. Note that this routine does not
354+
/// return a `Result`; if the VM fails to start, the returned
355+
/// [`StateDriverOutput`] contains appropriate state for a failed VM.
356+
pub(super) async fn ensure_vm_and_launch_driver(
264357
log: slog::Logger,
265358
vm: Arc<super::Vm>,
266359
mut state_publisher: StatePublisher,
@@ -269,7 +362,7 @@ pub(super) async fn run_state_driver(
269362
ensure_options: super::EnsureOptions,
270363
) -> StateDriverOutput {
271364
let ensure_options = Arc::new(ensure_options);
272-
let activated_vm = match create_and_activate_vm(
365+
let activated_vm = match ensure_active_vm(
273366
&log,
274367
&vm,
275368
&mut state_publisher,
@@ -318,7 +411,7 @@ pub(super) async fn run_state_driver(
318411

319412
/// Processes the supplied `ensure_request` to create a set of VM objects that
320413
/// can be moved into a new `StateDriver`.
321-
async fn create_and_activate_vm<'a>(
414+
async fn ensure_active_vm<'a>(
322415
log: &'a slog::Logger,
323416
vm: &'a Arc<super::Vm>,
324417
state_publisher: &'a mut StatePublisher,
@@ -371,23 +464,27 @@ async fn create_and_activate_vm<'a>(
371464
}
372465

373466
impl StateDriver {
467+
/// Directs this state driver to enter its main event loop. The driver may
468+
/// perform additional tasks (e.g. automatically starting a migration
469+
/// target) before it begins processing events from its queues.
374470
pub(super) async fn run(mut self, migrated_in: bool) -> StateDriverOutput {
375471
info!(self.log, "state driver launched");
376472

377473
let final_state = if migrated_in {
378474
if self.start_vm(VmStartReason::MigratedIn).await.is_ok() {
379-
self.run_loop().await
475+
self.event_loop().await
380476
} else {
381477
InstanceState::Failed
382478
}
383479
} else {
384-
self.run_loop().await
480+
self.event_loop().await
385481
};
386482

387483
StateDriverOutput { state_publisher: self.external_state, final_state }
388484
}
389485

390-
async fn run_loop(&mut self) -> InstanceState {
486+
/// Runs the state driver's main event loop.
487+
async fn event_loop(&mut self) -> InstanceState {
391488
info!(self.log, "state driver entered main loop");
392489
loop {
393490
let event = self.input_queue.wait_for_next_event().await;
@@ -415,6 +512,8 @@ impl StateDriver {
415512
}
416513
}
417514

515+
/// Starts the driver's VM by sending start commands to its devices and
516+
/// vCPUs.
418517
async fn start_vm(
419518
&mut self,
420519
start_reason: VmStartReason,

0 commit comments

Comments
 (0)