You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
server: improve behavior of starting VMs that are waiting for Crucible activation (#873)
Modify the state driver's VM startup procedure to allow the driver to
process Crucible volume configuration changes while block backends are
being activated. This fixes a livelock that occurs when starting a VM
with a Crucible VCR that points to an unavailable downstairs: the
unavailable downstairs prevents Crucible activation from proceeding;
Nexus sends a corrected VCR that, if applied, would allow the upstairs
to activate; but the state driver never applies the new VCR because it's
blocked trying to activate using the broken VCR.
Since the state driver can now dequeue VM state change requests during
startup, also teach it to abort startup if a stop request is received
while block backends are starting. This is very slightly spicy, because
it can cancel a block backend startup future in a way that was not
possible before, but (a) the only affected backend is the Crucible
backend, and (b) there should be no requests in flight anyway because,
if this case is reached, the VM's vCPUs have not started.
Add PHD coverage of the new behaviors:
- Modify the PHD VCR replacement smoke test to check that start requests
with a bad set of Crucible targets can be unblocked by replacing a bad
target with a good target.
- Add a server state machine test that checks that a VM that is blocked
waiting for Crucible activation can be explicitly stopped.
To assist with this, add a feature to PHD Crucible disks that allows a
test to specify that a disk's generated VCRs should contain an invalid
downstairs IP address. Starting a VM with a disk configured this way
will cause activation to block until the disk's VCR is replaced with a
corrected VCR.
0 commit comments