-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: separate statuses of inbound and outbound migrations #661
Conversation
This should be OK to review as-is, but I'm going to try to cook up the corresponding sled agent change before sending actual review requests here. |
Oops. I think this is just an out-of-date OpenAPI doc--will work on the fix. |
**N.B.** This is a breaking change to the Propolis API that will need a corresponding Omicron change before it can be merged there. Change the way the server reports migration statuses: - Instead of asking clients to identify a specific migration to query, create a `migration-status` endpoint that reports on all migrations currently known to the server. (The server wasn't keeping any history of prior migration attempts anyway, so specifying an ID wasn't buying very much; if the client cares about a specific ID they can compare it to the IDs the server now returns.) - When reporting migration status (either through the migration status endpoint or the instance monitor endpoint), report the ID and status of both the inbound migration (if there was one) and the most recent attempted outbound migration. This should allow us to simplify a few things in sled agent. Today: - Sled agent has to handle mismatched migration IDs carefully to cover cases such as this one: - Propolis was initialized via incoming migration with ID X - Sled agent knows about outgoing migration with ID Y - Propolis reports an instance state change that still bears ID X This sort of case is easier to reason about (or at least to make fewer assumptions about) when the outbound and inbound migrations are properly distinguished. - If Propolis reports a status of "destroyed, migration finished," it's ambiguous whether the VM was migrated into and then stopped or was migrated out of. With these changes it's obvious that a successful migration-out occurred. Additional fixes/improvements: - The `pending_migration_id` field in the VM controller wasn't being used for anything; now it is. - Add a log line to the code path that publishes a new externally-visible instance state. Tests: - cargo test - local PHD run (migration-from-base tests are broken because of the breaking API change) - ran an ad hoc live migration with propolis-cli to test both that the CLI works properly and that the right messages appear in the Propolis logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited to get this merged! I just had a couple fairly minor nits.
bin/propolis-cli/src/main.rs
Outdated
let Some(migration) = migration else { | ||
if role == "dst" { | ||
anyhow::bail!( | ||
"{role} instance's migration ID wasn't set" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
considered nitpicky: since this is inside an if role == "dst"
, do we really need to format it here?
crates/propolis-api-types/src/lib.rs
Outdated
/// direction. This means it's possible for both the inbound and outbound | ||
/// migration statuses to be set, e.g. if a Propolis was initialized via | ||
/// migration in and is now the source of a migration out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice if this comment discussed the motivation behind reporting both migrations here, rather than e.g. just an enum MigrationRole
or similar?
c1ad061
to
8e44af6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited to get this one merged! I had some very unimportant nits, but feel free to ignore them --- I'd like to get this wired through to sled-agent as soon as possible. :)
crates/propolis-api-types/src/lib.rs
Outdated
pub migration_id: Uuid, | ||
pub state: MigrationState, | ||
/// The status of the most recent attempt to initialize the current instance | ||
/// via migration in, or None if the instance has never been a migration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extremely nitpicky (I don't actually care about this):
/// via migration in, or None if the instance has never been a migration | |
/// via migration in, or `None` if the instance has never been a migration |
crates/propolis-api-types/src/lib.rs
Outdated
/// target. | ||
pub migration_in: Option<InstanceMigrationStatus>, | ||
/// The status of the most recent attempt to migrate out of the current | ||
/// instance, or None if the instance has never been a migration source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly:
/// instance, or None if the instance has never been a migration source. | |
/// instance, or `None` if the instance has never been a migration source. |
/// | ||
/// # Panics | ||
/// | ||
/// Panics if both `instance_state` and `migration_state` are `None`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit of a bummer, I kind of wonder if we should instead have an
enum PublishedState {
Instance(ApiInstanceState),
Migration(PublishedMigrationState),
Both(ApiInstanceState, PublishedMigrationState),
}
or similar? On the other hand, if that's too annoying to work with, I don't think it's a huge deal...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was bothering me too on self-review. I'll have a fix in my next push.
Looks like the OpenAPI specs need to be regenerated: https://github.com/oxidecomputer/propolis/actions/runs/9551503512/job/26325948032?pr=661 |
Once more with feeling! |
N.B. This is a breaking change to the Propolis API that will need a corresponding Omicron change before it can be merged there.
Change the way the server reports migration statuses:
migration-status
endpoint that reports on all migrations currently known to the server. (The server wasn't keeping any history of prior migration attempts anyway, so specifying an ID wasn't buying very much; if the client cares about a specific ID they can compare it to the IDs the server now returns.)This should allow us to simplify a few things in sled agent. Today:
Additional fixes/improvements:
pending_migration_id
field in the VM controller wasn't being used for anything; now it is.Tests:
Fixes #508.