Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: separate statuses of inbound and outbound migrations #661

Merged
merged 6 commits into from
Jun 17, 2024

Conversation

gjcolombo
Copy link
Contributor

N.B. This is a breaking change to the Propolis API that will need a corresponding Omicron change before it can be merged there.

Change the way the server reports migration statuses:

  • Instead of asking clients to identify a specific migration to query, create a migration-status endpoint that reports on all migrations currently known to the server. (The server wasn't keeping any history of prior migration attempts anyway, so specifying an ID wasn't buying very much; if the client cares about a specific ID they can compare it to the IDs the server now returns.)
  • When reporting migration status (either through the migration status endpoint or the instance monitor endpoint), report the ID and status of both the inbound migration (if there was one) and the most recent attempted outbound migration.

This should allow us to simplify a few things in sled agent. Today:

  • Sled agent has to handle mismatched migration IDs carefully to cover cases such as this one:
    • Propolis was initialized via incoming migration with ID X
    • Sled agent knows about outgoing migration with ID Y
    • Propolis reports an instance state change that still bears ID X This sort of case is easier to reason about (or at least to make fewer assumptions about) when the outbound and inbound migrations are properly distinguished.
  • If Propolis reports a status of "destroyed, migration finished," it's ambiguous whether the VM was migrated into and then stopped or was migrated out of. With these changes it's obvious that a successful migration-out occurred.

Additional fixes/improvements:

  • The pending_migration_id field in the VM controller wasn't being used for anything; now it is.
  • Add a log line to the code path that publishes a new externally-visible instance state.

Tests:

  • cargo test
  • local PHD run (migration-from-base tests are broken because of the breaking API change)
  • ran an ad hoc live migration with propolis-cli to test both that the CLI works properly and that the right messages appear in the Propolis logs

Fixes #508.

@gjcolombo
Copy link
Contributor Author

This should be OK to review as-is, but I'm going to try to cook up the corresponding sled agent change before sending actual review requests here.

@gjcolombo
Copy link
Contributor Author

This should be OK to review as-is

falcon Failing after 4m — Failure!

Oops. I think this is just an out-of-date OpenAPI doc--will work on the fix.

@hawkw hawkw self-requested a review March 13, 2024 17:27
**N.B.** This is a breaking change to the Propolis API that will need a
corresponding Omicron change before it can be merged there.

Change the way the server reports migration statuses:

- Instead of asking clients to identify a specific migration to query, create
  a `migration-status` endpoint that reports on all migrations currently known
  to the server. (The server wasn't keeping any history of prior migration
  attempts anyway, so specifying an ID wasn't buying very much; if the client
  cares about a specific ID they can compare it to the IDs the server now
  returns.)
- When reporting migration status (either through the migration status endpoint
  or the instance monitor endpoint), report the ID and status of both the
  inbound migration (if there was one) and the most recent attempted outbound
  migration.

This should allow us to simplify a few things in sled agent. Today:

- Sled agent has to handle mismatched migration IDs carefully to cover cases
  such as this one:
  - Propolis was initialized via incoming migration with ID X
  - Sled agent knows about outgoing migration with ID Y
  - Propolis reports an instance state change that still bears ID X
  This sort of case is easier to reason about (or at least to make fewer
  assumptions about) when the outbound and inbound migrations are properly
  distinguished.
- If Propolis reports a status of "destroyed, migration finished," it's
  ambiguous whether the VM was migrated into and then stopped or was migrated
  out of. With these changes it's obvious that a successful migration-out
  occurred.

Additional fixes/improvements:

- The `pending_migration_id` field in the VM controller wasn't being used for
  anything; now it is.
- Add a log line to the code path that publishes a new externally-visible
  instance state.

Tests:

- cargo test
- local PHD run (migration-from-base tests are broken because of the breaking
  API change)
- ran an ad hoc live migration with propolis-cli to test both that the CLI works
  properly and that the right messages appear in the Propolis logs
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to get this merged! I just had a couple fairly minor nits.

let Some(migration) = migration else {
if role == "dst" {
anyhow::bail!(
"{role} instance's migration ID wasn't set"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considered nitpicky: since this is inside an if role == "dst", do we really need to format it here?

Comment on lines 94 to 102
/// direction. This means it's possible for both the inbound and outbound
/// migration statuses to be set, e.g. if a Propolis was initialized via
/// migration in and is now the source of a migration out.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice if this comment discussed the motivation behind reporting both migrations here, rather than e.g. just an enum MigrationRole or similar?

@gjcolombo gjcolombo force-pushed the gjcolombo/two-migration-statuses branch from c1ad061 to 8e44af6 Compare June 14, 2024 23:37
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to get this one merged! I had some very unimportant nits, but feel free to ignore them --- I'd like to get this wired through to sled-agent as soon as possible. :)

pub migration_id: Uuid,
pub state: MigrationState,
/// The status of the most recent attempt to initialize the current instance
/// via migration in, or None if the instance has never been a migration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extremely nitpicky (I don't actually care about this):

Suggested change
/// via migration in, or None if the instance has never been a migration
/// via migration in, or `None` if the instance has never been a migration

/// target.
pub migration_in: Option<InstanceMigrationStatus>,
/// The status of the most recent attempt to migrate out of the current
/// instance, or None if the instance has never been a migration source.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly:

Suggested change
/// instance, or None if the instance has never been a migration source.
/// instance, or `None` if the instance has never been a migration source.

///
/// # Panics
///
/// Panics if both `instance_state` and `migration_state` are `None`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit of a bummer, I kind of wonder if we should instead have an

enum PublishedState {
    Instance(ApiInstanceState),
    Migration(PublishedMigrationState),
    Both(ApiInstanceState, PublishedMigrationState),
}

or similar? On the other hand, if that's too annoying to work with, I don't think it's a huge deal...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was bothering me too on self-review. I'll have a fix in my next push.

@hawkw
Copy link
Member

hawkw commented Jun 17, 2024

Looks like the OpenAPI specs need to be regenerated: https://github.com/oxidecomputer/propolis/actions/runs/9551503512/job/26325948032?pr=661

@gjcolombo
Copy link
Contributor Author

Once more with feeling!

@gjcolombo gjcolombo merged commit 50cb28f into master Jun 17, 2024
9 of 10 checks passed
@gjcolombo gjcolombo deleted the gjcolombo/two-migration-statuses branch June 17, 2024 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

want migration status endpoints to report source and target migration IDs separately
2 participants