Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New downstairs clone subcommand. #1129

Merged
merged 7 commits into from
Feb 8, 2024
Merged

New downstairs clone subcommand. #1129

merged 7 commits into from
Feb 8, 2024

Conversation

leftwo
Copy link
Contributor

@leftwo leftwo commented Feb 1, 2024

Add support for a new downstairs clone subcommand.

This enables a downstairs to "clone" another downstairs using the
repair endpoint. It requires the source downstairs to be read only
and will destroy whatever data exists on the destination downstairs.

Support for cloning SQLite backend downstairs is supported, and I had
to bring back some of the old repair code that supported the additional
files for that. The resulting clone'd downstairs will still have a SQLite
backend.

Found an fixed a bug where the incorrect file types were returned
during a repair that contained SQLite files.

A bunch more tests were added to cover the clone process, and new
API endpoints were added to the downstairs repair server.

The is the base layer we need to support replacement of read only parents.

For that the workflow up in omicron will be to:

  • Identify a good working downstairs we can clone from.
  • Get the repair port for that good downstairs.
  • Create a new downstairs region on a good disk with the same region size, but
    don't run the downstairs once the region is created.
  • Call the downstairs clone using the region created and the repair port from
    the good downstairs.
  • Wait for the clone to finish.
  • Restart the new downstairs in RO mode.
  • Replace the failed donwstairs with the cloned downstairs.

This enables a downstairs to "clone" another downstairs using the
repair endpoint.  It requires the source downstairs to be read only
and will destroy whatever data exists on the destination downstairs.

Support for cloning SQLite backend downstairs is supported, and I had
to bring back some of the old repair code that supported the additional
files for that.

Found an fixed a bug where the incorrect file types were returned
during a repair that contained SQLite files.

A bunch more tests were added to cover the clone process, and new
API endpoints were added to the downstairs repair server.
@leftwo leftwo requested review from mkeeter and jmpesp February 1, 2024 17:16
all.extend(vec![
extent_file_name(eid as u32, ExtentType::DbShm),
extent_file_name(eid as u32, ExtentType::DbWal),
]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these always guaranteed to be in the same order? Using a BTreeSet<String> or similar would make it more obvious that the comparison is meant to be order-agnostic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always produce them in the same order, so assuming extent_file_list()'s into_iter() call
will walk the vec provided to it in order, then I suppose it's safe to expect that order never
changes?
We also (and I'm not sure why any longer) call repair_files.sort() over in
downstairs/src/region.rs, so even if the remote side does not produce the files in order, we
sort them on the requesting side.

Now that I think about it, I wonder if that repair_files.sort() in downstairs/src/region.rs was
attempting to work around the bug in the file type translation, where we had mixed db-shm
and db-wal.

@leftwo
Copy link
Contributor Author

leftwo commented Feb 2, 2024

Fixes #1018

@leftwo leftwo mentioned this pull request Feb 2, 2024
@mkeeter
Copy link
Contributor

mkeeter commented Feb 5, 2024

This is almost good to go; my only remaining comment is fixing the remaining instances of "compatable"

Comment on lines +282 to +286
false,
false,
false,
false,
true, // read_only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's time for the builder pattern... 🙃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a new PR for just that:
#1152

@leftwo leftwo requested a review from jmpesp February 8, 2024 05:56
@leftwo leftwo merged commit 35ef0cf into main Feb 8, 2024
18 checks passed
@leftwo leftwo deleted the alan/downstairs-clone branch February 8, 2024 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants