-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHD: Use ZFS clones for file-backed disks #640
Conversation
Oops. The phd-run job stores artifacts on a tmpfs mount, which makes the runner upset when it tries to figure out which ZFS dataset they're on. Two threads to pull on here: (1) make PHD fall back to file copies if the artifact can't be cloned, (2) see if there's a better directory to use for artifacts so that CI runs can take advantage of this feature. |
OK, this seems to be working now: the logs from the most recent CI run don't have the warnings/errors about ZFS clones failing to work as intended, and the run before that shows that the fallback behavior works as intended. |
In the lab* target environments, if you want a ZFS pool (reasonable!) you'll need to create one on one of the available SSDs. |
I believe that's what's going on here, for example: https://github.com/oxidecomputer/omicron/blob/a5430a6465f457175ca1cf0c0053ffc47f25bf62/.github/buildomat/jobs/deploy.sh#L223 |
Oh nice, beat me to it. Thanks for the pointer! (I see now from a perusal of the lab image post-boot script that I'm definitely getting away with a crime here, abetted by the small size of the Alpine image, which just so happens to fit into the I'll move this back to draft status while I work on this a bit more. I think I'll also want to change the framework not to presume anything about the name of the zpool under which it should try to place the clones. |
Yeah I think two things to note:
|
I have (hopefully) tightened this up a bit in the newest changes. The phd-run script now creates a pool spanning the runner's SSDs (as in the omicron-deploy job) and uses this pool's default dataset's root directory to store downloaded artifacts. I also tweaked the framework so that, when cloning artifact A, the clone is made a descendant of A's original dataset (the framework was already looking this up to figure out what to snapshot) instead of sticking it under whatever happens to be named I ran this locally (where I have a bespoke artifact TOML that specifies some handmade artisan locally-blended OS images, not all of which live in the same directory as the artifact dir I pass to PHD) in addition to the CI run on 2b025bd and things looked OK (tests pass, nothing obviously leaked). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is very straightforward and pretty easy to understand! I commented on some small nits.
13a27d1
to
c655dfb
Compare
2b025bd
to
59a8337
Compare
When creating a file-backed disk from an artifact, create a ZFS snapshot and of the dataset containing the artifact, then create and mount a clone of the snapshot and back the disk from the clone. This is much, much less expensive than copying the underlying file and markedly improves performance for test runs with disks of nontrivial size. (On my local test machine, a full test run with a 2 GiB Debian nocloud image took 688 seconds before this change and 610 seconds after, about a 10% speedup.) This is unfortunately not likely to move the needle much on CI times because the Alpine test image CI uses is very small. However, this should make it much easier to switch to non-Alpine CI guest(s) without needlessly bloating CI times. Tested with full runs against Debian 11 and Alpine guests.
This keeps tests from falling over if the user chose an artifact directory that isn't on a ZFS dataset (e.g. everything is in tmpfs). This should fix PHD CI, but I'm also going to try to make another change to allow CI runs to take advantage of the cloning feature.
This is needed to use ZFS clones for file-backed disks.
59a8337
to
4154865
Compare
(N.B. Stacked on #634.)
When creating a file-backed disk from an artifact, create a ZFS snapshot and of the dataset containing the artifact, then create and mount a clone of the snapshot and back the disk from the clone. This is much, much less expensive than copying the underlying file and markedly improves performance for test runs with disks of nontrivial size. (On my local test machine, a full test run with a 2 GiB Debian nocloud image took 688 seconds before this change and 610 seconds after, about a 10% speedup.)
This is unfortunately not likely to move the needle much on CI times because the Alpine test image CI uses is very small. However, this should make it much easier to switch to non-Alpine CI guest(s) without needlessly bloating CI times.
Tested with full runs against Debian 11 and Alpine guests.