-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider not using on_disk_hash
#1161
Comments
mkeeter
added a commit
that referenced
this issue
Jul 31, 2024
Right now, the `ReadResponse` sends back a `Vec<Option<BlockContext>>`, i.e. an `Option<BlockContext>` for each block. The `BlockContext` in turn contains ```rust pub struct BlockContext { pub hash: u64, pub encryption_context: Option<EncryptionContext>, } ``` If `encryption_context` is populated, then sending `hash` to the Upstairs is unnecessary: we are using authenticated encryption, so we know whether the block data + context is valid based on whether it decrypted successfully. This PR removes `hash` from the `ReadResponse` message in favor of a new `enum`: ```rust #[derive(Debug, PartialEq, Copy, Clone, Serialize, Deserialize)] pub enum ReadBlockContext { Empty, Encrypted { ctx: EncryptionContext }, Unencrypted { hash: u64 }, } ``` This does not change the on-disk format or write message format, which both continue to use hashes: - When **sending** data to the Downstairs, the hash (in `BlockContext`) lets us detect corruption in transit. We can't use the encryption context here, because the Downstairs isn't allowed to have encryption keys - When recovering from a bad write, `on_disk_hash` is used to figure out which context slot is valid. `on_disk_hash` is never sent over the network¹ This PR is a step towards [RFD 490 § Metadata reduction](https://rfd.shared.oxide.computer/rfd/490#_metadata_reduction), which proposes to **not** store block hashes for encrypted data on disk. If in the future we don't store block hashes for encrypted data, we would not be able to send them over the network; this PR removes that future hurdle. However, the PR stands alone as a small optimization (39 → 32 bytes per block) that simplifies program behavior (no need to think about what happens if encryption fails but the hash matches, or vice versa). ¹ `on_disk_hash` is also _technically_ superfluous if we already have `hash` (see #1161), but this PR doesn't change it
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Right now, the Downstairs hashes twice:
BlockContext::hash
, it hashes a tuple of[nonce, tag, data]
(or justdata
for unencrypted blocks)DownstairsBlockContext::on_disk_hash
, it hashes onlydata
. This is also written to disk, alongside theBlockContext
(with its ownhash
, discussed above)I think the
on_disk_hash
is unnecessary. The purpose ofon_disk_hash
is to tell us which context data is valid at startup, butBlockContext::hash
can serve exactly the same purpose: whichever context has a matchinghash
is correct.In fact, most of the time we don't actually need a full rehash: we persist the active slots when clearing the
dirty
flag, which happens on every flush.Skipping the computation of
on_disk_hash
would hopefully speed up writes: in a flamegraph of 1M random writes post-#1148 (which removes the dominance of__fdsync
), I see the two calls tocommon_integrity_hash
taking 10% and 6% of runtime respectively:The downside is that recovering from a dirty shutdown would take twice as long, because we have to hash both
[nonce1, tag1, data]
and[nonce2, tag2, data]
. (It's too bad we didn't choose the order[data, nonce, tag]
, which would have let us reuse hasher state and made this extra cost negligible)I think it's still probably worthwhile – dirty shutdowns should be rare, and only the specific dirty extents would pay the extra price.
We could start doing this Right Now, without any on-disk format changes:
BlockContext::hash
instead ofon_disk_hash
when checking for slot validityon_disk_hash
to all zerosSince these changes are confined to the
crucible-downstairs
binary, we can make these two behavior changes atomically in a single release. It is also not a compatibility break with the upstairs, which only cares aboutBlockContext::hash
(theon_disk_hash
is never sent over the network). However, we wouldn't be able to roll back to previouscrucible-downstairs
releases!We could also theoretically remove the
on_disk_hash
, saving 8 bytes from theOnDiskDownstairsBlockContext
(so 16 bytes per block, since we have two slots per block). This is harder to do in place; if we wanted to make that change, we'd probably implement aRawInnerV2
. I'd vote to not do that out of the gate.The text was updated successfully, but these errors were encountered: