Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't count Done bytes for backpressure #1208

Merged
merged 1 commit into from
Mar 15, 2024

Conversation

mkeeter
Copy link
Contributor

@mkeeter mkeeter commented Mar 15, 2024

Stemming from this comment, I investigated why #1192 decreased write speeds (from > 700 MiB/s → 550ish for 1 MiB random writes).

It turns out there was a problem with our byte-based backpressure implementation: we were counting all un-retired bytes, including bytes in jobs that were Done.

This is nonsensical: the point of backpressure is to normalize write time (as experienced by the Guest) with the actual time that a write takes. Jobs remain in the Done bucket until a flush, but that's unrelated to how long the write itself took (i.e. in our current implementation, changing the flush timeout would change backpressure, which is silly).

In other words, backpressure looks like a sawtooth graph: it drops to a minimal value after each flush, then climbs up as more jobs are issued, completed, and not retired (until the next flush).

#1192 makes this worse because it increases the backpressure gain, so the slope of the sawtooth is steeper. In other words, our backpressure was always unnecessarily high, but this change made the average higher. Here's the aforementioned sawtooth under slow and fast builds:

Screenshot 2024-03-14 at 6 00 30 PM

This PR changes the byte-based backpressure to only track bytes in flight. This matches the existing job-based backpressure, which uses the sum of self.io_state_count.new + self.io_state_count.in_progress (note the lack of io_state_count.done in that summation!).

Sure enough, this improves performance back to > 700 MiB/s.

It's not perfect – doing fine-grained sampling of backpressure, I see that it looks slightly unstable:

Screenshot 2024-03-15 at 10 21 03 AM

Still, this is a noticeable performance improvement and seems philosophically correct.

@mkeeter mkeeter force-pushed the fix-byte-backpressure branch from 9ec3c2c to ad94bd2 Compare March 15, 2024 22:16
Copy link
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think this is fine.
We do have to have some way of capping the total amount of memory an upstairs
will require. But trying to use guest backpressure as a way to do that might
not be the right way. Especially since the amount of memory we need is more
about how quickly we are flushing ACK'd jobs and that is opaque to the guest.

Maybe in the future we adjust the flush timeout to accomplish that, but that's
not for this PR.

@mkeeter mkeeter merged commit 952c7d6 into oxidecomputer:main Mar 15, 2024
18 checks passed
@mkeeter mkeeter deleted the fix-byte-backpressure branch March 16, 2024 01:38
mkeeter added a commit that referenced this pull request Mar 29, 2024
In #1208, we changed backpressure to stop counting jobs once they had
been completed, rather than retired. This was to accurately reflect jobs
in flight, rather than "how long has it been since a flush".

However, we messed up the accounting, leading to #1236:

- Once a job was completed (or skipped) by all 3x Downstairs, its
contribution to backpressure was subtracted out
- If a job was replayed, its contribution to backpressure **was not**
added back
- A replayed job could be completed **again**. If it had previously been
completed, the second completion would cause the backpressure counter to
underflow, which we detect and panic

This PR adds a `backpressure_bytes: Option<u64>` member to the
`DownstairsIO` to make this accounting more fool-proof. This member
tracks whether a job currently counts for backpressure, meaning we can
make completion and requeueing idempotent.

I also add a fail-safe check at job retirement, which prints an error
message if we messed something up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants