-
Notifications
You must be signed in to change notification settings - Fork 19
Commit 147ae59
authored
Move
The downstairs currently has the following tasks (labelled in `snake_case`):
```
┌──────────┐ ┌─────────────────────────────────────┐
│FramedRead│ │ FramedWrite │
└────┬─────┘ └─▲─────▲──────────────────▲──────────┘
│ │ │ │
│ ping│ │invalid │
│ ┌────────┘ │frame │responses
│ │ │errors │
│ │ │ │
┌────▼──┴─┐ message ┌┴──────┐ job ┌┴────────┐
│resp_loop├──────────►│pf_task├─────────►│ dw_task │
└─────────┘ channel └──┬────┘ channel └▲────────┘
│ │
add│work new│
per-connection │ work│
========================= │ ============== │ ===============
shared state ┌──▼────────────────┴────────────┐
│ Downstairs │
└────────────────────────────────┘
```
Sending response data to the `FramedWrite` requires serializing it (using
`bincode`) and pushing it into the socket.
During read-heavy operations, this is expensive! In one flamegraph, I see
`__writev` and `memcpy` together taking about 28% of runtime.
(see PR for flamegraph)
`FramedWrite::send` is currently being called from `dw_task` (from
`Downstairs::do_work`), which means that we're blocking on this serialization +
socket write. This PR adds a new task specifically for performing the
`FramedWrite` work, i.e.
```
┌──────────┐ ┌───────────┐
│FramedRead│ │FramedWrite│
└────┬─────┘ └─────▲─────┘
│ │
│ ┌────────────────┴────────────────────┐
│ │ framed_write_task │
│ └─▲─────▲──────────────────▲──────────┘
│ │ │ │
│ ping│ │invalid │
│ ┌────────┘ │frame │responses
│ │ │errors │
│ │ │ │
┌────▼──┴─┐ message ┌┴──────┐ job ┌┴────────┐
│resp_loop├──────────►│pf_task├─────────►│ dw_task │
└─────────┘ channel └──┬────┘ channel └▲────────┘
│ │
add│work new│work
per-connection │ │
========================= │ ============== │ ===============
shared state ┌──▼────────────────┴────────────┐
│ Downstairs │
└────────────────────────────────┘
```
`dw_task` now passes `Message` objects into the `framed_write_task`, and **keeps
going**. Serialization is a standalone operation that can happen elsewhere in
the worker thread pool, without holding the `Downstairs` lock.
With these changes, I see a **20% speedup for 1M random reads and a 56% speedup
for 4M random reads**.
(I'd also be interested in a #1087 -style fix later on, to remove the `memcpy`
entirely, but this seems worthwhile on its own)FramedWrite
work to a separate task (#1145)1 parent 18acb09 commit 147ae59Copy full SHA for 147ae59
1 file changed
+152
-110
lines changed
0 commit comments