Skip to content

Commit 147ae59

Browse files
authored
Move FramedWrite work to a separate task (#1145)
The downstairs currently has the following tasks (labelled in `snake_case`): ``` ┌──────────┐ ┌─────────────────────────────────────┐ │FramedRead│ │ FramedWrite │ └────┬─────┘ └─▲─────▲──────────────────▲──────────┘ │ │ │ │ │ ping│ │invalid │ │ ┌────────┘ │frame │responses │ │ │errors │ │ │ │ │ ┌────▼──┴─┐ message ┌┴──────┐ job ┌┴────────┐ │resp_loop├──────────►│pf_task├─────────►│ dw_task │ └─────────┘ channel └──┬────┘ channel └▲────────┘ │ │ add│work new│ per-connection │ work│ ========================= │ ============== │ =============== shared state ┌──▼────────────────┴────────────┐ │ Downstairs │ └────────────────────────────────┘ ``` Sending response data to the `FramedWrite` requires serializing it (using `bincode`) and pushing it into the socket. During read-heavy operations, this is expensive! In one flamegraph, I see `__writev` and `memcpy` together taking about 28% of runtime. (see PR for flamegraph) `FramedWrite::send` is currently being called from `dw_task` (from `Downstairs::do_work`), which means that we're blocking on this serialization + socket write. This PR adds a new task specifically for performing the `FramedWrite` work, i.e. ``` ┌──────────┐ ┌───────────┐ │FramedRead│ │FramedWrite│ └────┬─────┘ └─────▲─────┘ │ │ │ ┌────────────────┴────────────────────┐ │ │ framed_write_task │ │ └─▲─────▲──────────────────▲──────────┘ │ │ │ │ │ ping│ │invalid │ │ ┌────────┘ │frame │responses │ │ │errors │ │ │ │ │ ┌────▼──┴─┐ message ┌┴──────┐ job ┌┴────────┐ │resp_loop├──────────►│pf_task├─────────►│ dw_task │ └─────────┘ channel └──┬────┘ channel └▲────────┘ │ │ add│work new│work per-connection │ │ ========================= │ ============== │ =============== shared state ┌──▼────────────────┴────────────┐ │ Downstairs │ └────────────────────────────────┘ ``` `dw_task` now passes `Message` objects into the `framed_write_task`, and **keeps going**. Serialization is a standalone operation that can happen elsewhere in the worker thread pool, without holding the `Downstairs` lock. With these changes, I see a **20% speedup for 1M random reads and a 56% speedup for 4M random reads**. (I'd also be interested in a #1087 -style fix later on, to remove the `memcpy` entirely, but this seems worthwhile on its own)
1 parent 18acb09 commit 147ae59

File tree

1 file changed

+152
-110
lines changed

1 file changed

+152
-110
lines changed

0 commit comments

Comments
 (0)