-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add explicit API to terminate pool cleanly #45
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Oct 15, 2024
hawkw
approved these changes
Oct 15, 2024
smklein
added a commit
to oxidecomputer/omicron
that referenced
this pull request
Oct 15, 2024
I'd like to integrate oxidecomputer/qorb#45 into qorb, and then omicron, but I want to be able to control that revision explicitly. Before I do that, pin the revision of qorb we're using so that "merging something into qorb" doesn't break omicron.
hawkw
reviewed
Oct 16, 2024
hawkw
added a commit
that referenced
this pull request
Oct 16, 2024
Currently, the `SetWorker::run` function will terminate all per-slot tasks in the set when termination is explicitly signaled. However, it does *not* currently do this when the `SetWorker` shuts down because there are no more clients, so we don't clean up the per-slot tasks in that case. I noticed this [while reviewing][1] PR #45, but left the review only after it merged. This commit fixes that by moving the termination code out of the loop, and making both the "explicit termination channel fired" and the "no more clients" cases `break` out of the loop into the termination code, instead of `return`ing. [1]: #45 (comment)
This was referenced Oct 16, 2024
Merged
smklein
added a commit
to oxidecomputer/omicron
that referenced
this pull request
Oct 18, 2024
Fixes #6505 , integrates usage of the new qorb APIs to terminate pools cleanly: oxidecomputer/qorb#45 # Background - [https://github.com/oxidecomputer/qorb](https://github.com/oxidecomputer/qorb) is a connection pooling crate, which spins up tokio task that attempt to connect to backends. - When `qorb` was integrated into Omicron in #5876, I used it to connect to our database backend (CockroachDB). This included usage in tests, even with a "single backend host" (one, test-only CRDB server) -- I wanted to ensure that we used the same pool and connector logic in tests and prod. # What Went Wrong As identified in #6505 , we saw some tests failing during **termination**. The specific cause of the failure was a panic from [async-bb8-diesel](https://github.com/oxidecomputer/async-bb8-diesel), where we attempted to spawn tasks with a terminating tokio executor. This issue stems from async-bb8-diesel's usage of `tokio::task::spawn_blocking`, where the returned `JoinHandle` is immediately awaited and **unwrapped**, with an expectation that "we should always be able to spawn a blocking task". There was a separate discussion about "whether or not async-bb8-diesel should be unwrapping in this case" (see: oxidecomputer/async-bb8-diesel#77), but instead, I chose to focus on the question: ## Why are we trying to send requests to async-bb8-diesel while the tokio runtime is exiting? The answer to this question lies in qorb's internals -- qorb itself spawns many tokio tasks to handle ongoing work, including monitoring DNS resolution, checking on backend health, and making connections before they are requested. One of these qorb tasks calling `ping_async`, which checks connection health, used the `async-bb8-diesel` interface that ultimately panicked. Within qorb most of these tokio tasks have a drop implementation of the form: ```rust struct MyQorbStructure { handle: tokio::task::JoinHandle<()>, } impl Drop for MyQorbStructure { fn drop(&mut self) { self.handle.abort(); } } ``` Tragically, without async drop support in Rust, this is the best we can do implicitly -- signal that the background tasks should stop ASAP -- but that may not be soon enough! Calling `.abort()` on the `JoinHandle` does not terminate the task immediately, it simply signals that it should shut down at the next yield point. As a result, we can still see the flake observed in #6505: - A qorb pool is initialized with background tasks - One of the qorb worker tasks is about to make a call to check on the health of a connection to a backend - The test finishes, and returns. The tokio runtime begins terminating - We call `drop` on the `qorb` pool, which signals the background task should abort - The aforementioned qorb worker task makes the call to `ping_async`, which calls `spawn_blocking`. This fails, because the tokio runtime is terminating, and returns a [JoinError::Cancelled](https://buildomat.eng.oxide.computer/wg/0/details/01J9YQVN7X5EQNXFSEY6XJBH8B/zfviqPz9RoJp3bY4TafbyqXTwbhqdr7w4oupqBtVARR00CXF/01J9YQWAXY36WM0R2VG27QMFRK#S6049). - `async-bb8-diesel` unwraps this `JoinError`, and the test panics. # How do we mitigate this? That's where this PR comes in. Using the new qorb APIs, we don't rely on the synchronous drop methods -- we explicitly call `.terminate().await` functions which do the following: - Use `tokio::sync::oneshot`s as signals to `tokio::tasks` that they should exit - `.await` the `JoinHandle` for those tasks before returning Doing this work explicitly as a part of cleanup ensures that there are not any background tasks attempting to do new work while the tokio runtime is terminating.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Part of oxidecomputer/omicron#6505
Adds an API to cleanly terminate a qorb Pool, and guarantee that no background tasks are still executing beyond the termination point.