Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Active -> Offline -> Faulted tests #1257

Merged
merged 5 commits into from
Apr 12, 2024

Conversation

mkeeter
Copy link
Contributor

@mkeeter mkeeter commented Apr 11, 2024

This is (hopefully) the final prep PR before addressing #1252.

There was an untested path of Active -(timeout)-> Offline -(too many jobs)-> Faulted; this PR adds tests for both the job and byte-based fault conditions. In addition, it confirms that live-repair works after all fault-inducing tests, moving that logic to a new async fn run_live_repair(mut harness: TestHarness).

Making these tests pass required fixing a (probably innocuous) race condition:

  • Downstairs is marked as offline, begins to reconnect (with a 10s pause)
  • Too many jobs accumulate, and we try to stop the downstairs (to restart it again)
  • ClientIoTask::run_inner ignores the stop message during the initial 10s pause
  • After the pause completes, the client IO task connects then immediately disconnects, because it finally looks at the stop message

I think this was only an issue in unit tests, where we manage reconnections by hand; in the real system, the second reconnection should work fine. Anyhow, I fixed it by adding the stop condition to the initial client sleep and connection calls, so that it can interrupt the client at any point.

@mkeeter mkeeter requested review from jmpesp and leftwo April 11, 2024 16:03
Copy link
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I had tested this path, but I believe I did so before I then
later changed the timeout before reconnect to 10 seconds!

@mkeeter mkeeter merged commit 59bd303 into oxidecomputer:main Apr 12, 2024
18 checks passed
@mkeeter mkeeter deleted the dummy-offline-tests branch April 12, 2024 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants