Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHD: improve Windows reliability #651

Merged
merged 7 commits into from
Mar 1, 2024
Merged

Conversation

gjcolombo
Copy link
Contributor

Tweak a handful of things to make Windows tests more (though not perfectly) reliable:

  • Explicitly sleep between the time the first C:\Windows\System32> prompt is displayed and the time the next command is issued. This appears to reduce the chance that the first command's keystrokes will be eaten.
  • On Windows Server 2022, use the mode command to set the console session's row/column count explicitly before launching Cygwin. This avoids a problem where Cygwin thought it was on a 30-column terminal and was wrapping lines unexpectedly.
  • Shorten the Cygwin prompt from Administrator@PHD-WINDOWS:$ to just $ . The test doesn't need the user/host name cue and this reduces the likelihood of a line wrapping unexpectedly on WS2016/2019.
  • Handle the "cursor right" control code in the raw serial console buffer, since WS2022 uses it sometimes.
  • Add a tracing affordance to the Vt80x24 buffer that prints the current screen contents line-by-line, ignoring blank lines at the end of the output. This makes the terminal contents much easier to read during debugging.
  • Bump the repeated character debounce up again to try to cut down on 2016/2019 flakiness. In the long run I think it'll make sense to experiment with totally different approaches to this problem. For example, I wonder if things would work better if the serial task waited for individual typed characters to be echoed before sending a subsequent character, but I think doing that will require the serial task and the buffer trait to be restructured somewhat.

Tested with WS2022, WS2019, and Debian 11 guests.

@gjcolombo gjcolombo requested a review from hawkw February 26, 2024 18:26
@gjcolombo
Copy link
Contributor Author

I pushed ffc0178 to resolve the Clippy issue, but GH doesn't seem to be picking it up into the PR 🙃

If this doesn't resolve itself soon I'll do an amend/force-push to try to get things moving.

CommandSequenceEntry::wait_for("C:\\Windows\\system32>"),
CommandSequenceEntry::write_str("cls\r"),
CommandSequenceEntry::Sleep(Duration::from_secs(5)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

five seconds is kind of a longish time to add to every test in the test suite...are we sure we need to wait this long?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a facility to check for command line responsiveness more dynamically in 68607a8. The start sequence will now try to enter a command and wait for a short while for it to be echoed; if that fails, it will retry (with backoff) until the desired output shows up (or the overarching boot sequence times out).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I'll take a look!

@@ -148,3 +152,23 @@ fn make_absolute_cursor_position(col: usize, row: usize) -> Change {
y: Position::Absolute(row),
}
}

fn trace_buffer_contents(contents: &str) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call!

Copy link
Contributor Author

@gjcolombo gjcolombo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, @hawkw!

CommandSequenceEntry::wait_for("C:\\Windows\\system32>"),
CommandSequenceEntry::write_str("cls\r"),
CommandSequenceEntry::Sleep(Duration::from_secs(5)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a facility to check for command line responsiveness more dynamically in 68607a8. The start sequence will now try to enter a command and wait for a short while for it to be echoed; if that fails, it will retry (with backoff) until the desired output shows up (or the overarching boot sequence times out).

/// Attempt to establish consistent echoing of characters to the guest
/// serial console by typing `send` and then waiting for `expect` to appear
/// within the supplied `timeout`.
EstablishConsistentEcho {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat!

is there any particular reason we allow using arbitrary strings for this operation? could we just always use a single character, or is there a reason we might want to do this with multiple strings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string's arbitrary. It could be just a single character, though if the caller picks something that might already be in the back buffer at the time it requests this operation, it will want to issue a ClearBuffer command first to avoid spurious matching. (The way I look at this operation is that it gives guests a way to tell the framework to repeat a pair of WriteStr/WaitFor commands until the WaitFor is satisfied; the flexibility in specifying the sent/expected strings is accidental, not essential, and is mostly there to provide an analogy to the Write/Wait commands.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, the need to ensure that the string isn't already in the buffer makes sense. 👍

Tighten a few bolts to increase the reliability of Windows tests:

- In the Windows boot sequence, wait 5 seconds after the initial command prompt
  appears before entering a newline and continuing with the boot sequence. This
  helps to mitigate an apparent race condition in which characters typed too
  quickly into a brand new command line session get eaten. (Previously PHD
  attempted to mitigate this with a `cls` command, but this isn't reliable
  enough.)
- Explicitly run `mode con` to set the console width and height before launching
  Cygwin. Without this, Cygwin on Server 2022 believes its terminal is 30
  columns wide, which causes it to wrap commands in unexpected places that break
  echo detection.
- Shorten the command prompt from "Administrator@PHD-WINDOWS:$ " to "$ " to
  reduce command wraparound for Server 2016/2019 guests.
- Teach the raw buffer backend to honor "move cursor to the right" control
  commands. On Server 2022, Cygwin sometimes issues these instead of printing
  spaces when continuing a multiline command onto a new line, so this is needed
  to make echoing work reliably. Also add some tracing to this backend to help
  debug these kinds of problems.
@gjcolombo gjcolombo force-pushed the gjcolombo/phd/win-reliability branch from 5eb8cd0 to 30e0ce3 Compare February 29, 2024 19:40
@gjcolombo
Copy link
Contributor Author

I suspect the migrate-from-base tests barfed on the last run because this branch was previously based on a commit from before #639 landed. Let's see how the force-pushed branch fares.

@gjcolombo
Copy link
Contributor Author

That's more like it.

@gjcolombo gjcolombo merged commit 02d173d into master Mar 1, 2024
10 checks passed
@gjcolombo gjcolombo deleted the gjcolombo/phd/win-reliability branch March 1, 2024 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants