Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop depending on avrt.dll statically on Windows #27

Merged
merged 2 commits into from
Mar 13, 2024

Conversation

yjugl
Copy link
Collaborator

@yjugl yjugl commented Mar 12, 2024

  • Load avrt.dll dynamically with LoadLibraryW

  • Fail with an AudioThreadPriorityError

See also https://bugzilla.mozilla.org/show_bug.cgi?id=1884214

Note: This adds a dependency to once_cell on Windows.

Note: If the functions are always called from the same thread we could replace once_cell::sync::OnceCell with once_cell::unsync::OnceCell. I took the thread-safe option because I don't know if it is the case or not.

This is a follow-up of #26 which I closed over (legitimate) calling convention concerns. I have checked that the tests now pass in 32-bit x86 as well (they did not before).

@yjugl
Copy link
Collaborator Author

yjugl commented Mar 12, 2024

Before merging this we should probably review all the other ways Firefox code calls into avrt.dll libraries directly, or this might just move the crash to these other locations in our code. This search shows that at least media/libcubeb/src/cubeb_wasapi.cpp, libwebrtc/modules/audio_device/win/core_audio_utility_win.h, third_party/rust/cubeb-sys/libcubeb/src/cubeb_wasapi.cpp could be problematic and need a rewrite too. I'll try to get a complete list extracted from a binary build of xul.dll.

@padenot
Copy link
Contributor

padenot commented Mar 12, 2024

Before merging this we should probably review all the other ways Firefox code calls into avrt.dll libraries directly, or this might just move the crash to these other locations in our code. This search shows that at least media/libcubeb/src/cubeb_wasapi.cpp [...]

This is the only call site that is compiled. Fixing this doesn't need to wait on fixing that other call-site, but the CI should be green.

@yjugl
Copy link
Collaborator Author

yjugl commented Mar 12, 2024

This is the only call site that is compiled. Fixing this doesn't need to wait on fixing that other call-site, but the CI should be green.

I was mentioning this because I saw we have some crashes going through xul!audioipc2_client::context::promote_and_register_thread which is also a Rust project, so I wanted to confirm why and was wondering if there was some merging of code to do; but in fact it seems all our Rust code is already ultimately relying on audio_thread_priority everywhere, which is good.

About the failure, the error is a bit unexpected. Have jobs always been trustworthy for Windows? (i.e. tests never failed with the previous code) Did you ever get a similar error before? Can you run again to see if it's intermittent? Can you let both nightly and stable jobs run to completion? Thank you.

@padenot
Copy link
Contributor

padenot commented Mar 12, 2024

I was mentioning this because I saw we have some crashes going through xul!audioipc2_client::context::promote_and_register_thread which is also a Rust project, so I wanted to confirm why and was wondering if there was some merging of code to do; but in fact it seems all our Rust code is already ultimately relying on audio_thread_priority everywhere, which is good.

It's simply calling this crate indeed. libcubeb is however calling this directly and needs to be fixed.

@yjugl yjugl force-pushed the failproof-avrt-load branch 2 times, most recently from 7252f2f to 94ea17f Compare March 12, 2024 16:18
yjugl added 2 commits March 13, 2024 12:10
* Load `avrt.dll` dynamically with `LoadLibraryW`
* Fail with an `AudioThreadPriorityError`
* Ensure thread-safety with a warmup call

See also https://bugzilla.mozilla.org/show_bug.cgi?id=1884214
@yjugl yjugl force-pushed the failproof-avrt-load branch from 94ea17f to 36cd677 Compare March 13, 2024 15:08
@yjugl
Copy link
Collaborator Author

yjugl commented Mar 13, 2024

I have fixed the tests by playing with the CI. The failures seem to indicate the following although it is mostly speculation from me: if parallel calls to AvSetMmThreadCharacteristicsA (or AvSetMmThreadCharacteristicsW) occur while the MMCSS service is inactive, then only one of these calls will succeed whereas the other calls will fail with error code ERROR_PATH_NOT_FOUND (3). Presumably the call that succeeds sees that the service is inactive, starts it, and waits for its full initialization; whereas the other calls would see that the service is running and then fail to acquire some resource because the service is still initializing and hasn't made every resource available yet.

I triggered this behavior by adding new tests in the Windows part of the code, whereas we had only a single test running on Windows before. The preexisting code already had the issue, but it wasn't showing because of the single test. cargo test runs tests in parallel by default, which presumably led to this behavior because MMCSS would be inactive in the machine state in which tests start on the CI (still speculation).

I have added a new test that explicitly tests the API in a multithreaded setting to make sure that we cover this explicitely. Then to solve the actual issue, I have added a warmup call to AvSetMmThreadCharacteristicsW at library load. That has fixed the issue at crate level, however, we could still run into the same issue at Firefox level if cubeb or webrtc make their own calls to AvSetMmThreadCharacteristicsW without going through audio_thread_priority. If these calls would unfortunately happen at the same time as audio_thread_priority's warmup call, cubeb and webrtc could mess with our warmup call (and our subsequent calls), or we could mess with their calls.

Note that on a non-CI machine, it may be fine to assume that the service is already running, in which case this wouldn't cause a problem for actual users. On my own machine MMCSS is on Auto start, which suggests we should be fine as long as my speculations from above are correct. Even if I set it on Demand start, it is started by the time I reach my desktop.

@padenot
Copy link
Contributor

padenot commented Mar 13, 2024

Thanks for all the research!

@padenot padenot merged commit 9d5da95 into mozilla:master Mar 13, 2024
16 of 24 checks passed
@yjugl yjugl deleted the failproof-avrt-load branch March 14, 2024 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants