Lock free updates for floating point metrics - Throughput increased by up to 50% #2016
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related to #1740
AtomicTracker
implementation forf64
is using aMutex
for all the operations. This could lead to high contention when multiple threads concurrently update the same tracker as each of them would try to acquire a lock before making the update. Rust std library doesn't have support for atomic f64 but we can effectively do these operations in a thread-safe manner using atomic u64. This is coming from the one of the team members of the Rust language: rust-lang/rust#72353 (comment)The idea is to use the memory representation (
f64::to_bits()
) of anf64
value which is in turn au64
number. Rust already provides support for AtomicU64 which we can use to back our "Atomic floating point" number.Machine information
OS: Ubuntu 22.04.3 LTS (5.15.146.1-microsoft-standard-WSL2) Hardware: AMD EPYC 7763 64-Core Processor - 2.44 GHz, 16vCPUs, RAM: 64.0 GBBenchmarks
There is a 5% increase in benchmark performance. This should be coming from avoiding the cost to acquire and release a lock in the hot path.
Stress Test (with high contention)
This is the scenario where all the threads concurrently update the same tracker meaning they emit measurement for the same set of attributes. This is where we see the highest perf benefit! An improvement of nearly 50%.
Stress Test (with low contention)
This is the scenario where threads concurrently update random trackers meaning they emit measurement for a random set of attributes. In the stress test, there were 16 threads and 1000 unique combinations of attributes. This means that probability of two or more threads trying to update the same tracker is quite low. There is only a minor improvement here as expected. This is also arguably within the deviation range of the stress test results.
Merge requirement checklist
CHANGELOG.md
files updated for non-trivial, user-facing changes