Lock free updates for floating point metrics - Throughput increased by up to 50% #2016

utpilla · 2024-08-13T06:28:50Z

Related to #1740

AtomicTracker implementation for f64 is using a Mutex for all the operations. This could lead to high contention when multiple threads concurrently update the same tracker as each of them would try to acquire a lock before making the update. Rust std library doesn't have support for atomic f64 but we can effectively do these operations in a thread-safe manner using atomic u64. This is coming from the one of the team members of the Rust language: rust-lang/rust#72353 (comment)

The idea is to use the memory representation (f64::to_bits()) of an f64 value which is in turn a u64 number. Rust already provides support for AtomicU64 which we can use to back our "Atomic floating point" number.

Machine information

OS: Ubuntu 22.04.3 LTS (5.15.146.1-microsoft-standard-WSL2) Hardware: AMD EPYC 7763 64-Core Processor - 2.44 GHz, 16vCPUs, RAM: 64.0 GB

Benchmarks

There is a 5% increase in benchmark performance. This should be coming from avoiding the cost to acquire and release a lock in the hot path.

Counter_Add_With_Three_Random_Attributes	Average time
main branch	191.48 ns
PR	181.95 ns

Stress Test (with high contention)

This is the scenario where all the threads concurrently update the same tracker meaning they emit measurement for the same set of attributes. This is where we see the highest perf benefit! An improvement of nearly 50%.

16 threads updating the same tracker	Throughput
main branch	6.2 M/sec
PR	9.4 M/sec

Stress Test (with low contention)

This is the scenario where threads concurrently update random trackers meaning they emit measurement for a random set of attributes. In the stress test, there were 16 threads and 1000 unique combinations of attributes. This means that probability of two or more threads trying to update the same tracker is quite low. There is only a minor improvement here as expected. This is also arguably within the deviation range of the stress test results.

16 threads updating random trackers	Throughput
main branch	11.2 M/sec
PR	11.5 M/sec

Merge requirement checklist

CONTRIBUTING guidelines followed
Unit tests added/updated (if applicable)
Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
Changes in public API reviewed (if applicable)

codecov · 2024-08-13T06:36:16Z

Codecov Report

Attention: Patch coverage is 96.10390% with 3 lines in your changes missing coverage. Please review.

Project coverage is 75.3%. Comparing base (b24608e) to head (feb0c1f).

Files	Patch %	Lines
opentelemetry-sdk/src/metrics/internal/mod.rs	86.3%	3 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##            main   #2016     +/-   ##
=======================================
+ Coverage   75.2%   75.3%   +0.1%     
=======================================
  Files        122     122             
  Lines      20873   20938     +65     
=======================================
+ Hits       15703   15782     +79     
+ Misses      5170    5156     -14

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

cijothomas · 2024-08-13T15:42:34Z

Vaguely recalling that, we did try this in .NET too, but ended up with a diff. approach (that won't work in Rust without relying on unsafe!)
https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry/Internal/InterlockedHelper.cs#L11

utpilla · 2024-08-13T15:55:47Z

Vaguely recalling that, we did try this in .NET too, but ended up with a diff. approach (that won't work in Rust without relying on unsafe!) https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry/Internal/InterlockedHelper.cs#L11

Yes. The main difference being that .NET offers a compare and exchange API for double/float64 type values while Rust does not.

Use atomic operations for float

047d263

utpilla requested a review from a team August 13, 2024 06:28

Merge branch 'main' into utpilla/Use-atomic-opereations-for-float

feb0c1f

cijothomas approved these changes Aug 13, 2024

View reviewed changes

cijothomas merged commit 17b99a1 into open-telemetry:main Aug 13, 2024
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lock free updates for floating point metrics - Throughput increased by up to 50% #2016

Lock free updates for floating point metrics - Throughput increased by up to 50% #2016

utpilla commented Aug 13, 2024 •

edited

Loading

codecov bot commented Aug 13, 2024 •

edited

Loading

cijothomas commented Aug 13, 2024

utpilla commented Aug 13, 2024

Lock free updates for floating point metrics - Throughput increased by up to 50% #2016

Lock free updates for floating point metrics - Throughput increased by up to 50% #2016

Conversation

utpilla commented Aug 13, 2024 • edited Loading

Benchmarks

Stress Test (with high contention)

Stress Test (with low contention)

Merge requirement checklist

codecov bot commented Aug 13, 2024 • edited Loading

Codecov Report

cijothomas commented Aug 13, 2024

utpilla commented Aug 13, 2024

utpilla commented Aug 13, 2024 •

edited

Loading

codecov bot commented Aug 13, 2024 •

edited

Loading