Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock free updates for floating point metrics - Throughput increased by up to 50% #2016

Conversation

utpilla
Copy link
Contributor

@utpilla utpilla commented Aug 13, 2024

Related to #1740

AtomicTracker implementation for f64 is using a Mutex for all the operations. This could lead to high contention when multiple threads concurrently update the same tracker as each of them would try to acquire a lock before making the update. Rust std library doesn't have support for atomic f64 but we can effectively do these operations in a thread-safe manner using atomic u64. This is coming from the one of the team members of the Rust language: rust-lang/rust#72353 (comment)

The idea is to use the memory representation (f64::to_bits()) of an f64 value which is in turn a u64 number. Rust already provides support for AtomicU64 which we can use to back our "Atomic floating point" number.

Machine information OS: Ubuntu 22.04.3 LTS (5.15.146.1-microsoft-standard-WSL2) Hardware: AMD EPYC 7763 64-Core Processor - 2.44 GHz, 16vCPUs, RAM: 64.0 GB

Benchmarks

There is a 5% increase in benchmark performance. This should be coming from avoiding the cost to acquire and release a lock in the hot path.

Counter_Add_With_Three_Random_Attributes Average time
main branch 191.48 ns
PR 181.95 ns

Stress Test (with high contention)

This is the scenario where all the threads concurrently update the same tracker meaning they emit measurement for the same set of attributes. This is where we see the highest perf benefit! An improvement of nearly 50%.

16 threads updating the same tracker Throughput
main branch 6.2 M/sec
PR 9.4 M/sec

Stress Test (with low contention)

This is the scenario where threads concurrently update random trackers meaning they emit measurement for a random set of attributes. In the stress test, there were 16 threads and 1000 unique combinations of attributes. This means that probability of two or more threads trying to update the same tracker is quite low. There is only a minor improvement here as expected. This is also arguably within the deviation range of the stress test results.

16 threads updating random trackers Throughput
main branch 11.2 M/sec
PR 11.5 M/sec

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@utpilla utpilla requested a review from a team August 13, 2024 06:28
Copy link

codecov bot commented Aug 13, 2024

Codecov Report

Attention: Patch coverage is 96.10390% with 3 lines in your changes missing coverage. Please review.

Project coverage is 75.3%. Comparing base (b24608e) to head (feb0c1f).

Files Patch % Lines
opentelemetry-sdk/src/metrics/internal/mod.rs 86.3% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #2016     +/-   ##
=======================================
+ Coverage   75.2%   75.3%   +0.1%     
=======================================
  Files        122     122             
  Lines      20873   20938     +65     
=======================================
+ Hits       15703   15782     +79     
+ Misses      5170    5156     -14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cijothomas
Copy link
Member

Vaguely recalling that, we did try this in .NET too, but ended up with a diff. approach (that won't work in Rust without relying on unsafe!)
https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry/Internal/InterlockedHelper.cs#L11

@utpilla
Copy link
Contributor Author

utpilla commented Aug 13, 2024

Vaguely recalling that, we did try this in .NET too, but ended up with a diff. approach (that won't work in Rust without relying on unsafe!) https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry/Internal/InterlockedHelper.cs#L11

Yes. The main difference being that .NET offers a compare and exchange API for double/float64 type values while Rust does not.

@cijothomas cijothomas merged commit 17b99a1 into open-telemetry:main Aug 13, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants