-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opentelemetry::global::shutdown_tracer_provider()
Hangs indefinitely, and other troubles
#536
Comments
Thanks for the feedback. We try to adapt the builder pattern for every configuration we encounter. But there are many configurations so it can be confusing. To your questions. I think what you are looking for is As for the One thing that may worth trying is disable the default features in
|
Thanks @TommyCpp -- So setting For the hang, I'm using tokio with the multiple thread scheduler. I tried zipkin with and without |
@Sushisource you can also use the |
Same issue here. In my case it is a combination of #[actix_web::main]
.with_client(reqwest::Client::new())
.install_batch(opentelemetry::runtime::Tokio); |
Could you provide an example if it's possible. I ran some examples and it seems to be working. Wondering if I missed something here. Thanks in advance. |
@TommyCpp This repros it consistently for me: https://gist.github.com/Sushisource/0c04fcc0fecef386a46cd6dd25c7a296 |
@Sushisource Thanks for putting together a demo. Tokio test macro by default runs on a single thread executor. You can either enable The gist works for me when I tried the above approaches. Let me know if those do the trick for you 😬 |
@simon-an May be try |
@TommyCpp Ah, sure this makes sense and fixes it. It would be really great if the examples (READMEs like the one here) mentioned that using a multithreaded runtime, or setting the current thread batch executor is necessary to avoid blocking - I think probably a lot of people are going to try using the library via tests like this when they first try it out and (obviously, lol) this can be a confusing thing to run into. Thanks for the help! Feel free to close this issue, really appreciate it. |
@TommyCpp Don't I have the multithreaded runtime, when i have a toml like this? actix-http = { version = "3.0.0-beta.6", default-features = false, features = ["compress"] }
actix-web = { version = "4.0.0-beta.6", default-features = false, features = ["compress"] }
actix-web-opentelemetry = { version ="0.11.0-beta.3", features = ["metrics"] }
tokio = { version = "1.5", features = ["rt-multi-thread", "macros", "time"] }
opentelemetry = { version = "0.13", features = ["trace", "metrics", "rt-tokio"] }
opentelemetry-zipkin = { version = "0.11", features = ["reqwest-client"], default-features = false }
opentelemetry-application-insights = {version = "0.14", features = ["reqwest-client"], default-features = false } You are right. It works when i use
instead of
|
I believe the |
@TommyCpp i found a solution and updated the example project. Handle::current().spawn_blocking(move || {
opentelemetry::global::shutdown_tracer_provider();
}).await.unwrap(); Thanks for pointing out, that there might be a current thread scheduler in a multithreaded runtime. That was not that clear to me, but makes scence in the context of a webserver, where thread::spawn should never be executed on a other worker's thread. |
I see a similar bug in Apollo router, running on the tokio multi threaded runtime, in the main code, not in tests. Reproduction info is here. The exporter is created here. The fix proposed by @simon-an works, but I took some time to dig a bit more into it. It happens when we try to reload the configuration and create a new subscriber. We need to replace the tracer provider because we might switch to a different exporter (I feel there's another way to do that without replacing the tracer provider but couldn't find it) opentelemetry-rust/opentelemetry/src/sdk/trace/span_processor.rs Lines 266 to 275 in a81c6c6
That function is called because the Through generous println applications, I see that timeline:
I suspect a bad interaction between the tokio runtime and the futures executor. I tried replacing it with a call to tokio's This is not a blocking bug for us, as I said the proposed fix works. But it would be nice to get a good fix inside the opentelemetry crate, and I hope I gave enough context to help in fixing it 🙂 |
The test wasn't using the multi_thread executor, so eventually it would trip over the bug in opentelemetry which we've hit in the past where the runtime would hang for not fully understood reasons. see: open-telemetry/opentelemetry-rust#536 Using the multi_thread executor is the recommended work-around. I've also remove the redundant thread spawn from the telemetry Drop implementation, since I don't think it's required as well as the spawn_blocking() call. I've run this test in a tight loop for a couple of hours (on my M1 laptop) with no issues.
The test wasn't using the multi_thread executor, so eventually it would trip over the bug in opentelemetry which we've hit in the past where the runtime would hang for not fully understood reasons. see: open-telemetry/opentelemetry-rust#536 Using the multi_thread executor is the recommended work-around. I've also remove the redundant thread spawn from the telemetry Drop implementation, since I don't think it's required as well as the spawn_blocking() call. I've run this test in a tight loop for a couple of hours (on my M1 laptop) with no issues. fixes: #914
Can we know here when someone will take the time to fix this issue helped by what Geal said? Just so we know when we will be able to remove our workaround 🙏 |
🆙 🙏 Like Geal said:
Because it's currently painful to implement a wrapper around the opentelemetry crate, moreover since we don't have I hope #1514 will help solving this issue once for all 🙏. |
Hello,
Been trying to get tracing exporting going to either Jaeger or Zipkin, and with both I'm running into issues actually getting much exported.
Following the readme for either, they mention to call
global::shut_down_provider(); // sending remaining spans
on exit (which, the Readmes seem to be outdated). Calling what looks like the current equivalent (in the title) just locks up entirely. I'm using opentelemetry with thert-tokio
feature, and both the Jaeger and Zipkin crates, with or without theirtokio
features, exhibit this behavior.When I don't call this and instead do something silly like just sleep for a while at the end of a test, I'll get output like:
Somehow I had Jaeger exporting working OK before (never with
opentelemetry::global::set_text_map_propagator(opentelemetry_jaeger::Propagator::new());
or the shutdown call), but when trying to run a much larger scale load test I was never able to collect anything because of the packet size limits. The error helpfully suggests I try setting a smaller batch size, but I was totally unable to determine how to do that after reviewing the docs for a while.Thanks for the hard work on this crate -- my honest feedback is that the API is pretty challenging to figure out. Taking something like the above batch size problem, I can find https://docs.rs/opentelemetry/0.13.0/opentelemetry/sdk/trace/struct.BatchSpanProcessorBuilder.html#method.with_max_export_batch_size which looks like it might be what I want, but I can't actually find any way to use that in combination with the Jaeger exporter.
Thanks again! Let me know if I can provide more info or feedback.
The text was updated successfully, but these errors were encountered: