Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global Log handler cleanup - Logs SDK #2184

Merged
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 13 additions & 8 deletions opentelemetry-sdk/src/logs/log_emitter.rs
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
use super::{BatchLogProcessor, LogProcessor, LogRecord, SimpleLogProcessor, TraceContext};
use crate::{export::logs::LogExporter, runtime::RuntimeChannel, Resource};
use opentelemetry::otel_warn;
use opentelemetry::{
global,
logs::{LogError, LogResult},
otel_error, otel_warn,
trace::TraceContextExt,
Context, InstrumentationLibrary,
};
Expand Down Expand Up @@ -114,9 +113,9 @@
let mut errs = vec![];
for processor in &self.inner.processors {
if let Err(err) = processor.shutdown() {
otel_warn!(
name: "logger_provider_shutdown_error",
error = format!("{:?}", err)
otel_error!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as https://github.com/open-telemetry/opentelemetry-rust/pull/2184/files#r1793832380, this feels redundant, given Result will carry this info already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to debug now.

name: "LoggerProvider.Shutdown.Error",
error = err

Check warning on line 118 in opentelemetry-sdk/src/logs/log_emitter.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/logs/log_emitter.rs#L116-L118

Added lines #L116 - L118 were not covered by tests
);
errs.push(err);
}
Expand All @@ -128,10 +127,13 @@
Err(LogError::Other(format!("{errs:?}").into()))
}
} else {
let error = LogError::Other("logger provider already shut down".into());

otel_warn!(
name: "logger_provider_already_shutdown"
name: "LoggerProvider.Shutdown.AlreadyShutdown",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks redundant, as returned Result already contains this information. If we need to keep this, then maybe keep it at debug/info level.

Copy link
Member Author

@lalitb lalitb Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the error is returned as Result to the user, we decide not to log it even if this could be actionable ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not to log or log at a very low severity.

Do you see any reason why this should be logged, given Result already carries this information? "Internal logging" is more for things that are otherwise hard to figure out, right?

Copy link
Member Author

@lalitb lalitb Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer logging at a lower severity (debug?) for internal debugging. When we request logs from the customer at the debug level, they should include everything necessary for troubleshooting, even if the customer disregards the shutdown result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point! Agree with debug! here. It won't cause noise in normal operations, but when we ask users to report bug/issues, we can ask to collect all logs, including debug, so we'll have enough to investigate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to debug now.

error = error
);
Err(LogError::Other("logger provider already shut down".into()))
Err(error)
}
}
}
Expand All @@ -146,7 +148,10 @@
fn drop(&mut self) {
for processor in &mut self.processors {
if let Err(err) = processor.shutdown() {
global::handle_error(err);
otel_warn!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should do more changes to this part - this is the automatically invoked shutdown upon drop right? I believe we should first check if user has already called shutdown, and if yes, return without any logs (maybe a debug level log?)

if user has not called shutdown, then we need to call shutdown ourselves. Even here - should we log for each processor? or just log once saying shutdownfailed. The detailed failure reason should be part of the individual processor's own log.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should do more changes to this part - this is the automatically invoked shutdown upon drop right? I believe we should first check if user has already called shutdown, and if yes, return without any logs (maybe a debug level log?)

Modified it accordingly, and add debug log for already shutdown.

if user has not called shutdown, then we need to call shutdown ourselves. Even here - should we log for each processor? or just log once saying shutdownfailed. The detailed failure reason should be part of the individual processor's own log.

The Processor::shutdown() returns the error, and we are now logging each to debug (instead of warn).

name: "LoggerProvider.Drop.AlreadyShutdown",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how we assume that the error is due to "AlreadyShutdown".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good catch I have changed it to LoggerProvider.Drop.ShutdownError. Have kept it as warning, as the error is during shutdown of LoggerProvider, and should be less destructive ( I think)

error = err
);
}
}
}
Expand Down
61 changes: 18 additions & 43 deletions opentelemetry-sdk/src/logs/log_processor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,8 @@
// noop after shutdown
if self.is_shutdown.load(std::sync::atomic::Ordering::Relaxed) {
otel_warn!(
name: "simple_log_processor_emit_after_shutdown"
name: "SimpleLogProcessor.Export.AfterShutdown",
error = LogError::Other("Attempted to export a log after processor shutdown".into())

Check warning on line 104 in opentelemetry-sdk/src/logs/log_processor.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/logs/log_processor.rs#L104

Added line #L104 was not covered by tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any advantage of wrapping this into LogError?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it useful - it adds convenience on top of std::error::Error, by declaring various error types as enum and mapping it to actual error string. We don't need to add the error string while returning the error.

);
return;
}
Expand All @@ -115,10 +116,9 @@
});
if let Err(err) = result {
otel_error!(
name: "simple_log_processor_emit_error",
error = format!("{:?}", err)
name: "SimpleLogProcessor.Export.Error",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not easy to tell if this is useful or not.. The failure could be due to

  1. Mutex poisoning
  2. Exporter facing issues like network etc.
  3. Futures-Executor can fail itself too.

I don't have a good solution in mind, so probably okay to just replace current global error handler with otel_error! for now until we can find a better alternative.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have modified it to differentiate between MutexPoisoning and Exporter failure - I think we can further extend it in future.

error = err

Check warning on line 120 in opentelemetry-sdk/src/logs/log_processor.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/logs/log_processor.rs#L119-L120

Added lines #L119 - L120 were not covered by tests
);
global::handle_error(err);
}
}

Expand All @@ -133,9 +133,6 @@
exporter.shutdown();
Ok(())
} else {
otel_error!(
name: "simple_log_processor_shutdown_error"
);
Err(LogError::Other(
"simple logprocessor mutex poison during shutdown".into(),
))
Expand Down Expand Up @@ -172,8 +169,8 @@

if let Err(err) = result {
otel_error!(
name: "batch_log_processor_emit_error",
error = format!("{:?}", err)
name: "BatchLogProcessor.Export.Error",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this be triggered when channel is full? If yes, we need to rethink this, as this can spam the log output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this error only triggers when channel is full or closed. We need to add some throttling or logic to prevent flooding - have added the TODO for now, as we need common strategy for such flooding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree we need a common strategy, but lets remove the error log from here. It'll flood as-is when buffer is full.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove altogether, or make it otel_debug for now - with comment to change it to otel_error once throttling is ready.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either of them are fine with me, though I slightly prefer removing altogether, as I don't know if we can ship a throttling solution for next release.

error = err
);
global::handle_error(LogError::Other(err.into()));
}
Expand Down Expand Up @@ -243,10 +240,9 @@

if let Err(err) = result {
otel_error!(
name: "batch_log_processor_export_error",
error = format!("{:?}", err)
name: "BatchLogProcessor.Export.Error",
error = err

Check warning on line 244 in opentelemetry-sdk/src/logs/log_processor.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/logs/log_processor.rs#L243-L244

Added lines #L243 - L244 were not covered by tests
);
global::handle_error(err);
}
}
}
Expand All @@ -261,24 +257,12 @@
.await;

if let Some(channel) = res_channel {
if let Err(result) = channel.send(result) {
global::handle_error(LogError::from(format!(
"failed to send flush result: {:?}",
result
)));
if let Err(send_error) = channel.send(result) {
otel_error!(
name: "batch_log_processor_flush_error",
error = format!("{:?}", result),
message = "Failed to send flush result"
name: "BatchLogProcessor.Flush.SendResultError",
error = format!("{:?}", send_error),

Check warning on line 263 in opentelemetry-sdk/src/logs/log_processor.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/logs/log_processor.rs#L263

Added line #L263 was not covered by tests
);
}
} else if let Err(err) = result {
otel_error!(
name: "batch_log_processor_flush_error",
error = format!("{:?}", err),
message = "Flush failed"
);
global::handle_error(err);
}
}
// Stream has terminated or processor is shutdown, return to finish execution.
Expand All @@ -294,15 +278,12 @@
exporter.shutdown();

if let Err(result) = ch.send(result) {
otel_error!(
name: "batch_log_processor_shutdown_error",
error = format!("{:?}", result),
message = "Failed to send shutdown result"
);
global::handle_error(LogError::from(format!(
"failed to send batch processor shutdown result: {:?}",
result
)));
if let Err(err) = result {
otel_error!(
name: "BatchLogProcessor.Shutdown.SendResultError",
error = err
);
}

Check warning on line 286 in opentelemetry-sdk/src/logs/log_processor.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/logs/log_processor.rs#L281-L286

Added lines #L281 - L286 were not covered by tests
}

break;
Expand Down Expand Up @@ -357,13 +338,7 @@
pin_mut!(timeout);
match future::select(export, timeout).await {
Either::Left((export_res, _)) => export_res,
Either::Right((_, _)) => {
otel_error!(
name: "export_with_timeout_timeout",
timeout_duration = time_out.as_millis()
);
ExportResult::Err(LogError::ExportTimedOut(time_out))
}
Either::Right((_, _)) => ExportResult::Err(LogError::ExportTimedOut(time_out)),

Check warning on line 341 in opentelemetry-sdk/src/logs/log_processor.rs

View check run for this annotation

Codecov / codecov/patch

opentelemetry-sdk/src/logs/log_processor.rs#L341

Added line #L341 was not covered by tests
}
}

Expand Down
31 changes: 29 additions & 2 deletions opentelemetry/src/global/internal_logging.rs
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,13 @@ macro_rules! otel_warn {
(name: $name:expr, $($key:ident = $value:expr),+ $(,)?) => {
#[cfg(feature = "internal-logs")]
{
tracing::warn!(name: $name, target: env!("CARGO_PKG_NAME"), $($key = $value),+, "");
tracing::warn!(name: $name,
target: env!("CARGO_PKG_NAME"),
$($key = {
$crate::format_value!($value)
}),+,
""
)
}
};
}
Expand Down Expand Up @@ -108,7 +114,28 @@ macro_rules! otel_error {
(name: $name:expr, $($key:ident = $value:expr),+ $(,)?) => {
#[cfg(feature = "internal-logs")]
{
tracing::error!(name: $name, target: env!("CARGO_PKG_NAME"), $($key = $value),+, "");
tracing::error!(name: $name,
target: env!("CARGO_PKG_NAME"),
$($key = {
$crate::format_value!($value)
}),+,
""
)
}
};
}

/// Helper macro to format a value using Debug if available, falling back to Display
#[macro_export]
macro_rules! format_value {
($value:expr) => {{
// Try Debug first
let debug_result = std::fmt::format(format_args!("{:?}", $value));
if debug_result.starts_with('<') || debug_result.contains("::") {
// Contains module path or starts with generic angle brackets
format!("{}", $value)
} else {
debug_result
}
}};
}
Loading