-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use dedicated thread for Metrics PeriodicReader #2142
base: main
Are you sure you want to change the base?
Use dedicated thread for Metrics PeriodicReader #2142
Conversation
I'm worried about support for situations where threads may be problematic, either because their are not supported or because or due to the use of functions unsafe to call in the presence of threads (non-reentrant function calls). For example, what about Would it be possible to have thread not required and instead be another runtime to chose from? Or exposing a lower level interface based on channels and let the use chose its concurrency solution? |
Yes that is very valid point, which is already mentioned in the desc: Yes we expect there'd be scenarios where OTel spawning background threads is not feasible, and OTel will offer a way to "bring your own runtime". |
@sandersaares Could you take a look? |
These tests should help with testing #2142
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basic premise seems workable. I suspect my lack of OTel familiarity prevents me from making more useful comments here.
…metry#2147) These tests should help with testing open-telemetry#2142
Co-authored-by: Zhongyang Wu <[email protected]> Co-authored-by: Lalit Kumar Bhasin <[email protected]>
…open-telemetry#2152) Co-authored-by: Cijo Thomas <[email protected]>
…wn (open-telemetry#2156) Co-authored-by: Lalit Kumar Bhasin <[email protected]>
Co-authored-by: Lalit Kumar Bhasin <[email protected]>
@@ -61,7 +61,7 @@ fn init_tracer_provider() -> Result<sdktrace::TracerProvider, TraceError> { | |||
|
|||
fn init_metrics() -> Result<opentelemetry_sdk::metrics::SdkMeterProvider, MetricsError> { | |||
opentelemetry_otlp::new_pipeline() | |||
.metrics(opentelemetry_sdk::runtime::Tokio) | |||
.metrics() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you verified that this still works? I believe reqwest
relies on Tokio. Now that the outgoing Http call through reqwest
is being made on a background thread without Tokio (or any runtime), it might be problematic. Looking to ensure that we don't hit this issue in particular:
there is no reactor running, must be called from the context of Tokio runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent point!
Based on offline discussion, yes this appear to cause issues when using both reqwest
and hyper
. These libraries seem to have a strong requirement that it cannot work unless they are inside tokio runtime.
tonic
seem to work fine without issues.
Will check if there are ways to work around the http library limitations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will check if there are ways to work around the http library limitations.
I did a quick test with Simple Log Processor + OTLP exporter with request::blocking::Client - this works without need of tokio runtime. This should also work with background thread for batch in that case. Enforcing this for background thread could be one option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will check if there are ways to work around the http library limitations.
I did a quick test with Simple Log Processor + OTLP exporter with request::blocking::Client - this works without need of tokio runtime. This should also work with background thread for batch in that case. Enforcing this for background thread could be one option.
Yes. But there is no blocking version for libraries like tonic, effectively limiting exporting to only support http with reqwest::blockingClient.
Will need to redesign after doing a comparison of all feasible options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tonic seem to work fine without issues.
This maybe incorrect, as it may have worked during shutdown only. Apologies for the confusion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, tonic
is tricky, there is no alternative either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lalitb I have another correction to make about tonic, based on more testing:
The changes in this PR works fine with tonic, as long as the main function of the app is a tokio one. Yes the tonic::export call is made from our background thread, still this works.
If the main function is not a tokio main, then the app panics at meterprovider build itself. It looks like we attempt to create a grpc channel at build(), and that fails due to lack of tokio runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this PR works fine with tonic, as long as the main function of the app is a tokio one. Yes the tonic::export call is made from our background thread, still this works.
Interesting, I was thinking the spawned thread (std::thread::spawn) doesn't run in the tokio runtime and so will not have access to tokio runtime. thanks for confirming.
Trying to summarize the scenarios:
These work:
tokio::main -> background-thread -> gRPC (tonic), HTTP (hyper, reqwest, reqwest-blocking)
tokio::main -> simple-exporter -> gRPC (tonic), HTTP (hyper, reqwest, reqwest-blocking) (assuming we do filtering to avoid infinite loop).
tokio::main -> simple-exporter -> HTTP (reqwest-blocking)
main -> simple-exporter -> HTTP (reqwest-blocking)
And these doesn't work:
main -> background-thread -> gRPC (tonic), HTTP (hyper, reqwest, reqwest-blocking)
tokio::main(current_thread) -> simple-exporter -> gRPC (tonic), HTTP (hyper, reqwest, reqwest-blocking) # hangs
how about this ?
tokio::main(current_thread) -> background-thread -> gRPC (tonic), HTTP (hyper, reqwest, reqwest-blocking)
self.is_shutdown | ||
.store(true, std::sync::atomic::Ordering::Relaxed); | ||
if response { | ||
Ok(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Join the background thread before considering it a successful shutdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Join the background thread could block the user thread if the background thread is some how blocking (like blocking in the export call), so the timeout parameter could be ignored.
Could we try to join it if it can be done within the specified timeout, and give up the joining operation once timeout is reached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ThomsonTan Such a situation will not arise in current implementation. If response_rx
got the message back, the background thread will be exiting itself, and not doing anything else.
PeriodicReader::shutdown()
deadlocks on current thread runtime. #2056Given the usage of dedicated thread, PeriodicReader is no longer at the mercy of user threads not blocking threads and causing issues like this.
(The ability for users to bring-own-async-runtime - this can be offered as an opt-in feature. How exactly - TBD. Most likely we need to do it post 1.0)
Note: The key change is in periodicreader.rs only. Rest are cascading effects on removing runtime requirement, passing timeout to exporters. The non-relevant changes to examples are to be removed before PR merge - it is added to help anyone run locally and observe the logs themselves.