-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding tracing_opentelemetry::layer
to tracing_subscriber::registry
breaks span exporting to Jaeger
#159
Comments
Try commenting out the |
having the same issue. Commenting out the |
I've lost days to this. I'm a newb to Rust and OpenTelemetry and assumed I it was my fault when I tweaked the opentelemetry-otlp/examples/basic-otlp example to use |
I did some experiments, it's just the latest version that's borked. If I use slightly older versions (and the corresponding versions of their dependencies) then the I experimented with the 3 most recent releases: v0.26.0, v0.25.0, v0.24.0 for v0.26.0: jaeger displayed no traces so I'll be using v0.24.0 and it dependencies for now. For anyone curious, these are:
I did a wireshark capture of traffic to port 4317 and in the failed run there's only 1 HTTP/2 connection, from the MetricsService: while in successful runs there are 2 HTTP/2 connections, one from MetricsService and one another from the TraceService: Note sure if that helps. If I weren't such a Rust newb, I'd dig further into the actual code |
This is the tracing-opentelemetry commit that upgraded to opentelemetry 0.25: None of the other changes in tracing-opentelemetry 0.26 seem to be potential regressions for this: So I'll suggest that your issue isn't caused by any problem in tracing-opentelemetry, but is instead in your own code or, more likely, in the interaction between your code and the opentelemetry crates. |
In my last post, I was just checking out this repo and running its opentelemetry-otlp.rs example In the latest 0.26.0 release (which mostly uses the 0.26 releases of If very little changed in this crate, my best guess is that the break is due to some change in the |
Yes, that is what I was suggesting. Going to close this issue based on that conclusion, feel free to reopen if you have reason to disagree. |
I believe that if we think that this library is broken, but it's broken by one of the dependencies we should still track it. People can find this issue and still is an open problem. It also seems that examples in this repository don't work, that sounds like an issue that should be solved as the examples is probably the most common thing people use when trying to set this up. |
I've found that setting the registry subscriber as the global default (the use std::error::Error;
use opentelemetry::trace::TracerProvider as _;
use opentelemetry::{global as otel_global, KeyValue};
use opentelemetry_otlp::WithExportConfig as _;
use opentelemetry_sdk::{runtime, trace::Config as TraceConfig, Resource};
use opentelemetry_semantic_conventions::resource::SERVICE_NAME;
use tracing::{info, info_span};
use tracing_subscriber::prelude::*;
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error + Send + Sync + 'static>> {
let exporter = opentelemetry_otlp::new_exporter()
.tonic()
.with_endpoint("http://localhost:4317");
let tracer_provider = opentelemetry_otlp::new_pipeline()
.tracing()
.with_exporter(exporter)
.with_trace_config(
TraceConfig::default()
.with_resource(Resource::new([KeyValue::new(SERVICE_NAME, "playground")])),
)
.install_simple()
/* .install_batch(runtime::Tokio) */
?;
let tracer = tracer_provider.tracer("movement_tracing");
let telemetry_layer = tracing_opentelemetry::layer().with_tracer(tracer);
let subscriber = tracing_subscriber::registry().with(telemetry_layer);
tracing::subscriber::set_global_default(subscriber)?;
// tracing::subscriber::with_default(subscriber, ||
{
let _span = info_span!("test_span").entered();
info!("test event");
}
// );
otel_global::shutdown_tracer_provider();
Ok(())
} If the Commenting the line with |
With the "simple" span processor, the deadlock is traced to let result = self
.exporter
.lock()
.map_err(|_| TraceError::Other("SimpleSpanProcessor mutex poison".into()))
.and_then(|mut exporter| futures_executor::block_on(exporter.export(vec![span]))); If the tonic exporter enters and leaves a tracing span anywhere in the implementation of the Regardless of the choice of the processor, the fundamental problem is that we have a recursion due to the globally set tracing subscriber. Perhaps tracing-opentelemetry should temporarily replace the global tracing subscriber while exporting to prevent the recursion. |
On every export? It feels like the performance cost of that would be prohibitive. |
Yes, also there will be issues with other threads closing spans concurrently. |
Filtering works with the simple processor: let telemetry_layer = tracing_opentelemetry::layer()
.with_tracer(tracer)
.with_filter(filter::Targets::new().with_target("playground_telemetry", Level::INFO));
let subscriber = tracing_subscriber::registry().with(telemetry_layer);
tracing::subscriber::set_global_default(subscriber)?;
{
let _span = info_span!(target: "playground_telemetry", "test_span").entered();
info!(target: "playground_telemetry", "test event");
} The batch processor, however, is broken by |
I think it would be good to open an issue against the opentelemetry-rust project describing this issue from first principles -- AIUI, it's still mostly an opentelemetry-rust issue rather than a tracing-opentelemetry issue? |
I think it's not, from opentelemetry-rust POV: any component down their exporter stack might be using tracing, they can't be expected to try to prevent their dependencies from doing so. But if the global tracing subscriber calls into the opentelemetry pipeline again, we always have potential of lock reentrancy, infinite recursion, or a trace queue growing until it exhausts a resource limit. |
The issue with the batch processor is different. It looks like it needs the subscriber to be dropped to finish exporting properly. This example emits a span 1000 times to cause some export data to be buffered. let tracer = tracer_provider.tracer("tracing-otel-subscriber");
let telemetry_layer = tracing_opentelemetry::layer()
.with_tracer(tracer)
.with_filter(filter::Targets::new().with_target("example_telemetry", Level::INFO));
let fmt_layer = tracing_subscriber::fmt::layer()
.with_thread_names(true)
.without_time();
let subscriber = tracing_subscriber::registry()
.with(telemetry_layer)
.with(fmt_layer);
tracing::subscriber::set_global_default(subscriber)?;
for _ in 0..1000 {
let _span = info_span!(target: "example_telemetry", "test_span").entered();
info!(target: "example_telemetry", "test event");
} This is the output I get from the fmt layer after the 1000 test events at the end of
|
I'm new to open-telemetry and have been stuck on this issue for 2 days. I was trying to run the demo in Then, I downgraded tracing-opentelemetry and a few opentelemetry-* libraries to versions provided above and it's working. I haven't dug into the corresponding crates. It would be great if someone could check if there's anything wrong between these versions. Also tried downgrading opentelemetry-* libraries and keep tracing-opoentelemetry as-is, but type is not matched as the tracing lib apparently requires higher versions of opentelemetry-*. So I was not able to reckon which one is the root cause.
Reproducible demo: https://github.com/h-a-n-a/issue_opentelemetry_jaeger |
I have found a partial workaround in calling The question remains as to why the batch worker does not flush automatically. |
The batch exporter does flush on its own upon reaching some conditions as long as the runtime is running, so there is no need to do anything periodically. |
@mzabaluev tokio::task::spawn_blocking(move || {
let _ = provider.shutdown();
})
.await
.unwrap(); ORIGINAL: use opentelemetry::{
global::{self, shutdown_tracer_provider},
trace::TracerProvider,
KeyValue,
};
use opentelemetry_sdk::{runtime, Resource};
use tracing::subscriber::{set_default, with_default};
use tracing_opentelemetry::OpenTelemetryLayer;
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt};
#[tracing::instrument]
fn foo() {
tracing::info!("info");
tracing::error!("error");
bar();
}
#[tracing::instrument]
fn bar() {
tracing::info!("info");
tracing::error!("error");
}
#[tokio::test]
async fn subscriber_init() {
let provider = opentelemetry_otlp::new_pipeline()
.tracing()
.with_exporter(opentelemetry_otlp::new_exporter().tonic())
.with_trace_config(opentelemetry_sdk::trace::Config::default().with_resource(
Resource::new(vec![KeyValue::new("service.name", "issue-03")]),
))
.install_batch(runtime::Tokio)
.unwrap();
global::set_tracer_provider(provider.clone());
let trace = provider.tracer("issue-03");
tracing_subscriber::registry()
.with(tracing_subscriber::fmt::layer())
.with(OpenTelemetryLayer::new(trace))
.init();
foo();
let _ = provider.shutdown();
shutdown_tracer_provider();
} My dependencies: [dependencies]
opentelemetry = "0.26.0"
opentelemetry-otlp = "0.26.0"
opentelemetry_sdk = { version = "0.26.0", features = ["rt-tokio"] }
tokio = { version = "1.32.0", features = ["macros", "rt-multi-thread"] }
tracing = "0.1"
tracing-opentelemetry = "0.27.0"
tracing-subscriber = { version = "0.3" } |
The shutdown of the span processor calls futures_executor::block_on, it's probably not a good thing to do in async code running on the Tokio runtime. |
|
Bug Report
Version
Platform
Description
When I add a
tracing_opentelemetry::layer()
to atracing_subscriber::registry()
my existing exports to Jaeger breaks.To replicate first run the following to start an instance of Jaeger:
Then run the following code with the below Cargo.toml:
Running the above as is will export the traces created by the OTel crates to Jaeger. Uncommenting the below lines will cause that exporting to fail:
Instead I would have expected these lines to export the spans created by the
tracing
crate to Jaeger alongside the spans created by the Otel crates. Initially reported this to the OTtel crates, but seems like it is likely coming fromtracing_opentelemetry
.The text was updated successfully, but these errors were encountered: