-
Notifications
You must be signed in to change notification settings - Fork 327
Show in-process child spans in zpages #782
Comments
Are you using z-pages? One of the features of the tracez page (although it's pretty rudimentary atm) is that you can see some basic info for all spans in a particular latency bucket (even those that were not sampled). |
Didn't know that. Will have a look. Thanks :) |
@mwuertinger what kinds of partial information would be useful? I can think of a way we could pretty easily retain all child spans of a parent span that had high latency and/or errors. Using this, we could display the parent span representing the HTTP/gRPC request and then all the child spans representing e.g. database calls to service that request. It would only contain spans from the same process, not the full distributed trace. Is this what you had in mind? |
I just wonder how useful is the sampling that is based on worst-case scenarios? Wouldn't you lose track of what normal looks like? |
@Ramonza From what I understand this is the best we can do at the moment. I think it would help in certain situations but I also have to agree with @vaijab that this would distort the overall picture of service health and might lead to the wrong conclusions. It's probably best to leave decisions like that to the individual teams. I don't think there is public information available but I heard some time ago that Google's internal tracing system does have much smarter sampling decisions. Does anybody know more about that? |
Yes, doing this would distort the overall picture because the sampling would no longer be uniform. We can mitigate that by annotating traces as "sampled uniformly" versus "sampled because something interesting happened." The advantage of being able to get traces of slow operations is being able to debug why they were slow. |
I agree that it's important to know whether something was sampled uniformly or not. This argument also applies to cases where tracing is explicitly requested from the client. Perhaps we should add an attribute that indicates the sampling policy & weight? We do already have (in tracez) a way to see spans in each latency bucket. I think adding in-process child spans there might make that a lot more useful. They wouldn't be stored anywhere so wouldn't affect how representative the stored sample is. |
@Ramonza Adding in-process child spans to tracez sounds like an excellent idea. |
repurposing this issue |
I'm using OpenCensus in a Golang HTTP server with the
trace.ProbabilitySampler
. As far as I understand the sampling decision is made before the request processing starts and therefore it is currently impossible to influence the decision based on properties of the request outcome (eg. latency, status code). As I was told on Gitter this is currently per design as a tracing decision is usually made in the outer most service and then passed on to all the downstream services.However, it would be helpful if one could influence the tracing decision within an application even after the request started in order to collect at least partial information about slow requests.
Is there anything planned in that regard?
The text was updated successfully, but these errors were encountered: