Show in-process child spans in zpages #782

mwuertinger · 2018-06-06T18:36:29Z

I'm using OpenCensus in a Golang HTTP server with the trace.ProbabilitySampler. As far as I understand the sampling decision is made before the request processing starts and therefore it is currently impossible to influence the decision based on properties of the request outcome (eg. latency, status code). As I was told on Gitter this is currently per design as a tracing decision is usually made in the outer most service and then passed on to all the downstream services.

However, it would be helpful if one could influence the tracing decision within an application even after the request started in order to collect at least partial information about slow requests.

Is there anything planned in that regard?

The text was updated successfully, but these errors were encountered:

semistrict · 2018-06-11T18:46:31Z

Are you using z-pages? One of the features of the tracez page (although it's pretty rudimentary atm) is that you can see some basic info for all spans in a particular latency bucket (even those that were not sampled).

mwuertinger · 2018-06-13T07:19:33Z

Didn't know that. Will have a look. Thanks :)

semistrict · 2018-07-03T22:58:15Z

@mwuertinger what kinds of partial information would be useful?

I can think of a way we could pretty easily retain all child spans of a parent span that had high latency and/or errors. Using this, we could display the parent span representing the HTTP/gRPC request and then all the child spans representing e.g. database calls to service that request. It would only contain spans from the same process, not the full distributed trace. Is this what you had in mind?

vaijab · 2018-07-04T08:26:34Z

I just wonder how useful is the sampling that is based on worst-case scenarios? Wouldn't you lose track of what normal looks like?

mwuertinger · 2018-07-05T08:33:34Z

@Ramonza From what I understand this is the best we can do at the moment. I think it would help in certain situations but I also have to agree with @vaijab that this would distort the overall picture of service health and might lead to the wrong conclusions. It's probably best to leave decisions like that to the individual teams.

I don't think there is public information available but I heard some time ago that Google's internal tracing system does have much smarter sampling decisions. Does anybody know more about that?

g-easy · 2018-07-06T05:25:29Z

Yes, doing this would distort the overall picture because the sampling would no longer be uniform. We can mitigate that by annotating traces as "sampled uniformly" versus "sampled because something interesting happened."

The advantage of being able to get traces of slow operations is being able to debug why they were slow.

semistrict · 2018-07-06T17:06:11Z

I agree that it's important to know whether something was sampled uniformly or not. This argument also applies to cases where tracing is explicitly requested from the client. Perhaps we should add an attribute that indicates the sampling policy & weight?

We do already have (in tracez) a way to see spans in each latency bucket. I think adding in-process child spans there might make that a lot more useful. They wouldn't be stored anywhere so wouldn't affect how representative the stored sample is.

mwuertinger · 2018-07-10T08:48:58Z

@Ramonza Adding in-process child spans to tracez sounds like an excellent idea.

semistrict · 2018-07-11T18:07:14Z

repurposing this issue

semistrict added the trace label Jun 6, 2018

semistrict added enhancement zpages labels Jul 3, 2018

semistrict changed the title ~~Sampling decision based on request latency~~ Show in-process child spans in zpages Jul 11, 2018

rghetia added the P2 label May 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show in-process child spans in zpages #782

Show in-process child spans in zpages #782

mwuertinger commented Jun 6, 2018

semistrict commented Jun 11, 2018

mwuertinger commented Jun 13, 2018

semistrict commented Jul 3, 2018

vaijab commented Jul 4, 2018

mwuertinger commented Jul 5, 2018

g-easy commented Jul 6, 2018

semistrict commented Jul 6, 2018

mwuertinger commented Jul 10, 2018

semistrict commented Jul 11, 2018

Show in-process child spans in zpages #782

Show in-process child spans in zpages #782

Comments

mwuertinger commented Jun 6, 2018

semistrict commented Jun 11, 2018

mwuertinger commented Jun 13, 2018

semistrict commented Jul 3, 2018

vaijab commented Jul 4, 2018

mwuertinger commented Jul 5, 2018

g-easy commented Jul 6, 2018

semistrict commented Jul 6, 2018

mwuertinger commented Jul 10, 2018

semistrict commented Jul 11, 2018