Startup optimization: Custom DatadogLoggerSink for internal logging / Configuration refactoring #4360

tonyredondo · 2023-07-05T15:45:14Z

Summary of changes

This PR contains the following changes to improve startup time in the tracer and reduce the impact of the internal logger in the callsite thread.

A new DatadogSharedFileSink: based on the already existing SharedFileSink this sink removes the mutex acquire for each LogItem written with a custom MutexStream that only acquires the mutex when there's something to be written to the underlying filestream, this combined with a buffered StreamWriter allow us to only acquire the mutex on buffer full or forced flush.
A new DatadogDeferredSink: this sink deferred the write of each LogItem by enqueuing and moving the write to the threadpool. Allowing the removal of any impact of the logger to the current caller thread. This proved to bring faster startup time ( removing almost 30ms from the startup process).
IConfigurationSource and ITelemeteredConfigurationSource new methods overloads for filtering items on GetDictionary methods. (avoid Dictionary -> Where -> ToDictionary scenario)

Reason for change

Reduce performance overhead of the internal logger.

This can be seen on both Execution Benchmark tests and Throughput tests:

Test coverage

Same tests must pass to confirm we don't change the current behavior.

datadog-ddstaging · 2023-07-12T09:44:12Z

Datadog Report

Branch report: tony/logger-shared-mutexstream
Commit report: 2d2ff6a

❄️ dd-trace-dotnet: 0 Failed, 1 New Flaky, 291189 Passed, 1091 Skipped, 30m 24.87s Wall Time

New Flaky Tests (1)

UtcNow_GivesLegitTime - Datadog.Trace.Tests.TraceContextTests - Last Failure
Expand for error
```
 Assert.True() Failure
 Expected: True
 Actual:   False
```

andrewlock · 2023-07-12T10:00:15Z

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

Welch test with statistical test for significance of 5%
Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (4360) - mean (2,991ms)  : 2932, 3050
     .   : milestone, 2991,
    master - mean (3,006ms)  : 2921, 3091
     .   : milestone, 3006,

    section CallTarget+Inlining+NGEN
    This PR (4360) - mean (3,760ms)  : 3670, 3849
     .   : milestone, 3760,
    master - mean (3,841ms)  : 3779, 3902
     .   : milestone, 3841,

gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (4360) - mean (3,119ms)  : 3030, 3208
     .   : milestone, 3119,
    master - mean (3,116ms)  : 3031, 3200
     .   : milestone, 3116,

    section CallTarget+Inlining+NGEN
    This PR (4360) - mean (3,570ms)  : 3467, 3673
     .   : milestone, 3570,
    master - mean (3,625ms)  : 3559, 3691
     .   : milestone, 3625,

gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (4360) - mean (3,099ms)  : 2997, 3202
     .   : milestone, 3099,
    master - mean (3,116ms)  : 3034, 3198
     .   : milestone, 3116,

    section CallTarget+Inlining+NGEN
    This PR (4360) - mean (3,537ms)  : 3463, 3611
     .   : milestone, 3537,
    master - mean (3,597ms)  : 3541, 3653
     .   : milestone, 3597,

gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (4360) - mean (189ms)  : 186, 193
     .   : milestone, 189,
    master - mean (190ms)  : 186, 195
     .   : milestone, 190,

    section CallTarget+Inlining+NGEN
    This PR (4360) - mean (1,088ms)  : 1057, 1119
     .   : milestone, 1088,
    master - mean (1,129ms)  : 1103, 1155
     .   : milestone, 1129,

gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (4360) - mean (372ms)  : 363, 380
     .   : milestone, 372,
    master - mean (372ms)  : 365, 379
     .   : milestone, 372,

    section CallTarget+Inlining+NGEN
    This PR (4360) - mean (1,149ms)  : 1123, 1175
     .   : milestone, 1149,
    master - mean (1,197ms)  : 1168, 1225
     .   : milestone, 1197,

gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (4360) - mean (358ms)  : 354, 363
     .   : milestone, 358,
    master - mean (360ms)  : 353, 366
     .   : milestone, 360,

    section CallTarget+Inlining+NGEN
    This PR (4360) - mean (1,099ms)  : 1077, 1120
     .   : milestone, 1099,
    master - mean (1,145ms)  : 1128, 1163
     .   : milestone, 1145,

andrewlock · 2023-07-12T10:31:41Z

Benchmarks Report 🐌

Benchmarks for #4360 compared to master:

4 benchmarks are faster, with geometric mean 1.162
2 benchmarks are slower, with geometric mean 1.150
All benchmarks have the same allocations

The following thresholds were used for comparing the benchmark speeds:

Mann–Whitney U test with statistical test for significance of 5%
Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`WriteAndFlushEnrichedTraces`	net6.0	474μs	154ns	575ns	0	2.62 KB
master	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	647μs	296ns	1.11μs	0	2.63 KB
master	`WriteAndFlushEnrichedTraces`	net472	794μs	409ns	1.59μs	0.396	3.22 KB
#4360	`WriteAndFlushEnrichedTraces`	net6.0	471μs	104ns	387ns	0	2.63 KB
#4360	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	658μs	377ns	1.46μs	0	2.63 KB
#4360	`WriteAndFlushEnrichedTraces`	net472	799μs	386ns	1.49μs	0.398	3.22 KB

Benchmarks.Trace.AppSecBodyBenchmark - Faster 🎉 Same allocations ✔️

Faster 🎉 in #4360

Benchmark	base/diff	Base Median (ns)	Diff Median (ns)	Modality
Benchmarks.Trace.AppSecBodyBenchmark.ObjectExtractorSimpleBody‑net472	1.241	187.47	151.04
Benchmarks.Trace.AppSecBodyBenchmark.ObjectExtractorSimpleBody‑net6.0	1.139	132.88	116.63

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`AllCycleSimpleBody`	net6.0	38.9μs	23.3ns	90.4ns	0.0195	0	1.65 KB
master	`AllCycleSimpleBody`	netcoreapp3.1	41.2μs	144ns	559ns	0.0207	0	1.63 KB
master	`AllCycleSimpleBody`	net472	42.3μs	126ns	487ns	0.249	0	1.69 KB
master	`AllCycleMoreComplexBody`	net6.0	222μs	81.9ns	317ns	0.111	0	9.22 KB
master	`AllCycleMoreComplexBody`	netcoreapp3.1	234μs	99.6ns	345ns	0.117	0	9.12 KB
master	`AllCycleMoreComplexBody`	net472	240μs	79.6ns	308ns	1.43	0	9.28 KB
master	`ObjectExtractorSimpleBody`	net6.0	133ns	0.0862ns	0.322ns	0.00396	0	280 B
master	`ObjectExtractorSimpleBody`	netcoreapp3.1	167ns	0.0591ns	0.229ns	0.00363	0	272 B
master	`ObjectExtractorSimpleBody`	net472	188ns	0.194ns	0.75ns	0.0446	0	281 B
master	`ObjectExtractorMoreComplexBody`	net6.0	3μs	1.52ns	5.91ns	0.054	0	3.88 KB
master	`ObjectExtractorMoreComplexBody`	netcoreapp3.1	4.12μs	2.33ns	9.01ns	0.0516	0	3.78 KB
master	`ObjectExtractorMoreComplexBody`	net472	4.09μs	1.69ns	6.33ns	0.617	0.00612	3.89 KB
#4360	`AllCycleSimpleBody`	net6.0	39.1μs	8.07ns	29.1ns	0.0198	0	1.65 KB
#4360	`AllCycleSimpleBody`	netcoreapp3.1	41.5μs	73.4ns	275ns	0.0206	0	1.63 KB
#4360	`AllCycleSimpleBody`	net472	42.3μs	17.6ns	65.7ns	0.254	0	1.69 KB
#4360	`AllCycleMoreComplexBody`	net6.0	223μs	175ns	678ns	0.11	0	9.22 KB
#4360	`AllCycleMoreComplexBody`	netcoreapp3.1	230μs	139ns	521ns	0.114	0	9.12 KB
#4360	`AllCycleMoreComplexBody`	net472	242μs	55.5ns	208ns	1.46	0	9.28 KB
#4360	`ObjectExtractorSimpleBody`	net6.0	117ns	0.062ns	0.232ns	0.00403	0	280 B
#4360	`ObjectExtractorSimpleBody`	netcoreapp3.1	170ns	0.0622ns	0.233ns	0.00376	0	272 B
#4360	`ObjectExtractorSimpleBody`	net472	151ns	0.119ns	0.447ns	0.0446	0	281 B
#4360	`ObjectExtractorMoreComplexBody`	net6.0	2.98μs	0.967ns	3.62ns	0.0566	0	3.88 KB
#4360	`ObjectExtractorMoreComplexBody`	netcoreapp3.1	4.12μs	2.04ns	7.62ns	0.0514	0	3.78 KB
#4360	`ObjectExtractorMoreComplexBody`	net472	4.1μs	2.91ns	11.3ns	0.617	0.00613	3.89 KB

Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`SendRequest`	net6.0	170μs	74.8ns	290ns	0.255	18.03 KB
master	`SendRequest`	netcoreapp3.1	187μs	288ns	1.12μs	0.186	20.09 KB
master	`SendRequest`	net472	0ns	0ns	0ns	0	0 b
#4360	`SendRequest`	net6.0	169μs	104ns	388ns	0.169	18.03 KB
#4360	`SendRequest`	netcoreapp3.1	189μs	249ns	966ns	0.189	20.09 KB
#4360	`SendRequest`	net472	0.000401ns	0.000162ns	0.000583ns	0	0 b

Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Gen 2	Allocated
master	`WriteAndFlushEnrichedTraces`	net6.0	514μs	2.52μs	10.4μs	0.492	0	0	41.95 KB
master	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	621μs	1.88μs	7.27μs	0.316	0	0	42.02 KB
master	`WriteAndFlushEnrichedTraces`	net472	767μs	2.43μs	9.4μs	8.38	2.67	0.381	53.25 KB
#4360	`WriteAndFlushEnrichedTraces`	net6.0	532μs	1.84μs	7.12μs	0.512	0	0	41.86 KB
#4360	`WriteAndFlushEnrichedTraces`	netcoreapp3.1	625μs	756ns	2.93μs	0.306	0	0	41.92 KB
#4360	`WriteAndFlushEnrichedTraces`	net472	791μs	2.1μs	8.13μs	8.2	2.34	0.391	53.22 KB

Benchmarks.Trace.ConfigurationBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
#4360	`CreateImmutableTracerSettings`	net6.0	1.84ms	823ns	3.08μs	0	0	37.66 KB
#4360	`CreateImmutableTracerSettings`	netcoreapp3.1	1.82ms	1.05μs	3.92μs	0	0	38.64 KB
#4360	`CreateImmutableTracerSettings`	net472	2.45ms	1.49μs	5.36μs	43.3	1.2	275.84 KB

Benchmarks.Trace.DbCommandBenchmark - Faster 🎉 Same allocations ✔️

Faster 🎉 in #4360

Benchmark	base/diff	Base Median (ns)	Diff Median (ns)	Modality
Benchmarks.Trace.DbCommandBenchmark.ExecuteNonQuery‑netcoreapp3.1	1.119	1,247.49	1,114.80

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`ExecuteNonQuery`	net6.0	841ns	0.228ns	0.822ns	0.0101	720 B
master	`ExecuteNonQuery`	netcoreapp3.1	1.25μs	0.336ns	1.26ns	0.00944	720 B
master	`ExecuteNonQuery`	net472	1.35μs	0.491ns	1.9ns	0.108	682 B
#4360	`ExecuteNonQuery`	net6.0	881ns	0.304ns	1.18ns	0.0107	720 B
#4360	`ExecuteNonQuery`	netcoreapp3.1	1.11μs	0.284ns	1.06ns	0.00945	720 B
#4360	`ExecuteNonQuery`	net472	1.45μs	0.279ns	1.01ns	0.108	682 B

Benchmarks.Trace.ElasticsearchBenchmark - Slower ⚠️ Same allocations ✔️

Slower ⚠️ in #4360

Benchmark	diff/base	Base Median (ns)	Diff Median (ns)	Modality
Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync‑net6.0	1.155	987.28	1,140.33

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`CallElasticsearch`	net6.0	1.08μs	0.2ns	0.721ns	0.013	0	944 B
master	`CallElasticsearch`	netcoreapp3.1	1.24μs	0.855ns	3.2ns	0.0125	0	944 B
master	`CallElasticsearch`	net472	2.1μs	0.71ns	2.75ns	0.151	0.00104	955 B
master	`CallElasticsearchAsync`	net6.0	987ns	0.216ns	0.748ns	0.0129	0	920 B
master	`CallElasticsearchAsync`	netcoreapp3.1	1.31μs	0.543ns	2.1ns	0.0132	0	992 B
master	`CallElasticsearchAsync`	net472	2.11μs	0.533ns	2.06ns	0.161	0.00105	1.01 KB
#4360	`CallElasticsearch`	net6.0	987ns	0.35ns	1.31ns	0.0139	0	944 B
#4360	`CallElasticsearch`	netcoreapp3.1	1.21μs	0.438ns	1.58ns	0.0127	0	944 B
#4360	`CallElasticsearch`	net472	2.1μs	0.76ns	2.94ns	0.151	0.00105	955 B
#4360	`CallElasticsearchAsync`	net6.0	1.14μs	0.532ns	2.06ns	0.0135	0	920 B
#4360	`CallElasticsearchAsync`	netcoreapp3.1	1.32μs	0.355ns	1.33ns	0.0131	0	992 B
#4360	`CallElasticsearchAsync`	net472	2.14μs	0.366ns	1.37ns	0.16	0	1.01 KB

Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`ExecuteAsync`	net6.0	1.11μs	0.279ns	1.04ns	0.0125	0	912 B
master	`ExecuteAsync`	netcoreapp3.1	1.38μs	0.448ns	1.68ns	0.0124	0	912 B
master	`ExecuteAsync`	net472	1.57μs	0.376ns	1.46ns	0.139	0.000771	875 B
#4360	`ExecuteAsync`	net6.0	1.16μs	0.466ns	1.75ns	0.0134	0	912 B
#4360	`ExecuteAsync`	netcoreapp3.1	1.29μs	0.807ns	3.02ns	0.0118	0	912 B
#4360	`ExecuteAsync`	net472	1.59μs	0.722ns	2.8ns	0.139	0	875 B

Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`SendAsync`	net6.0	3.48μs	1.9ns	7.1ns	0.026	1.9 KB
master	`SendAsync`	netcoreapp3.1	4.26μs	1.3ns	4.67ns	0.0322	2.43 KB
master	`SendAsync`	net472	6.91μs	1.77ns	6.86ns	0.477	3 KB
#4360	`SendAsync`	net6.0	3.52μs	2.4ns	8.99ns	0.0283	1.9 KB
#4360	`SendAsync`	netcoreapp3.1	4.29μs	2.61ns	9.75ns	0.0321	2.43 KB
#4360	`SendAsync`	net472	6.93μs	1.72ns	6.65ns	0.474	3 KB

Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net6.0	1.23μs	0.746ns	2.79ns	0.0229	1.62 KB
master	`EnrichedLog`	netcoreapp3.1	1.96μs	1.73ns	6.71ns	0.0214	1.62 KB
master	`EnrichedLog`	net472	2.29μs	1.69ns	6.31ns	0.244	1.54 KB
#4360	`EnrichedLog`	net6.0	1.26μs	0.647ns	2.42ns	0.0239	1.62 KB
#4360	`EnrichedLog`	netcoreapp3.1	1.96μs	0.743ns	2.78ns	0.0216	1.62 KB
#4360	`EnrichedLog`	net472	2.28μs	1.03ns	4ns	0.244	1.54 KB

Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Gen 1	Allocated
master	`EnrichedLog`	net6.0	113μs	91.7ns	355ns	0.0564	0	4.21 KB
master	`EnrichedLog`	netcoreapp3.1	116μs	167ns	648ns	0	0	4.21 KB
master	`EnrichedLog`	net472	148μs	173ns	670ns	0.662	0.221	4.38 KB
#4360	`EnrichedLog`	net6.0	113μs	128ns	498ns	0.0569	0	4.21 KB
#4360	`EnrichedLog`	netcoreapp3.1	117μs	255ns	986ns	0.0575	0	4.21 KB
#4360	`EnrichedLog`	net472	151μs	336ns	1.3μs	0.674	0.225	4.38 KB

Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net6.0	2.86μs	0.604ns	2.18ns	0.0301	2.18 KB
master	`EnrichedLog`	netcoreapp3.1	3.83μs	1.23ns	4.61ns	0.029	2.18 KB
master	`EnrichedLog`	net472	4.52μs	1.14ns	4.43ns	0.315	1.99 KB
#4360	`EnrichedLog`	net6.0	2.82μs	0.986ns	3.69ns	0.0325	2.18 KB
#4360	`EnrichedLog`	netcoreapp3.1	3.81μs	1.26ns	4.55ns	0.0285	2.18 KB
#4360	`EnrichedLog`	net472	4.45μs	0.729ns	2.63ns	0.315	1.99 KB

Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`SendReceive`	net6.0	1.12μs	0.536ns	2.08ns	0.0155	1.11 KB
master	`SendReceive`	netcoreapp3.1	1.39μs	0.55ns	2.06ns	0.0147	1.11 KB
master	`SendReceive`	net472	1.83μs	0.807ns	3.02ns	0.177	1.12 KB
#4360	`SendReceive`	net6.0	1.11μs	1.81ns	6.99ns	0.0163	1.11 KB
#4360	`SendReceive`	netcoreapp3.1	1.47μs	0.93ns	3.6ns	0.0147	1.11 KB
#4360	`SendReceive`	net472	1.72μs	1.44ns	5.39ns	0.176	1.12 KB

Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`EnrichedLog`	net6.0	2.6μs	1.26ns	4.89ns	0.0209	1.53 KB
master	`EnrichedLog`	netcoreapp3.1	3.54μs	1.11ns	4.14ns	0.0213	1.58 KB
master	`EnrichedLog`	net472	4.23μs	21ns	91.5ns	0.31	1.96 KB
#4360	`EnrichedLog`	net6.0	2.54μs	0.479ns	1.73ns	0.0229	1.53 KB
#4360	`EnrichedLog`	netcoreapp3.1	3.48μs	1.65ns	6.38ns	0.021	1.58 KB
#4360	`EnrichedLog`	net472	4.02μs	1.46ns	5.65ns	0.309	1.96 KB

Benchmarks.Trace.SpanBenchmark - Slower ⚠️ Same allocations ✔️

Slower ⚠️ in #4360

Benchmark	diff/base	Base Median (ns)	Diff Median (ns)	Modality
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑net472	1.146	554.36	635.09

Faster 🎉 in #4360

Benchmark	base/diff	Base Median (ns)	Diff Median (ns)	Modality
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑netcoreapp3.1	1.152	584.22	507.33

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`StartFinishSpan`	net6.0	386ns	0.129ns	0.465ns	0.0075	536 B
master	`StartFinishSpan`	netcoreapp3.1	583ns	0.49ns	1.83ns	0.00731	536 B
master	`StartFinishSpan`	net472	554ns	0.0567ns	0.196ns	0.0852	538 B
master	`StartFinishScope`	net6.0	512ns	0.178ns	0.666ns	0.00915	656 B
master	`StartFinishScope`	netcoreapp3.1	749ns	0.204ns	0.764ns	0.00861	656 B
master	`StartFinishScope`	net472	806ns	0.264ns	0.987ns	0.098	618 B
#4360	`StartFinishSpan`	net6.0	408ns	0.122ns	0.455ns	0.00793	536 B
#4360	`StartFinishSpan`	netcoreapp3.1	507ns	0.445ns	1.72ns	0.00728	536 B
#4360	`StartFinishSpan`	net472	635ns	0.217ns	0.839ns	0.0853	538 B
#4360	`StartFinishScope`	net6.0	482ns	0.121ns	0.452ns	0.0095	656 B
#4360	`StartFinishScope`	netcoreapp3.1	699ns	0.135ns	0.506ns	0.00886	656 B
#4360	`StartFinishScope`	net472	854ns	0.2ns	0.776ns	0.098	618 B

Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch	Method	Toolchain	Mean	StdError	StdDev	Gen 0	Allocated
master	`RunOnMethodBegin`	net6.0	674ns	0.47ns	1.63ns	0.00918	656 B
master	`RunOnMethodBegin`	netcoreapp3.1	756ns	0.31ns	1.12ns	0.00879	656 B
master	`RunOnMethodBegin`	net472	935ns	0.211ns	0.817ns	0.0979	618 B
#4360	`RunOnMethodBegin`	net6.0	650ns	0.223ns	0.863ns	0.00949	656 B
#4360	`RunOnMethodBegin`	netcoreapp3.1	778ns	0.677ns	2.44ns	0.00898	656 B
#4360	`RunOnMethodBegin`	net472	919ns	1.39ns	5.39ns	0.098	618 B

andrewlock · 2023-07-12T10:38:03Z

Throughput/Crank Report:zap:

Throughput results for AspNetCoreSimpleController comparing the following branches/commits:

Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red.

Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards!

gantt
    title Throughput Linux x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (4360) (11.040M)   : 0, 11040083
    master (11.039M)   : 0, 11038648
    benchmarks/2.33.0 (10.950M)   : 0, 10949608
    benchmarks/2.9.0 (11.003M)   : 0, 11003334

    section Automatic
    This PR (4360) (7.763M)   : 0, 7762643
    master (7.701M)   : 0, 7701120
    benchmarks/2.33.0 (7.621M)   : 0, 7620707
    benchmarks/2.9.0 (7.928M)   : 0, 7927595

    section Trace stats
    master (7.716M)   : 0, 7716275
    benchmarks/2.33.0 (7.602M)   : 0, 7602038

    section Manual
    This PR (4360) (9.901M)   : 0, 9900621
    master (9.985M)   : 0, 9984799
    benchmarks/2.33.0 (9.770M)   : 0, 9770235

    section Manual + Automatic
    This PR (4360) (7.487M)   : 0, 7486799
    master (7.390M)   : 0, 7389573
    benchmarks/2.33.0 (7.189M)   : 0, 7188658

    section Version Conflict
    master (6.712M)   : 0, 6712363
    benchmarks/2.33.0 (6.553M)   : 0, 6553316

gantt
    title Throughput Linux arm64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (4360) (9.555M)   : 0, 9555401
    master (9.765M)   : 0, 9764849
    benchmarks/2.33.0 (9.426M)   : 0, 9425603
    benchmarks/2.9.0 (9.605M)   : 0, 9604953

    section Automatic
    This PR (4360) (6.755M)   : 0, 6755381
    master (6.735M)   : 0, 6734547
    benchmarks/2.33.0 (6.803M)   : 0, 6802932

    section Trace stats
    master (6.702M)   : 0, 6702121
    benchmarks/2.33.0 (6.810M)   : 0, 6810447

    section Manual
    This PR (4360) (8.615M)   : 0, 8615209
    master (8.572M)   : 0, 8572415
    benchmarks/2.33.0 (8.435M)   : 0, 8434569

    section Manual + Automatic
    This PR (4360) (6.330M)   : 0, 6330111
    master (6.616M)   : 0, 6616222
    benchmarks/2.33.0 (6.465M)   : 0, 6465149

    section Version Conflict
    master (5.860M)   : 0, 5860102
    benchmarks/2.33.0 (5.807M)   : 0, 5806690

gantt
    title Throughput Windows x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (4360) (9.327M)   : 0, 9327482
    master (9.976M)   : 0, 9976017
    benchmarks/2.33.0 (9.426M)   : 0, 9426080
    benchmarks/2.9.0 (8.999M)   : 0, 8999461

    section Automatic
    This PR (4360) (6.894M)   : 0, 6894120
    master (7.092M)   : 0, 7091991
    benchmarks/2.33.0 (6.469M)   : 0, 6468939
    benchmarks/2.9.0 (6.913M)   : 0, 6912666

    section Trace stats
    master (7.239M)   : 0, 7238581
    benchmarks/2.33.0 (6.830M)   : 0, 6830314

    section Manual
    This PR (4360) (8.610M)   : 0, 8610012
    master (8.535M)   : 0, 8534813
    benchmarks/2.33.0 (8.108M)   : 0, 8108026

    section Manual + Automatic
    This PR (4360) (6.699M)   : 0, 6699359
    master (6.990M)   : 0, 6990093
    benchmarks/2.33.0 (6.856M)   : 0, 6856379

    section Version Conflict
    master (6.339M)   : 0, 6339396
    benchmarks/2.33.0 (5.893M)   : 0, 5893016

gantt
    title Throughput Linux x64 (ASM) (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (4360) (7.676M)   : 0, 7676412
    master (7.739M)   : 0, 7738834
    benchmarks/2.33.0 (7.765M)   : 0, 7764607
    benchmarks/2.9.0 (8.089M)   : 0, 8088812

    section No attack
    This PR (4360) (2.182M)   : 0, 2182373
    master (2.207M)   : 0, 2207260
    benchmarks/2.33.0 (2.209M)   : 0, 2208725
    benchmarks/2.9.0 (3.319M)   : 0, 3319024

    section Attack
    This PR (4360) (1.917M)   : 0, 1916755
    master (1.933M)   : 0, 1932642
    benchmarks/2.33.0 (1.911M)   : 0, 1910909
    benchmarks/2.9.0 (2.631M)   : 0, 2630747

    section Blocking
    This PR (4360) (3.827M)   : 0, 3827172
    master (3.845M)   : 0, 3844841
    benchmarks/2.33.0 (3.861M)   : 0, 3860900

andrewlock

tl;dr:

We can't change IConfigurationSource, so we need to remove those APIs. Plus we only seem to actually use the function delegate once, in a path that is only ever called once. I'm not at all convinced the extra complexity that introduces is worth it
The IntegrationId updates look good, but I think we can simplify a lot given that the only cases where the non-optimal paths are called is if the customer is doing something funky (and incorrect) in code.
The serilog changes look like they could be high impact, and I like it in principle, but they also seem relatively high risk:
- There's no changes in the throughput tests, that seems surprising, unless we don't write many logs by default (which I believe is the case). In which case, this seems like it may risky (completely replacing log infra) without much of an expected benefit? (if there's no logs, there's no benefit!)
- Are we sure we can/should move log serialization off the main loop? Is there prior art from Serilog/other sinks for this, just to be sure that it's actually safe to do (I know it's not safe in some other log FXs). Thinking about async local values, shared scopes etc.

Sorry for all the nay-saying, I do like it in theory! 😄 (well, except for the function pointers 😝)

andrewlock · 2023-07-13T13:58:22Z

tracer/src/Datadog.Trace.SourceGenerators/EnumExtensions/Sources.cs

@@ -106,6 +106,71 @@ private static string GetToStringFast(StringBuilder sb, in EnumToGenerate enumTo
        return sb.ToString();
    }

+    private static string GetKeys(StringBuilder sb, in EnumToGenerate enumToGenerate)


Suggestion for a slight clarification

Suggested change

private static string GetKeys(StringBuilder sb, in EnumToGenerate enumToGenerate)

private static string GetConfigurationKeys(StringBuilder sb, in EnumToGenerate enumToGenerate)

andrewlock · 2023-07-13T13:58:46Z

tracer/src/Datadog.Trace.SourceGenerators/EnumExtensions/Sources.cs

@@ -106,6 +106,71 @@ private static string GetToStringFast(StringBuilder sb, in EnumToGenerate enumTo
        return sb.ToString();
    }

+    private static string GetKeys(StringBuilder sb, in EnumToGenerate enumToGenerate)
+    {
+        if (enumToGenerate.FullyQualifiedName != "Datadog.Trace.Configuration.IntegrationId")


I have to commend your ingenuity, but also, jamming this into the generic "enum extensions" source generator makes my left eye twitch 😛

andrewlock · 2023-07-13T13:59:39Z

tracer/src/Datadog.Trace/ClrProfiler/Instrumentation.cs

-            var tracer = Tracer.Instance;
-            if (tracer is null)
-            {
-                Log.Debug("Tracer.Instance is null after InitializeNoNativeParts was invoked");


Why don't we want to log this anymore?

Because we are not logging this, previous to this code we already are doing Tracer.Instance.XXXX so it will crash earlier 🤦🏻

andrewlock · 2023-07-13T14:00:49Z

tracer/src/Datadog.Trace/Configuration/ConfigurationSources/CompositeConfigurationSource.cs

+            for (var i = 0; i < _sources.Count; i++)
+            {
+                if (_sources[i].GetString(key, NullConfigurationTelemetry.Instance, validator: null, recordValue: true) is { } result)
+                {
+                    return result.Result;
+                }
+            }
+
+            return default;


It's not worth changing these [PublicApi] methods in the configuration sources IMO. They're legacy implementations that we will never invoke, and are only remaining there because they're public APIs.

andrewlock · 2023-07-13T14:01:32Z

tracer/src/Datadog.Trace/Configuration/ConfigurationSources/CompositeConfigurationSource.cs

        }

        /// <inheritdoc />
        ConfigurationResult<string>? ITelemeteredConfigurationSource.GetString(string key, IConfigurationTelemetry telemetry, Func<string, bool>? validator, bool recordValue)
-            => _sources.Select(source => source.GetString(key, telemetry, validator, recordValue)).FirstOrDefault(value => value != null);
+        {
+            for (var i = 0; i < _sources.Count; i++)


These ITelemeteredConfigurationSource.GetString() rewrites are fine as we use them

andrewlock · 2023-07-14T09:17:13Z

tracer/src/Datadog.Trace/Configuration/TracerSettingsSnapshot.cs

+
+            RecordChanges();
+
+            void RecordChanges()


Can't you use the GetKeys() approach here? IMO we could just use that and remove the string.format/string/create approach entirely (as we're guaranteed to only be working with existing integration ids here)

andrewlock · 2023-07-14T09:20:38Z

tracer/src/Datadog.Trace/Logging/Internal/DatadogDeferredSink.cs

+#if !NETCOREAPP
+            ThreadPool.UnsafeQueueUserWorkItem(_waitCallback, null);
+#else
+            ThreadPool.UnsafeQueueUserWorkItem(this, true);
+#endif


This stuff all makes me nervous, as I've basically not used it for 10 years, so I don't know the edge cases and issues, and it feels pretty high risk to be replacing a "core" component like this.

It's also not entirely clear to me that it's safe to move the LogEvent to a background thread like this, rather than rendering it immediately. Is this something Serilog does internally? If not, why not? Have you checked if any of that has changed in Serilog 3.0?

So all I can really say is "I hope you've done this right" 😬

It's also not entirely clear to me that it's safe to move the LogEvent to a background thread like this, rather than rendering it immediately. Is this something Serilog does internally? If not, why not? Have you checked if any of that has changed in Serilog 3.0?

All this is done not because the rendering but for the OpenFile handles of the sink.

I wanted to defer all log handling when doing a cold start because basically we write a bunch of logs at startup and then we don't write any after the app is running normally.

In fact the whole PR is just an attempt to reduce startup time by reducing time in configuration load and log writes. I guess only CI Visibility and Serverless will benefits of this.

This is way this shouldn't affect Throughput tests, but Execution time benchmarks. Which in fact we do: #4360 (comment)

I'm open for ideas for the R&D week 😉

andrewlock · 2023-07-14T09:22:18Z

tracer/src/Datadog.Trace/Logging/Internal/DatadogRollingFileSink.cs

+// This product includes software developed at Datadog (https://www.datadoghq.com/). Copyright 2017 Datadog, Inc.
+// </copyright>
+
+using System;


nullable enable maybe? I'm guessing most of the Serilog code doesn't have that... also, if we're going to update to serilog 3.x soon, be aware we may need/want to update this soon

andrewlock · 2023-07-14T09:26:48Z

tracer/src/Datadog.Trace/Logging/Internal/DatadogRollingFileSink.cs

@@ -0,0 +1,249 @@
+// <copyright file="DatadogRollingFileSink.cs" company="Datadog">


This is file is basically identical to RollingFileSink right, just it creates a DatadogDeferredSink instead of File/Shared Sink? Maybe we should add a comment about that?

andrewlock · 2023-07-14T09:32:45Z

tracer/src/Datadog.Trace/Logging/Internal/DatadogSharedFileSink.cs

+/// Datadog Shared File Sink is based on SharedFileSink code but instead of mutex locking for every log item
+/// the mutex is acquire in each flush, reducing the impact of the logger while keeping the mutex logic
+/// </summary>
+internal sealed class DatadogSharedFileSink : IFileSink, IDisposable


ok, now I'm confused... so many sinks... 😂

tonyredondo added 3 commits July 5, 2023 01:04

Initial commit

f66f6e7

changes

d9ef34c

Improved shared logger.

5627718

tonyredondo self-assigned this Jul 5, 2023

Merge branch 'master' into tony/logger-shared-mutexstream

d623a04

github-actions bot added the area:tracer The core tracer library (Datadog.Trace, does not include OpenTracing, native code, or integrations) label Jul 5, 2023

tonyredondo added 5 commits July 5, 2023 18:11

use optimized TraceClock

33a55c5

fix

2674f35

Adds deferred sink

4333f40

Change configuration creation.

de373b7

Merge branch 'master' into tony/logger-shared-mutexstream

7dc8fea

DataDog deleted a comment from andrewlock Jul 6, 2023

DataDog deleted a comment from datadog-ddstaging bot Jul 6, 2023

tonyredondo added 2 commits July 6, 2023 15:22

remove console.writeline

37316ec

Fixes

86aff75

tonyredondo changed the title ~~Tony/logger shared mutexstream~~ Custom DatadogLoggerSink for internal logging Jul 6, 2023

tonyredondo changed the title ~~Custom DatadogLoggerSink for internal logging~~ Startup optimization: Custom DatadogLoggerSink for internal logging / Configuration refactoring Jul 6, 2023

tonyredondo added 2 commits July 6, 2023 23:54

Merge branch 'master' into tony/logger-shared-mutexstream

bcf12db

Fix

c3d22af

DataDog deleted a comment from datadog-ddstaging bot Jul 7, 2023

DataDog deleted a comment from andrewlock Jul 7, 2023

tonyredondo added 3 commits July 10, 2023 19:21

fix

af0e124

Revert EnvironmentConfigurationSource

6ca1c10

Merge branch 'master' into tony/logger-shared-mutexstream

e7838fe

DataDog deleted a comment from andrewlock Jul 11, 2023

tonyredondo added 2 commits July 11, 2023 16:49

Merge branch 'master' into tony/logger-shared-mutexstream

d2367a1

Merge branch 'master' into tony/logger-shared-mutexstream

b6251a4

This comment has been minimized.

Sign in to view

changes

2d2ff6a

DataDog deleted a comment from datadog-ddstaging bot Jul 12, 2023

DataDog deleted a comment from andrewlock Jul 12, 2023

tonyredondo marked this pull request as ready for review July 12, 2023 12:28

tonyredondo requested review from a team as code owners July 12, 2023 12:28

andrewlock requested changes Jul 14, 2023

View reviewed changes

	private static string GetKeys(StringBuilder sb, in EnumToGenerate enumToGenerate)
	private static string GetConfigurationKeys(StringBuilder sb, in EnumToGenerate enumToGenerate)

		@@ -0,0 +1,249 @@
		// <copyright file="DatadogRollingFileSink.cs" company="Datadog">

Startup optimization: Custom DatadogLoggerSink for internal logging / Configuration refactoring #4360

Are you sure you want to change the base?

Startup optimization: Custom DatadogLoggerSink for internal logging / Configuration refactoring #4360

Conversation

tonyredondo commented Jul 5, 2023 • edited Loading

Summary of changes

Reason for change

Test coverage

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

datadog-ddstaging bot commented Jul 12, 2023 • edited Loading

Datadog Report

New Flaky Tests (1)

andrewlock commented Jul 12, 2023

Execution-Time Benchmarks Report ⏱️

andrewlock commented Jul 12, 2023

Benchmarks Report 🐌

Benchmark details

Raw results

Faster 🎉 in #4360

Raw results

Raw results

Raw results

Raw results

Faster 🎉 in #4360

Raw results

Slower ⚠️ in #4360

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Raw results

Slower ⚠️ in #4360

Faster 🎉 in #4360

Raw results

Raw results

andrewlock commented Jul 12, 2023

Throughput/Crank Report:zap:

andrewlock left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tonyredondo Aug 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tonyredondo Aug 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tonyredondo commented Jul 5, 2023 •

edited

Loading

datadog-ddstaging bot commented Jul 12, 2023 •

edited

Loading

tonyredondo Aug 3, 2023 •

edited

Loading

tonyredondo Aug 3, 2023 •

edited

Loading