Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure metrics latency #2323

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

fraillt
Copy link
Contributor

@fraillt fraillt commented Nov 22, 2024

Tests how much time it takes to call .add(value, attribs) under various conditions.
It measures several things: average time, median time (p50), p95 and p99.
The way it works is pretty simple:

  • call .add(value, attribs) on several threads in the loop and measure each call
  • in main thread, keep collecting data (delta collect) with some delay between calls.
    Here's the results on my machine on main branch.
See results.
*** updates, using 4 threads ***
no attribs
	iter 26.43M
	avg 103.0ns
	p50 57.00ns
	p95 177.0ns
	p99 182.0ns
1 attrib
	iter 8.267M
	avg 422.0ns
	p50 362.0ns
	p95 743.0ns
	p99 1.017μs
9 attribs
	iter 4.919M
	avg 718.0ns
	p50 615.0ns
	p95 854.0ns
	p99 1.026μs
*** inserts, using 4 threads ***
1 attrib
	iter 435.1K
	avg 8.931μs
	p50 958.0ns
	p95 64.16μs
	p99 113.8μs
10 attribs
	iter 119.0K
	avg 33.03μs
	p50 4.959μs
	p95 133.9μs
	p99 205.0μs
*** mix mostly updates (100 attribute-sets), using 4 threads ***
10 attribs
	iter 379.1K
	avg 10.36μs
	p50 1.729μs
	p95 67.73μs
	p99 125.9μs
*** updates, using 36 threads ***
no attribs
	iter 40.32M
	avg 682.0ns
	p50 215.0ns
	p95 228.0ns
	p99 243.0ns
1 attrib
	iter 11.72M
	avg 2.991μs
	p50 797.0ns
	p95 1.539μs
	p99 2.264μs
9 attribs
	iter 8.819M
	avg 3.952μs
	p50 933.0ns
	p95 1.442μs
	p99 1.797μs
*** inserts, using 36 threads ***
1 attrib
	iter 279.6K
	avg 128.2μs
	p50 2.757μs
	p95 792.6μs
	p99 1.596ms
10 attribs
	iter 80.01K
	avg 448.5μs
	p50 130.5μs
	p95 1.858ms
	p99 3.044ms
*** mix mostly updates (100 attribute-sets), using 36 threads ***
10 attribs
	iter 1.249M
	avg 28.57μs
	p50 945.0ns
	p95 13.10μs
	p99 871.9μs

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@fraillt fraillt requested a review from a team as a code owner November 22, 2024 10:15
@fraillt fraillt mentioned this pull request Nov 22, 2024
4 tasks
Copy link

codecov bot commented Nov 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.5%. Comparing base (e0159ad) to head (8898ac5).

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #2323   +/-   ##
=====================================
  Coverage   79.5%   79.5%           
=====================================
  Files        123     123           
  Lines      21492   21492           
=====================================
  Hits       17094   17094           
  Misses      4398    4398           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

// collect threads in short intervals
let mut total_count = 0;
while threads.iter().any(|t| !t.is_finished()) {
// collect agressively so we could measure inserts properly,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to understand what this is simulating? Metric collection interval is by default 30 sec or 60 sec in the spec. What is the scenario for collecting every 500 microsec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically idea, is to choose collection interval so it wouldn't affect numbers (p99) but still allow to test updates and inserts (for insert I need to clean it regularly, but not too frequent that updates would become inserts).
I updated code with explanation why this number... it could be different, but it should be good enough so it would work in all cases without affecting p99.
( Sorry for rebase,-force push, first commit was a bit rushed... I wasn't happy with, now it's better).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I am not following the scenario being tested - are we expecting that users will run collect() every few microseconds? I don't think that is something we need to consider for performance optimizations.

Also, I opened #2328 to track revisiting the aggressive cleaning up of HashMap.

}));
}
let mut total_count = 0;
while start.elapsed() < Duration::from_secs(1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is to measure the impact of Collect to overall throughput, lets add the existing PeriodicReader with a dummy exporter to the existing stress test. The PeriodicReader interval should be the defaults (60 secs) - or we can shorten to a low number like 5 sec, but definitely not micro-secs!

(I don't expect adding Reader to have any measurable impact to stress test.)

Copy link
Contributor Author

@fraillt fraillt Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is opposite,- collect should be invisible in the statistics, but keep in mind, that this number is almost arbitrary... and I should probably have better implementation, but for the starters it already shows meaningful numbers :)
So with 300micro-sec... it will clear around 1800-2500times per second (time to collect also adds up).
For p99, I care about 99% of measurements, and can ignore the rest. Which means, that for p99 it will have no effect on updates, if I measure more than 2500*99 =~250K/s, for my case I'm always over several millions, so collecting doesn't have any affect on statistics.
For inserts situation is different... I need to avoid hitting overflow (2k), which means that at most I can reach around 2M (which might be the case with better hardware...), so I basically hope that I clear frequently enough, to not hit 2k (overflow), but not too frequent, so that it will greatly affect p99 (but it probably affects in some scenarios... so this is not ideal).

Probably I should implement some sort of dynamic/adaptive interval, so that it wouldn't clean to frequently (so it wouldn't affect more than 1% of measurements for p99, but not too rarely to hit overflow (2k).
In other words, if I collect after less than 100, then it will affect p99, but if I collect after 2k, then for inserts it might reached overflow.

I could also implement different strategies for each case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate on the scenario being covered? Without understanding the actual scenario, its hard to provide useful feedback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I update revision a bit. Generally, the purpose of this PR is to simply look how much time it takes for individual calls to measure things under various conditions, that's it :)
There's something to learn too :) For example, I have learned, that even simple update (no-attributes) under high load (lots of threads) can take as little as 60ns, or as much as 80ms, it's more than 1000000 time difference!
I still cannot explain why some times it takes so long... I tried to do "warmup" but, at least on my machine, this is consistently reproducible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fraillt for the great work, as always!
It is very important to ensure that the update time is predictable and never causes thread to get blocked etc. which will defeat the whole point of relying on OpenTelemetry to report metrics (if OTel itself can cause a thread to lose its CPU slice and mess up the duration!!). I don't think we'll fix this for 1.0 Metrics release, so I plan to call this out in docs. My current guess is.- some form of Bound instruments will solve this problem.

Let me re-check this PR to see if we want to keep this for measuring this scenario. The current behavior of aggressively draining the hash map is working against us, as that will force some updates to need write lock, effectively causing other threads to potentially get blocked...Something #2328 can address..

@fraillt fraillt force-pushed the measure_metrics_latency branch 2 times, most recently from 3b7fbe8 to 7ae7647 Compare November 28, 2024 13:49
KeyValue::new("url.scheme", "not_found"),
KeyValue::new("error.type", 404),
KeyValue::new("http.response.status_code", 404),
KeyValue::new("http.route", "testing/metrics/latency"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one interesting thing is - the route may not be readily available as a `static str always, forcing users to String allocate to report this!
#2089 (comment) is trying to fix that scenario, hopefully we can make it happen for 1.0 stable of Metrics!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants