-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise Prometheus targets handling #2474
Conversation
a575d60
to
82ce9f8
Compare
dddfa4e
to
b39c1f9
Compare
b912766
to
2a0575b
Compare
package relabel | ||
|
||
// LabelBuilder is an interface that can be used to change labels with relabel logic. | ||
type LabelBuilder interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows us to use our own label builder with prometheus relabel logic. We may be able to contribute this upstream as it doesn't seem harmful.
@@ -214,6 +216,7 @@ func attemptLoadingAlloyConfig(t *testing.T, bb []byte) { | |||
clusterService, | |||
labelstore.New(nil, prometheus.DefaultRegisterer), | |||
remotecfgService, | |||
livedebugging.New(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed some errors in log related to unknown service, so I added this. I think this test is attempting to start the config and find any config syntax related issues before the run inevitably fails.
comparedAndEqual = result{compared: true, isEqual: true} | ||
) | ||
|
||
func deepCustomEqual(v1, v2 reflect.Value) result { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adapted from reflect.DeepEqual.
# Conflicts: # CHANGELOG.md
2788b05
to
872d35b
Compare
Benchmark_Targets_TypicalPipeline-8 34 33537419 ns/op 6022333 B/op 100543 allocs/op | ||
Benchmark_Targets_TypicalPipeline-8 34 33687083 ns/op 6109172 B/op 100543 allocs/op | ||
*/ | ||
func Benchmark_Targets_TypicalPipeline(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compared to current main
we see:
benchstat ./01_typical_pipeline_targets_as_maps.txt ./08_typical_pipeline_obj_pools.txt
goos: darwin
goarch: arm64
pkg: github.com/grafana/alloy/internal/component/discovery
cpu: Apple M2
│ ./01_typical_pipeline_targets_as_maps.txt │ ./08_typical_pipeline_obj_pools.txt │
│ sec/op │ sec/op vs base │
_Targets_TypicalPipeline-8 38.63m ± 1% 33.61m ± 6% -12.98% (p=0.000 n=10+6)
│ ./01_typical_pipeline_targets_as_maps.txt │ ./08_typical_pipeline_obj_pools.txt │
│ B/op │ B/op vs base │
_Targets_TypicalPipeline-8 61.953Mi ± 0% 5.743Mi ± 1% -90.73% (p=0.000 n=10+6)
│ ./01_typical_pipeline_targets_as_maps.txt │ ./08_typical_pipeline_obj_pools.txt │
│ allocs/op │ allocs/op vs base │
_Targets_TypicalPipeline-8 241.8k ± 0% 100.5k ± 0% -58.42% (p=0.000 n=10+6)
relabelled, keep := relabel.Process(lset, relabelConfigs...) | ||
var ( | ||
relabelled discovery.Target | ||
builder = discovery.NewTargetBuilderFrom(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This builder should/could be reused to avoid extra allocations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it should be since the content of the builder is mutated on each call to ProcessBuilder, if we can make it safe then seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the current profiles show good performance gains here already, I would like to leave this optimisation for future PRs after we have more data. I would like to consider an option of using an object pool for the couple of maps that the builder uses underneath as an alternative and compare the two options.
mustGet := func(label string) string { | ||
val, ok := t.Get(label) | ||
if !ok { // should never happen as Target is immutable | ||
panic("label concurrently modified - this is a bug - please report an issue") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not panic but instead return an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Target labels are immutable and the only way that labels passed in order
are no longer there is someone introducing a bug. I've removed the panic to not have crashes if this ever happened, but instead of returning an error, in order not to overcomplicate the API, I'm hashing an empty string - it will still provide consistent hashing behaviour and is a better choice than complicating things with error handling IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions but overall very solid! Fantastic gains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed the comments, PTAL
mustGet := func(label string) string { | ||
val, ok := t.Get(label) | ||
if !ok { // should never happen as Target is immutable | ||
panic("label concurrently modified - this is a bug - please report an issue") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Target labels are immutable and the only way that labels passed in order
are no longer there is someone introducing a bug. I've removed the panic to not have crashes if this ever happened, but instead of returning an error, in order not to overcomplicate the API, I'm hashing an empty string - it will still provide consistent hashing behaviour and is a better choice than complicating things with error handling IMO.
relabelled, keep := relabel.Process(lset, relabelConfigs...) | ||
var ( | ||
relabelled discovery.Target | ||
builder = discovery.NewTargetBuilderFrom(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the current profiles show good performance gains here already, I would like to leave this optimisation for future PRs after we have more data. I would like to consider an option of using an object pool for the couple of maps that the builder uses underneath as an alternative and compare the two options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, great work. My only remaining ask is to document the places we cant change the code due to backwards compatibility with prometheus.
PR Description
High level overview: This changes Targets to a capsule instead of a Go map in a way that is transparent to the users, but allows us to optimise the allocation and memory footprint of the targets handling in Prometheus pipelines.
NOTE: this PR appears large, however:
Which issue(s) this PR fixes
Fixes #1999
Notes to the Reviewer
Changes in more detail to facilitate the review:
group
and target's own labels inown
fields. This allows to avoid a lot of allocation.own
andgroup
fields, we need to use a dedicatedEquals
function. Implementing it is easy, but we also need to make sure that it is used by Alloy runtime (see below).equality
package which will attempt to compare using ourCustomEquality
interface where possible, but if that fails, it will use the previous reflect.DeepEquals. This package is now used throughout the runtime to avoid unnecessary updates etc.ConvertibleIntoCapsule
- we know it can be converted into an object (which is a map), so we do that and we print it as such.group
are passed on a group level. This reconstruction of target groups is done indiscovery.ComponentTargetsToPromTargetGroups
. We group them using thegroup
labelset hash, but we also handle hash conflicts in case they are detected.Performance:
Deployed to a large cluster between 10:30 and 13:00, we observed the following impact:
Memory and CPU down, we can also see smaller variation in memory usage which is good for HPA.
data:image/s3,"s3://crabby-images/92554/925545032aced119992e26954a9024f779976a22" alt="image"
Allocation profile before:
data:image/s3,"s3://crabby-images/11f21/11f211755b1f73f94e1c0e54fdf6f83c4e93cd03" alt="image"
Allocation profile after:
data:image/s3,"s3://crabby-images/ffa78/ffa782088941cc80da84b5d4893e33bcdf97791e" alt="image"
Looking at the diff, the reduction is, as expected, mostly in these two functions:
data:image/s3,"s3://crabby-images/fd81b/fd81bff5b05f556add817dd4234452ace8bacc21" alt="image"
We can see the overall memory usage went down and back up as the changed was rolled out and then rolled back:
data:image/s3,"s3://crabby-images/1623f/1623f93252b52a75f2e31bb9066da8b47c30c81f" alt="image"
PR Checklist