fix: handle delayed resets by forcefully recording 0s for unreceived events #144
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR should workaround the issues of using counters when their values do not reset at the same time.
By forcefully reset all other possible statuses for a given set of repo/label to 0, we forcefully reset all of them at the same time, avoiding delayed resets that cause huge spikes.
The current value for jobs in a given status is
job{status=OBSERVED_STATUS} - job{status=NEXT_STATUS}
before
T0:
(actual jobs in
queued
state is1001 - 1000 = 1
)T1 (after the collector restarts and an
in_progress
event is received):At this point, prometheus will have the current state internally:
so until another
queued
event is received, and the resetted data point reported, we would have erroneously1000
currently queued jobs.after
T0:
(actual jobs in
queued
state is1001 - 1000 = 1
)T1 (after the collector restarts and an
in_progress
event is received):At this point, prometheus will have the current state internally: