fix: handle delayed resets by forcefully recording 0s for unreceived events #144

Elfo404 · 2024-09-20T09:41:25Z

This PR should workaround the issues of using counters when their values do not reset at the same time.
By forcefully reset all other possible statuses for a given set of repo/label to 0, we forcefully reset all of them at the same time, avoiding delayed resets that cause huge spikes.

The current value for jobs in a given status is job{status=OBSERVED_STATUS} - job{status=NEXT_STATUS}

before

T0:

jobs{status=queued} 1001
jobs{status=in_progress} 1000

(actual jobs in queued state is 1001 - 1000 = 1)

T1 (after the collector restarts and an in_progress event is received):

jobs{status=in_progress} 1

At this point, prometheus will have the current state internally:

jobs{status=queued} 1001
jobs{status=in_progress} 1

so until another queued event is received, and the resetted data point reported, we would have erroneously 1000 currently queued jobs.

after

T0:

jobs{status=queued} 1001
jobs{status=in_progress} 1000

(actual jobs in queued state is 1001 - 1000 = 1)

T1 (after the collector restarts and an in_progress event is received):

jobs{status=queued} 0 # This is set as result of the code being added, forcing a reset also for this label
jobs{status=in_progress} 1

At this point, prometheus will have the current state internally:

jobs{status=queued} 0
jobs{status=in_progress} 1

…events

fix: handle delayed resets by forcefully recording 0s for unreceived …

1e792dd

…events

Elfo404 requested a review from a team September 20, 2024 09:41

Elfo404 added the receiver/github label Sep 20, 2024

dsotirakis approved these changes Sep 23, 2024

View reviewed changes

Elfo404 merged commit 0d34954 into main Sep 23, 2024
5 checks passed

Elfo404 deleted the gio/fix/delayed-resets branch September 23, 2024 11:21

Elfo404 mentioned this pull request Sep 30, 2024

chore: add event to metrics tests #147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle delayed resets by forcefully recording 0s for unreceived events #144

fix: handle delayed resets by forcefully recording 0s for unreceived events #144

Elfo404 commented Sep 20, 2024

fix: handle delayed resets by forcefully recording 0s for unreceived events #144

fix: handle delayed resets by forcefully recording 0s for unreceived events #144

Conversation

Elfo404 commented Sep 20, 2024

before

after