Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behaviour when complete service outage #380

Closed
mihaidraghici98 opened this issue Sep 6, 2022 · 1 comment
Closed

Behaviour when complete service outage #380

mihaidraghici98 opened this issue Sep 6, 2022 · 1 comment

Comments

@mihaidraghici98
Copy link

mihaidraghici98 commented Sep 6, 2022

Hello!

Does Sloth have a mechanism to include total service outages?

For example in a K8s deployment the pods might not exist at all or be excluded from the load balancer due to livenessProbe readinessProbe failing, thus the availability for the duration of incident being 0%.

I was trying to include such mechanism in the sli.events queries, but the 0% availability will be propagated to all windows and mess the SLI burn ratio (e.g. windows for 30d will be 0%, meaning a huge burn ratio).

Thanks!

@slok
Copy link
Owner

slok commented Oct 27, 2022

Hi @mihaidraghici98!

Sloth doesn't support that :/

I would suggest creating another SLO based on the Prometheus up metric. Even more, maybe you could play with the Prometheus target SLI plugin.

Best,

@slok slok closed this as completed Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants