Skip to content

Commit

Permalink
Deprecate Kubernetes on-call rotations
Browse files Browse the repository at this point in the history
  • Loading branch information
saad-ali committed Oct 23, 2017
1 parent 55a3365 commit fc975ed
Show file tree
Hide file tree
Showing 5 changed files with 60 additions and 195 deletions.
56 changes: 51 additions & 5 deletions contributors/devel/issues.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,57 @@ and this document will cover the basic ones.

Sometimes users ask for support requests in issues; these are usually requests
from people who need help configuring some aspect of Kubernetes. These should be
directed to our [support structures](https://github.com/kubernetes/community/blob/master/contributors/devel/on-call-user-support.md) and then closed. Also, if the issue is clearly abandoned or in
the wrong place, it should be closed. Keep in mind that only issue reporter,
assignees and component organization members can close issue. If you do not
have such privilege, just comment your findings. Otherwise, first `/assign`
issue to yourself and then `/close`.
directed to our support structures (see below) and then closed. Also, if the issue
is clearly abandoned or in the wrong place, it should be closed. Keep in mind that
only issue reporter, assignees and component organization members can close issue.
If you do not have such privilege, just comment your findings. Otherwise, first
`/assign` issue to yourself and then `/close`.

### Support Structures

Support requests should be directed to the following:

* [User documentation](https://kubernetes.io/docs/) and
[troubleshooting guide](https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/)

* [Stack Overflow](http://stackoverflow.com/questions/tagged/kubernetes) and
[ServerFault](http://serverfault.com/questions/tagged/google-kubernetes)

* [Slack](https://kubernetes.slack.com) ([registration](http://slack.k8s.io))
* Check out the [Slack Archive](http://kubernetes.slackarchive.io/) first.

* [Email/Groups](https://groups.google.com/forum/#!forum/kubernetes-users)

### User support response example

If you see support questions on [email protected] or issues asking for
support try to redirect them to Stack Overflow. Example response:

```code
Please re-post your question to [Stack Overflow]
(http://stackoverflow.com/questions/tagged/kubernetes).
We are trying to consolidate the channels to which questions for help/support
are posted so that we can improve our efficiency in responding to your requests,
and to make it easier for you to find answers to frequently asked questions and
how to address common use cases.
We regularly see messages posted in multiple forums, with the full response
thread only in one place or, worse, spread across multiple forums. Also, the
large volume of support issues on github is making it difficult for us to use
issues to identify real bugs.
Members of the Kubernetes community use Stack Overflow to field support
requests. Before posting a new question, please search Stack Overflow for answers
to similar questions, and also familiarize yourself with:
* [user documentation](http://kubernetes.io/docs/)
* [troubleshooting guide](https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/)
Again, thanks for using Kubernetes.
The Kubernetes Team
```

## Find the right SIG(s)
Components are divided among [Special Interest Groups (SIGs)](https://github.com/kubernetes/community/blob/master/sig-list.md). Find a proper SIG for the ownership of the issue using the bot:
Expand Down
50 changes: 0 additions & 50 deletions contributors/devel/on-call-build-cop.md

This file was deleted.

43 changes: 0 additions & 43 deletions contributors/devel/on-call-rotations.md

This file was deleted.

83 changes: 0 additions & 83 deletions contributors/devel/on-call-user-support.md

This file was deleted.

23 changes: 9 additions & 14 deletions contributors/devel/release/testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,10 +81,11 @@ When a test is failing, it must be quickly escalated to the correct owner. Test
are left to fail for days or weeks become toxic and create noise in the system health
metrics.

The [build cop] is expected to ensure that the release blocking tests remain
Each SIG is expected to ensure that the release blocking tests that belong to the SIG remain
perpetually healthy by monitoring the test grid and escalating failures.

On test failures, the build cop will follow the [sig escalation](#sig-test-escalation) path.
Failing tests that are not being addressed, can be escalated by following the
[sig escalation](#sig-test-escalation) path.

*Tests without a responsive owner should be assigned a new owner or disabled.*

Expand Down Expand Up @@ -132,14 +133,11 @@ urgent than persistent failures, but still expected to have a root cause investi

## Broken test workflow

SIGs are expected to proactively monitor and maintain their tests. The build cop will also
monitor the health of the entire project, but is intended as backup who will escalate
failures to the owning SIGs.
SIGs are expected to proactively monitor and maintain their tests.

- File an issue for the broken test so it can be referenced and discovered
- Set the following labels: `priority/failing-test`, `sig/*`
- Assign the issue to whoever is working on it
- Mention the current build cop (TODO: publish this somewhere)
- Root cause analysis of the test failure is performed by the owner
- **Note**: The owning SIG for a test can reassign ownership of a resolution to another SIG only after getting
approval from that SIG
Expand All @@ -152,13 +150,11 @@ failures to the owning SIGs.

## SIG test escalation

The build cop should monitor the overall test health of the project, and ensure ownership for any given
test does not fall through the cracks. When the build cop observer a test failure, they should first
search to see if an issue has been filed already, and if not (optionally file an issue and) escalate to the SIG
escalation point. If the escalation point is unresponsive within a day, the build cop should escalate to the SIG
googlegroup and/or slack channel, mentioning the SIG leads. If escalation through the SIG googlegroup,
slack channel and SIG leads is unsuccessful, the build cop should escalate to SIG release through the
googlegroup and slack - mentioning the SIG leads.
As a Kubernetes developers if you observe a test failure, first search to see if an issue has been filed already,
and if not (optionally file an issue and) escalate to the SIG escalation point.
If the escalation point is unresponsive within a day, escalate to the SIG googlegroup and/or slack channel,
mentioning the SIG leads. If escalation through the SIG googlegroup, slack channel and SIG leads is unsuccessful,
escalate to SIG release through the googlegroup and slack - mentioning the SIG leads.

The SIG escalation points should be bootstrapped from the [community sig list].

Expand All @@ -172,7 +168,6 @@ The SIG escalation points should be bootstrapped from the [community sig list].
[community sig list]: https://github.com/kubernetes/community/blob/master/sig-list.md
[triage tool]: https://storage.googleapis.com/k8s-gubernator/triage/index.html
[test grid]: https://k8s-testgrid.appspot.com/
[build cop]: https://github.com/kubernetes/community/blob/master/contributors/devel/on-call-build-cop.md
[release-master-blocking]: https://k8s-testgrid.appspot.com/release-master-blocking#Summary
[1.7-master-upgrade]: https://k8s-testgrid.appspot.com/1.7-master-upgrade#Summary
[1.6-master-upgrade]: https://k8s-testgrid.appspot.com/1.6-master-upgrade#Summary
Expand Down

0 comments on commit fc975ed

Please sign in to comment.