Skip to content

Commit 2c26c4e

Browse files
bobcatfishgoogle-prow-robot
authored andcommittedMay 30, 2018
Update condition type in spec and do not assert against reason (knative#903)
* Some spec docs were using `RolloutComplete` for the condition type, however as errors.md documents and as discussed in knative#351, the type should actually be `Ready` * The conformance tests were asserting against the condition `Reason` but as per knative#351 and errors.md, values in this field are for the user to consume and should be transpart to client applications. * errors.md didn't make the latter as clear as mattmoor@'s comment in knative#351 so I copied some of his comment into the doc Also reorganized some of errors.md to make it easier to identify, by field, what requirements are.
1 parent 4210e45 commit 2c26c4e

File tree

4 files changed

+117
-99
lines changed

4 files changed

+117
-99
lines changed
 

‎docs/spec/errors.md

+108-82
Original file line numberDiff line numberDiff line change
@@ -1,88 +1,111 @@
11
# Error Conditions and Reporting
22

3-
Elafros uses the standard Kubernetes API pattern for reporting
4-
configuration errors and current state of the system by writing the
3+
Elafros uses [the standard Kubernetes API
4+
pattern](https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#typical-status-properties)
5+
for reporting configuration errors and current state of the system by writing the
56
report in the `status` section. There are two mechanisms commonly used
6-
in status:
7+
in `status`:
78

8-
* conditions represent true/false statements about the current state
9-
of the resource.
9+
* **Conditions** represent true/false statements about the current
10+
state of the resource.
1011

11-
* other fields may provide status on the most recently retrieved state
12+
* **Other fields** may provide status on the most recently retrieved state
1213
of the system as it relates to the resource (example: number of
1314
replicas or traffic assignments).
1415

1516
Both of these mechanisms often include additional data from the
1617
controller such as `observedGeneration` (to determine whether the
1718
controller has seen the latest updates to the spec).
1819

20+
## Conditions
21+
1922
Conditions provide an easy mechanism for client user interfaces to
2023
indicate the current state of resources to a user. Elafros resources
21-
should follow these patterns:
22-
23-
1. Each resource should define a small number of success conditions as
24-
Types. This should bias towards fewer than 5 high-level progress
25-
categories which are separate and meaningful for customers. For a
26-
Revision, these might be `BuildSucceeded`, `ResourcesAvailable` and
27-
`ContainerHealthy`.
28-
2. Where it makes sense, resources should define a top-level "happy
29-
state" condition type which indicates that the resource is set up
30-
correctly and ready to serve. For long-running resources, this
31-
condition type should be `Ready`. For objects which run to completion,
32-
the condition type should be `Succeeded`.
33-
3. Each condition's status should be one of:
34-
* `Unknown` when the controller is actively working to achieve the
35-
condition.
36-
* `False` when the reconciliation has failed. This should be a terminal
37-
failure state until user action occurs.
38-
* `True ` when the reconciliation has succeeded. Once all transition
39-
conditions have succeeded, the "happy state" condition should be set
40-
to `True`.
41-
42-
Type names should be chosen such that these interpretations are clear:
43-
44-
> `BuildSucceeded` works because `True` = success and `False` = failure.
45-
46-
> `BuildCompleted` does not, because `False` could mean "in-progress".
47-
48-
Conditions may also be omitted entirely if reconciliation has been
49-
skipped. When all conditions have succeeded, the "happy state"
50-
should clear other conditions for output legibility. Until the
51-
"happy state" is set, conditions should be persisted for the
52-
benefit of UI tools representing progress on the outcome.
53-
54-
4. Conditions with a status of `False` will also supply additional details
55-
about the failure in the "Reason" and "Message" sections -- both of
56-
these should be considered to have unlimited cardinality, unlike
57-
Type. If a resource has a "happy state" type, it will surface the
58-
`Reason` and `Message` from the first failing sub Condition.
24+
should follow [the k8s API conventions for
25+
`condition`](https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#typical-status-properties)
26+
and the patterns described in this section.
27+
28+
### Elafros condition `type`
29+
30+
Each resource should define a small number of success conditions as
31+
`type`s. This should bias towards fewer than **5** high-level progress
32+
categories which are separate and meaningful for customers. For a
33+
Revision, these might be `BuildSucceeded`, `ResourcesAvailable` and
34+
`ContainerHealthy`.
35+
36+
Where it makes sense, resources should define a top-level "happy
37+
state" condition `type` which indicates that the resource is set up
38+
correctly and ready to serve.
39+
40+
* For long-running resources, this condition `type` should be
41+
`Ready`.
42+
* For objects which run to completion, the condition `type` should
43+
be `Succeeded`.
44+
45+
### Elafros condition `status`
46+
47+
Each condition's `status` should be one of:
48+
49+
* `Unknown` when the controller is actively working to achieve the
50+
condition.
51+
* `False` when the reconciliation has failed. This should be a terminal
52+
failure state until user action occurs.
53+
* `True` when the reconciliation has succeeded. Once all transition
54+
conditions have succeeded, the "happy state" condition should be set
55+
to `True`.
56+
57+
Type names should be chosen such that these interpretations are clear:
58+
59+
* `BuildSucceeded` works because `True` = success and `False` = failure.
60+
* `BuildCompleted` does not, because `False` could mean "in-progress".
61+
62+
Conditions may also be omitted entirely if reconciliation has been
63+
skipped. When all conditions have succeeded, the "happy state"
64+
should clear other conditions for output legibility. Until the
65+
"happy state" is set, conditions should be persisted for the
66+
benefit of UI tools representing progress on the outcome.
67+
68+
Conditions with a status of `False` will also supply additional details
69+
about the failure in [the "Reason" and "Message" sections](#condition-reason-and-message).
70+
71+
### Elafros condition `reason` and `message`
72+
73+
The fields `reason` and `message` should be considered to have unlimited
74+
cardinality, unlike [`type`](#condition-type) and [`status`](#condition-status).
75+
If a resource has a "happy state" [`type`](#condition-type), it will surface the
76+
`reason` and `message` from the first failing sub Condition.
77+
78+
The values `reason` takes on (while camelcase words) should be treated as opaque.
79+
Clients shouldn't programmatically act on their values, but bias towards using
80+
`reason` as a terse explanation of the state for end-users, whereas `message`
81+
is the long-form of this.
82+
83+
## Example scenarios
5984

6085
Example user and system error scenarios are included below along with
6186
how the status is presented to CLI and UI tools via the API.
6287

6388
* [Deployment-Related Failures](#deployment-related-failures)
64-
* [Revision failed to become Ready](#revision-failed-to-become-ready)
65-
* [Build failed](#build-failed)
66-
* [Resource exhausted while creating a revision](#resource-exhausted-while-creating-a-revision)
67-
* [Container image not present in repository](#container-image-not-present-in-repository)
68-
* [Container image fails at startup on Revision](#container-image-fails-at-startup-on-revision)
69-
* [Deployment progressing slowly/stuck](#deployment-progressing-slowly-stuck)
89+
* [Revision failed to become Ready](#revision-failed-to-become-ready)
90+
* [Build failed](#build-failed)
91+
* [Resource exhausted while creating a revision](#resource-exhausted-while-creating-a-revision)
92+
* [Container image not present in repository](#container-image-not-present-in-repository)
93+
* [Container image fails at startup on Revision](#container-image-fails-at-startup-on-revision)
94+
* [Deployment progressing slowly/stuck](#deployment-progressing-slowly-stuck)
7095
* [Routing-Related Failures](#routing-related-failures)
71-
* [Traffic not assigned](#traffic-not-assigned)
72-
* [Revision not found by Route](#revision-not-found-by-route)
73-
* [Configuration not found by Route](#configuration-not-found-by-route)
74-
* [Latest Revision of a Configuration deleted](#latest-revision-of-a-configuration-deleted)
75-
* [Traffic shift progressing slowly/stuck](#traffic-shift-progressing-slowly-stuck)
76-
96+
* [Traffic not assigned](#traffic-not-assigned)
97+
* [Revision not found by Route](#revision-not-found-by-route)
98+
* [Configuration not found by Route](#configuration-not-found-by-route)
99+
* [Latest Revision of a Configuration deleted](#latest-revision-of-a-configuration-deleted)
100+
* [Traffic shift progressing slowly/stuck](#traffic-shift-progressing-slowly-stuck)
77101

78-
# Deployment-Related Failures
102+
## Deployment-Related Failures
79103

80104
The following scenarios will generally occur when attempting to deploy
81105
changes to the software stack by updating the Service or Configuration
82106
resources to cause a new Revision to be created.
83107

84-
85-
## Revision failed to become Ready
108+
### Revision failed to become Ready
86109

87110
If the latest Revision fails to become `Ready` for any reason within
88111
some reasonable timeframe, the Configuration and Service should signal
@@ -92,6 +115,7 @@ message from the `Ready` condition on the Revision.
92115
```http
93116
GET /api/elafros.dev/v1alpha1/namespaces/default/configurations/my-service
94117
```
118+
95119
```yaml
96120
...
97121
status:
@@ -107,6 +131,7 @@ status:
107131
```http
108132
GET /api/elafros.dev/v1alpha1/namespaces/default/services/my-service
109133
```
134+
110135
```yaml
111136
...
112137
status:
@@ -121,8 +146,7 @@ status:
121146
meassage: "Build Step XYZ failed with error message: $LASTLOGLINE"
122147
```
123148
124-
125-
## Build failed
149+
### Build failed
126150
127151
If the Build steps failed while creating a Revision, you can examine
128152
the `Failed` condition on the Build or the `BuildSucceeded` condition
@@ -134,6 +158,7 @@ build.
134158
```http
135159
GET /apis/build.dev/v1alpha1/namespaces/default/builds/build-1acub3
136160
```
161+
137162
```yaml
138163
...
139164
status:
@@ -149,6 +174,7 @@ status:
149174
```http
150175
GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/abc
151176
```
177+
152178
```yaml
153179
...
154180
status:
@@ -163,8 +189,7 @@ status:
163189
message: "Step XYZ failed with error message: $LASTLOGLINE"
164190
```
165191

166-
167-
## Resource exhausted while creating a revision
192+
### Resource exhausted while creating a revision
168193

169194
Since a Revision is only metadata, the Revision will be created, but
170195
will have a condition indicating the underlying failure, possibly
@@ -175,6 +200,7 @@ into the underlying resources in the hosting environment.
175200
```http
176201
GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/abc
177202
```
203+
178204
```yaml
179205
...
180206
status:
@@ -189,8 +215,7 @@ status:
189215
message: "The controller could not create a deployment named ela-abc-e13ac."
190216
```
191217

192-
193-
## Container image not present in repository
218+
### Container image not present in repository
194219

195220
Revisions might be created while a Build is still creating the
196221
container image or uploading it to the repository. If the build is
@@ -208,6 +233,7 @@ the original docker image is deleted.
208233
```http
209234
GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/abc
210235
```
236+
211237
```yaml
212238
...
213239
status:
@@ -222,8 +248,7 @@ status:
222248
message: "Unable to fetch image 'gcr.io/...': <literal error>"
223249
```
224250

225-
226-
## Container image fails at startup on Revision
251+
### Container image fails at startup on Revision
227252

228253
Particularly for development cases with interpreted languages like
229254
Node or Python, syntax errors might only be caught at container
@@ -241,6 +266,7 @@ be used to fetch the logs for the failed process.
241266
```http
242267
GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/abc
243268
```
269+
244270
```yaml
245271
...
246272
status:
@@ -256,8 +282,7 @@ status:
256282
message: "Container failed with: SyntaxError: Unexpected identifier"
257283
```
258284

259-
260-
## Deployment progressing slowly/stuck
285+
### Deployment progressing slowly/stuck
261286

262287
See [the kubernetes documentation for how this is handled for
263288
Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#failed-deployment). For
@@ -275,6 +300,7 @@ might attempt to make progress even after the deadline.
275300
```http
276301
GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/abc
277302
```
303+
278304
```yaml
279305
...
280306
status:
@@ -285,16 +311,14 @@ status:
285311
message: "Did not pass readiness checks in 120 seconds."
286312
```
287313

288-
289-
# Routing-Related Failures
314+
## Routing-Related Failures
290315

291316
The following scenarios are most likely to occur when attempting to
292317
roll out a change by shifting traffic to a new Revision. Some of these
293318
conditions can also occur under normal operations due to (for example)
294319
operator error causing live resources to be deleted.
295320

296-
297-
## Traffic not assigned
321+
### Traffic not assigned
298322

299323
If some percentage of traffic cannot be assigned to a live
300324
(materialized or scaled-to-zero) Revision, the Route will report the
@@ -305,6 +329,7 @@ the first Revision is unable to serve:
305329
```http
306330
GET /apis/elafros.dev/v1alpha1/namespaces/default/routes/my-service
307331
```
332+
308333
```yaml
309334
...
310335
status:
@@ -322,6 +347,7 @@ status:
322347
```http
323348
GET /apis/elafros.dev/v1alpha1/namespaces/default/services/my-service
324349
```
350+
325351
```yaml
326352
...
327353
status:
@@ -339,8 +365,7 @@ status:
339365
message: "Container failed with: SyntaxError: Unexpected identifier"
340366
```
341367

342-
343-
## Revision not found by Route
368+
### Revision not found by Route
344369

345370
If a Revision is referenced in a Route's `spec.traffic`, and the Revision
346371
cannot be found, the `AllTrafficAssigned` condition will be marked as False
@@ -350,6 +375,7 @@ Route's `status.traffic`.
350375
```http
351376
GET /apis/elafros.dev/v1alpha1/namespaces/default/routes/my-service
352377
```
378+
353379
```yaml
354380
...
355381
status:
@@ -368,8 +394,7 @@ status:
368394
message: "Revision 'qyzz' referenced in traffic not found"
369395
```
370396

371-
372-
## Configuration not found by Route
397+
### Configuration not found by Route
373398

374399
If a Route references the `latestReadyRevisionName` of a Configuration
375400
and the Configuration cannot be found, the `AllTrafficAssigned` condition
@@ -379,6 +404,7 @@ Revision will be omitted from the Route's `status.traffic`.
379404
```http
380405
GET /apis/elafros.dev/v1alpha1/namespaces/default/routes/my-service
381406
```
407+
382408
```yaml
383409
...
384410
status:
@@ -394,8 +420,7 @@ status:
394420
message: "Configuration 'abc' referenced in traffic not found"
395421
```
396422

397-
398-
## Latest Revision of a Configuration deleted
423+
### Latest Revision of a Configuration deleted
399424

400425
If the most recent Revision is deleted, the Configuration will set
401426
`LatestRevisionReady` to False.
@@ -409,6 +434,7 @@ set the `AllTrafficAssigned` condition to False with reason
409434
```http
410435
GET /apis/elafros.dev/v1alpha1/namespaces/default/configurations/my-service
411436
```
437+
412438
```yaml
413439
...
414440
metadata:
@@ -426,8 +452,7 @@ status:
426452
observedGeneration: 1234
427453
```
428454

429-
430-
## Traffic shift progressing slowly/stuck
455+
### Traffic shift progressing slowly/stuck
431456

432457
Similar to deployment slowness, if the transfer of traffic (either via
433458
gradual or abrupt rollout) takes longer than a certain timeout to
@@ -437,6 +462,7 @@ True, but the reason will be set to `ProgressDeadlineExceeded`.
437462
```http
438463
GET /apis/elafros.dev/v1alpha1/namespaces/default/routes/my-service
439464
```
465+
440466
```yaml
441467
...
442468
status:
@@ -451,4 +477,4 @@ status:
451477
reason: ProgressDeadlineExceeded
452478
# reason is a short status, message provides error details
453479
message: "Unable to update traffic split for more than 120 seconds."
454-
```
480+
```

‎docs/spec/normative_examples.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,7 @@ status:
217217
- revisionName: def
218218
percent: 25
219219
conditions:
220-
- type: RolloutComplete
220+
- type: Ready
221221
status: False
222222
```
223223

@@ -243,7 +243,7 @@ status:
243243
- revisionName: def
244244
percent: 100
245245
conditions:
246-
- type: RolloutComplete
246+
- type: Ready
247247
status: True
248248
...
249249
```
@@ -475,7 +475,7 @@ status:
475475
percent: 100
476476
477477
conditions:
478-
- type: RolloutComplete
478+
- type: Ready
479479
status: True
480480
481481
observedGeneration: 2145
@@ -713,7 +713,7 @@ status:
713713
name: next # addressable as next.my-service.default.mydomain.com
714714
percent: 0
715715
conditions:
716-
- type: RolloutComplete
716+
- type: Ready
717717
status: True
718718
```
719719

@@ -787,7 +787,7 @@ status:
787787
name: next # addressable as next.my-service.default.mydomain.com
788788
percent: 0
789789
conditions:
790-
- type: RolloutComplete
790+
- type: Ready
791791
status: True
792792
```
793793

‎docs/spec/spec.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ status:
8787
- ...
8888

8989
conditions: # See also the [error conditions documentation](errors.md)
90-
- type: RolloutComplete
90+
- type: Ready
9191
status: True
9292
- type: AllTrafficAssigned
9393
status: True

‎test/states.go

+3-11
Original file line numberDiff line numberDiff line change
@@ -53,18 +53,10 @@ func IsRevisionReady(revisionName string) func(r *v1alpha1.Revision) (bool, erro
5353
if r.Status.Conditions[0].Type != v1alpha1.RevisionConditionType("Ready") {
5454
return true, fmt.Errorf("Expected Revision to have a \"Ready\" status but only had %s", r.Status.Conditions[0].Type)
5555
}
56-
if r.Status.Conditions[0].Status == corev1.ConditionStatus("Unknown") {
57-
if r.Status.Conditions[0].Reason != "Deploying" {
58-
return true, fmt.Errorf("If the Revision isn't ready the reason should be to be \"Deploying\" but was %s", r.Status.Conditions[0].Reason)
59-
}
60-
} else {
61-
if r.Status.Conditions[0].Status != corev1.ConditionStatus("True") {
62-
return true, fmt.Errorf("Expected Revision Status Condition Status to be True or Unknown but was %s", r.Status.Conditions[0].Status)
63-
}
64-
if r.Status.Conditions[0].Reason != "ServiceReady" {
65-
return true, fmt.Errorf("If the Revision is ready the Reason should be \"ServiceReady\" but was %s", r.Status.Conditions[0].Status)
66-
}
56+
if r.Status.Conditions[0].Status == corev1.ConditionTrue {
6757
return true, nil
58+
} else if r.Status.Conditions[0].Status != corev1.ConditionUnknown {
59+
return true, fmt.Errorf("Expected Revision Status Condition Status to be True or Unknown but was %s", r.Status.Conditions[0].Status)
6860
}
6961
}
7062
return false, nil

0 commit comments

Comments
 (0)
Please sign in to comment.