1
1
# Error Conditions and Reporting
2
2
3
- Elafros uses the standard Kubernetes API pattern for reporting
4
- configuration errors and current state of the system by writing the
3
+ Elafros uses [ the standard Kubernetes API
4
+ pattern] ( https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#typical-status-properties )
5
+ for reporting configuration errors and current state of the system by writing the
5
6
report in the ` status ` section. There are two mechanisms commonly used
6
- in status:
7
+ in ` status ` :
7
8
8
- * conditions represent true/false statements about the current state
9
- of the resource.
9
+ * ** Conditions ** represent true/false statements about the current
10
+ state of the resource.
10
11
11
- * other fields may provide status on the most recently retrieved state
12
+ * ** Other fields** may provide status on the most recently retrieved state
12
13
of the system as it relates to the resource (example: number of
13
14
replicas or traffic assignments).
14
15
15
16
Both of these mechanisms often include additional data from the
16
17
controller such as ` observedGeneration ` (to determine whether the
17
18
controller has seen the latest updates to the spec).
18
19
20
+ ## Conditions
21
+
19
22
Conditions provide an easy mechanism for client user interfaces to
20
23
indicate the current state of resources to a user. Elafros resources
21
- should follow these patterns:
22
-
23
- 1 . Each resource should define a small number of success conditions as
24
- Types. This should bias towards fewer than 5 high-level progress
25
- categories which are separate and meaningful for customers. For a
26
- Revision, these might be ` BuildSucceeded ` , ` ResourcesAvailable ` and
27
- ` ContainerHealthy ` .
28
- 2 . Where it makes sense, resources should define a top-level "happy
29
- state" condition type which indicates that the resource is set up
30
- correctly and ready to serve. For long-running resources, this
31
- condition type should be ` Ready ` . For objects which run to completion,
32
- the condition type should be ` Succeeded ` .
33
- 3 . Each condition's status should be one of:
34
- * ` Unknown ` when the controller is actively working to achieve the
35
- condition.
36
- * ` False ` when the reconciliation has failed. This should be a terminal
37
- failure state until user action occurs.
38
- * ` True ` when the reconciliation has succeeded. Once all transition
39
- conditions have succeeded, the "happy state" condition should be set
40
- to ` True ` .
41
-
42
- Type names should be chosen such that these interpretations are clear:
43
-
44
- > ` BuildSucceeded ` works because ` True ` = success and ` False ` = failure.
45
-
46
- > ` BuildCompleted ` does not, because ` False ` could mean "in-progress".
47
-
48
- Conditions may also be omitted entirely if reconciliation has been
49
- skipped. When all conditions have succeeded, the "happy state"
50
- should clear other conditions for output legibility. Until the
51
- "happy state" is set, conditions should be persisted for the
52
- benefit of UI tools representing progress on the outcome.
53
-
54
- 4 . Conditions with a status of ` False ` will also supply additional details
55
- about the failure in the "Reason" and "Message" sections -- both of
56
- these should be considered to have unlimited cardinality, unlike
57
- Type. If a resource has a "happy state" type, it will surface the
58
- ` Reason ` and ` Message ` from the first failing sub Condition.
24
+ should follow [ the k8s API conventions for
25
+ ` condition ` ] ( https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#typical-status-properties )
26
+ and the patterns described in this section.
27
+
28
+ ### Elafros condition ` type `
29
+
30
+ Each resource should define a small number of success conditions as
31
+ ` type ` s. This should bias towards fewer than ** 5** high-level progress
32
+ categories which are separate and meaningful for customers. For a
33
+ Revision, these might be ` BuildSucceeded ` , ` ResourcesAvailable ` and
34
+ ` ContainerHealthy ` .
35
+
36
+ Where it makes sense, resources should define a top-level "happy
37
+ state" condition ` type ` which indicates that the resource is set up
38
+ correctly and ready to serve.
39
+
40
+ * For long-running resources, this condition ` type ` should be
41
+ ` Ready ` .
42
+ * For objects which run to completion, the condition ` type ` should
43
+ be ` Succeeded ` .
44
+
45
+ ### Elafros condition ` status `
46
+
47
+ Each condition's ` status ` should be one of:
48
+
49
+ * ` Unknown ` when the controller is actively working to achieve the
50
+ condition.
51
+ * ` False ` when the reconciliation has failed. This should be a terminal
52
+ failure state until user action occurs.
53
+ * ` True ` when the reconciliation has succeeded. Once all transition
54
+ conditions have succeeded, the "happy state" condition should be set
55
+ to ` True ` .
56
+
57
+ Type names should be chosen such that these interpretations are clear:
58
+
59
+ * ` BuildSucceeded ` works because ` True ` = success and ` False ` = failure.
60
+ * ` BuildCompleted ` does not, because ` False ` could mean "in-progress".
61
+
62
+ Conditions may also be omitted entirely if reconciliation has been
63
+ skipped. When all conditions have succeeded, the "happy state"
64
+ should clear other conditions for output legibility. Until the
65
+ "happy state" is set, conditions should be persisted for the
66
+ benefit of UI tools representing progress on the outcome.
67
+
68
+ Conditions with a status of ` False ` will also supply additional details
69
+ about the failure in [ the "Reason" and "Message" sections] ( #condition-reason-and-message ) .
70
+
71
+ ### Elafros condition ` reason ` and ` message `
72
+
73
+ The fields ` reason ` and ` message ` should be considered to have unlimited
74
+ cardinality, unlike [ ` type ` ] ( #condition-type ) and [ ` status ` ] ( #condition-status ) .
75
+ If a resource has a "happy state" [ ` type ` ] ( #condition-type ) , it will surface the
76
+ ` reason ` and ` message ` from the first failing sub Condition.
77
+
78
+ The values ` reason ` takes on (while camelcase words) should be treated as opaque.
79
+ Clients shouldn't programmatically act on their values, but bias towards using
80
+ ` reason ` as a terse explanation of the state for end-users, whereas ` message `
81
+ is the long-form of this.
82
+
83
+ ## Example scenarios
59
84
60
85
Example user and system error scenarios are included below along with
61
86
how the status is presented to CLI and UI tools via the API.
62
87
63
88
* [ Deployment-Related Failures] ( #deployment-related-failures )
64
- * [ Revision failed to become Ready] ( #revision-failed-to-become-ready )
65
- * [ Build failed] ( #build-failed )
66
- * [ Resource exhausted while creating a revision] ( #resource-exhausted-while-creating-a-revision )
67
- * [ Container image not present in repository] ( #container-image-not-present-in-repository )
68
- * [ Container image fails at startup on Revision] ( #container-image-fails-at-startup-on-revision )
69
- * [ Deployment progressing slowly/stuck] ( #deployment-progressing-slowly-stuck )
89
+ * [ Revision failed to become Ready] ( #revision-failed-to-become-ready )
90
+ * [ Build failed] ( #build-failed )
91
+ * [ Resource exhausted while creating a revision] ( #resource-exhausted-while-creating-a-revision )
92
+ * [ Container image not present in repository] ( #container-image-not-present-in-repository )
93
+ * [ Container image fails at startup on Revision] ( #container-image-fails-at-startup-on-revision )
94
+ * [ Deployment progressing slowly/stuck] ( #deployment-progressing-slowly-stuck )
70
95
* [ Routing-Related Failures] ( #routing-related-failures )
71
- * [ Traffic not assigned] ( #traffic-not-assigned )
72
- * [ Revision not found by Route] ( #revision-not-found-by-route )
73
- * [ Configuration not found by Route] ( #configuration-not-found-by-route )
74
- * [ Latest Revision of a Configuration deleted] ( #latest-revision-of-a-configuration-deleted )
75
- * [ Traffic shift progressing slowly/stuck] ( #traffic-shift-progressing-slowly-stuck )
76
-
96
+ * [ Traffic not assigned] ( #traffic-not-assigned )
97
+ * [ Revision not found by Route] ( #revision-not-found-by-route )
98
+ * [ Configuration not found by Route] ( #configuration-not-found-by-route )
99
+ * [ Latest Revision of a Configuration deleted] ( #latest-revision-of-a-configuration-deleted )
100
+ * [ Traffic shift progressing slowly/stuck] ( #traffic-shift-progressing-slowly-stuck )
77
101
78
- # Deployment-Related Failures
102
+ ## Deployment-Related Failures
79
103
80
104
The following scenarios will generally occur when attempting to deploy
81
105
changes to the software stack by updating the Service or Configuration
82
106
resources to cause a new Revision to be created.
83
107
84
-
85
- ## Revision failed to become Ready
108
+ ### Revision failed to become Ready
86
109
87
110
If the latest Revision fails to become ` Ready ` for any reason within
88
111
some reasonable timeframe, the Configuration and Service should signal
@@ -92,6 +115,7 @@ message from the `Ready` condition on the Revision.
92
115
``` http
93
116
GET /api/elafros.dev/v1alpha1/namespaces/default/configurations/my-service
94
117
```
118
+
95
119
``` yaml
96
120
...
97
121
status :
@@ -107,6 +131,7 @@ status:
107
131
` ` ` http
108
132
GET /api/elafros.dev/v1alpha1/namespaces/default/services/my-service
109
133
```
134
+
110
135
``` yaml
111
136
...
112
137
status :
@@ -121,8 +146,7 @@ status:
121
146
meassage : " Build Step XYZ failed with error message: $LASTLOGLINE"
122
147
` ` `
123
148
124
-
125
- ## Build failed
149
+ ### Build failed
126
150
127
151
If the Build steps failed while creating a Revision, you can examine
128
152
the ` Failed` condition on the Build or the `BuildSucceeded` condition
@@ -134,6 +158,7 @@ build.
134
158
` ` ` http
135
159
GET /apis/build.dev/v1alpha1/namespaces/default/builds/build-1acub3
136
160
` ` `
161
+
137
162
` ` ` yaml
138
163
...
139
164
status:
@@ -149,6 +174,7 @@ status:
149
174
` ` ` http
150
175
GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/abc
151
176
` ` `
177
+
152
178
` ` ` yaml
153
179
...
154
180
status:
@@ -163,8 +189,7 @@ status:
163
189
message: "Step XYZ failed with error message: $LASTLOGLINE"
164
190
` ` `
165
191
166
-
167
- # # Resource exhausted while creating a revision
192
+ # ## Resource exhausted while creating a revision
168
193
169
194
Since a Revision is only metadata, the Revision will be created, but
170
195
will have a condition indicating the underlying failure, possibly
@@ -175,6 +200,7 @@ into the underlying resources in the hosting environment.
175
200
` ` ` http
176
201
GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/abc
177
202
` ` `
203
+
178
204
` ` ` yaml
179
205
...
180
206
status:
@@ -189,8 +215,7 @@ status:
189
215
message: "The controller could not create a deployment named ela-abc-e13ac."
190
216
` ` `
191
217
192
-
193
- # # Container image not present in repository
218
+ # ## Container image not present in repository
194
219
195
220
Revisions might be created while a Build is still creating the
196
221
container image or uploading it to the repository. If the build is
@@ -208,6 +233,7 @@ the original docker image is deleted.
208
233
` ` ` http
209
234
GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/abc
210
235
` ` `
236
+
211
237
` ` ` yaml
212
238
...
213
239
status:
@@ -222,8 +248,7 @@ status:
222
248
message: "Unable to fetch image 'gcr.io/...': <literal error>"
223
249
` ` `
224
250
225
-
226
- # # Container image fails at startup on Revision
251
+ # ## Container image fails at startup on Revision
227
252
228
253
Particularly for development cases with interpreted languages like
229
254
Node or Python, syntax errors might only be caught at container
@@ -241,6 +266,7 @@ be used to fetch the logs for the failed process.
241
266
` ` ` http
242
267
GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/abc
243
268
` ` `
269
+
244
270
` ` ` yaml
245
271
...
246
272
status:
@@ -256,8 +282,7 @@ status:
256
282
message: "Container failed with: SyntaxError: Unexpected identifier"
257
283
` ` `
258
284
259
-
260
- # # Deployment progressing slowly/stuck
285
+ # ## Deployment progressing slowly/stuck
261
286
262
287
See [the kubernetes documentation for how this is handled for
263
288
Deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#failed-deployment). For
@@ -275,6 +300,7 @@ might attempt to make progress even after the deadline.
275
300
` ` ` http
276
301
GET /apis/elafros.dev/v1alpha1/namespaces/default/revisions/abc
277
302
` ` `
303
+
278
304
` ` ` yaml
279
305
...
280
306
status:
@@ -285,16 +311,14 @@ status:
285
311
message: "Did not pass readiness checks in 120 seconds."
286
312
` ` `
287
313
288
-
289
- # Routing-Related Failures
314
+ # # Routing-Related Failures
290
315
291
316
The following scenarios are most likely to occur when attempting to
292
317
roll out a change by shifting traffic to a new Revision. Some of these
293
318
conditions can also occur under normal operations due to (for example)
294
319
operator error causing live resources to be deleted.
295
320
296
-
297
- # # Traffic not assigned
321
+ # ## Traffic not assigned
298
322
299
323
If some percentage of traffic cannot be assigned to a live
300
324
(materialized or scaled-to-zero) Revision, the Route will report the
@@ -305,6 +329,7 @@ the first Revision is unable to serve:
305
329
` ` ` http
306
330
GET /apis/elafros.dev/v1alpha1/namespaces/default/routes/my-service
307
331
` ` `
332
+
308
333
` ` ` yaml
309
334
...
310
335
status:
@@ -322,6 +347,7 @@ status:
322
347
` ` ` http
323
348
GET /apis/elafros.dev/v1alpha1/namespaces/default/services/my-service
324
349
` ` `
350
+
325
351
` ` ` yaml
326
352
...
327
353
status:
@@ -339,8 +365,7 @@ status:
339
365
message: "Container failed with: SyntaxError: Unexpected identifier"
340
366
` ` `
341
367
342
-
343
- # # Revision not found by Route
368
+ # ## Revision not found by Route
344
369
345
370
If a Revision is referenced in a Route's `spec.traffic`, and the Revision
346
371
cannot be found, the `AllTrafficAssigned` condition will be marked as False
@@ -350,6 +375,7 @@ Route's `status.traffic`.
350
375
` ` ` http
351
376
GET /apis/elafros.dev/v1alpha1/namespaces/default/routes/my-service
352
377
` ` `
378
+
353
379
` ` ` yaml
354
380
...
355
381
status:
@@ -368,8 +394,7 @@ status:
368
394
message: "Revision 'qyzz' referenced in traffic not found"
369
395
` ` `
370
396
371
-
372
- # # Configuration not found by Route
397
+ # ## Configuration not found by Route
373
398
374
399
If a Route references the `latestReadyRevisionName` of a Configuration
375
400
and the Configuration cannot be found, the `AllTrafficAssigned` condition
@@ -379,6 +404,7 @@ Revision will be omitted from the Route's `status.traffic`.
379
404
` ` ` http
380
405
GET /apis/elafros.dev/v1alpha1/namespaces/default/routes/my-service
381
406
` ` `
407
+
382
408
` ` ` yaml
383
409
...
384
410
status:
@@ -394,8 +420,7 @@ status:
394
420
message: "Configuration 'abc' referenced in traffic not found"
395
421
` ` `
396
422
397
-
398
- # # Latest Revision of a Configuration deleted
423
+ # ## Latest Revision of a Configuration deleted
399
424
400
425
If the most recent Revision is deleted, the Configuration will set
401
426
` LatestRevisionReady` to False.
@@ -409,6 +434,7 @@ set the `AllTrafficAssigned` condition to False with reason
409
434
` ` ` http
410
435
GET /apis/elafros.dev/v1alpha1/namespaces/default/configurations/my-service
411
436
` ` `
437
+
412
438
` ` ` yaml
413
439
...
414
440
metadata:
@@ -426,8 +452,7 @@ status:
426
452
observedGeneration: 1234
427
453
` ` `
428
454
429
-
430
- # # Traffic shift progressing slowly/stuck
455
+ # ## Traffic shift progressing slowly/stuck
431
456
432
457
Similar to deployment slowness, if the transfer of traffic (either via
433
458
gradual or abrupt rollout) takes longer than a certain timeout to
@@ -437,6 +462,7 @@ True, but the reason will be set to `ProgressDeadlineExceeded`.
437
462
` ` ` http
438
463
GET /apis/elafros.dev/v1alpha1/namespaces/default/routes/my-service
439
464
` ` `
465
+
440
466
` ` ` yaml
441
467
...
442
468
status:
@@ -451,4 +477,4 @@ status:
451
477
reason: ProgressDeadlineExceeded
452
478
# reason is a short status, message provides error details
453
479
message: "Unable to update traffic split for more than 120 seconds."
454
- ` ` `
480
+ ` ` `
0 commit comments