You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Improve and clarify lastHealthyAt and lastFailedAt functionality
Change definitelyHealthy to safeAsLastReplica
Change variable names in TimestampAfterTimestamp function
Set LastHealthyAt every time a replica appears RW in an engine
Add an atomic utility function for replica failed fields
Explain replica health related fields in comments and CRD
TimestampAfterTimestamp returns an error
Fix one more unit test
Further clarify the reason for getSafeAsLastReplicaCount
Signed-off-by: Eric Weber <[email protected]>
@@ -710,24 +708,21 @@ func (c *VolumeController) ReconcileEngineReplicaState(v *longhorn.Volume, es ma
710
708
}
711
709
ifr.Spec.FailedAt=="" {
712
710
log.Warnf("Replica %v is marked as failed, current state %v, mode %v, engine name %v, active %v", r.Name, r.Status.CurrentState, mode, r.Spec.EngineName, r.Spec.Active)
713
-
now:=c.nowHandler()
714
-
r.Spec.FailedAt=now
715
-
r.Spec.LastFailedAt=now
711
+
setReplicaFailedAt(r, c.nowHandler())
716
712
e.Spec.LogRequested=true
717
713
r.Spec.LogRequested=true
718
714
}
719
715
r.Spec.DesireState=longhorn.InstanceStateStopped
720
716
} elseifmode==longhorn.ReplicaModeRW {
721
-
// record once replica became healthy, so if it
722
-
// failed in the future, we can tell it apart
723
-
// from replica failed during rebuilding
717
+
now:=c.nowHandler()
724
718
ifr.Spec.HealthyAt=="" {
725
719
c.backoff.DeleteEntry(r.Name)
726
-
now:=c.nowHandler()
720
+
// Set HealthyAt to distinguish this replica from one that has never been rebuilt.
727
721
r.Spec.HealthyAt=now
728
-
r.Spec.LastHealthyAt=now
729
722
r.Spec.RebuildRetryCount=0
730
723
}
724
+
// Set LastHealthyAt to record the last time this replica became RW in an engine.
725
+
r.Spec.LastHealthyAt=now
731
726
healthyCount++
732
727
}
733
728
}
@@ -739,9 +734,7 @@ func (c *VolumeController) ReconcileEngineReplicaState(v *longhorn.Volume, es ma
Copy file name to clipboardexpand all lines: k8s/crds.yaml
+4
Original file line number
Diff line number
Diff line change
@@ -2640,16 +2640,20 @@ spec:
2640
2640
evictionRequested:
2641
2641
type: boolean
2642
2642
failedAt:
2643
+
description: FailedAt is set when a running replica fails or when a running engine is unable to use a replica for any reason. FailedAt indicates the time the failure occurred. When FailedAt is set, a replica is likely to have useful (though possibly stale) data. A replica with FailedAt set must be rebuilt from a non-failed replica (or it can be used in a salvage if all replicas are failed). FailedAt is cleared before a rebuild or salvage.
2643
2644
type: string
2644
2645
hardNodeAffinity:
2645
2646
type: string
2646
2647
healthyAt:
2648
+
description: HealthyAt is set the first time a replica becomes read/write in an engine after creation or rebuild. HealthyAt indicates the time the last successful rebuild occurred. When HealthyAt is set, a replica is likely to have useful (though possibly stale) data. HealthyAt is cleared before a rebuild.
2647
2649
type: string
2648
2650
image:
2649
2651
type: string
2650
2652
lastFailedAt:
2653
+
description: LastFailedAt is always set at the same time as FailedAt. Unlike FailedAt, LastFailedAt is never cleared. LastFailedAt is not a reliable indicator of the state of a replica's data. For example, a replica with LastFailedAt may already be healthy and in use again. However, because it is never cleared, it can be compared to LastHealthyAt to help prevent dangerous replica deletion in some corner cases.
2651
2654
type: string
2652
2655
lastHealthyAt:
2656
+
description: LastHealthyAt is set every time a replica becomes read/write in an engine. Unlike HealthyAt, LastHealthyAt is never cleared. LastHealthyAt is not a reliable indicator of the state of a replica's data. For example, a replica with LastHealthyAt set may be in the middle of a rebuild. However, because it is never cleared, it can be compared to LastFailedAt to help prevent dangerous replica deletion in some corner cases.
0 commit comments