Skip to content

Commit

Permalink
Fix some more links
Browse files Browse the repository at this point in the history
  • Loading branch information
cblecker committed Dec 22, 2017
1 parent 9470041 commit f2816c8
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 7 deletions.
2 changes: 1 addition & 1 deletion contributors/guide/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ SIGs also have their own CONTRIBUTING.md files, which may contain extra informat

Like everything else in Kubernetes, a SIG is an open, community, effort. Anybody is welcome to jump into a SIG and begin fixing issues, critiquing design proposals and reviewing code. SIGs have regular [video meetings](https://kubernetes.io/community/) which everyone is welcome to. Each SIG has a kubernetes slack channel that you can join as well.

There is an entire SIG ([sig-contributor-experience](../../sig-contributor-experience/README.md)) devoted to improving your experience as a contributor.
There is an entire SIG ([sig-contributor-experience](/sig-contributor-experience/README.md)) devoted to improving your experience as a contributor.
Contributing to Kubernetes should be easy. If you find a rough edge, let us know! Better yet, help us fix it by joining the SIG; just
show up to one of the [bi-weekly meetings](https://docs.google.com/document/d/1qf-02B7EOrItQgwXFxgqZ5qjW0mtfu5qkYIF1Hl4ZLI/edit).

Expand Down
12 changes: 6 additions & 6 deletions sig-storage/1.3-retrospective/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
**Collaborators:** Saad Ali ([@saad-ali](https://github.com/saad-ali)), Paul Morie ([@pmorie](https://github.com/pmorie)), Tim Hockins ([@thockin](https://github.com/thockin)), Steve Watt ([@wattsteve](https://github.com/wattsteve))

**Links:**
* [1.3 Schedule Dates](https://git.k8s.io/features/release-1.3/release-1.3.md)
* [1.3 Schedule Dates](https://git.k8s.io/sig-release/releases/release-1.3/release-1.3.md)

## Purpose
This document is intended to chronicle the decisions made by the [Storage SIG](/sig-storage/README.md) near the end of the Kubernetes 1.3 release with the storage stack that were not well understood by the wider community. This document should explain those decisions, why the SIG made the exception, detail the impact, and offer lessons learned for the future.
Expand All @@ -18,11 +18,11 @@ The PV/PVC controller handles the connection of provisioned storage volumes to a

A characteristic list of issues (as not all of them were well captured in GitHub issues) include:

1. Approximately a 5% rate of incidents under controlled conditions where operations related to Claims binding to Persistent Volumes would fail.
1. Approximately a 5% rate of incidents under controlled conditions where operations related to Claims binding to Persistent Volumes would fail.
2. Rapid creation and deletion of pods referencing the same volume could result in attach/detach events being triggered out of order resulting in detaching of volumes in use (resulting in data loss/corruption). The current 1.2 work around was to fail the operation. This led to surprises and failures in launching pods that referenced the same volume.
3. Item #2 created instability in use of multiple pods referencing the same Volume (a supported feature) even when only one pod uses it at a time ([#19953](https://github.com/kubernetes/kubernetes/issues/19953))
4. Hiccups in the operation flow of binding the Claims to Volumes resulted in timeouts of tens of minutes.
5. External object bleeding. Much of the logic was centered on a state machine that lived in the kubelet. Other kube components had to be aware of the state machine and other aspects of the binding framework to use Volumes.
5. External object bleeding. Much of the logic was centered on a state machine that lived in the kubelet. Other kube components had to be aware of the state machine and other aspects of the binding framework to use Volumes.
6. Maintenance was difficult as this work was implemented in three different controllers that spread the logic for provisioning, binding, and recycling Volumes.
7. Kubelet failures on the Node could “strand” storage. Requiring users to manually unmount storage.
8. A pod's long running detach routine could impact other pods as the operations run synchronously in the kubelet sync loop.
Expand Down Expand Up @@ -52,18 +52,18 @@ At the end of the design summit, the attendees of the summit agreed to pseudo co

Resources were established for the PV/PVC controller rework at the conclusion of the design summit and the existing resources on the attach/detach/mount/unmount work deemed acceptable to complete the other two projects.

At this point, a group of engineers were assigned to work on the three efforts that compromised the overhaul. The plan was to not only include development work but comprehensive testing with time to have the functionality “soak” weeks before 1.3 shipped. These engineers were composed of a hybrid team of Red Hat and Google. The allocation of work made making all three sub deliverables in 1.3 aggressive but reasonable.
At this point, a group of engineers were assigned to work on the three efforts that compromised the overhaul. The plan was to not only include development work but comprehensive testing with time to have the functionality “soak” weeks before 1.3 shipped. These engineers were composed of a hybrid team of Red Hat and Google. The allocation of work made making all three sub deliverables in 1.3 aggressive but reasonable.

Near the end of 1.3 development, on May 13, 2016, approximately one week prior to code freeze, a key engineer for this effort left the project. This disrupted the Kubelet Volume Redesign effort. The PV/PVC controller was complete (PR [#24331](https://github.com/kubernetes/kubernetes/pull/24331)) and committed at this point. However the Attach/Detach Controller was dependent on the Kubelet Volume Redesign and was impacted.

The leads involved with the projects met and the Kubelet Volume Redesign work was handed off from one engineer to another familiar with Storage. The decision to continue this work after the 1.3 code freeze date of May 20 was based on the need to address the outstanding issues in 1.2. Also much of the Attach/Detach Controller work had been committed but was dependent on the Kubelet Volume Redesign effort.

The Kubelet Volume Redesign involved changing fundamental assumptions of data flow and volume operations in kubelet. The high level change introduced a new volume manager in kubelet that handled mount/unmount logic and enabled attach/detach logic to be offloaded to the master (by default, while retaining the ability for kubelet to do attach/detach on its own). The remaining work to complete the effort was the kubelet volume redesign PR ([#26801](https://github.com/kubernetes/kubernetes/pull/26801)). This combined with the attach/detach controller (PR [#25457](https://github.com/kubernetes/kubernetes/pull/25457)) were substantial changes to the stack.

## Impact:
## Impact:

1. **Release delay**
* The large amount of churn so late in the release with little stabilization time resulted in the delay of the release by one week: The Kubernetes 1.3 release [was targeted](https://git.k8s.io/features/release-1.3/release-1.3.md) for June 20 to June 24, 2016. It ended up [going out on July 1, 2016](https://github.com/kubernetes/kubernetes/releases/tag/v1.3.0). This was mostly due to the time to resolve a data corruption issue on ungracefully terminated pods caused by detaching of mounted volumes ([#27691](https://github.com/kubernetes/kubernetes/issues/27691)). A large number of the bugs introduced in the release were fixed in the 1.3.4 release which [was cut on August 1, 2016](https://github.com/kubernetes/kubernetes/releases/tag/v1.3.4).
* The large amount of churn so late in the release with little stabilization time resulted in the delay of the release by one week: The Kubernetes 1.3 release [was targeted](https://git.k8s.io/sig-release/releases/release-1.3/release-1.3.md) for June 20 to June 24, 2016. It ended up [going out on July 1, 2016](https://github.com/kubernetes/kubernetes/releases/tag/v1.3.0). This was mostly due to the time to resolve a data corruption issue on ungracefully terminated pods caused by detaching of mounted volumes ([#27691](https://github.com/kubernetes/kubernetes/issues/27691)). A large number of the bugs introduced in the release were fixed in the 1.3.4 release which [was cut on August 1, 2016](https://github.com/kubernetes/kubernetes/releases/tag/v1.3.4).
2. **Instability in 1.3's Storage stack**
* The Kubelet volume redesign shipped in 1.3.0 with several bugs. These were mostly due to unexpected interactions between the new functionality and other Kubernetes components. For example, secrets were handled serially not in parallel, namespace dependencies were not well understood, etc. Most of these issues were quickly identified and addressed but waited for 1.3 patch releases.
* Issues related to this include:
Expand Down

0 comments on commit f2816c8

Please sign in to comment.