diff --git a/autoscaling.md b/autoscaling.md index 86a9a819e45..ff50aa97f6b 100644 --- a/autoscaling.md +++ b/autoscaling.md @@ -34,7 +34,7 @@ Documentation for other releases can be found at ## Abstract Auto-scaling is a data-driven feature that allows users to increase or decrease capacity as needed by controlling the -number of pods deployed within the system automatically. +number of pods deployed within the system automatically. ## Motivation @@ -49,7 +49,7 @@ done automatically based on statistical analysis and thresholds. * Scale verb - [1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629) * Config conflicts - [Config](https://github.com/GoogleCloudPlatform/kubernetes/blob/c7cb991987193d4ca33544137a5cb7d0292cf7df/docs/config.md#automated-re-configuration-processes) * Rolling updates - [1353](https://github.com/GoogleCloudPlatform/kubernetes/issues/1353) - * Multiple scalable types - [1624](https://github.com/GoogleCloudPlatform/kubernetes/issues/1624) + * Multiple scalable types - [1624](https://github.com/GoogleCloudPlatform/kubernetes/issues/1624) ## Constraints and Assumptions @@ -77,7 +77,7 @@ balanced or situated behind a proxy - the data from those proxies and load balan server traffic for applications. This is the primary, but not sole, source of data for making decisions. Within Kubernetes a [kube proxy](../user-guide/services.md#ips-and-vips) -running on each node directs service requests to the underlying implementation. +running on each node directs service requests to the underlying implementation. While the proxy provides internal inter-pod connections, there will be L3 and L7 proxies and load balancers that manage traffic to backends. OpenShift, for instance, adds a "route" resource for defining external to internal traffic flow. @@ -87,7 +87,7 @@ data source for the number of backends. ### Scaling based on predictive analysis Scaling may also occur based on predictions of system state like anticipated load, historical data, etc. Hand in hand -with scaling based on traffic, predictive analysis may be used to determine anticipated system load and scale the application automatically. +with scaling based on traffic, predictive analysis may be used to determine anticipated system load and scale the application automatically. ### Scaling based on arbitrary data @@ -113,7 +113,7 @@ use a client/cache implementation to receive watch data from the data aggregator scaling the application. Auto-scalers are created and defined like other resources via REST endpoints and belong to the namespace just as a `ReplicationController` or `Service`. -Since an auto-scaler is a durable object it is best represented as a resource. +Since an auto-scaler is a durable object it is best represented as a resource. ```go //The auto scaler interface @@ -241,7 +241,7 @@ be specified as "when requests per second fall below 25 for 30 seconds scale the ### Data Aggregator This section has intentionally been left empty. I will defer to folks who have more experience gathering and analyzing -time series statistics. +time series statistics. Data aggregation is opaque to the auto-scaler resource. The auto-scaler is configured to use `AutoScaleThresholds` that know how to work with the underlying data in order to know if an application must be scaled up or down. Data aggregation @@ -257,7 +257,7 @@ potentially piggyback on this registry. If multiple scalable targets satisfy the `TargetSelector` criteria the auto-scaler should be configurable as to which target(s) are scaled. To begin with, if multiple targets are found the auto-scaler will scale the largest target up -or down as appropriate. In the future this may be more configurable. +or down as appropriate. In the future this may be more configurable. ### Interactions with a deployment @@ -266,12 +266,12 @@ there will be multiple replication controllers, with one scaling up and another auto-scaler must be aware of the entire set of capacity that backs a service so it does not fight with the deployer. `AutoScalerSpec.MonitorSelector` is what provides this ability. By using a selector that spans the entire service the auto-scaler can monitor capacity of multiple replication controllers and check that capacity against the `AutoScalerSpec.MaxAutoScaleCount` and -`AutoScalerSpec.MinAutoScaleCount` while still only targeting a specific set of `ReplicationController`s with `TargetSelector`. +`AutoScalerSpec.MinAutoScaleCount` while still only targeting a specific set of `ReplicationController`s with `TargetSelector`. In the course of a deployment it is up to the deployment orchestration to decide how to manage the labels on the replication controllers if it needs to ensure that only specific replication controllers are targeted by the auto-scaler. By default, the auto-scaler will scale the largest replication controller that meets the target label -selector criteria. +selector criteria. During deployment orchestration the auto-scaler may be making decisions to scale its target up or down. In order to prevent the scaler from fighting with a deployment process that is scaling one replication controller up and scaling another one diff --git a/federation.md b/federation.md index 99dbe90400e..1845e9eb610 100644 --- a/federation.md +++ b/federation.md @@ -31,17 +31,17 @@ Documentation for other releases can be found at -# Kubernetes Cluster Federation +# Kubernetes Cluster Federation ## (a.k.a. "Ubernetes") ## Requirements Analysis and Product Proposal -## _by Quinton Hoole ([quinton@google.com](mailto:quinton@google.com))_ +## _by Quinton Hoole ([quinton@google.com](mailto:quinton@google.com))_ -_Initial revision: 2015-03-05_ -_Last updated: 2015-03-09_ -This doc: [tinyurl.com/ubernetesv2](http://tinyurl.com/ubernetesv2) +_Initial revision: 2015-03-05_ +_Last updated: 2015-03-09_ +This doc: [tinyurl.com/ubernetesv2](http://tinyurl.com/ubernetesv2) Slides: [tinyurl.com/ubernetes-slides](http://tinyurl.com/ubernetes-slides) ## Introduction @@ -89,11 +89,11 @@ loosely speaking, a cluster can be thought of as running in a single data center, or cloud provider availability zone, a more precise definition is that each cluster provides: -1. a single Kubernetes API entry point, +1. a single Kubernetes API entry point, 1. a consistent, cluster-wide resource naming scheme 1. a scheduling/container placement domain 1. a service network routing domain -1. (in future) an authentication and authorization model. +1. (in future) an authentication and authorization model. 1. .... The above in turn imply the need for a relatively performant, reliable @@ -220,7 +220,7 @@ the multi-cloud provider implementation should just work for a single cloud provider). Propose high-level design catering for both, with initial implementation targeting single cloud provider only. -**Clarifying questions:** +**Clarifying questions:** **How does global external service discovery work?** In the steady state, which external clients connect to which clusters? GeoDNS or similar? What is the tolerable failover latency if a cluster goes @@ -266,8 +266,8 @@ Doing nothing (i.e. forcing users to choose between 1 and 2 on their own) is probably an OK starting point. Kubernetes autoscaling can get us to 3 at some later date. -Up to this point, this use case ("Unavailability Zones") seems materially different from all the others above. It does not require dynamic cross-cluster service migration (we assume that the service is already running in more than one cluster when the failure occurs). Nor does it necessarily involve cross-cluster service discovery or location affinity. As a result, I propose that we address this use case somewhat independently of the others (although I strongly suspect that it will become substantially easier once we've solved the others). - +Up to this point, this use case ("Unavailability Zones") seems materially different from all the others above. It does not require dynamic cross-cluster service migration (we assume that the service is already running in more than one cluster when the failure occurs). Nor does it necessarily involve cross-cluster service discovery or location affinity. As a result, I propose that we address this use case somewhat independently of the others (although I strongly suspect that it will become substantially easier once we've solved the others). + All of the above (regarding "Unavailibility Zones") refers primarily to already-running user-facing services, and minimizing the impact on end users of those services becoming unavailable in a given cluster. @@ -322,7 +322,7 @@ location affinity: (other than the source of YouTube videos, which is assumed to be equally remote from all clusters in this example). Each pod can be scheduled independently, in any cluster, and moved at any time. -1. **"Preferentially Coupled"**: Somewhere between Coupled and Decoupled. These applications prefer to have all of their pods located in the same cluster (e.g. for failure correlation, network latency or bandwidth cost reasons), but can tolerate being partitioned for "short" periods of time (for example while migrating the application from one cluster to another). Most small to medium sized LAMP stacks with not-very-strict latency goals probably fall into this category (provided that they use sane service discovery and reconnect-on-fail, which they need to do anyway to run effectively, even in a single Kubernetes cluster). +1. **"Preferentially Coupled"**: Somewhere between Coupled and Decoupled. These applications prefer to have all of their pods located in the same cluster (e.g. for failure correlation, network latency or bandwidth cost reasons), but can tolerate being partitioned for "short" periods of time (for example while migrating the application from one cluster to another). Most small to medium sized LAMP stacks with not-very-strict latency goals probably fall into this category (provided that they use sane service discovery and reconnect-on-fail, which they need to do anyway to run effectively, even in a single Kubernetes cluster). And then there's what I'll call _absolute_ location affinity. Some applications are required to run in bounded geographical or network @@ -341,7 +341,7 @@ of our users are in Western Europe, U.S. West Coast" etc). ## Cross-cluster service discovery -I propose having pods use standard discovery methods used by external clients of Kubernetes applications (i.e. DNS). DNS might resolve to a public endpoint in the local or a remote cluster. Other than Strictly Coupled applications, software should be largely oblivious of which of the two occurs. +I propose having pods use standard discovery methods used by external clients of Kubernetes applications (i.e. DNS). DNS might resolve to a public endpoint in the local or a remote cluster. Other than Strictly Coupled applications, software should be largely oblivious of which of the two occurs. _Aside:_ How do we avoid "tromboning" through an external VIP when DNS resolves to a public IP on the local cluster? Strictly speaking this would be an optimization, and probably only matters to high bandwidth, @@ -384,15 +384,15 @@ such events include: 1. A change of scheduling policy ("we no longer use cloud provider X"). 1. A change of resource pricing ("cloud provider Y dropped their prices - lets migrate there"). -Strictly Decoupled applications can be trivially moved, in part or in whole, one pod at a time, to one or more clusters. -For Preferentially Decoupled applications, the federation system must first locate a single cluster with sufficient capacity to accommodate the entire application, then reserve that capacity, and incrementally move the application, one (or more) resources at a time, over to the new cluster, within some bounded time period (and possibly within a predefined "maintenance" window). +Strictly Decoupled applications can be trivially moved, in part or in whole, one pod at a time, to one or more clusters. +For Preferentially Decoupled applications, the federation system must first locate a single cluster with sufficient capacity to accommodate the entire application, then reserve that capacity, and incrementally move the application, one (or more) resources at a time, over to the new cluster, within some bounded time period (and possibly within a predefined "maintenance" window). Strictly Coupled applications (with the exception of those deemed completely immovable) require the federation system to: 1. start up an entire replica application in the destination cluster 1. copy persistent data to the new application instance 1. switch traffic across -1. tear down the original application instance +1. tear down the original application instance It is proposed that support for automated migration of Strictly Coupled applications be deferred to a later date. @@ -422,11 +422,11 @@ TBD: All very hand-wavey still, but some initial thoughts to get the conversatio ## Ubernetes API -This looks a lot like the existing Kubernetes API but is explicitly multi-cluster. +This looks a lot like the existing Kubernetes API but is explicitly multi-cluster. -+ Clusters become first class objects, which can be registered, listed, described, deregistered etc via the API. -+ Compute resources can be explicitly requested in specific clusters, or automatically scheduled to the "best" cluster by Ubernetes (by a pluggable Policy Engine). -+ There is a federated equivalent of a replication controller type, which is multicluster-aware, and delegates to cluster-specific replication controllers as required (e.g. a federated RC for n replicas might simply spawn multiple replication controllers in different clusters to do the hard work). ++ Clusters become first class objects, which can be registered, listed, described, deregistered etc via the API. ++ Compute resources can be explicitly requested in specific clusters, or automatically scheduled to the "best" cluster by Ubernetes (by a pluggable Policy Engine). ++ There is a federated equivalent of a replication controller type, which is multicluster-aware, and delegates to cluster-specific replication controllers as required (e.g. a federated RC for n replicas might simply spawn multiple replication controllers in different clusters to do the hard work). + These federated replication controllers (and in fact all the services comprising the Ubernetes Control Plane) have to run somewhere. For high availability Ubernetes deployments, these