forked from kubernetes/community
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
28 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,17 +31,17 @@ Documentation for other releases can be found at | |
|
||
<!-- END MUNGE: UNVERSIONED_WARNING --> | ||
|
||
# Kubernetes Cluster Federation | ||
# Kubernetes Cluster Federation | ||
|
||
## (a.k.a. "Ubernetes") | ||
|
||
## Requirements Analysis and Product Proposal | ||
|
||
## _by Quinton Hoole ([[email protected]](mailto:[email protected]))_ | ||
## _by Quinton Hoole ([[email protected]](mailto:[email protected]))_ | ||
|
||
_Initial revision: 2015-03-05_ | ||
_Last updated: 2015-03-09_ | ||
This doc: [tinyurl.com/ubernetesv2](http://tinyurl.com/ubernetesv2) | ||
_Initial revision: 2015-03-05_ | ||
_Last updated: 2015-03-09_ | ||
This doc: [tinyurl.com/ubernetesv2](http://tinyurl.com/ubernetesv2) | ||
Slides: [tinyurl.com/ubernetes-slides](http://tinyurl.com/ubernetes-slides) | ||
|
||
## Introduction | ||
|
@@ -89,11 +89,11 @@ loosely speaking, a cluster can be thought of as running in a single | |
data center, or cloud provider availability zone, a more precise | ||
definition is that each cluster provides: | ||
|
||
1. a single Kubernetes API entry point, | ||
1. a single Kubernetes API entry point, | ||
1. a consistent, cluster-wide resource naming scheme | ||
1. a scheduling/container placement domain | ||
1. a service network routing domain | ||
1. (in future) an authentication and authorization model. | ||
1. (in future) an authentication and authorization model. | ||
1. .... | ||
|
||
The above in turn imply the need for a relatively performant, reliable | ||
|
@@ -220,7 +220,7 @@ the multi-cloud provider implementation should just work for a single | |
cloud provider). Propose high-level design catering for both, with | ||
initial implementation targeting single cloud provider only. | ||
|
||
**Clarifying questions:** | ||
**Clarifying questions:** | ||
**How does global external service discovery work?** In the steady | ||
state, which external clients connect to which clusters? GeoDNS or | ||
similar? What is the tolerable failover latency if a cluster goes | ||
|
@@ -266,8 +266,8 @@ Doing nothing (i.e. forcing users to choose between 1 and 2 on their | |
own) is probably an OK starting point. Kubernetes autoscaling can get | ||
us to 3 at some later date. | ||
|
||
Up to this point, this use case ("Unavailability Zones") seems materially different from all the others above. It does not require dynamic cross-cluster service migration (we assume that the service is already running in more than one cluster when the failure occurs). Nor does it necessarily involve cross-cluster service discovery or location affinity. As a result, I propose that we address this use case somewhat independently of the others (although I strongly suspect that it will become substantially easier once we've solved the others). | ||
Up to this point, this use case ("Unavailability Zones") seems materially different from all the others above. It does not require dynamic cross-cluster service migration (we assume that the service is already running in more than one cluster when the failure occurs). Nor does it necessarily involve cross-cluster service discovery or location affinity. As a result, I propose that we address this use case somewhat independently of the others (although I strongly suspect that it will become substantially easier once we've solved the others). | ||
|
||
All of the above (regarding "Unavailibility Zones") refers primarily | ||
to already-running user-facing services, and minimizing the impact on | ||
end users of those services becoming unavailable in a given cluster. | ||
|
@@ -322,7 +322,7 @@ location affinity: | |
(other than the source of YouTube videos, which is assumed to be | ||
equally remote from all clusters in this example). Each pod can be | ||
scheduled independently, in any cluster, and moved at any time. | ||
1. **"Preferentially Coupled"**: Somewhere between Coupled and Decoupled. These applications prefer to have all of their pods located in the same cluster (e.g. for failure correlation, network latency or bandwidth cost reasons), but can tolerate being partitioned for "short" periods of time (for example while migrating the application from one cluster to another). Most small to medium sized LAMP stacks with not-very-strict latency goals probably fall into this category (provided that they use sane service discovery and reconnect-on-fail, which they need to do anyway to run effectively, even in a single Kubernetes cluster). | ||
1. **"Preferentially Coupled"**: Somewhere between Coupled and Decoupled. These applications prefer to have all of their pods located in the same cluster (e.g. for failure correlation, network latency or bandwidth cost reasons), but can tolerate being partitioned for "short" periods of time (for example while migrating the application from one cluster to another). Most small to medium sized LAMP stacks with not-very-strict latency goals probably fall into this category (provided that they use sane service discovery and reconnect-on-fail, which they need to do anyway to run effectively, even in a single Kubernetes cluster). | ||
|
||
And then there's what I'll call _absolute_ location affinity. Some | ||
applications are required to run in bounded geographical or network | ||
|
@@ -341,7 +341,7 @@ of our users are in Western Europe, U.S. West Coast" etc). | |
|
||
## Cross-cluster service discovery | ||
|
||
I propose having pods use standard discovery methods used by external clients of Kubernetes applications (i.e. DNS). DNS might resolve to a public endpoint in the local or a remote cluster. Other than Strictly Coupled applications, software should be largely oblivious of which of the two occurs. | ||
I propose having pods use standard discovery methods used by external clients of Kubernetes applications (i.e. DNS). DNS might resolve to a public endpoint in the local or a remote cluster. Other than Strictly Coupled applications, software should be largely oblivious of which of the two occurs. | ||
_Aside:_ How do we avoid "tromboning" through an external VIP when DNS | ||
resolves to a public IP on the local cluster? Strictly speaking this | ||
would be an optimization, and probably only matters to high bandwidth, | ||
|
@@ -384,15 +384,15 @@ such events include: | |
1. A change of scheduling policy ("we no longer use cloud provider X"). | ||
1. A change of resource pricing ("cloud provider Y dropped their prices - lets migrate there"). | ||
|
||
Strictly Decoupled applications can be trivially moved, in part or in whole, one pod at a time, to one or more clusters. | ||
For Preferentially Decoupled applications, the federation system must first locate a single cluster with sufficient capacity to accommodate the entire application, then reserve that capacity, and incrementally move the application, one (or more) resources at a time, over to the new cluster, within some bounded time period (and possibly within a predefined "maintenance" window). | ||
Strictly Decoupled applications can be trivially moved, in part or in whole, one pod at a time, to one or more clusters. | ||
For Preferentially Decoupled applications, the federation system must first locate a single cluster with sufficient capacity to accommodate the entire application, then reserve that capacity, and incrementally move the application, one (or more) resources at a time, over to the new cluster, within some bounded time period (and possibly within a predefined "maintenance" window). | ||
Strictly Coupled applications (with the exception of those deemed | ||
completely immovable) require the federation system to: | ||
|
||
1. start up an entire replica application in the destination cluster | ||
1. copy persistent data to the new application instance | ||
1. switch traffic across | ||
1. tear down the original application instance | ||
1. tear down the original application instance | ||
|
||
It is proposed that support for automated migration of Strictly Coupled applications be | ||
deferred to a later date. | ||
|
@@ -422,11 +422,11 @@ TBD: All very hand-wavey still, but some initial thoughts to get the conversatio | |
|
||
## Ubernetes API | ||
|
||
This looks a lot like the existing Kubernetes API but is explicitly multi-cluster. | ||
This looks a lot like the existing Kubernetes API but is explicitly multi-cluster. | ||
|
||
+ Clusters become first class objects, which can be registered, listed, described, deregistered etc via the API. | ||
+ Compute resources can be explicitly requested in specific clusters, or automatically scheduled to the "best" cluster by Ubernetes (by a pluggable Policy Engine). | ||
+ There is a federated equivalent of a replication controller type, which is multicluster-aware, and delegates to cluster-specific replication controllers as required (e.g. a federated RC for n replicas might simply spawn multiple replication controllers in different clusters to do the hard work). | ||
+ Clusters become first class objects, which can be registered, listed, described, deregistered etc via the API. | ||
+ Compute resources can be explicitly requested in specific clusters, or automatically scheduled to the "best" cluster by Ubernetes (by a pluggable Policy Engine). | ||
+ There is a federated equivalent of a replication controller type, which is multicluster-aware, and delegates to cluster-specific replication controllers as required (e.g. a federated RC for n replicas might simply spawn multiple replication controllers in different clusters to do the hard work). | ||
+ These federated replication controllers (and in fact all the | ||
services comprising the Ubernetes Control Plane) have to run | ||
somewhere. For high availability Ubernetes deployments, these | ||
|