Initial version of scalability thresholds

bronhaim · Jul 19, 2017 · 48542f2 · 48542f2
1 parent 937bad8
commit 48542f2
Showing 1 changed file with 100 additions and 0 deletions.
diff --git a/sig-scalability/thresholds.md b/sig-scalability/thresholds.md
@@ -0,0 +1,100 @@
+# Kubernetes Scalability thresholds
+
+## Background
+
+Since 1.6 release Kubernetes officially supports 5000-node clusters. However,
+the question is what that actually means. As of early Q3 2017 we are in the
+process of defining set of performance-related SLIs ([Service Level Indicators])
+and SLOs ([Service Level Objectives]).
+
+However, no matter what SLIs and SLOs we have, there will always be some users
+coming and saying that their cluster is not meeting the SLOs. And in most cases
+it appears that the reason behind is that we (as developers) have silently
+assumed something (e.g. there will be no more than 10000 services in the
+cluster) and users were not aware of that.
+
+This document is trying to explicitly summarize limits for the number of objects
+in the system that we are aware of and state if we will try to relax them in the
+future or not.
+
+## Kubernetes thresholds
+
+We start with explicit definition of quantities and thresholds we assume are
+satisfied in the cluster. This is followed by an explanations for some of those.
+Important notes about the numbers:
+1. In most cases, exceeding these thresholds doesn’t mean that the cluster
+   fails over - it just means that its overall performance degrades.
+1. **Some thresholds below (e.g. total number of all objects, or total number of
+   pods or namespaces) are given for the largest possible cluster. For smaller
+   clusters, the limits are proportionally lower.**
+1. The thresholds obviously differ between different Kubernetes releases
+   (hopefully each of them is non-decreasing). The numbers we present are for
+   the current release (Kubernetes 1.7 release).
+1. There are a lot of factors that influence the thresholds, e.g. etcd version
+   or storage data format. For each of those we assume the default from the
+   release to avoid providing numbers for huge number of combinations of those.
+1. The “Head threshold” is representing the status of Kubernetes head. This
+   column should be snapshotted at every release to produce per-release
+   thresholds (and dedicated column for each release should then be added).
+
+| Quantity                            | Head threshold | 1.8 release | Long term goal |
+|-------------------------------------|----------------|-------------|----------------|
+| Total number of all objects         | 250000         |             | 1000000        |
+| Number of nodes                     | 5000           |             | 5000           |
+| Number of pods                      | 150000         |             | 500000         |
+| Number of pods per node<sup>1</sup> | 100            |             | 100            |
+| Number of pods per core<sup>1</sup> | 10             |             | 10             |
+| Number of namespaces (ns)           | 10000          |             | 100000         |
+| Number of pods per ns               | 15000          |             | 50000          |
+| Number of services                  | 10000          |             | 100000         |
+| Number of all services backends     | TBD            |             | 500000         |
+| Number of backends per service      | 5000           |             | 5000           |
+| Number of deployments per ns        | 20000          |             | 10000          |
+| Number of pods per deployment       | TBD            |             | 10000          |
+| Number of jobs per ns               | TBD            |             | 1000           |
+| Number of daemon sets per ns        | TBD            |             | 100            |
+| Number of stateful sets per ns      | TBD            |             | 100            |
+| Number of secrets per ns            | TBD            |             | TBD            |
+| Number of secrets per pod           | TBD            |             | TBD            |
+| Number of config maps per ns        | TBD            |             | TBD            |
+| Number of config maps per pod       | TBD            |             | TBD            |
+| Number of storageclasses            | TBD            |             | TBD            |
+| Number of roles and rolebindings    | TBD            |             | TBD            |
+
+There are also thresholds for other types, but for those the numbers depend
+also on the environment (bare metal or which cloud provider) the cluster is
+running in. These include:
+
+| Quantity                                  | Head threshold | 1.8 release | Long term goal |
+|-------------------------------------------|----------------|-------------|----------------|
+| Number of ingresses                       | TBD            |             | TBD            |
+| Number of PersistentVolumes               | TBD            |             | TBD            |
+| Number of PersistentVolumeClaims per ns   | TBD            |             | TBD            |
+| Number of PersistentVolumeClaims per node | TBD            |             | TBD            |
+
+
+The rationale for some of those numbers:
+1. Total number of objects <br/>
+There is a limitation on the total number of objects on the system, as this
+affects among others etcd and its resource consumption.
+1. Number of nodes <br/>
+We believe that having clusters with more than 5000 nodes is not the best
+option and users should consider splitting into multiple clusters. However,
+we may consider bumping the long term goal at some time in the future.
+1. Number of services and endpoints <br/>
+Each service port and each service backend has a corresponding entry in
+iptables. Number of backends of a given service impact the size of the
+`Endpoints` objects, which impacts size of data that is being sent all over
+the system.
+1. Number of objects of a given type per namespace <br/>
+This holds for different objects (pods, secrets, deployments, ...). There are
+a number of control loops in the system that need to iterate over all objects
+in a given namespace as a reaction to some changes in state. Having large
+number of objects of a given type in a single namespace can make those loops
+expensive and slow down processing given state changes.
+
+---
+<sup>1</sup> The limit for number of pods on a given node is in fact minimum from the “pod per node” and “pods per core times number of cores of a node”.
+
+[Service Level Indicators]: https://en.wikipedia.org/wiki/Service_level_indicator
+[Service Level Objectives]: https://en.wikipedia.org/wiki/Service_level_objective