diff --git a/contributors/devel/README.md b/contributors/devel/README.md new file mode 100644 index 00000000000..cf29f3b41ef --- /dev/null +++ b/contributors/devel/README.md @@ -0,0 +1,83 @@ +# Kubernetes Developer Guide + +The developer guide is for anyone wanting to either write code which directly accesses the +Kubernetes API, or to contribute directly to the Kubernetes project. +It assumes some familiarity with concepts in the [User Guide](../user-guide/README.md) and the [Cluster Admin +Guide](../admin/README.md). + + +## The process of developing and contributing code to the Kubernetes project + +* **On Collaborative Development** ([collab.md](collab.md)): Info on pull requests and code reviews. + +* **GitHub Issues** ([issues.md](issues.md)): How incoming issues are reviewed and prioritized. + +* **Pull Request Process** ([pull-requests.md](pull-requests.md)): When and why pull requests are closed. + +* **Kubernetes On-Call Rotations** ([on-call-rotations.md](on-call-rotations.md)): Descriptions of on-call rotations for build and end-user support. + +* **Faster PR reviews** ([faster_reviews.md](faster_reviews.md)): How to get faster PR reviews. + +* **Getting Recent Builds** ([getting-builds.md](getting-builds.md)): How to get recent builds including the latest builds that pass CI. + +* **Automated Tools** ([automation.md](automation.md)): Descriptions of the automation that is running on our github repository. + + +## Setting up your dev environment, coding, and debugging + +* **Development Guide** ([development.md](development.md)): Setting up your development environment. + +* **Hunting flaky tests** ([flaky-tests.md](flaky-tests.md)): We have a goal of 99.9% flake free tests. + Here's how to run your tests many times. + +* **Logging Conventions** ([logging.md](logging.md)): Glog levels. + +* **Profiling Kubernetes** ([profiling.md](profiling.md)): How to plug in go pprof profiler to Kubernetes. + +* **Instrumenting Kubernetes with a new metric** + ([instrumentation.md](instrumentation.md)): How to add a new metrics to the + Kubernetes code base. + +* **Coding Conventions** ([coding-conventions.md](coding-conventions.md)): + Coding style advice for contributors. + +* **Document Conventions** ([how-to-doc.md](how-to-doc.md)) + Document style advice for contributors. + +* **Running a cluster locally** ([running-locally.md](running-locally.md)): + A fast and lightweight local cluster deployment for development. + +## Developing against the Kubernetes API + +* The [REST API documentation](../api-reference/README.md) explains the REST + API exposed by apiserver. + +* **Annotations** ([docs/user-guide/annotations.md](../user-guide/annotations.md)): are for attaching arbitrary non-identifying metadata to objects. + Programs that automate Kubernetes objects may use annotations to store small amounts of their state. + +* **API Conventions** ([api-conventions.md](api-conventions.md)): + Defining the verbs and resources used in the Kubernetes API. + +* **API Client Libraries** ([client-libraries.md](client-libraries.md)): + A list of existing client libraries, both supported and user-contributed. + + +## Writing plugins + +* **Authentication Plugins** ([docs/admin/authentication.md](../admin/authentication.md)): + The current and planned states of authentication tokens. + +* **Authorization Plugins** ([docs/admin/authorization.md](../admin/authorization.md)): + Authorization applies to all HTTP requests on the main apiserver port. + This doc explains the available authorization implementations. + +* **Admission Control Plugins** ([admission_control](../design/admission_control.md)) + + +## Building releases + +See the [kubernetes/release](https://github.com/kubernetes/release) repository for details on creating releases and related tools and helper scripts. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/README.md?pixel)]() + diff --git a/contributors/devel/adding-an-APIGroup.md b/contributors/devel/adding-an-APIGroup.md new file mode 100644 index 00000000000..5832be23fb1 --- /dev/null +++ b/contributors/devel/adding-an-APIGroup.md @@ -0,0 +1,100 @@ +Adding an API Group +=============== + +This document includes the steps to add an API group. You may also want to take +a look at PR [#16621](https://github.com/kubernetes/kubernetes/pull/16621) and +PR [#13146](https://github.com/kubernetes/kubernetes/pull/13146), which add API +groups. + +Please also read about [API conventions](api-conventions.md) and +[API changes](api_changes.md) before adding an API group. + +### Your core group package: + +We plan on improving the way the types are factored in the future; see +[#16062](https://github.com/kubernetes/kubernetes/pull/16062) for the directions +in which this might evolve. + +1. Create a folder in pkg/apis to hold your group. Create types.go in +pkg/apis/``/ and pkg/apis/``/``/ to define API objects +in your group; + +2. Create pkg/apis/``/{register.go, ``/register.go} to register +this group's API objects to the encoding/decoding scheme (e.g., +[pkg/apis/authentication/register.go](../../pkg/apis/authentication/register.go) and +[pkg/apis/authentication/v1beta1/register.go](../../pkg/apis/authentication/v1beta1/register.go); + +3. Add a pkg/apis/``/install/install.go, which is responsible for adding +the group to the `latest` package, so that other packages can access the group's +meta through `latest.Group`. You probably only need to change the name of group +and version in the [example](../../pkg/apis/authentication/install/install.go)). You +need to import this `install` package in {pkg/master, +pkg/client/unversioned}/import_known_versions.go, if you want to make your group +accessible to other packages in the kube-apiserver binary, binaries that uses +the client package. + +Step 2 and 3 are mechanical, we plan on autogenerate these using the +cmd/libs/go2idl/ tool. + +### Scripts changes and auto-generated code: + +1. Generate conversions and deep-copies: + + 1. Add your "group/" or "group/version" into + cmd/libs/go2idl/conversion-gen/main.go; + 2. Make sure your pkg/apis/``/`` directory has a doc.go file + with the comment `// +k8s:deepcopy-gen=package,register`, to catch the + attention of our generation tools. + 3. Make sure your `pkg/apis//` directory has a doc.go file + with the comment `// +k8s:conversion-gen=`, to catch the + attention of our generation tools. For most APIs the only target you + need is `k8s.io/kubernetes/pkg/apis/` (your internal API). + 3. Make sure your `pkg/apis/` and `pkg/apis//` directories + have a doc.go file with the comment `+groupName=.k8s.io`, to correctly + generate the DNS-suffixed group name. + 5. Run hack/update-all.sh. + +2. Generate files for Ugorji codec: + + 1. Touch types.generated.go in pkg/apis/``{/, ``}; + 2. Run hack/update-codecgen.sh. + +3. Generate protobuf objects: + + 1. Add your group to `cmd/libs/go2idl/go-to-protobuf/protobuf/cmd.go` to + `New()` in the `Packages` field + 2. Run hack/update-generated-protobuf.sh + +### Client (optional): + +We are overhauling pkg/client, so this section might be outdated; see +[#15730](https://github.com/kubernetes/kubernetes/pull/15730) for how the client +package might evolve. Currently, to add your group to the client package, you +need to: + +1. Create pkg/client/unversioned/``.go, define a group client interface +and implement the client. You can take pkg/client/unversioned/extensions.go as a +reference. + +2. Add the group client interface to the `Interface` in +pkg/client/unversioned/client.go and add method to fetch the interface. Again, +you can take how we add the Extensions group there as an example. + +3. If you need to support the group in kubectl, you'll also need to modify +pkg/kubectl/cmd/util/factory.go. + +### Make the group/version selectable in unit tests (optional): + +1. Add your group in pkg/api/testapi/testapi.go, then you can access the group +in tests through testapi.``; + +2. Add your "group/version" to `KUBE_TEST_API_VERSIONS` in + hack/make-rules/test.sh and hack/make-rules/test-integration.sh + +TODO: Add a troubleshooting section. + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/adding-an-APIGroup.md?pixel)]() + diff --git a/contributors/devel/api-conventions.md b/contributors/devel/api-conventions.md new file mode 100644 index 00000000000..0be45182f6a --- /dev/null +++ b/contributors/devel/api-conventions.md @@ -0,0 +1,1350 @@ +API Conventions +=============== + +Updated: 4/22/2016 + +*This document is oriented at users who want a deeper understanding of the +Kubernetes API structure, and developers wanting to extend the Kubernetes API. +An introduction to using resources with kubectl can be found in [Working with +resources](../user-guide/working-with-resources.md).* + +**Table of Contents** + + + - [Types (Kinds)](#types-kinds) + - [Resources](#resources) + - [Objects](#objects) + - [Metadata](#metadata) + - [Spec and Status](#spec-and-status) + - [Typical status properties](#typical-status-properties) + - [References to related objects](#references-to-related-objects) + - [Lists of named subobjects preferred over maps](#lists-of-named-subobjects-preferred-over-maps) + - [Primitive types](#primitive-types) + - [Constants](#constants) + - [Unions](#unions) + - [Lists and Simple kinds](#lists-and-simple-kinds) + - [Differing Representations](#differing-representations) + - [Verbs on Resources](#verbs-on-resources) + - [PATCH operations](#patch-operations) + - [Strategic Merge Patch](#strategic-merge-patch) + - [List Operations](#list-operations) + - [Map Operations](#map-operations) + - [Idempotency](#idempotency) + - [Optional vs. Required](#optional-vs-required) + - [Defaulting](#defaulting) + - [Late Initialization](#late-initialization) + - [Concurrency Control and Consistency](#concurrency-control-and-consistency) + - [Serialization Format](#serialization-format) + - [Units](#units) + - [Selecting Fields](#selecting-fields) + - [Object references](#object-references) + - [HTTP Status codes](#http-status-codes) + - [Success codes](#success-codes) + - [Error codes](#error-codes) + - [Response Status Kind](#response-status-kind) + - [Events](#events) + - [Naming conventions](#naming-conventions) + - [Label, selector, and annotation conventions](#label-selector-and-annotation-conventions) + - [WebSockets and SPDY](#websockets-and-spdy) + - [Validation](#validation) + + + +The conventions of the [Kubernetes API](../api.md) (and related APIs in the +ecosystem) are intended to ease client development and ensure that configuration +mechanisms can be implemented that work across a diverse set of use cases +consistently. + +The general style of the Kubernetes API is RESTful - clients create, update, +delete, or retrieve a description of an object via the standard HTTP verbs +(POST, PUT, DELETE, and GET) - and those APIs preferentially accept and return +JSON. Kubernetes also exposes additional endpoints for non-standard verbs and +allows alternative content types. All of the JSON accepted and returned by the +server has a schema, identified by the "kind" and "apiVersion" fields. Where +relevant HTTP header fields exist, they should mirror the content of JSON +fields, but the information should not be represented only in the HTTP header. + +The following terms are defined: + +* **Kind** the name of a particular object schema (e.g. the "Cat" and "Dog" +kinds would have different attributes and properties) +* **Resource** a representation of a system entity, sent or retrieved as JSON +via HTTP to the server. Resources are exposed via: + * Collections - a list of resources of the same type, which may be queryable + * Elements - an individual resource, addressable via a URL + +Each resource typically accepts and returns data of a single kind. A kind may be +accepted or returned by multiple resources that reflect specific use cases. For +instance, the kind "Pod" is exposed as a "pods" resource that allows end users +to create, update, and delete pods, while a separate "pod status" resource (that +acts on "Pod" kind) allows automated processes to update a subset of the fields +in that resource. + +Resource collections should be all lowercase and plural, whereas kinds are +CamelCase and singular. + + +## Types (Kinds) + +Kinds are grouped into three categories: + +1. **Objects** represent a persistent entity in the system. + + Creating an API object is a record of intent - once created, the system will +work to ensure that resource exists. All API objects have common metadata. + + An object may have multiple resources that clients can use to perform +specific actions that create, update, delete, or get. + + Examples: `Pod`, `ReplicationController`, `Service`, `Namespace`, `Node`. + +2. **Lists** are collections of **resources** of one (usually) or more +(occasionally) kinds. + + The name of a list kind must end with "List". Lists have a limited set of +common metadata. All lists use the required "items" field to contain the array +of objects they return. Any kind that has the "items" field must be a list kind. + + Most objects defined in the system should have an endpoint that returns the +full set of resources, as well as zero or more endpoints that return subsets of +the full list. Some objects may be singletons (the current user, the system +defaults) and may not have lists. + + In addition, all lists that return objects with labels should support label +filtering (see [docs/user-guide/labels.md](../user-guide/labels.md), and most +lists should support filtering by fields. + + Examples: PodLists, ServiceLists, NodeLists + + TODO: Describe field filtering below or in a separate doc. + +3. **Simple** kinds are used for specific actions on objects and for +non-persistent entities. + + Given their limited scope, they have the same set of limited common metadata +as lists. + + For instance, the "Status" kind is returned when errors occur and is not +persisted in the system. + + Many simple resources are "subresources", which are rooted at API paths of +specific resources. When resources wish to expose alternative actions or views +that are closely coupled to a single resource, they should do so using new +sub-resources. Common subresources include: + + * `/binding`: Used to bind a resource representing a user request (e.g., Pod, +PersistentVolumeClaim) to a cluster infrastructure resource (e.g., Node, +PersistentVolume). + * `/status`: Used to write just the status portion of a resource. For +example, the `/pods` endpoint only allows updates to `metadata` and `spec`, +since those reflect end-user intent. An automated process should be able to +modify status for users to see by sending an updated Pod kind to the server to +the "/pods/<name>/status" endpoint - the alternate endpoint allows +different rules to be applied to the update, and access to be appropriately +restricted. + * `/scale`: Used to read and write the count of a resource in a manner that +is independent of the specific resource schema. + + Two additional subresources, `proxy` and `portforward`, provide access to +cluster resources as described in +[docs/user-guide/accessing-the-cluster.md](../user-guide/accessing-the-cluster.md). + +The standard REST verbs (defined below) MUST return singular JSON objects. Some +API endpoints may deviate from the strict REST pattern and return resources that +are not singular JSON objects, such as streams of JSON objects or unstructured +text log data. + +The term "kind" is reserved for these "top-level" API types. The term "type" +should be used for distinguishing sub-categories within objects or subobjects. + +### Resources + +All JSON objects returned by an API MUST have the following fields: + +* kind: a string that identifies the schema this object should have +* apiVersion: a string that identifies the version of the schema the object +should have + +These fields are required for proper decoding of the object. They may be +populated by the server by default from the specified URL path, but the client +likely needs to know the values in order to construct the URL path. + +### Objects + +#### Metadata + +Every object kind MUST have the following metadata in a nested object field +called "metadata": + +* namespace: a namespace is a DNS compatible label that objects are subdivided +into. The default namespace is 'default'. See +[docs/user-guide/namespaces.md](../user-guide/namespaces.md) for more. +* name: a string that uniquely identifies this object within the current +namespace (see [docs/user-guide/identifiers.md](../user-guide/identifiers.md)). +This value is used in the path when retrieving an individual object. +* uid: a unique in time and space value (typically an RFC 4122 generated +identifier, see [docs/user-guide/identifiers.md](../user-guide/identifiers.md)) +used to distinguish between objects with the same name that have been deleted +and recreated + +Every object SHOULD have the following metadata in a nested object field called +"metadata": + +* resourceVersion: a string that identifies the internal version of this object +that can be used by clients to determine when objects have changed. This value +MUST be treated as opaque by clients and passed unmodified back to the server. +Clients should not assume that the resource version has meaning across +namespaces, different kinds of resources, or different servers. (See +[concurrency control](#concurrency-control-and-consistency), below, for more +details.) +* generation: a sequence number representing a specific generation of the +desired state. Set by the system and monotonically increasing, per-resource. May +be compared, such as for RAW and WAW consistency. +* creationTimestamp: a string representing an RFC 3339 date of the date and time +an object was created +* deletionTimestamp: a string representing an RFC 3339 date of the date and time +after which this resource will be deleted. This field is set by the server when +a graceful deletion is requested by the user, and is not directly settable by a +client. The resource will be deleted (no longer visible from resource lists, and +not reachable by name) after the time in this field. Once set, this value may +not be unset or be set further into the future, although it may be shortened or +the resource may be deleted prior to this time. +* labels: a map of string keys and values that can be used to organize and +categorize objects (see [docs/user-guide/labels.md](../user-guide/labels.md)) +* annotations: a map of string keys and values that can be used by external +tooling to store and retrieve arbitrary metadata about this object (see +[docs/user-guide/annotations.md](../user-guide/annotations.md)) + +Labels are intended for organizational purposes by end users (select the pods +that match this label query). Annotations enable third-party automation and +tooling to decorate objects with additional metadata for their own use. + +#### Spec and Status + +By convention, the Kubernetes API makes a distinction between the specification +of the desired state of an object (a nested object field called "spec") and the +status of the object at the current time (a nested object field called +"status"). The specification is a complete description of the desired state, +including configuration settings provided by the user, +[default values](#defaulting) expanded by the system, and properties initialized +or otherwise changed after creation by other ecosystem components (e.g., +schedulers, auto-scalers), and is persisted in stable storage with the API +object. If the specification is deleted, the object will be purged from the +system. The status summarizes the current state of the object in the system, and +is usually persisted with the object by an automated processes but may be +generated on the fly. At some cost and perhaps some temporary degradation in +behavior, the status could be reconstructed by observation if it were lost. + +When a new version of an object is POSTed or PUT, the "spec" is updated and +available immediately. Over time the system will work to bring the "status" into +line with the "spec". The system will drive toward the most recent "spec" +regardless of previous versions of that stanza. In other words, if a value is +changed from 2 to 5 in one PUT and then back down to 3 in another PUT the system +is not required to 'touch base' at 5 before changing the "status" to 3. In other +words, the system's behavior is *level-based* rather than *edge-based*. This +enables robust behavior in the presence of missed intermediate state changes. + +The Kubernetes API also serves as the foundation for the declarative +configuration schema for the system. In order to facilitate level-based +operation and expression of declarative configuration, fields in the +specification should have declarative rather than imperative names and +semantics -- they represent the desired state, not actions intended to yield the +desired state. + +The PUT and POST verbs on objects MUST ignore the "status" values, to avoid +accidentally overwriting the status in read-modify-write scenarios. A `/status` +subresource MUST be provided to enable system components to update statuses of +resources they manage. + +Otherwise, PUT expects the whole object to be specified. Therefore, if a field +is omitted it is assumed that the client wants to clear that field's value. The +PUT verb does not accept partial updates. Modification of just part of an object +may be achieved by GETting the resource, modifying part of the spec, labels, or +annotations, and then PUTting it back. See +[concurrency control](#concurrency-control-and-consistency), below, regarding +read-modify-write consistency when using this pattern. Some objects may expose +alternative resource representations that allow mutation of the status, or +performing custom actions on the object. + +All objects that represent a physical resource whose state may vary from the +user's desired intent SHOULD have a "spec" and a "status". Objects whose state +cannot vary from the user's desired intent MAY have only "spec", and MAY rename +"spec" to a more appropriate name. + +Objects that contain both spec and status should not contain additional +top-level fields other than the standard metadata fields. + +##### Typical status properties + +**Conditions** represent the latest available observations of an object's +current state. Objects may report multiple conditions, and new types of +conditions may be added in the future. Therefore, conditions are represented +using a list/slice, where all have similar structure. + +The `FooCondition` type for some resource type `Foo` may include a subset of the +following fields, but must contain at least `type` and `status` fields: + +```go + Type FooConditionType `json:"type" description:"type of Foo condition"` + Status ConditionStatus `json:"status" description:"status of the condition, one of True, False, Unknown"` + LastHeartbeatTime unversioned.Time `json:"lastHeartbeatTime,omitempty" description:"last time we got an update on a given condition"` + LastTransitionTime unversioned.Time `json:"lastTransitionTime,omitempty" description:"last time the condition transit from one status to another"` + Reason string `json:"reason,omitempty" description:"one-word CamelCase reason for the condition's last transition"` + Message string `json:"message,omitempty" description:"human-readable message indicating details about last transition"` +``` + +Additional fields may be added in the future. + +Conditions should be added to explicitly convey properties that users and +components care about rather than requiring those properties to be inferred from +other observations. + +Condition status values may be `True`, `False`, or `Unknown`. The absence of a +condition should be interpreted the same as `Unknown`. + +In general, condition values may change back and forth, but some condition +transitions may be monotonic, depending on the resource and condition type. +However, conditions are observations and not, themselves, state machines, nor do +we define comprehensive state machines for objects, nor behaviors associated +with state transitions. The system is level-based rather than edge-triggered, +and should assume an Open World. + +A typical oscillating condition type is `Ready`, which indicates the object was +believed to be fully operational at the time it was last probed. A possible +monotonic condition could be `Succeeded`. A `False` status for `Succeeded` would +imply failure. An object that was still active would not have a `Succeeded` +condition, or its status would be `Unknown`. + +Some resources in the v1 API contain fields called **`phase`**, and associated +`message`, `reason`, and other status fields. The pattern of using `phase` is +deprecated. Newer API types should use conditions instead. Phase was essentially +a state-machine enumeration field, that contradicted +[system-design principles](../design/principles.md#control-logic) and hampered +evolution, since [adding new enum values breaks backward +compatibility](api_changes.md). Rather than encouraging clients to infer +implicit properties from phases, we intend to explicitly expose the conditions +that clients need to monitor. Conditions also have the benefit that it is +possible to create some conditions with uniform meaning across all resource +types, while still exposing others that are unique to specific resource types. +See [#7856](http://issues.k8s.io/7856) for more details and discussion. + +In condition types, and everywhere else they appear in the API, **`Reason`** is +intended to be a one-word, CamelCase representation of the category of cause of +the current status, and **`Message`** is intended to be a human-readable phrase +or sentence, which may contain specific details of the individual occurrence. +`Reason` is intended to be used in concise output, such as one-line +`kubectl get` output, and in summarizing occurrences of causes, whereas +`Message` is intended to be presented to users in detailed status explanations, +such as `kubectl describe` output. + +Historical information status (e.g., last transition time, failure counts) is +only provided with reasonable effort, and is not guaranteed to not be lost. + +Status information that may be large (especially proportional in size to +collections of other resources, such as lists of references to other objects -- +see below) and/or rapidly changing, such as +[resource usage](../design/resources.md#usage-data), should be put into separate +objects, with possibly a reference from the original object. This helps to +ensure that GETs and watch remain reasonably efficient for the majority of +clients, which may not need that data. + +Some resources report the `observedGeneration`, which is the `generation` most +recently observed by the component responsible for acting upon changes to the +desired state of the resource. This can be used, for instance, to ensure that +the reported status reflects the most recent desired status. + +#### References to related objects + +References to loosely coupled sets of objects, such as +[pods](../user-guide/pods.md) overseen by a +[replication controller](../user-guide/replication-controller.md), are usually +best referred to using a [label selector](../user-guide/labels.md). In order to +ensure that GETs of individual objects remain bounded in time and space, these +sets may be queried via separate API queries, but will not be expanded in the +referring object's status. + +References to specific objects, especially specific resource versions and/or +specific fields of those objects, are specified using the `ObjectReference` type +(or other types representing strict subsets of it). Unlike partial URLs, the +ObjectReference type facilitates flexible defaulting of fields from the +referring object or other contextual information. + +References in the status of the referee to the referrer may be permitted, when +the references are one-to-one and do not need to be frequently updated, +particularly in an edge-based manner. + +#### Lists of named subobjects preferred over maps + +Discussed in [#2004](http://issue.k8s.io/2004) and elsewhere. There are no maps +of subobjects in any API objects. Instead, the convention is to use a list of +subobjects containing name fields. + +For example: + +```yaml +ports: + - name: www + containerPort: 80 +``` + +vs. + +```yaml +ports: + www: + containerPort: 80 +``` + +This rule maintains the invariant that all JSON/YAML keys are fields in API +objects. The only exceptions are pure maps in the API (currently, labels, +selectors, annotations, data), as opposed to sets of subobjects. + +#### Primitive types + +* Avoid floating-point values as much as possible, and never use them in spec. +Floating-point values cannot be reliably round-tripped (encoded and re-decoded) +without changing, and have varying precision and representations across +languages and architectures. +* All numbers (e.g., uint32, int64) are converted to float64 by Javascript and +some other languages, so any field which is expected to exceed that either in +magnitude or in precision (specifically integer values > 53 bits) should be +serialized and accepted as strings. +* Do not use unsigned integers, due to inconsistent support across languages and +libraries. Just validate that the integer is non-negative if that's the case. +* Do not use enums. Use aliases for string instead (e.g., `NodeConditionType`). +* Look at similar fields in the API (e.g., ports, durations) and follow the +conventions of existing fields. +* All public integer fields MUST use the Go `(u)int32` or Go `(u)int64` types, +not `(u)int` (which is ambiguous depending on target platform). Internal types +may use `(u)int`. + +#### Constants + +Some fields will have a list of allowed values (enumerations). These values will +be strings, and they will be in CamelCase, with an initial uppercase letter. +Examples: "ClusterFirst", "Pending", "ClientIP". + +#### Unions + +Sometimes, at most one of a set of fields can be set. For example, the +[volumes] field of a PodSpec has 17 different volume type-specific fields, such +as `nfs` and `iscsi`. All fields in the set should be +[Optional](#optional-vs-required). + +Sometimes, when a new type is created, the api designer may anticipate that a +union will be needed in the future, even if only one field is allowed initially. +In this case, be sure to make the field [Optional](#optional-vs-required) +optional. In the validation, you may still return an error if the sole field is +unset. Do not set a default value for that field. + +### Lists and Simple kinds + +Every list or simple kind SHOULD have the following metadata in a nested object +field called "metadata": + +* resourceVersion: a string that identifies the common version of the objects +returned by in a list. This value MUST be treated as opaque by clients and +passed unmodified back to the server. A resource version is only valid within a +single namespace on a single kind of resource. + +Every simple kind returned by the server, and any simple kind sent to the server +that must support idempotency or optimistic concurrency should return this +value. Since simple resources are often used as input alternate actions that +modify objects, the resource version of the simple resource should correspond to +the resource version of the object. + + +## Differing Representations + +An API may represent a single entity in different ways for different clients, or +transform an object after certain transitions in the system occur. In these +cases, one request object may have two representations available as different +resources, or different kinds. + +An example is a Service, which represents the intent of the user to group a set +of pods with common behavior on common ports. When Kubernetes detects a pod +matches the service selector, the IP address and port of the pod are added to an +Endpoints resource for that Service. The Endpoints resource exists only if the +Service exists, but exposes only the IPs and ports of the selected pods. The +full service is represented by two distinct resources - under the original +Service resource the user created, as well as in the Endpoints resource. + +As another example, a "pod status" resource may accept a PUT with the "pod" +kind, with different rules about what fields may be changed. + +Future versions of Kubernetes may allow alternative encodings of objects beyond +JSON. + + +## Verbs on Resources + +API resources should use the traditional REST pattern: + +* GET /<resourceNamePlural> - Retrieve a list of type +<resourceName>, e.g. GET /pods returns a list of Pods. +* POST /<resourceNamePlural> - Create a new resource from the JSON object +provided by the client. +* GET /<resourceNamePlural>/<name> - Retrieves a single resource +with the given name, e.g. GET /pods/first returns a Pod named 'first'. Should be +constant time, and the resource should be bounded in size. +* DELETE /<resourceNamePlural>/<name> - Delete the single resource +with the given name. DeleteOptions may specify gracePeriodSeconds, the optional +duration in seconds before the object should be deleted. Individual kinds may +declare fields which provide a default grace period, and different kinds may +have differing kind-wide default grace periods. A user provided grace period +overrides a default grace period, including the zero grace period ("now"). +* PUT /<resourceNamePlural>/<name> - Update or create the resource +with the given name with the JSON object provided by the client. +* PATCH /<resourceNamePlural>/<name> - Selectively modify the +specified fields of the resource. See more information [below](#patch). +* GET /<resourceNamePlural>&watch=true - Receive a stream of JSON +objects corresponding to changes made to any resource of the given kind over +time. + +### PATCH operations + +The API supports three different PATCH operations, determined by their +corresponding Content-Type header: + +* JSON Patch, `Content-Type: application/json-patch+json` + * As defined in [RFC6902](https://tools.ietf.org/html/rfc6902), a JSON Patch is +a sequence of operations that are executed on the resource, e.g. `{"op": "add", +"path": "/a/b/c", "value": [ "foo", "bar" ]}`. For more details on how to use +JSON Patch, see the RFC. +* Merge Patch, `Content-Type: application/merge-patch+json` + * As defined in [RFC7386](https://tools.ietf.org/html/rfc7386), a Merge Patch +is essentially a partial representation of the resource. The submitted JSON is +"merged" with the current resource to create a new one, then the new one is +saved. For more details on how to use Merge Patch, see the RFC. +* Strategic Merge Patch, `Content-Type: application/strategic-merge-patch+json` + * Strategic Merge Patch is a custom implementation of Merge Patch. For a +detailed explanation of how it works and why it needed to be introduced, see +below. + +#### Strategic Merge Patch + +In the standard JSON merge patch, JSON objects are always merged but lists are +always replaced. Often that isn't what we want. Let's say we start with the +following Pod: + +```yaml +spec: + containers: + - name: nginx + image: nginx-1.0 +``` + +...and we POST that to the server (as JSON). Then let's say we want to *add* a +container to this Pod. + +```yaml +PATCH /api/v1/namespaces/default/pods/pod-name +spec: + containers: + - name: log-tailer + image: log-tailer-1.0 +``` + +If we were to use standard Merge Patch, the entire container list would be +replaced with the single log-tailer container. However, our intent is for the +container lists to merge together based on the `name` field. + +To solve this problem, Strategic Merge Patch uses metadata attached to the API +objects to determine what lists should be merged and which ones should not. +Currently the metadata is available as struct tags on the API objects +themselves, but will become available to clients as Swagger annotations in the +future. In the above example, the `patchStrategy` metadata for the `containers` +field would be `merge` and the `patchMergeKey` would be `name`. + +Note: If the patch results in merging two lists of scalars, the scalars are +first deduplicated and then merged. + +Strategic Merge Patch also supports special operations as listed below. + +### List Operations + +To override the container list to be strictly replaced, regardless of the +default: + +```yaml +containers: + - name: nginx + image: nginx-1.0 + - $patch: replace # any further $patch operations nested in this list will be ignored +``` + +To delete an element of a list that should be merged: + +```yaml +containers: + - name: nginx + image: nginx-1.0 + - $patch: delete + name: log-tailer # merge key and value goes here +``` + +### Map Operations + +To indicate that a map should not be merged and instead should be taken literally: + +```yaml +$patch: replace # recursive and applies to all fields of the map it's in +containers: +- name: nginx + image: nginx-1.0 +``` + +To delete a field of a map: + +```yaml +name: nginx +image: nginx-1.0 +labels: + live: null # set the value of the map key to null +``` + + +## Idempotency + +All compatible Kubernetes APIs MUST support "name idempotency" and respond with +an HTTP status code 409 when a request is made to POST an object that has the +same name as an existing object in the system. See +[docs/user-guide/identifiers.md](../user-guide/identifiers.md) for details. + +Names generated by the system may be requested using `metadata.generateName`. +GenerateName indicates that the name should be made unique by the server prior +to persisting it. A non-empty value for the field indicates the name will be +made unique (and the name returned to the client will be different than the name +passed). The value of this field will be combined with a unique suffix on the +server if the Name field has not been provided. The provided value must be valid +within the rules for Name, and may be truncated by the length of the suffix +required to make the value unique on the server. If this field is specified, and +Name is not present, the server will NOT return a 409 if the generated name +exists - instead, it will either return 201 Created or 504 with Reason +`ServerTimeout` indicating a unique name could not be found in the time +allotted, and the client should retry (optionally after the time indicated in +the Retry-After header). + +## Optional vs. Required + +Fields must be either optional or required. + +Optional fields have the following properties: + +- They have `omitempty` struct tag in Go. +- They are a pointer type in the Go definition (e.g. `bool *awesomeFlag`) or +have a built-in `nil` value (e.g. maps and slices). +- The API server should allow POSTing and PUTing a resource with this field +unset. + +Required fields have the opposite properties, namely: + +- They do not have an `omitempty` struct tag. +- They are not a pointer type in the Go definition (e.g. `bool otherFlag`). +- The API server should not allow POSTing or PUTing a resource with this field +unset. + +Using the `omitempty` tag causes swagger documentation to reflect that the field +is optional. + +Using a pointer allows distinguishing unset from the zero value for that type. +There are some cases where, in principle, a pointer is not needed for an +optional field since the zero value is forbidden, and thus implies unset. There +are examples of this in the codebase. However: + +- it can be difficult for implementors to anticipate all cases where an empty +value might need to be distinguished from a zero value +- structs are not omitted from encoder output even where omitempty is specified, +which is messy; +- having a pointer consistently imply optional is clearer for users of the Go +language client, and any other clients that use corresponding types + +Therefore, we ask that pointers always be used with optional fields that do not +have a built-in `nil` value. + + +## Defaulting + +Default resource values are API version-specific, and they are applied during +the conversion from API-versioned declarative configuration to internal objects +representing the desired state (`Spec`) of the resource. Subsequent GETs of the +resource will include the default values explicitly. + +Incorporating the default values into the `Spec` ensures that `Spec` depicts the +full desired state so that it is easier for the system to determine how to +achieve the state, and for the user to know what to anticipate. + +API version-specific default values are set by the API server. + +## Late Initialization + +Late initialization is when resource fields are set by a system controller +after an object is created/updated. + +For example, the scheduler sets the `pod.spec.nodeName` field after the pod is +created. + +Late-initializers should only make the following types of modifications: + - Setting previously unset fields + - Adding keys to maps + - Adding values to arrays which have mergeable semantics +(`patchStrategy:"merge"` attribute in the type definition). + +These conventions: + 1. allow a user (with sufficient privilege) to override any system-default + behaviors by setting the fields that would otherwise have been defaulted. + 1. enables updates from users to be merged with changes made during late +initialization, using strategic merge patch, as opposed to clobbering the +change. + 1. allow the component which does the late-initialization to use strategic +merge patch, which facilitates composition and concurrency of such components. + +Although the apiserver Admission Control stage acts prior to object creation, +Admission Control plugins should follow the Late Initialization conventions +too, to allow their implementation to be later moved to a 'controller', or to +client libraries. + +## Concurrency Control and Consistency + +Kubernetes leverages the concept of *resource versions* to achieve optimistic +concurrency. All Kubernetes resources have a "resourceVersion" field as part of +their metadata. This resourceVersion is a string that identifies the internal +version of an object that can be used by clients to determine when objects have +changed. When a record is about to be updated, it's version is checked against a +pre-saved value, and if it doesn't match, the update fails with a StatusConflict +(HTTP status code 409). + +The resourceVersion is changed by the server every time an object is modified. +If resourceVersion is included with the PUT operation the system will verify +that there have not been other successful mutations to the resource during a +read/modify/write cycle, by verifying that the current value of resourceVersion +matches the specified value. + +The resourceVersion is currently backed by [etcd's +modifiedIndex](https://coreos.com/docs/distributed-configuration/etcd-api/). +However, it's important to note that the application should *not* rely on the +implementation details of the versioning system maintained by Kubernetes. We may +change the implementation of resourceVersion in the future, such as to change it +to a timestamp or per-object counter. + +The only way for a client to know the expected value of resourceVersion is to +have received it from the server in response to a prior operation, typically a +GET. This value MUST be treated as opaque by clients and passed unmodified back +to the server. Clients should not assume that the resource version has meaning +across namespaces, different kinds of resources, or different servers. +Currently, the value of resourceVersion is set to match etcd's sequencer. You +could think of it as a logical clock the API server can use to order requests. +However, we expect the implementation of resourceVersion to change in the +future, such as in the case we shard the state by kind and/or namespace, or port +to another storage system. + +In the case of a conflict, the correct client action at this point is to GET the +resource again, apply the changes afresh, and try submitting again. This +mechanism can be used to prevent races like the following: + +``` +Client #1 Client #2 +GET Foo GET Foo +Set Foo.Bar = "one" Set Foo.Baz = "two" +PUT Foo PUT Foo +``` + +When these sequences occur in parallel, either the change to Foo.Bar or the +change to Foo.Baz can be lost. + +On the other hand, when specifying the resourceVersion, one of the PUTs will +fail, since whichever write succeeds changes the resourceVersion for Foo. + +resourceVersion may be used as a precondition for other operations (e.g., GET, +DELETE) in the future, such as for read-after-write consistency in the presence +of caching. + +"Watch" operations specify resourceVersion using a query parameter. It is used +to specify the point at which to begin watching the specified resources. This +may be used to ensure that no mutations are missed between a GET of a resource +(or list of resources) and a subsequent Watch, even if the current version of +the resource is more recent. This is currently the main reason that list +operations (GET on a collection) return resourceVersion. + + +## Serialization Format + +APIs may return alternative representations of any resource in response to an +Accept header or under alternative endpoints, but the default serialization for +input and output of API responses MUST be JSON. + +Protobuf serialization of API objects are currently **EXPERIMENTAL** and will change without notice. + +All dates should be serialized as RFC3339 strings. + +## Units + +Units must either be explicit in the field name (e.g., `timeoutSeconds`), or +must be specified as part of the value (e.g., `resource.Quantity`). Which +approach is preferred is TBD, though currently we use the `fooSeconds` +convention for durations. + + +## Selecting Fields + +Some APIs may need to identify which field in a JSON object is invalid, or to +reference a value to extract from a separate resource. The current +recommendation is to use standard JavaScript syntax for accessing that field, +assuming the JSON object was transformed into a JavaScript object, without the +leading dot, such as `metadata.name`. + +Examples: + +* Find the field "current" in the object "state" in the second item in the array +"fields": `fields[1].state.current` + +## Object references + +Object references should either be called `fooName` if referring to an object of +kind `Foo` by just the name (within the current namespace, if a namespaced +resource), or should be called `fooRef`, and should contain a subset of the +fields of the `ObjectReference` type. + + +TODO: Plugins, extensions, nested kinds, headers + + +## HTTP Status codes + +The server will respond with HTTP status codes that match the HTTP spec. See the +section below for a breakdown of the types of status codes the server will send. + +The following HTTP status codes may be returned by the API. + +#### Success codes + +* `200 StatusOK` + * Indicates that the request completed successfully. +* `201 StatusCreated` + * Indicates that the request to create kind completed successfully. +* `204 StatusNoContent` + * Indicates that the request completed successfully, and the response contains +no body. + * Returned in response to HTTP OPTIONS requests. + +#### Error codes + +* `307 StatusTemporaryRedirect` + * Indicates that the address for the requested resource has changed. + * Suggested client recovery behavior: + * Follow the redirect. + + +* `400 StatusBadRequest` + * Indicates the requested is invalid. + * Suggested client recovery behavior: + * Do not retry. Fix the request. + + +* `401 StatusUnauthorized` + * Indicates that the server can be reached and understood the request, but +refuses to take any further action, because the client must provide +authorization. If the client has provided authorization, the server is +indicating the provided authorization is unsuitable or invalid. + * Suggested client recovery behavior: + * If the user has not supplied authorization information, prompt them for +the appropriate credentials. If the user has supplied authorization information, +inform them their credentials were rejected and optionally prompt them again. + + +* `403 StatusForbidden` + * Indicates that the server can be reached and understood the request, but +refuses to take any further action, because it is configured to deny access for +some reason to the requested resource by the client. + * Suggested client recovery behavior: + * Do not retry. Fix the request. + + +* `404 StatusNotFound` + * Indicates that the requested resource does not exist. + * Suggested client recovery behavior: + * Do not retry. Fix the request. + + +* `405 StatusMethodNotAllowed` + * Indicates that the action the client attempted to perform on the resource +was not supported by the code. + * Suggested client recovery behavior: + * Do not retry. Fix the request. + + +* `409 StatusConflict` + * Indicates that either the resource the client attempted to create already +exists or the requested update operation cannot be completed due to a conflict. + * Suggested client recovery behavior: + * * If creating a new resource: + * * Either change the identifier and try again, or GET and compare the +fields in the pre-existing object and issue a PUT/update to modify the existing +object. + * * If updating an existing resource: + * See `Conflict` from the `status` response section below on how to +retrieve more information about the nature of the conflict. + * GET and compare the fields in the pre-existing object, merge changes (if +still valid according to preconditions), and retry with the updated request +(including `ResourceVersion`). + + +* `410 StatusGone` + * Indicates that the item is no longer available at the server and no +forwarding address is known. + * Suggested client recovery behavior: + * Do not retry. Fix the request. + + +* `422 StatusUnprocessableEntity` + * Indicates that the requested create or update operation cannot be completed +due to invalid data provided as part of the request. + * Suggested client recovery behavior: + * Do not retry. Fix the request. + + +* `429 StatusTooManyRequests` + * Indicates that the either the client rate limit has been exceeded or the +server has received more requests then it can process. + * Suggested client recovery behavior: + * Read the `Retry-After` HTTP header from the response, and wait at least +that long before retrying. + + +* `500 StatusInternalServerError` + * Indicates that the server can be reached and understood the request, but +either an unexpected internal error occurred and the outcome of the call is +unknown, or the server cannot complete the action in a reasonable time (this may +be due to temporary server load or a transient communication issue with another +server). + * Suggested client recovery behavior: + * Retry with exponential backoff. + + +* `503 StatusServiceUnavailable` + * Indicates that required service is unavailable. + * Suggested client recovery behavior: + * Retry with exponential backoff. + + +* `504 StatusServerTimeout` + * Indicates that the request could not be completed within the given time. +Clients can get this response ONLY when they specified a timeout param in the +request. + * Suggested client recovery behavior: + * Increase the value of the timeout param and retry with exponential +backoff. + +## Response Status Kind + +Kubernetes will always return the `Status` kind from any API endpoint when an +error occurs. Clients SHOULD handle these types of objects when appropriate. + +A `Status` kind will be returned by the API in two cases: + * When an operation is not successful (i.e. when the server would return a non +2xx HTTP status code). + * When a HTTP `DELETE` call is successful. + +The status object is encoded as JSON and provided as the body of the response. +The status object contains fields for humans and machine consumers of the API to +get more detailed information for the cause of the failure. The information in +the status object supplements, but does not override, the HTTP status code's +meaning. When fields in the status object have the same meaning as generally +defined HTTP headers and that header is returned with the response, the header +should be considered as having higher priority. + +**Example:** + +```console +$ curl -v -k -H "Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc" https://10.240.122.184:443/api/v1/namespaces/default/pods/grafana + +> GET /api/v1/namespaces/default/pods/grafana HTTP/1.1 +> User-Agent: curl/7.26.0 +> Host: 10.240.122.184 +> Accept: */* +> Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc +> + +< HTTP/1.1 404 Not Found +< Content-Type: application/json +< Date: Wed, 20 May 2015 18:10:42 GMT +< Content-Length: 232 +< +{ + "kind": "Status", + "apiVersion": "v1", + "metadata": {}, + "status": "Failure", + "message": "pods \"grafana\" not found", + "reason": "NotFound", + "details": { + "name": "grafana", + "kind": "pods" + }, + "code": 404 +} +``` + +`status` field contains one of two possible values: +* `Success` +* `Failure` + +`message` may contain human-readable description of the error + +`reason` may contain a machine-readable, one-word, CamelCase description of why +this operation is in the `Failure` status. If this value is empty there is no +information available. The `reason` clarifies an HTTP status code but does not +override it. + +`details` may contain extended data associated with the reason. Each reason may +define its own extended details. This field is optional and the data returned is +not guaranteed to conform to any schema except that defined by the reason type. + +Possible values for the `reason` and `details` fields: +* `BadRequest` + * Indicates that the request itself was invalid, because the request doesn't +make any sense, for example deleting a read-only object. + * This is different than `status reason` `Invalid` above which indicates that +the API call could possibly succeed, but the data was invalid. + * API calls that return BadRequest can never succeed. + * Http status code: `400 StatusBadRequest` + + +* `Unauthorized` + * Indicates that the server can be reached and understood the request, but +refuses to take any further action without the client providing appropriate +authorization. If the client has provided authorization, this error indicates +the provided credentials are insufficient or invalid. + * Details (optional): + * `kind string` + * The kind attribute of the unauthorized resource (on some operations may +differ from the requested resource). + * `name string` + * The identifier of the unauthorized resource. + * HTTP status code: `401 StatusUnauthorized` + + +* `Forbidden` + * Indicates that the server can be reached and understood the request, but +refuses to take any further action, because it is configured to deny access for +some reason to the requested resource by the client. + * Details (optional): + * `kind string` + * The kind attribute of the forbidden resource (on some operations may +differ from the requested resource). + * `name string` + * The identifier of the forbidden resource. + * HTTP status code: `403 StatusForbidden` + + +* `NotFound` + * Indicates that one or more resources required for this operation could not +be found. + * Details (optional): + * `kind string` + * The kind attribute of the missing resource (on some operations may +differ from the requested resource). + * `name string` + * The identifier of the missing resource. + * HTTP status code: `404 StatusNotFound` + + +* `AlreadyExists` + * Indicates that the resource you are creating already exists. + * Details (optional): + * `kind string` + * The kind attribute of the conflicting resource. + * `name string` + * The identifier of the conflicting resource. + * HTTP status code: `409 StatusConflict` + +* `Conflict` + * Indicates that the requested update operation cannot be completed due to a +conflict. The client may need to alter the request. Each resource may define +custom details that indicate the nature of the conflict. + * HTTP status code: `409 StatusConflict` + + +* `Invalid` + * Indicates that the requested create or update operation cannot be completed +due to invalid data provided as part of the request. + * Details (optional): + * `kind string` + * the kind attribute of the invalid resource + * `name string` + * the identifier of the invalid resource + * `causes` + * One or more `StatusCause` entries indicating the data in the provided +resource that was invalid. The `reason`, `message`, and `field` attributes will +be set. + * HTTP status code: `422 StatusUnprocessableEntity` + + +* `Timeout` + * Indicates that the request could not be completed within the given time. +Clients may receive this response if the server has decided to rate limit the +client, or if the server is overloaded and cannot process the request at this +time. + * Http status code: `429 TooManyRequests` + * The server should set the `Retry-After` HTTP header and return +`retryAfterSeconds` in the details field of the object. A value of `0` is the +default. + + +* `ServerTimeout` + * Indicates that the server can be reached and understood the request, but +cannot complete the action in a reasonable time. This maybe due to temporary +server load or a transient communication issue with another server. + * Details (optional): + * `kind string` + * The kind attribute of the resource being acted on. + * `name string` + * The operation that is being attempted. + * The server should set the `Retry-After` HTTP header and return +`retryAfterSeconds` in the details field of the object. A value of `0` is the +default. + * Http status code: `504 StatusServerTimeout` + + +* `MethodNotAllowed` + * Indicates that the action the client attempted to perform on the resource +was not supported by the code. + * For instance, attempting to delete a resource that can only be created. + * API calls that return MethodNotAllowed can never succeed. + * Http status code: `405 StatusMethodNotAllowed` + + +* `InternalError` + * Indicates that an internal error occurred, it is unexpected and the outcome +of the call is unknown. + * Details (optional): + * `causes` + * The original error. + * Http status code: `500 StatusInternalServerError` `code` may contain the suggested HTTP return code for this status. + + +## Events + +Events are complementary to status information, since they can provide some +historical information about status and occurrences in addition to current or +previous status. Generate events for situations users or administrators should +be alerted about. + +Choose a unique, specific, short, CamelCase reason for each event category. For +example, `FreeDiskSpaceInvalid` is a good event reason because it is likely to +refer to just one situation, but `Started` is not a good reason because it +doesn't sufficiently indicate what started, even when combined with other event +fields. + +`Error creating foo` or `Error creating foo %s` would be appropriate for an +event message, with the latter being preferable, since it is more informational. + +Accumulate repeated events in the client, especially for frequent events, to +reduce data volume, load on the system, and noise exposed to users. + +## Naming conventions + +* Go field names must be CamelCase. JSON field names must be camelCase. Other +than capitalization of the initial letter, the two should almost always match. +No underscores nor dashes in either. +* Field and resource names should be declarative, not imperative (DoSomething, +SomethingDoer, DoneBy, DoneAt). +* Use `Node` where referring to +the node resource in the context of the cluster. Use `Host` where referring to +properties of the individual physical/virtual system, such as `hostname`, +`hostPath`, `hostNetwork`, etc. +* `FooController` is a deprecated kind naming convention. Name the kind after +the thing being controlled instead (e.g., `Job` rather than `JobController`). +* The name of a field that specifies the time at which `something` occurs should +be called `somethingTime`. Do not use `stamp` (e.g., `creationTimestamp`). +* We use the `fooSeconds` convention for durations, as discussed in the [units +subsection](#units). + * `fooPeriodSeconds` is preferred for periodic intervals and other waiting +periods (e.g., over `fooIntervalSeconds`). + * `fooTimeoutSeconds` is preferred for inactivity/unresponsiveness deadlines. + * `fooDeadlineSeconds` is preferred for activity completion deadlines. +* Do not use abbreviations in the API, except where they are extremely commonly +used, such as "id", "args", or "stdin". +* Acronyms should similarly only be used when extremely commonly known. All +letters in the acronym should have the same case, using the appropriate case for +the situation. For example, at the beginning of a field name, the acronym should +be all lowercase, such as "httpGet". Where used as a constant, all letters +should be uppercase, such as "TCP" or "UDP". +* The name of a field referring to another resource of kind `Foo` by name should +be called `fooName`. The name of a field referring to another resource of kind +`Foo` by ObjectReference (or subset thereof) should be called `fooRef`. +* More generally, include the units and/or type in the field name if they could +be ambiguous and they are not specified by the value or value type. + +## Label, selector, and annotation conventions + +Labels are the domain of users. They are intended to facilitate organization and +management of API resources using attributes that are meaningful to users, as +opposed to meaningful to the system. Think of them as user-created mp3 or email +inbox labels, as opposed to the directory structure used by a program to store +its data. The former enables the user to apply an arbitrary ontology, whereas +the latter is implementation-centric and inflexible. Users will use labels to +select resources to operate on, display label values in CLI/UI columns, etc. +Users should always retain full power and flexibility over the label schemas +they apply to labels in their namespaces. + +However, we should support conveniences for common cases by default. For +example, what we now do in ReplicationController is automatically set the RC's +selector and labels to the labels in the pod template by default, if they are +not already set. That ensures that the selector will match the template, and +that the RC can be managed using the same labels as the pods it creates. Note +that once we generalize selectors, it won't necessarily be possible to +unambiguously generate labels that match an arbitrary selector. + +If the user wants to apply additional labels to the pods that it doesn't select +upon, such as to facilitate adoption of pods or in the expectation that some +label values will change, they can set the selector to a subset of the pod +labels. Similarly, the RC's labels could be initialized to a subset of the pod +template's labels, or could include additional/different labels. + +For disciplined users managing resources within their own namespaces, it's not +that hard to consistently apply schemas that ensure uniqueness. One just needs +to ensure that at least one value of some label key in common differs compared +to all other comparable resources. We could/should provide a verification tool +to check that. However, development of conventions similar to the examples in +[Labels](../user-guide/labels.md) make uniqueness straightforward. Furthermore, +relatively narrowly used namespaces (e.g., per environment, per application) can +be used to reduce the set of resources that could potentially cause overlap. + +In cases where users could be running misc. examples with inconsistent schemas, +or where tooling or components need to programmatically generate new objects to +be selected, there needs to be a straightforward way to generate unique label +sets. A simple way to ensure uniqueness of the set is to ensure uniqueness of a +single label value, such as by using a resource name, uid, resource hash, or +generation number. + +Problems with uids and hashes, however, include that they have no semantic +meaning to the user, are not memorable nor readily recognizable, and are not +predictable. Lack of predictability obstructs use cases such as creation of a +replication controller from a pod, such as people want to do when exploring the +system, bootstrapping a self-hosted cluster, or deletion and re-creation of a +new RC that adopts the pods of the previous one, such as to rename it. +Generation numbers are more predictable and much clearer, assuming there is a +logical sequence. Fortunately, for deployments that's the case. For jobs, use of +creation timestamps is common internally. Users should always be able to turn +off auto-generation, in order to permit some of the scenarios described above. +Note that auto-generated labels will also become one more field that needs to be +stripped out when cloning a resource, within a namespace, in a new namespace, in +a new cluster, etc., and will need to be ignored around when updating a resource +via patch or read-modify-write sequence. + +Inclusion of a system prefix in a label key is fairly hostile to UX. A prefix is +only necessary in the case that the user cannot choose the label key, in order +to avoid collisions with user-defined labels. However, I firmly believe that the +user should always be allowed to select the label keys to use on their +resources, so it should always be possible to override default label keys. + +Therefore, resources supporting auto-generation of unique labels should have a +`uniqueLabelKey` field, so that the user could specify the key if they wanted +to, but if unspecified, it could be set by default, such as to the resource +type, like job, deployment, or replicationController. The value would need to be +at least spatially unique, and perhaps temporally unique in the case of job. + +Annotations have very different intended usage from labels. We expect them to be +primarily generated and consumed by tooling and system extensions. I'm inclined +to generalize annotations to permit them to directly store arbitrary json. Rigid +names and name prefixes make sense, since they are analogous to API fields. + +In fact, in-development API fields, including those used to represent fields of +newer alpha/beta API versions in the older stable storage version, may be +represented as annotations with the form `something.alpha.kubernetes.io/name` or +`something.beta.kubernetes.io/name` (depending on our confidence in it). For +example `net.alpha.kubernetes.io/policy` might represent an experimental network +policy field. The "name" portion of the annotation should follow the below +conventions for annotations. When an annotation gets promoted to a field, the +name transformation should then be mechanical: `foo-bar` becomes `fooBar`. + +Other advice regarding use of labels, annotations, and other generic map keys by +Kubernetes components and tools: + - Key names should be all lowercase, with words separated by dashes, such as +`desired-replicas` + - Prefix the key with `kubernetes.io/` or `foo.kubernetes.io/`, preferably the +latter if the label/annotation is specific to `foo` + - For instance, prefer `service-account.kubernetes.io/name` over +`kubernetes.io/service-account.name` + - Use annotations to store API extensions that the controller responsible for +the resource doesn't need to know about, experimental fields that aren't +intended to be generally used API fields, etc. Beware that annotations aren't +automatically handled by the API conversion machinery. + + +## WebSockets and SPDY + +Some of the API operations exposed by Kubernetes involve transfer of binary +streams between the client and a container, including attach, exec, portforward, +and logging. The API therefore exposes certain operations over upgradeable HTTP +connections ([described in RFC 2817](https://tools.ietf.org/html/rfc2817)) via +the WebSocket and SPDY protocols. These actions are exposed as subresources with +their associated verbs (exec, log, attach, and portforward) and are requested +via a GET (to support JavaScript in a browser) and POST (semantically accurate). + +There are two primary protocols in use today: + +1. Streamed channels + + When dealing with multiple independent binary streams of data such as the +remote execution of a shell command (writing to STDIN, reading from STDOUT and +STDERR) or forwarding multiple ports the streams can be multiplexed onto a +single TCP connection. Kubernetes supports a SPDY based framing protocol that +leverages SPDY channels and a WebSocket framing protocol that multiplexes +multiple channels onto the same stream by prefixing each binary chunk with a +byte indicating its channel. The WebSocket protocol supports an optional +subprotocol that handles base64-encoded bytes from the client and returns +base64-encoded bytes from the server and character based channel prefixes ('0', +'1', '2') for ease of use from JavaScript in a browser. + +2. Streaming response + + The default log output for a channel of streaming data is an HTTP Chunked +Transfer-Encoding, which can return an arbitrary stream of binary data from the +server. Browser-based JavaScript is limited in its ability to access the raw +data from a chunked response, especially when very large amounts of logs are +returned, and in future API calls it may be desirable to transfer large files. +The streaming API endpoints support an optional WebSocket upgrade that provides +a unidirectional channel from the server to the client and chunks data as binary +WebSocket frames. An optional WebSocket subprotocol is exposed that base64 +encodes the stream before returning it to the client. + +Clients should use the SPDY protocols if their clients have native support, or +WebSockets as a fallback. Note that WebSockets is susceptible to Head-of-Line +blocking and so clients must read and process each message sequentially. In +the future, an HTTP/2 implementation will be exposed that deprecates SPDY. + + +## Validation + +API objects are validated upon receipt by the apiserver. Validation errors are +flagged and returned to the caller in a `Failure` status with `reason` set to +`Invalid`. In order to facilitate consistent error messages, we ask that +validation logic adheres to the following guidelines whenever possible (though +exceptional cases will exist). + +* Be as precise as possible. +* Telling users what they CAN do is more useful than telling them what they +CANNOT do. +* When asserting a requirement in the positive, use "must". Examples: "must be +greater than 0", "must match regex '[a-z]+'". Words like "should" imply that +the assertion is optional, and must be avoided. +* When asserting a formatting requirement in the negative, use "must not". +Example: "must not contain '..'". Words like "should not" imply that the +assertion is optional, and must be avoided. +* When asserting a behavioral requirement in the negative, use "may not". +Examples: "may not be specified when otherField is empty", "only `name` may be +specified". +* When referencing a literal string value, indicate the literal in +single-quotes. Example: "must not contain '..'". +* When referencing another field name, indicate the name in back-quotes. +Example: "must be greater than `request`". +* When specifying inequalities, use words rather than symbols. Examples: "must +be less than 256", "must be greater than or equal to 0". Do not use words +like "larger than", "bigger than", "more than", "higher than", etc. +* When specifying numeric ranges, use inclusive ranges when possible. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/api-conventions.md?pixel)]() + diff --git a/contributors/devel/api_changes.md b/contributors/devel/api_changes.md new file mode 100755 index 00000000000..963deb7ceff --- /dev/null +++ b/contributors/devel/api_changes.md @@ -0,0 +1,732 @@ +*This document is oriented at developers who want to change existing APIs. +A set of API conventions, which applies to new APIs and to changes, can be +found at [API Conventions](api-conventions.md). + +**Table of Contents** + + +- [So you want to change the API?](#so-you-want-to-change-the-api) + - [Operational overview](#operational-overview) + - [On compatibility](#on-compatibility) + - [Incompatible API changes](#incompatible-api-changes) + - [Changing versioned APIs](#changing-versioned-apis) + - [Edit types.go](#edit-typesgo) + - [Edit defaults.go](#edit-defaultsgo) + - [Edit conversion.go](#edit-conversiongo) + - [Changing the internal structures](#changing-the-internal-structures) + - [Edit types.go](#edit-typesgo-1) + - [Edit validation.go](#edit-validationgo) + - [Edit version conversions](#edit-version-conversions) + - [Generate protobuf objects](#generate-protobuf-objects) + - [Edit json (un)marshaling code](#edit-json-unmarshaling-code) + - [Making a new API Group](#making-a-new-api-group) + - [Update the fuzzer](#update-the-fuzzer) + - [Update the semantic comparisons](#update-the-semantic-comparisons) + - [Implement your change](#implement-your-change) + - [Write end-to-end tests](#write-end-to-end-tests) + - [Examples and docs](#examples-and-docs) + - [Alpha, Beta, and Stable Versions](#alpha-beta-and-stable-versions) + - [Adding Unstable Features to Stable Versions](#adding-unstable-features-to-stable-versions) + + + +# So you want to change the API? + +Before attempting a change to the API, you should familiarize yourself with a +number of existing API types and with the [API conventions](api-conventions.md). +If creating a new API type/resource, we also recommend that you first send a PR +containing just a proposal for the new API types, and that you initially target +the extensions API (pkg/apis/extensions). + +The Kubernetes API has two major components - the internal structures and +the versioned APIs. The versioned APIs are intended to be stable, while the +internal structures are implemented to best reflect the needs of the Kubernetes +code itself. + +What this means for API changes is that you have to be somewhat thoughtful in +how you approach changes, and that you have to touch a number of pieces to make +a complete change. This document aims to guide you through the process, though +not all API changes will need all of these steps. + +## Operational overview + +It is important to have a high level understanding of the API system used in +Kubernetes in order to navigate the rest of this document. + +As mentioned above, the internal representation of an API object is decoupled +from any one API version. This provides a lot of freedom to evolve the code, +but it requires robust infrastructure to convert between representations. There +are multiple steps in processing an API operation - even something as simple as +a GET involves a great deal of machinery. + +The conversion process is logically a "star" with the internal form at the +center. Every versioned API can be converted to the internal form (and +vice-versa), but versioned APIs do not convert to other versioned APIs directly. +This sounds like a heavy process, but in reality we do not intend to keep more +than a small number of versions alive at once. While all of the Kubernetes code +operates on the internal structures, they are always converted to a versioned +form before being written to storage (disk or etcd) or being sent over a wire. +Clients should consume and operate on the versioned APIs exclusively. + +To demonstrate the general process, here is a (hypothetical) example: + + 1. A user POSTs a `Pod` object to `/api/v7beta1/...` + 2. The JSON is unmarshalled into a `v7beta1.Pod` structure + 3. Default values are applied to the `v7beta1.Pod` + 4. The `v7beta1.Pod` is converted to an `api.Pod` structure + 5. The `api.Pod` is validated, and any errors are returned to the user + 6. The `api.Pod` is converted to a `v6.Pod` (because v6 is the latest stable +version) + 7. The `v6.Pod` is marshalled into JSON and written to etcd + +Now that we have the `Pod` object stored, a user can GET that object in any +supported api version. For example: + + 1. A user GETs the `Pod` from `/api/v5/...` + 2. The JSON is read from etcd and unmarshalled into a `v6.Pod` structure + 3. Default values are applied to the `v6.Pod` + 4. The `v6.Pod` is converted to an `api.Pod` structure + 5. The `api.Pod` is converted to a `v5.Pod` structure + 6. The `v5.Pod` is marshalled into JSON and sent to the user + +The implication of this process is that API changes must be done carefully and +backward-compatibly. + +## On compatibility + +Before talking about how to make API changes, it is worthwhile to clarify what +we mean by API compatibility. Kubernetes considers forwards and backwards +compatibility of its APIs a top priority. + +An API change is considered forward and backward-compatible if it: + + * adds new functionality that is not required for correct behavior (e.g., +does not add a new required field) + * does not change existing semantics, including: + * default values and behavior + * interpretation of existing API types, fields, and values + * which fields are required and which are not + +Put another way: + +1. Any API call (e.g. a structure POSTed to a REST endpoint) that worked before +your change must work the same after your change. +2. Any API call that uses your change must not cause problems (e.g. crash or +degrade behavior) when issued against servers that do not include your change. +3. It must be possible to round-trip your change (convert to different API +versions and back) with no loss of information. +4. Existing clients need not be aware of your change in order for them to +continue to function as they did previously, even when your change is utilized. + +If your change does not meet these criteria, it is not considered strictly +compatible, and may break older clients, or result in newer clients causing +undefined behavior. + +Let's consider some examples. In a hypothetical API (assume we're at version +v6), the `Frobber` struct looks something like this: + +```go +// API v6. +type Frobber struct { + Height int `json:"height"` + Param string `json:"param"` +} +``` + +You want to add a new `Width` field. It is generally safe to add new fields +without changing the API version, so you can simply change it to: + +```go +// Still API v6. +type Frobber struct { + Height int `json:"height"` + Width int `json:"width"` + Param string `json:"param"` +} +``` + +The onus is on you to define a sane default value for `Width` such that rule #1 +above is true - API calls and stored objects that used to work must continue to +work. + +For your next change you want to allow multiple `Param` values. You can not +simply change `Param string` to `Params []string` (without creating a whole new +API version) - that fails rules #1 and #2. You can instead do something like: + +```go +// Still API v6, but kind of clumsy. +type Frobber struct { + Height int `json:"height"` + Width int `json:"width"` + Param string `json:"param"` // the first param + ExtraParams []string `json:"extraParams"` // additional params +} +``` + +Now you can satisfy the rules: API calls that provide the old style `Param` +will still work, while servers that don't understand `ExtraParams` can ignore +it. This is somewhat unsatisfying as an API, but it is strictly compatible. + +Part of the reason for versioning APIs and for using internal structs that are +distinct from any one version is to handle growth like this. The internal +representation can be implemented as: + +```go +// Internal, soon to be v7beta1. +type Frobber struct { + Height int + Width int + Params []string +} +``` + +The code that converts to/from versioned APIs can decode this into the somewhat +uglier (but compatible!) structures. Eventually, a new API version, let's call +it v7beta1, will be forked and it can use the clean internal structure. + +We've seen how to satisfy rules #1 and #2. Rule #3 means that you can not +extend one versioned API without also extending the others. For example, an +API call might POST an object in API v7beta1 format, which uses the cleaner +`Params` field, but the API server might store that object in trusty old v6 +form (since v7beta1 is "beta"). When the user reads the object back in the +v7beta1 API it would be unacceptable to have lost all but `Params[0]`. This +means that, even though it is ugly, a compatible change must be made to the v6 +API. + +However, this is very challenging to do correctly. It often requires multiple +representations of the same information in the same API resource, which need to +be kept in sync in the event that either is changed. For example, let's say you +decide to rename a field within the same API version. In this case, you add +units to `height` and `width`. You implement this by adding duplicate fields: + +```go +type Frobber struct { + Height *int `json:"height"` + Width *int `json:"width"` + HeightInInches *int `json:"heightInInches"` + WidthInInches *int `json:"widthInInches"` +} +``` + +You convert all of the fields to pointers in order to distinguish between unset +and set to 0, and then set each corresponding field from the other in the +defaulting pass (e.g., `heightInInches` from `height`, and vice versa), which +runs just prior to conversion. That works fine when the user creates a resource +from a hand-written configuration -- clients can write either field and read +either field, but what about creation or update from the output of GET, or +update via PATCH (see +[In-place updates](../user-guide/managing-deployments.md#in-place-updates-of-resources))? +In this case, the two fields will conflict, because only one field would be +updated in the case of an old client that was only aware of the old field (e.g., +`height`). + +Say the client creates: + +```json +{ + "height": 10, + "width": 5 +} +``` + +and GETs: + +```json +{ + "height": 10, + "heightInInches": 10, + "width": 5, + "widthInInches": 5 +} +``` + +then PUTs back: + +```json +{ + "height": 13, + "heightInInches": 10, + "width": 5, + "widthInInches": 5 +} +``` + +The update should not fail, because it would have worked before `heightInInches` +was added. + +Therefore, when there are duplicate fields, the old field MUST take precedence +over the new, and the new field should be set to match by the server upon write. +A new client would be aware of the old field as well as the new, and so can +ensure that the old field is either unset or is set consistently with the new +field. However, older clients would be unaware of the new field. Please avoid +introducing duplicate fields due to the complexity they incur in the API. + +A new representation, even in a new API version, that is more expressive than an +old one breaks backward compatibility, since clients that only understood the +old representation would not be aware of the new representation nor its +semantics. Examples of proposals that have run into this challenge include +[generalized label selectors](http://issues.k8s.io/341) and [pod-level security +context](http://prs.k8s.io/12823). + +As another interesting example, enumerated values cause similar challenges. +Adding a new value to an enumerated set is *not* a compatible change. Clients +which assume they know how to handle all possible values of a given field will +not be able to handle the new values. However, removing value from an enumerated +set *can* be a compatible change, if handled properly (treat the removed value +as deprecated but allowed). This is actually a special case of a new +representation, discussed above. + +For [Unions](api-conventions.md#unions), sets of fields where at most one should +be set, it is acceptable to add a new option to the union if the [appropriate +conventions](api-conventions.md#objects) were followed in the original object. +Removing an option requires following the deprecation process. + +## Incompatible API changes + +There are times when this might be OK, but mostly we want changes that meet this +definition. If you think you need to break compatibility, you should talk to the +Kubernetes team first. + +Breaking compatibility of a beta or stable API version, such as v1, is +unacceptable. Compatibility for experimental or alpha APIs is not strictly +required, but breaking compatibility should not be done lightly, as it disrupts +all users of the feature. Experimental APIs may be removed. Alpha and beta API +versions may be deprecated and eventually removed wholesale, as described in the +[versioning document](../design/versioning.md). Document incompatible changes +across API versions under the appropriate +[{v? conversion tips tag in the api.md doc](../api.md). + +If your change is going to be backward incompatible or might be a breaking +change for API consumers, please send an announcement to +`kubernetes-dev@googlegroups.com` before the change gets in. If you are unsure, +ask. Also make sure that the change gets documented in the release notes for the +next release by labeling the PR with the "release-note" github label. + +If you found that your change accidentally broke clients, it should be reverted. + +In short, the expected API evolution is as follows: + +* `extensions/v1alpha1` -> +* `newapigroup/v1alpha1` -> ... -> `newapigroup/v1alphaN` -> +* `newapigroup/v1beta1` -> ... -> `newapigroup/v1betaN` -> +* `newapigroup/v1` -> +* `newapigroup/v2alpha1` -> ... + +While in extensions we have no obligation to move forward with the API at all +and may delete or break it at any time. + +While in alpha we expect to move forward with it, but may break it. + +Once in beta we will preserve forward compatibility, but may introduce new +versions and delete old ones. + +v1 must be backward-compatible for an extended length of time. + +## Changing versioned APIs + +For most changes, you will probably find it easiest to change the versioned +APIs first. This forces you to think about how to make your change in a +compatible way. Rather than doing each step in every version, it's usually +easier to do each versioned API one at a time, or to do all of one version +before starting "all the rest". + +### Edit types.go + +The struct definitions for each API are in `pkg/api//types.go`. Edit +those files to reflect the change you want to make. Note that all types and +non-inline fields in versioned APIs must be preceded by descriptive comments - +these are used to generate documentation. Comments for types should not contain +the type name; API documentation is generated from these comments and end-users +should not be exposed to golang type names. + +Optional fields should have the `,omitempty` json tag; fields are interpreted as +being required otherwise. + +### Edit defaults.go + +If your change includes new fields for which you will need default values, you +need to add cases to `pkg/api//defaults.go`. Of course, since you +have added code, you have to add a test: `pkg/api//defaults_test.go`. + +Do use pointers to scalars when you need to distinguish between an unset value +and an automatic zero value. For example, +`PodSpec.TerminationGracePeriodSeconds` is defined as `*int64` the go type +definition. A zero value means 0 seconds, and a nil value asks the system to +pick a default. + +Don't forget to run the tests! + +### Edit conversion.go + +Given that you have not yet changed the internal structs, this might feel +premature, and that's because it is. You don't yet have anything to convert to +or from. We will revisit this in the "internal" section. If you're doing this +all in a different order (i.e. you started with the internal structs), then you +should jump to that topic below. In the very rare case that you are making an +incompatible change you might or might not want to do this now, but you will +have to do more later. The files you want are +`pkg/api//conversion.go` and `pkg/api//conversion_test.go`. + +Note that the conversion machinery doesn't generically handle conversion of +values, such as various kinds of field references and API constants. [The client +library](../../pkg/client/restclient/request.go) has custom conversion code for +field references. You also need to add a call to +api.Scheme.AddFieldLabelConversionFunc with a mapping function that understands +supported translations. + +## Changing the internal structures + +Now it is time to change the internal structs so your versioned changes can be +used. + +### Edit types.go + +Similar to the versioned APIs, the definitions for the internal structs are in +`pkg/api/types.go`. Edit those files to reflect the change you want to make. +Keep in mind that the internal structs must be able to express *all* of the +versioned APIs. + +## Edit validation.go + +Most changes made to the internal structs need some form of input validation. +Validation is currently done on internal objects in +`pkg/api/validation/validation.go`. This validation is the one of the first +opportunities we have to make a great user experience - good error messages and +thorough validation help ensure that users are giving you what you expect and, +when they don't, that they know why and how to fix it. Think hard about the +contents of `string` fields, the bounds of `int` fields and the +requiredness/optionalness of fields. + +Of course, code needs tests - `pkg/api/validation/validation_test.go`. + +## Edit version conversions + +At this point you have both the versioned API changes and the internal +structure changes done. If there are any notable differences - field names, +types, structural change in particular - you must add some logic to convert +versioned APIs to and from the internal representation. If you see errors from +the `serialization_test`, it may indicate the need for explicit conversions. + +Performance of conversions very heavily influence performance of apiserver. +Thus, we are auto-generating conversion functions that are much more efficient +than the generic ones (which are based on reflections and thus are highly +inefficient). + +The conversion code resides with each versioned API. There are two files: + + - `pkg/api//conversion.go` containing manually written conversion +functions + - `pkg/api//conversion_generated.go` containing auto-generated +conversion functions + - `pkg/apis/extensions//conversion.go` containing manually written +conversion functions + - `pkg/apis/extensions//conversion_generated.go` containing +auto-generated conversion functions + +Since auto-generated conversion functions are using manually written ones, +those manually written should be named with a defined convention, i.e. a +function converting type X in pkg a to type Y in pkg b, should be named: +`convert_a_X_To_b_Y`. + +Also note that you can (and for efficiency reasons should) use auto-generated +conversion functions when writing your conversion functions. + +Once all the necessary manually written conversions are added, you need to +regenerate auto-generated ones. To regenerate them run: + +```sh +hack/update-codegen.sh +``` + +As part of the build, kubernetes will also generate code to handle deep copy of +your versioned api objects. The deep copy code resides with each versioned API: + - `/zz_generated.deepcopy.go` containing auto-generated copy functions + +If regeneration is somehow not possible due to compile errors, the easiest +workaround is to comment out the code causing errors and let the script to +regenerate it. If the auto-generated conversion methods are not used by the +manually-written ones, it's fine to just remove the whole file and let the +generator to create it from scratch. + +Unsurprisingly, adding manually written conversion also requires you to add +tests to `pkg/api//conversion_test.go`. + + +## Generate protobuf objects + +For any core API object, we also need to generate the Protobuf IDL and marshallers. +That generation is done with + +```sh +hack/update-generated-protobuf.sh +``` + +The vast majority of objects will not need any consideration when converting +to protobuf, but be aware that if you depend on a Golang type in the standard +library there may be additional work required, although in practice we typically +use our own equivalents for JSON serialization. The `pkg/api/serialization_test.go` +will verify that your protobuf serialization preserves all fields - be sure to +run it several times to ensure there are no incompletely calculated fields. + +## Edit json (un)marshaling code + +We are auto-generating code for marshaling and unmarshaling json representation +of api objects - this is to improve the overall system performance. + +The auto-generated code resides with each versioned API: + + - `pkg/api//types.generated.go` + - `pkg/apis/extensions//types.generated.go` + +To regenerate them run: + +```sh +hack/update-codecgen.sh +``` + +## Making a new API Group + +This section is under construction, as we make the tooling completely generic. + +At the moment, you'll have to make a new directory under `pkg/apis/`; copy the +directory structure from `pkg/apis/authentication`. Add the new group/version to all +of the `hack/{verify,update}-generated-{deep-copy,conversions,swagger}.sh` files +in the appropriate places--it should just require adding your new group/version +to a bash array. See [docs on adding an API group](adding-an-APIGroup.md) for +more. + +Adding API groups outside of the `pkg/apis/` directory is not currently +supported, but is clearly desirable. The deep copy & conversion generators need +to work by parsing go files instead of by reflection; then they will be easy to +point at arbitrary directories: see issue [#13775](http://issue.k8s.io/13775). + +## Update the fuzzer + +Part of our testing regimen for APIs is to "fuzz" (fill with random values) API +objects and then convert them to and from the different API versions. This is +a great way of exposing places where you lost information or made bad +assumptions. If you have added any fields which need very careful formatting +(the test does not run validation) or if you have made assumptions such as +"this slice will always have at least 1 element", you may get an error or even +a panic from the `serialization_test`. If so, look at the diff it produces (or +the backtrace in case of a panic) and figure out what you forgot. Encode that +into the fuzzer's custom fuzz functions. Hint: if you added defaults for a +field, that field will need to have a custom fuzz function that ensures that the +field is fuzzed to a non-empty value. + +The fuzzer can be found in `pkg/api/testing/fuzzer.go`. + +## Update the semantic comparisons + +VERY VERY rarely is this needed, but when it hits, it hurts. In some rare cases +we end up with objects (e.g. resource quantities) that have morally equivalent +values with different bitwise representations (e.g. value 10 with a base-2 +formatter is the same as value 0 with a base-10 formatter). The only way Go +knows how to do deep-equality is through field-by-field bitwise comparisons. +This is a problem for us. + +The first thing you should do is try not to do that. If you really can't avoid +this, I'd like to introduce you to our `semantic DeepEqual` routine. It supports +custom overrides for specific types - you can find that in `pkg/api/helpers.go`. + +There's one other time when you might have to touch this: `unexported fields`. +You see, while Go's `reflect` package is allowed to touch `unexported fields`, +us mere mortals are not - this includes `semantic DeepEqual`. Fortunately, most +of our API objects are "dumb structs" all the way down - all fields are exported +(start with a capital letter) and there are no unexported fields. But sometimes +you want to include an object in our API that does have unexported fields +somewhere in it (for example, `time.Time` has unexported fields). If this hits +you, you may have to touch the `semantic DeepEqual` customization functions. + +## Implement your change + +Now you have the API all changed - go implement whatever it is that you're +doing! + +## Write end-to-end tests + +Check out the [E2E docs](e2e-tests.md) for detailed information about how to +write end-to-end tests for your feature. + +## Examples and docs + +At last, your change is done, all unit tests pass, e2e passes, you're done, +right? Actually, no. You just changed the API. If you are touching an existing +facet of the API, you have to try *really* hard to make sure that *all* the +examples and docs are updated. There's no easy way to do this, due in part to +JSON and YAML silently dropping unknown fields. You're clever - you'll figure it +out. Put `grep` or `ack` to good use. + +If you added functionality, you should consider documenting it and/or writing +an example to illustrate your change. + +Make sure you update the swagger and OpenAPI spec by running: + +```sh +hack/update-swagger-spec.sh +hack/update-openapi-spec.sh +``` + +The API spec changes should be in a commit separate from your other changes. + +## Alpha, Beta, and Stable Versions + +New feature development proceeds through a series of stages of increasing +maturity: + +- Development level + - Object Versioning: no convention + - Availability: not committed to main kubernetes repo, and thus not available +in official releases + - Audience: other developers closely collaborating on a feature or +proof-of-concept + - Upgradeability, Reliability, Completeness, and Support: no requirements or +guarantees +- Alpha level + - Object Versioning: API version name contains `alpha` (e.g. `v1alpha1`) + - Availability: committed to main kubernetes repo; appears in an official +release; feature is disabled by default, but may be enabled by flag + - Audience: developers and expert users interested in giving early feedback on +features + - Completeness: some API operations, CLI commands, or UI support may not be +implemented; the API need not have had an *API review* (an intensive and +targeted review of the API, on top of a normal code review) + - Upgradeability: the object schema and semantics may change in a later +software release, without any provision for preserving objects in an existing +cluster; removing the upgradability concern allows developers to make rapid +progress; in particular, API versions can increment faster than the minor +release cadence and the developer need not maintain multiple versions; +developers should still increment the API version when object schema or +semantics change in an [incompatible way](#on-compatibility) + - Cluster Reliability: because the feature is relatively new, and may lack +complete end-to-end tests, enabling the feature via a flag might expose bugs +with destabilize the cluster (e.g. a bug in a control loop might rapidly create +excessive numbers of object, exhausting API storage). + - Support: there is *no commitment* from the project to complete the feature; +the feature may be dropped entirely in a later software release + - Recommended Use Cases: only in short-lived testing clusters, due to +complexity of upgradeability and lack of long-term support and lack of +upgradability. +- Beta level: + - Object Versioning: API version name contains `beta` (e.g. `v2beta3`) + - Availability: in official Kubernetes releases, and enabled by default + - Audience: users interested in providing feedback on features + - Completeness: all API operations, CLI commands, and UI support should be +implemented; end-to-end tests complete; the API has had a thorough API review +and is thought to be complete, though use during beta may frequently turn up API +issues not thought of during review + - Upgradeability: the object schema and semantics may change in a later +software release; when this happens, an upgrade path will be documented; in some +cases, objects will be automatically converted to the new version; in other +cases, a manual upgrade may be necessary; a manual upgrade may require downtime +for anything relying on the new feature, and may require manual conversion of +objects to the new version; when manual conversion is necessary, the project +will provide documentation on the process (for an example, see [v1 conversion +tips](../api.md#v1-conversion-tips)) + - Cluster Reliability: since the feature has e2e tests, enabling the feature +via a flag should not create new bugs in unrelated features; because the feature +is new, it may have minor bugs + - Support: the project commits to complete the feature, in some form, in a +subsequent Stable version; typically this will happen within 3 months, but +sometimes longer; releases should simultaneously support two consecutive +versions (e.g. `v1beta1` and `v1beta2`; or `v1beta2` and `v1`) for at least one +minor release cycle (typically 3 months) so that users have enough time to +upgrade and migrate objects + - Recommended Use Cases: in short-lived testing clusters; in production +clusters as part of a short-lived evaluation of the feature in order to provide +feedback +- Stable level: + - Object Versioning: API version `vX` where `X` is an integer (e.g. `v1`) + - Availability: in official Kubernetes releases, and enabled by default + - Audience: all users + - Completeness: same as beta + - Upgradeability: only [strictly compatible](#on-compatibility) changes +allowed in subsequent software releases + - Cluster Reliability: high + - Support: API version will continue to be present for many subsequent +software releases; + - Recommended Use Cases: any + +### Adding Unstable Features to Stable Versions + +When adding a feature to an object which is already Stable, the new fields and +new behaviors need to meet the Stable level requirements. If these cannot be +met, then the new field cannot be added to the object. + +For example, consider the following object: + +```go +// API v6. +type Frobber struct { + Height int `json:"height"` + Param string `json:"param"` +} +``` + +A developer is considering adding a new `Width` parameter, like this: + +```go +// API v6. +type Frobber struct { + Height int `json:"height"` + Width int `json:"height"` + Param string `json:"param"` +} +``` + +However, the new feature is not stable enough to be used in a stable version +(`v6`). Some reasons for this might include: + +- the final representation is undecided (e.g. should it be called `Width` or +`Breadth`?) +- the implementation is not stable enough for general use (e.g. the `Area()` +routine sometimes overflows.) + +The developer cannot add the new field until stability is met. However, +sometimes stability cannot be met until some users try the new feature, and some +users are only able or willing to accept a released version of Kubernetes. In +that case, the developer has a few options, both of which require staging work +over several releases. + + +A preferred option is to first make a release where the new value (`Width` in +this example) is specified via an annotation, like this: + +```go +kind: frobber +version: v6 +metadata: + name: myfrobber + annotations: + frobbing.alpha.kubernetes.io/width: 2 +height: 4 +param: "green and blue" +``` + +This format allows users to specify the new field, but makes it clear that they +are using a Alpha feature when they do, since the word `alpha` is in the +annotation key. + +Another option is to introduce a new type with an new `alpha` or `beta` version +designator, like this: + +``` +// API v6alpha2 +type Frobber struct { + Height int `json:"height"` + Width int `json:"height"` + Param string `json:"param"` +} +``` + +The latter requires that all objects in the same API group as `Frobber` to be +replicated in the new version, `v6alpha2`. This also requires user to use a new +client which uses the other version. Therefore, this is not a preferred option. + +A related issue is how a cluster manager can roll back from a new version +with a new feature, that is already being used by users. See +https://github.com/kubernetes/kubernetes/issues/4855. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/api_changes.md?pixel)]() + diff --git a/contributors/devel/automation.md b/contributors/devel/automation.md new file mode 100644 index 00000000000..3a9f17547f2 --- /dev/null +++ b/contributors/devel/automation.md @@ -0,0 +1,116 @@ +# Kubernetes Development Automation + +## Overview + +Kubernetes uses a variety of automated tools in an attempt to relieve developers +of repetitive, low brain power work. This document attempts to describe these +processes. + + +## Submit Queue + +In an effort to + * reduce load on core developers + * maintain e2e stability + * load test github's label feature + +We have added an automated [submit-queue] +(https://github.com/kubernetes/contrib/blob/master/mungegithub/mungers/submit-queue.go) +to the +[github "munger"](https://github.com/kubernetes/contrib/tree/master/mungegithub) +for kubernetes. + +The submit-queue does the following: + +```go +for _, pr := range readyToMergePRs() { + if testsAreStable() { + if retestPR(pr) == success { + mergePR(pr) + } + } +} +``` + +The status of the submit-queue is [online.](http://submit-queue.k8s.io/) + +### Ready to merge status + +The submit-queue lists what it believes are required on the [merge requirements tab](http://submit-queue.k8s.io/#/info) of the info page. That may be more up to date. + +A PR is considered "ready for merging" if it matches the following: + * The PR must have the label "cla: yes" or "cla: human-approved" + * The PR must be mergeable. aka cannot need a rebase + * All of the following github statuses must be green + * Jenkins GCE Node e2e + * Jenkins GCE e2e + * Jenkins unit/integration + * The PR cannot have any prohibited future milestones (such as a v1.5 milestone during v1.4 code freeze) + * The PR must have the "lgtm" label. The "lgtm" label is automatically applied + following a review comment consisting of only "LGTM" (case-insensitive) + * The PR must not have been updated since the "lgtm" label was applied + * The PR must not have the "do-not-merge" label + +### Merge process + +Merges _only_ occur when the [critical builds](http://submit-queue.k8s.io/#/e2e) +are passing. We're open to including more builds here, let us know... + +Merges are serialized, so only a single PR is merged at a time, to ensure +against races. + +If the PR has the `retest-not-required` label, it is simply merged. If the PR does +not have this label the e2e, unit/integration, and node tests are re-run. If these +tests pass a second time, the PR will be merged as long as the `critical builds` are +green when this PR finishes retesting. + +## Github Munger + +We run [github "mungers"](https://github.com/kubernetes/contrib/tree/master/mungegithub). + +This runs repeatedly over github pulls and issues and runs modular "mungers" +similar to "mungedocs." The mungers include the 'submit-queue' referenced above along +with numerous other functions. See the README in the link above. + +Please feel free to unleash your creativity on this tool, send us new mungers +that you think will help support the Kubernetes development process. + +### Closing stale pull-requests + +Github Munger will close pull-requests that don't have human activity in the +last 90 days. It will warn about this process 60 days before closing the +pull-request, and warn again 30 days later. One way to prevent this from +happening is to add the "keep-open" label on the pull-request. + +Feel free to re-open and maybe add the "keep-open" label if this happens to a +valid pull-request. It may also be a good opportunity to get more attention by +verifying that it is properly assigned and/or mention people that might be +interested. Commenting on the pull-request will also keep it open for another 90 +days. + +## PR builder + +We also run a robotic PR builder that attempts to run tests for each PR. + +Before a PR from an unknown user is run, the PR builder bot (`k8s-bot`) asks to +a message from a contributor that a PR is "ok to test", the contributor replies +with that message. ("please" is optional, but remember to treat your robots with +kindness...) + +## FAQ: + +#### How can I ask my PR to be tested again for Jenkins failures? + +PRs should only need to be manually re-tested if you believe there was a flake +during the original test. All flakes should be filed as an +[issue](https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Akind%2Fflake). +Once you find or file a flake a contributer (this may be you!) should request +a retest with "@k8s-bot test this issue: #NNNNN", where NNNNN is replaced with +the issue number you found or filed. + +Any pushes of new code to the PR will automatically trigger a new test. No human +interraction is required. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/automation.md?pixel)]() + diff --git a/contributors/devel/bazel.md b/contributors/devel/bazel.md new file mode 100644 index 00000000000..e6a4e9c5e2e --- /dev/null +++ b/contributors/devel/bazel.md @@ -0,0 +1,44 @@ +# Build with Bazel + +Building with bazel is currently experimental. Automanaged BUILD rules have the +tag "automanaged" and are maintained by +[gazel](https://github.com/mikedanese/gazel). Instructions for installing bazel +can be found [here](https://www.bazel.io/versions/master/docs/install.html). + +To build docker images for the components, run: + +``` +$ bazel build //build-tools/... +``` + +To run many of the unit tests, run: + +``` +$ bazel test //cmd/... //build-tools/... //pkg/... //federation/... //plugin/... +``` + +To update automanaged build files, run: + +``` +$ ./hack/update-bazel.sh +``` + +**NOTES**: `update-bazel.sh` only works if check out directory of Kubernetes is "$GOPATH/src/k8s.io/kubernetes". + +To update a single build file, run: + +``` +$ # get gazel +$ go get -u github.com/mikedanese/gazel +$ # .e.g. ./pkg/kubectl/BUILD +$ gazel -root="${YOUR_KUBE_ROOT_PATH}" ./pkg/kubectl +``` + +Updating BUILD file for a package will be required when: +* Files are added to or removed from a package +* Import dependencies change for a package + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/bazel.md?pixel)]() + diff --git a/contributors/devel/cherry-picks.md b/contributors/devel/cherry-picks.md new file mode 100644 index 00000000000..ad8df62d57a --- /dev/null +++ b/contributors/devel/cherry-picks.md @@ -0,0 +1,64 @@ +# Overview + +This document explains cherry picks are managed on release branches within the +Kubernetes projects. Patches are either applied in batches or individually +depending on the point in the release cycle. + +## Propose a Cherry Pick + +1. Cherrypicks are [managed with labels and milestones] +(pull-requests.md#release-notes) +1. To get a PR merged to the release branch, first ensure the following labels + are on the original **master** branch PR: + * An appropriate milestone (e.g. v1.3) + * The `cherrypick-candidate` label +1. If `release-note-none` is set on the master PR, the cherrypick PR will need + to set the same label to confirm that no release note is needed. +1. `release-note` labeled PRs generate a release note using the PR title by + default OR the release-note block in the PR template if filled in. + * See the [PR template](../../.github/PULL_REQUEST_TEMPLATE.md) for more + details. + * PR titles and body comments are mutable and can be modified at any time + prior to the release to reflect a release note friendly message. + +### How do cherrypick-candidates make it to the release branch? + +1. **BATCHING:** After a branch is first created and before the X.Y.0 release + * Branch owners review the list of `cherrypick-candidate` labeled PRs. + * PRs batched up and merged to the release branch get a `cherrypick-approved` +label and lose the `cherrypick-candidate` label. + * PRs that won't be merged to the release branch, lose the +`cherrypick-candidate` label. + +1. **INDIVIDUAL CHERRYPICKS:** After the first X.Y.0 on a branch + * Run the cherry pick script. This example applies a master branch PR #98765 +to the remote branch `upstream/release-3.14`: +`hack/cherry_pick_pull.sh upstream/release-3.14 98765` + * Your cherrypick PR (targeted to the branch) will immediately get the +`do-not-merge` label. The branch owner will triage PRs targeted to +the branch and label the ones to be merged by applying the `lgtm` +label. + +There is an [issue](https://github.com/kubernetes/kubernetes/issues/23347) open +tracking the tool to automate the batching procedure. + +## Cherry Pick Review + +Cherry pick pull requests are reviewed differently than normal pull requests. In +particular, they may be self-merged by the release branch owner without fanfare, +in the case the release branch owner knows the cherry pick was already +requested - this should not be the norm, but it may happen. + +## Searching for Cherry Picks + +See the [cherrypick queue dashboard](http://cherrypick.k8s.io/#/queue) for +status of PRs labeled as `cherrypick-candidate`. + +[Contributor License Agreements](http://releases.k8s.io/HEAD/CONTRIBUTING.md) is +considered implicit for all code within cherry-pick pull requests, ***unless +there is a large conflict***. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/cherry-picks.md?pixel)]() + diff --git a/contributors/devel/cli-roadmap.md b/contributors/devel/cli-roadmap.md new file mode 100644 index 00000000000..cd21da08d5b --- /dev/null +++ b/contributors/devel/cli-roadmap.md @@ -0,0 +1,11 @@ +# Kubernetes CLI/Configuration Roadmap + +See github issues with the following labels: +* [area/app-config-deployment](https://github.com/kubernetes/kubernetes/labels/area/app-config-deployment) +* [component/kubectl](https://github.com/kubernetes/kubernetes/labels/component/kubectl) +* [component/clientlib](https://github.com/kubernetes/kubernetes/labels/component/clientlib) + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/cli-roadmap.md?pixel)]() + diff --git a/contributors/devel/client-libraries.md b/contributors/devel/client-libraries.md new file mode 100644 index 00000000000..d38f9fd7f28 --- /dev/null +++ b/contributors/devel/client-libraries.md @@ -0,0 +1,27 @@ +## Kubernetes API client libraries + +### Supported + + * [Go](https://github.com/kubernetes/client-go) + +### User Contributed + +*Note: Libraries provided by outside parties are supported by their authors, not +the core Kubernetes team* + + * [Clojure](https://github.com/yanatan16/clj-kubernetes-api) + * [Java (OSGi)](https://bitbucket.org/amdatulabs/amdatu-kubernetes) + * [Java (Fabric8, OSGi)](https://github.com/fabric8io/kubernetes-client) + * [Node.js](https://github.com/tenxcloud/node-kubernetes-client) + * [Node.js](https://github.com/godaddy/kubernetes-client) + * [Perl](https://metacpan.org/pod/Net::Kubernetes) + * [PHP](https://github.com/devstub/kubernetes-api-php-client) + * [PHP](https://github.com/maclof/kubernetes-client) + * [Python](https://github.com/eldarion-gondor/pykube) + * [Ruby](https://github.com/Ch00k/kuber) + * [Ruby](https://github.com/abonas/kubeclient) + * [Scala](https://github.com/doriordan/skuber) + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/client-libraries.md?pixel)]() + diff --git a/contributors/devel/coding-conventions.md b/contributors/devel/coding-conventions.md new file mode 100644 index 00000000000..bcfab41df4e --- /dev/null +++ b/contributors/devel/coding-conventions.md @@ -0,0 +1,147 @@ +# Coding Conventions + +Updated: 5/3/2016 + +**Table of Contents** + + +- [Coding Conventions](#coding-conventions) + - [Code conventions](#code-conventions) + - [Testing conventions](#testing-conventions) + - [Directory and file conventions](#directory-and-file-conventions) + - [Coding advice](#coding-advice) + + + +## Code conventions + + - Bash + + - https://google.github.io/styleguide/shell.xml + + - Ensure that build, release, test, and cluster-management scripts run on +OS X + + - Go + + - Ensure your code passes the [presubmit checks](development.md#hooks) + + - [Go Code Review +Comments](https://github.com/golang/go/wiki/CodeReviewComments) + + - [Effective Go](https://golang.org/doc/effective_go.html) + + - Comment your code. + - [Go's commenting +conventions](http://blog.golang.org/godoc-documenting-go-code) + - If reviewers ask questions about why the code is the way it is, that's a +sign that comments might be helpful. + + + - Command-line flags should use dashes, not underscores + + + - Naming + - Please consider package name when selecting an interface name, and avoid +redundancy. + + - e.g.: `storage.Interface` is better than `storage.StorageInterface`. + + - Do not use uppercase characters, underscores, or dashes in package +names. + - Please consider parent directory name when choosing a package name. + + - so pkg/controllers/autoscaler/foo.go should say `package autoscaler` +not `package autoscalercontroller`. + - Unless there's a good reason, the `package foo` line should match +the name of the directory in which the .go file exists. + - Importers can use a different name if they need to disambiguate. + + - Locks should be called `lock` and should never be embedded (always `lock +sync.Mutex`). When multiple locks are present, give each lock a distinct name +following Go conventions - `stateLock`, `mapLock` etc. + + - [API changes](api_changes.md) + + - [API conventions](api-conventions.md) + + - [Kubectl conventions](kubectl-conventions.md) + + - [Logging conventions](logging.md) + +## Testing conventions + + - All new packages and most new significant functionality must come with unit +tests + + - Table-driven tests are preferred for testing multiple scenarios/inputs; for +example, see [TestNamespaceAuthorization](../../test/integration/auth/auth_test.go) + + - Significant features should come with integration (test/integration) and/or +[end-to-end (test/e2e) tests](e2e-tests.md) + - Including new kubectl commands and major features of existing commands + + - Unit tests must pass on OS X and Windows platforms - if you use Linux +specific features, your test case must either be skipped on windows or compiled +out (skipped is better when running Linux specific commands, compiled out is +required when your code does not compile on Windows). + + - Avoid relying on Docker hub (e.g. pull from Docker hub). Use gcr.io instead. + + - Avoid waiting for a short amount of time (or without waiting) and expect an +asynchronous thing to happen (e.g. wait for 1 seconds and expect a Pod to be +running). Wait and retry instead. + + - See the [testing guide](testing.md) for additional testing advice. + +## Directory and file conventions + + - Avoid package sprawl. Find an appropriate subdirectory for new packages. +(See [#4851](http://issues.k8s.io/4851) for discussion.) + - Libraries with no more appropriate home belong in new package +subdirectories of pkg/util + + - Avoid general utility packages. Packages called "util" are suspect. Instead, +derive a name that describes your desired function. For example, the utility +functions dealing with waiting for operations are in the "wait" package and +include functionality like Poll. So the full name is wait.Poll + + - All filenames should be lowercase + + - Go source files and directories use underscores, not dashes + - Package directories should generally avoid using separators as much as +possible (when packages are multiple words, they usually should be in nested +subdirectories). + + - Document directories and filenames should use dashes rather than underscores + + - Contrived examples that illustrate system features belong in +/docs/user-guide or /docs/admin, depending on whether it is a feature primarily +intended for users that deploy applications or cluster administrators, +respectively. Actual application examples belong in /examples. + - Examples should also illustrate [best practices for configuration and +using the system](../user-guide/config-best-practices.md) + + - Third-party code + + - Go code for normal third-party dependencies is managed using +[Godeps](https://github.com/tools/godep) + + - Other third-party code belongs in `/third_party` + - forked third party Go code goes in `/third_party/forked` + - forked _golang stdlib_ code goes in `/third_party/golang` + + - Third-party code must include licenses + + - This includes modified third-party code and excerpts, as well + +## Coding advice + + - Go + + - [Go landmines](https://gist.github.com/lavalamp/4bd23295a9f32706a48f) + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/coding-conventions.md?pixel)]() + diff --git a/contributors/devel/collab.md b/contributors/devel/collab.md new file mode 100644 index 00000000000..b4a6281d09e --- /dev/null +++ b/contributors/devel/collab.md @@ -0,0 +1,87 @@ +# On Collaborative Development + +Kubernetes is open source, but many of the people working on it do so as their +day job. In order to avoid forcing people to be "at work" effectively 24/7, we +want to establish some semi-formal protocols around development. Hopefully these +rules make things go more smoothly. If you find that this is not the case, +please complain loudly. + +## Patches welcome + +First and foremost: as a potential contributor, your changes and ideas are +welcome at any hour of the day or night, weekdays, weekends, and holidays. +Please do not ever hesitate to ask a question or send a PR. + +## Code reviews + +All changes must be code reviewed. For non-maintainers this is obvious, since +you can't commit anyway. But even for maintainers, we want all changes to get at +least one review, preferably (for non-trivial changes obligatorily) from someone +who knows the areas the change touches. For non-trivial changes we may want two +reviewers. The primary reviewer will make this decision and nominate a second +reviewer, if needed. Except for trivial changes, PRs should not be committed +until relevant parties (e.g. owners of the subsystem affected by the PR) have +had a reasonable chance to look at PR in their local business hours. + +Most PRs will find reviewers organically. If a maintainer intends to be the +primary reviewer of a PR they should set themselves as the assignee on GitHub +and say so in a reply to the PR. Only the primary reviewer of a change should +actually do the merge, except in rare cases (e.g. they are unavailable in a +reasonable timeframe). + +If a PR has gone 2 work days without an owner emerging, please poke the PR +thread and ask for a reviewer to be assigned. + +Except for rare cases, such as trivial changes (e.g. typos, comments) or +emergencies (e.g. broken builds), maintainers should not merge their own +changes. + +Expect reviewers to request that you avoid [common go style +mistakes](https://github.com/golang/go/wiki/CodeReviewComments) in your PRs. + +## Assigned reviews + +Maintainers can assign reviews to other maintainers, when appropriate. The +assignee becomes the shepherd for that PR and is responsible for merging the PR +once they are satisfied with it or else closing it. The assignee might request +reviews from non-maintainers. + +## Merge hours + +Maintainers will do merges of appropriately reviewed-and-approved changes during +their local "business hours" (typically 7:00 am Monday to 5:00 pm (17:00h) +Friday). PRs that arrive over the weekend or on holidays will only be merged if +there is a very good reason for it and if the code review requirements have been +met. Concretely this means that nobody should merge changes immediately before +going to bed for the night. + +There may be discussion an even approvals granted outside of the above hours, +but merges will generally be deferred. + +If a PR is considered complex or controversial, the merge of that PR should be +delayed to give all interested parties in all timezones the opportunity to +provide feedback. Concretely, this means that such PRs should be held for 24 +hours before merging. Of course "complex" and "controversial" are left to the +judgment of the people involved, but we trust that part of being a committer is +the judgment required to evaluate such things honestly, and not be motivated by +your desire (or your cube-mate's desire) to get their code merged. Also see +"Holds" below, any reviewer can issue a "hold" to indicate that the PR is in +fact complicated or complex and deserves further review. + +PRs that are incorrectly judged to be merge-able, may be reverted and subject to +re-review, if subsequent reviewers believe that they in fact are controversial +or complex. + + +## Holds + +Any maintainer or core contributor who wants to review a PR but does not have +time immediately may put a hold on a PR simply by saying so on the PR discussion +and offering an ETA measured in single-digit days at most. Any PR that has a +hold shall not be merged until the person who requested the hold acks the +review, withdraws their hold, or is overruled by a preponderance of maintainers. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/collab.md?pixel)]() + diff --git a/contributors/devel/community-expectations.md b/contributors/devel/community-expectations.md new file mode 100644 index 00000000000..ff2487fdeed --- /dev/null +++ b/contributors/devel/community-expectations.md @@ -0,0 +1,87 @@ +## Community Expectations + +Kubernetes is a community project. Consequently, it is wholly dependent on +its community to provide a productive, friendly and collaborative environment. + +The first and foremost goal of the Kubernetes community to develop orchestration +technology that radically simplifies the process of creating reliable +distributed systems. However a second, equally important goal is the creation +of a community that fosters easy, agile development of such orchestration +systems. + +We therefore describe the expectations for +members of the Kubernetes community. This document is intended to be a living one +that evolves as the community evolves via the same PR and code review process +that shapes the rest of the project. It currently covers the expectations +of conduct that govern all members of the community as well as the expectations +around code review that govern all active contributors to Kubernetes. + +### Code of Conduct + +The most important expectation of the Kubernetes community is that all members +abide by the Kubernetes [community code of conduct](../../code-of-conduct.md). +Only by respecting each other can we develop a productive, collaborative +community. + +### Code review + +As a community we believe in the [value of code review for all contributions](collab.md). +Code review increases both the quality and readability of our codebase, which +in turn produces high quality software. + +However, the code review process can also introduce latency for contributors +and additional work for reviewers that can frustrate both parties. + +Consequently, as a community we expect that all active participants in the +community will also be active reviewers. + +We ask that active contributors to the project participate in the code review process +in areas where that contributor has expertise. Active +contributors are considered to be anyone who meets any of the following criteria: + * Sent more than two pull requests (PRs) in the previous one month, or more + than 20 PRs in the previous year. + * Filed more than three issues in the previous month, or more than 30 issues in + the previous 12 months. + * Commented on more than pull requests in the previous month, or + more than 50 pull requests in the previous 12 months. + * Marked any PR as LGTM in the previous month. + * Have *collaborator* permissions in the Kubernetes github project. + +In addition to these community expectations, any community member who wants to +be an active reviewer can also add their name to an *active reviewer* file +(location tbd) which will make them an active reviewer for as long as they +are included in the file. + +#### Expectations of reviewers: Review comments + +Because reviewers are often the first points of contact between new members of +the community and can significantly impact the first impression of the +Kubernetes community, reviewers are especially important in shaping the +Kubernetes community. Reviewers are highly encouraged to review the +[code of conduct](../../code-of-conduct.md) and are strongly encouraged to go above +and beyond the code of conduct to promote a collaborative, respectful +Kubernetes community. + +#### Expectations of reviewers: Review latency + +Reviewers are expected to respond in a timely fashion to PRs that are assigned +to them. Reviewers are expected to respond to an *active* PRs with reasonable +latency, and if reviewers fail to respond, those PRs may be assigned to other +reviewers. + +*Active* PRs are considered those which have a proper CLA (`cla:yes`) label +and do not need rebase to be merged. PRs that do not have a proper CLA, or +require a rebase are not considered active PRs. + +## Thanks + +Many thanks in advance to everyone who contributes their time and effort to +making Kubernetes both a successful system as well as a successful community. +The strength of our software shines in the strengths of each individual +community member. Thanks! + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/community-expectations.md?pixel)]() + diff --git a/contributors/devel/container-runtime-interface.md b/contributors/devel/container-runtime-interface.md new file mode 100644 index 00000000000..7ab085f7f36 --- /dev/null +++ b/contributors/devel/container-runtime-interface.md @@ -0,0 +1,127 @@ +# CRI: the Container Runtime Interface + +## What is CRI? + +CRI (_Container Runtime Interface_) consists of a +[protobuf API](../../pkg/kubelet/api/v1alpha1/runtime/api.proto), +specifications/requirements (to-be-added), +and [libraries] (https://github.com/kubernetes/kubernetes/tree/master/pkg/kubelet/server/streaming) +for container runtimes to integrate with kubelet on a node. CRI is currently in Alpha. + +In the future, we plan to add more developer tools such as the CRI validation +tests. + +## Why develop CRI? + +Prior to the existence of CRI, container runtimes (e.g., `docker`, `rkt`) were +integrated with kubelet through implementing an internal, high-level interface +in kubelet. The entrance barrier for runtimes was high because the integration +required understanding the internals of kubelet and contributing to the main +Kubernetes repository. More importantly, this would not scale because every new +addition incurs a significant maintenance overhead in the main kubernetes +repository. + +Kubernetes aims to be extensible. CRI is one small, yet important step to enable +pluggable container runtimes and build a healthier ecosystem. + +## How to use CRI? + +1. Start the image and runtime services on your node. You can have a single + service acting as both image and runtime services. +2. Set the kubelet flags + - Pass the unix socket(s) to which your services listen to kubelet: + `--container-runtime-endpoint` and `--image-service-endpoint`. + - Enable CRI in kubelet by`--experimental-cri=true`. + - Use the "remote" runtime by `--container-runtime=remote`. + +Please see the [Status Update](#status-update) section for known issues for +each release. + +Note that CRI is still in its early stages. We are actively incorporating +feedback from early developers to improve the API. Developers should expect +occasional API breaking changes. + +## Does Kubelet use CRI today? + +No, but we are working on it. + +The first step is to switch kubelet to integrate with Docker via CRI by +default. The current [Docker CRI implementation](https://github.com/kubernetes/kubernetes/blob/release-1.5/pkg/kubelet/dockershim) +already passes most end-to-end tests, and has mandatory PR builders to prevent +regressions. While we are expanding the test coverage gradually, it is +difficult to test on all combinations of OS distributions, platforms, and +plugins. There are also many experimental or even undocumented features relied +upon by some users. We would like to **encourage the community to help test +this Docker-CRI integration and report bugs and/or missing features** to +smooth the transition in the near future. Please file a Github issue and +include @kubernetes/sig-node for any CRI problem. + +### How to test the new Docker CRI integration? + +Start kubelet with the following flags: + - Use the Docker container runtime by `--container-runtime=docker`(the default). + - Enable CRI in kubelet by`--experimental-cri=true`. + +Please also see the [known issues](#docker-cri-1.5-known-issues) before trying +out. + +## Design docs and proposals + +We plan to add CRI specifications/requirements in the near future. For now, +these proposals and design docs are the best sources to understand CRI +besides discussions on Github issues. + + - [Original proposal](https://github.com/kubernetes/kubernetes/blob/release-1.5/docs/proposals/container-runtime-interface-v1.md) + - [Exec/attach/port-forward streaming requests](https://docs.google.com/document/d/1OE_QoInPlVCK9rMAx9aybRmgFiVjHpJCHI9LrfdNM_s/edit?usp=sharing) + - [Container stdout/stderr logs](https://github.com/kubernetes/kubernetes/blob/release-1.5/docs/proposals/kubelet-cri-logging.md) + - Networking: The CRI runtime handles network plugins and the + setup/teardown of the pod sandbox. + +## Work-In-Progress CRI runtimes + + - [cri-o](https://github.com/kubernetes-incubator/cri-o) + - [rktlet](https://github.com/kubernetes-incubator/rktlet) + - [frakti](https://github.com/kubernetes/frakti) + +## [Status update](#status-update) + +### Kubernetes v1.5 release (CRI v1alpha1) + + - [v1alpha1 version](https://github.com/kubernetes/kubernetes/blob/release-1.5/pkg/kubelet/api/v1alpha1/runtime/api.proto) of CRI is released. + +#### [CRI known issues](#cri-1.5-known-issues): + + - [#27097](https://github.com/kubernetes/kubernetes/issues/27097): Container + metrics are not yet defined in CRI. + - [#36401](https://github.com/kubernetes/kubernetes/issues/36401): The new + container log path/format is not yet supported by the logging pipeline + (e.g., fluentd, GCL). + - CRI may not be compatible with other experimental features (e.g., Seccomp). + - Streaming server needs to be hardened. + - [#36666](https://github.com/kubernetes/kubernetes/issues/36666): + Authentication. + - [#36187](https://github.com/kubernetes/kubernetes/issues/36187): Avoid + including user data in the redirect URL. + +#### [Docker CRI integration known issues](#docker-cri-1.5-known-issues) + + - Docker compatibility: Support only Docker v1.11 and v1.12. + - Network: + - [#35457](https://github.com/kubernetes/kubernetes/issues/35457): Does + not support host ports. + - [#37315](https://github.com/kubernetes/kubernetes/issues/37315): Does + not support bandwidth shaping. + - Exec/attach/port-forward (streaming requests): + - [#35747](https://github.com/kubernetes/kubernetes/issues/35747): Does + not support `nsenter` as the exec handler (`--exec-handler=nsenter`). + - Also see (#cri-1.5-known-issues) for limitations on CRI streaming. + +## Contacts + + - Email: sig-node (kubernetes-sig-node@googlegroups.com) + - Slack: https://kubernetes.slack.com/messages/sig-node + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/container-runtime-interface.md?pixel)]() + diff --git a/contributors/devel/controllers.md b/contributors/devel/controllers.md new file mode 100644 index 00000000000..daedc236d58 --- /dev/null +++ b/contributors/devel/controllers.md @@ -0,0 +1,186 @@ +# Writing Controllers + +A Kubernetes controller is an active reconciliation process. That is, it watches some object for the world's desired +state, and it watches the world's actual state, too. Then, it sends instructions to try and make the world's current +state be more like the desired state. + +The simplest implementation of this is a loop: + +```go +for { + desired := getDesiredState() + current := getCurrentState() + makeChanges(desired, current) +} +``` + +Watches, etc, are all merely optimizations of this logic. + +## Guidelines + +When you’re writing controllers, there are few guidelines that will help make sure you get the results and performance +you’re looking for. + +1. Operate on one item at a time. If you use a `workqueue.Interface`, you’ll be able to queue changes for a + particular resource and later pop them in multiple “worker” gofuncs with a guarantee that no two gofuncs will + work on the same item at the same time. + + Many controllers must trigger off multiple resources (I need to "check X if Y changes"), but nearly all controllers + can collapse those into a queue of “check this X” based on relationships. For instance, a ReplicaSetController needs + to react to a pod being deleted, but it does that by finding the related ReplicaSets and queuing those. + + +1. Random ordering between resources. When controllers queue off multiple types of resources, there is no guarantee + of ordering amongst those resources. + + Distinct watches are updated independently. Even with an objective ordering of “created resourceA/X” and “created + resourceB/Y”, your controller could observe “created resourceB/Y” and “created resourceA/X”. + + +1. Level driven, not edge driven. Just like having a shell script that isn’t running all the time, your controller + may be off for an indeterminate amount of time before running again. + + If an API object appears with a marker value of `true`, you can’t count on having seen it turn from `false` to `true`, + only that you now observe it being `true`. Even an API watch suffers from this problem, so be sure that you’re not + counting on seeing a change unless your controller is also marking the information it last made the decision on in + the object's status. + + +1. Use `SharedInformers`. `SharedInformers` provide hooks to receive notifications of adds, updates, and deletes for + a particular resource. They also provide convenience functions for accessing shared caches and determining when a + cache is primed. + + Use the factory methods down in https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/framework/informers/factory.go + to ensure that you are sharing the same instance of the cache as everyone else. + + This saves us connections against the API server, duplicate serialization costs server-side, duplicate deserialization + costs controller-side, and duplicate caching costs controller-side. + + You may see other mechanisms like reflectors and deltafifos driving controllers. Those were older mechanisms that we + later used to build the `SharedInformers`. You should avoid using them in new controllers + + +1. Never mutate original objects! Caches are shared across controllers, this means that if you mutate your "copy" + (actually a reference or shallow copy) of an object, you’ll mess up other controllers (not just your own). + + The most common point of failure is making a shallow copy, then mutating a map, like `Annotations`. Use + `api.Scheme.Copy` to make a deep copy. + + +1. Wait for your secondary caches. Many controllers have primary and secondary resources. Primary resources are the + resources that you’ll be updating `Status` for. Secondary resources are resources that you’ll be managing + (creating/deleting) or using for lookups. + + Use the `framework.WaitForCacheSync` function to wait for your secondary caches before starting your primary sync + functions. This will make sure that things like a Pod count for a ReplicaSet isn’t working off of known out of date + information that results in thrashing. + + +1. There are other actors in the system. Just because you haven't changed an object doesn't mean that somebody else + hasn't. + + Don't forget that the current state may change at any moment--it's not sufficient to just watch the desired state. + If you use the absence of objects in the desired state to indicate that things in the current state should be deleted, + make sure you don't have a bug in your observation code (e.g., act before your cache has filled). + + +1. Percolate errors to the top level for consistent re-queuing. We have a `workqueue.RateLimitingInterface` to allow + simple requeuing with reasonable backoffs. + + Your main controller func should return an error when requeuing is necessary. When it isn’t, it should use + `utilruntime.HandleError` and return nil instead. This makes it very easy for reviewers to inspect error handling + cases and to be confident that your controller doesn’t accidentally lose things it should retry for. + + +1. Watches and Informers will “sync”. Periodically, they will deliver every matching object in the cluster to your + `Update` method. This is good for cases where you may need to take additional action on the object, but sometimes you + know there won’t be more work to do. + + In cases where you are *certain* that you don't need to requeue items when there are no new changes, you can compare the + resource version of the old and new objects. If they are the same, you skip requeuing the work. Be careful when you + do this. If you ever skip requeuing your item on failures, you could fail, not requeue, and then never retry that + item again. + + +## Rough Structure + +Overall, your controller should look something like this: + +```go +type Controller struct{ + // podLister is secondary cache of pods which is used for object lookups + podLister cache.StoreToPodLister + + // queue is where incoming work is placed to de-dup and to allow "easy" rate limited requeues on errors + queue workqueue.RateLimitingInterface +} + +func (c *Controller) Run(threadiness int, stopCh chan struct{}){ + // don't let panics crash the process + defer utilruntime.HandleCrash() + // make sure the work queue is shutdown which will trigger workers to end + defer dsc.queue.ShutDown() + + glog.Infof("Starting controller") + + // wait for your secondary caches to fill before starting your work + if !framework.WaitForCacheSync(stopCh, c.podStoreSynced) { + return + } + + // start up your worker threads based on threadiness. Some controllers have multiple kinds of workers + for i := 0; i < threadiness; i++ { + // runWorker will loop until "something bad" happens. The .Until will then rekick the worker + // after one second + go wait.Until(c.runWorker, time.Second, stopCh) + } + + // wait until we're told to stop + <-stopCh + glog.Infof("Shutting down controller") +} + +func (c *Controller) runWorker() { + // hot loop until we're told to stop. processNextWorkItem will automatically wait until there's work + // available, so we don't don't worry about secondary waits + for c.processNextWorkItem() { + } +} + +// processNextWorkItem deals with one key off the queue. It returns false when it's time to quit. +func (c *Controller) processNextWorkItem() bool { + // pull the next work item from queue. It should be a key we use to lookup something in a cache + key, quit := c.queue.Get() + if quit { + return false + } + // you always have to indicate to the queue that you've completed a piece of work + defer c.queue.Done(key) + + // do your work on the key. This method will contains your "do stuff" logic" + err := c.syncHandler(key.(string)) + if err == nil { + // if you had no error, tell the queue to stop tracking history for your key. This will + // reset things like failure counts for per-item rate limiting + c.queue.Forget(key) + return true + } + + // there was a failure so be sure to report it. This method allows for pluggable error handling + // which can be used for things like cluster-monitoring + utilruntime.HandleError(fmt.Errorf("%v failed with : %v", key, err)) + // since we failed, we should requeue the item to work on later. This method will add a backoff + // to avoid hotlooping on particular items (they're probably still not going to work right away) + // and overall controller protection (everything I've done is broken, this controller needs to + // calm down or it can starve other useful work) cases. + c.queue.AddRateLimited(key) + + return true +} + +``` + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/controllers.md?pixel)]() + diff --git a/contributors/devel/developer-guides/vagrant.md b/contributors/devel/developer-guides/vagrant.md new file mode 100755 index 00000000000..b53b0002c07 --- /dev/null +++ b/contributors/devel/developer-guides/vagrant.md @@ -0,0 +1,432 @@ +## Getting started with Vagrant + +Running Kubernetes with Vagrant is an easy way to run/test/develop on your +local machine in an environment using the same setup procedures when running on +GCE or AWS cloud providers. This provider is not tested on a per PR basis, if +you experience bugs when testing from HEAD, please open an issue. + +### Prerequisites + +1. Install latest version >= 1.8.1 of vagrant from +http://www.vagrantup.com/downloads.html + +2. Install a virtual machine host. Examples: + 1. [Virtual Box](https://www.virtualbox.org/wiki/Downloads) + 2. [VMWare Fusion](https://www.vmware.com/products/fusion/) plus +[Vagrant VMWare Fusion provider](https://www.vagrantup.com/vmware) + 3. [Parallels Desktop](https://www.parallels.com/products/desktop/) +plus +[Vagrant Parallels provider](https://parallels.github.io/vagrant-parallels/) + +3. Get or build a +[binary release](../../../docs/getting-started-guides/binary_release.md) + +### Setup + +Setting up a cluster is as simple as running: + +```shell +export KUBERNETES_PROVIDER=vagrant +curl -sS https://get.k8s.io | bash +``` + +Alternatively, you can download +[Kubernetes release](https://github.com/kubernetes/kubernetes/releases) and +extract the archive. To start your local cluster, open a shell and run: + +```shell +cd kubernetes + +export KUBERNETES_PROVIDER=vagrant +./cluster/kube-up.sh +``` + +The `KUBERNETES_PROVIDER` environment variable tells all of the various cluster +management scripts which variant to use. If you forget to set this, the +assumption is you are running on Google Compute Engine. + +By default, the Vagrant setup will create a single master VM (called +kubernetes-master) and one node (called kubernetes-node-1). Each VM will take 1 +GB, so make sure you have at least 2GB to 4GB of free memory (plus appropriate +free disk space). + +Vagrant will provision each machine in the cluster with all the necessary +components to run Kubernetes. The initial setup can take a few minutes to +complete on each machine. + +If you installed more than one Vagrant provider, Kubernetes will usually pick +the appropriate one. However, you can override which one Kubernetes will use by +setting the +[`VAGRANT_DEFAULT_PROVIDER`](https://docs.vagrantup.com/v2/providers/default.html) +environment variable: + +```shell +export VAGRANT_DEFAULT_PROVIDER=parallels +export KUBERNETES_PROVIDER=vagrant +./cluster/kube-up.sh +``` + +By default, each VM in the cluster is running Fedora. + +To access the master or any node: + +```shell +vagrant ssh master +vagrant ssh node-1 +``` + +If you are running more than one node, you can access the others by: + +```shell +vagrant ssh node-2 +vagrant ssh node-3 +``` + +Each node in the cluster installs the docker daemon and the kubelet. + +The master node instantiates the Kubernetes master components as pods on the +machine. + +To view the service status and/or logs on the kubernetes-master: + +```shell +[vagrant@kubernetes-master ~] $ vagrant ssh master +[vagrant@kubernetes-master ~] $ sudo su + +[root@kubernetes-master ~] $ systemctl status kubelet +[root@kubernetes-master ~] $ journalctl -ru kubelet + +[root@kubernetes-master ~] $ systemctl status docker +[root@kubernetes-master ~] $ journalctl -ru docker + +[root@kubernetes-master ~] $ tail -f /var/log/kube-apiserver.log +[root@kubernetes-master ~] $ tail -f /var/log/kube-controller-manager.log +[root@kubernetes-master ~] $ tail -f /var/log/kube-scheduler.log +``` + +To view the services on any of the nodes: + +```shell +[vagrant@kubernetes-master ~] $ vagrant ssh node-1 +[vagrant@kubernetes-master ~] $ sudo su + +[root@kubernetes-master ~] $ systemctl status kubelet +[root@kubernetes-master ~] $ journalctl -ru kubelet + +[root@kubernetes-master ~] $ systemctl status docker +[root@kubernetes-master ~] $ journalctl -ru docker +``` + +### Interacting with your Kubernetes cluster with Vagrant. + +With your Kubernetes cluster up, you can manage the nodes in your cluster with +the regular Vagrant commands. + +To push updates to new Kubernetes code after making source changes: + +```shell +./cluster/kube-push.sh +``` + +To stop and then restart the cluster: + +```shell +vagrant halt +./cluster/kube-up.sh +``` + +To destroy the cluster: + +```shell +vagrant destroy +``` + +Once your Vagrant machines are up and provisioned, the first thing to do is to +check that you can use the `kubectl.sh` script. + +You may need to build the binaries first, you can do this with `make` + +```shell +$ ./cluster/kubectl.sh get nodes +``` + +### Authenticating with your master + +When using the vagrant provider in Kubernetes, the `cluster/kubectl.sh` script +will cache your credentials in a `~/.kubernetes_vagrant_auth` file so you will +not be prompted for them in the future. + +```shell +cat ~/.kubernetes_vagrant_auth +``` + +```json +{ "User": "vagrant", + "Password": "vagrant", + "CAFile": "/home/k8s_user/.kubernetes.vagrant.ca.crt", + "CertFile": "/home/k8s_user/.kubecfg.vagrant.crt", + "KeyFile": "/home/k8s_user/.kubecfg.vagrant.key" +} +``` + +You should now be set to use the `cluster/kubectl.sh` script. For example try to +list the nodes that you have started with: + +```shell +./cluster/kubectl.sh get nodes +``` + +### Running containers + +You can use `cluster/kube-*.sh` commands to interact with your VM machines: + +```shell +$ ./cluster/kubectl.sh get pods +NAME READY STATUS RESTARTS AGE + +$ ./cluster/kubectl.sh get services +NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE + +$ ./cluster/kubectl.sh get deployments +CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS +``` + +To Start a container running nginx with a Deployment and three replicas: + +```shell +$ ./cluster/kubectl.sh run my-nginx --image=nginx --replicas=3 --port=80 +``` + +When listing the pods, you will see that three containers have been started and +are in Waiting state: + +```shell +$ ./cluster/kubectl.sh get pods +NAME READY STATUS RESTARTS AGE +my-nginx-3800858182-4e6pe 0/1 ContainerCreating 0 3s +my-nginx-3800858182-8ko0s 1/1 Running 0 3s +my-nginx-3800858182-seu3u 0/1 ContainerCreating 0 3s +``` + +When the provisioning is complete: + +```shell +$ ./cluster/kubectl.sh get pods +NAME READY STATUS RESTARTS AGE +my-nginx-3800858182-4e6pe 1/1 Running 0 40s +my-nginx-3800858182-8ko0s 1/1 Running 0 40s +my-nginx-3800858182-seu3u 1/1 Running 0 40s + +$ ./cluster/kubectl.sh get services +NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE + +$ ./cluster/kubectl.sh get deployments +NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE +my-nginx 3 3 3 3 1m +``` + +We did not start any Services, hence there are none listed. But we see three +replicas displayed properly. Check the +[guestbook](https://github.com/kubernetes/kubernetes/tree/%7B%7Bpage.githubbranch%7D%7D/examples/guestbook) +application to learn how to create a Service. You can already play with scaling +the replicas with: + +```shell +$ ./cluster/kubectl.sh scale deployments my-nginx --replicas=2 +$ ./cluster/kubectl.sh get pods +NAME READY STATUS RESTARTS AGE +my-nginx-3800858182-4e6pe 1/1 Running 0 2m +my-nginx-3800858182-8ko0s 1/1 Running 0 2m +``` + +Congratulations! + +### Testing + +The following will run all of the end-to-end testing scenarios assuming you set +your environment: + +```shell +NUM_NODES=3 go run hack/e2e.go -v --build --up --test --down +``` + +### Troubleshooting + +#### I keep downloading the same (large) box all the time! + +By default the Vagrantfile will download the box from S3. You can change this +(and cache the box locally) by providing a name and an alternate URL when +calling `kube-up.sh` + +```shell +export KUBERNETES_BOX_NAME=choose_your_own_name_for_your_kuber_box +export KUBERNETES_BOX_URL=path_of_your_kuber_box +export KUBERNETES_PROVIDER=vagrant +./cluster/kube-up.sh +``` + +#### I am getting timeouts when trying to curl the master from my host! + +During provision of the cluster, you may see the following message: + +```shell +Validating node-1 +............. +Waiting for each node to be registered with cloud provider +error: couldn't read version from server: Get https://10.245.1.2/api: dial tcp 10.245.1.2:443: i/o timeout +``` + +Some users have reported VPNs may prevent traffic from being routed to the host +machine into the virtual machine network. + +To debug, first verify that the master is binding to the proper IP address: + +``` +$ vagrant ssh master +$ ifconfig | grep eth1 -C 2 +eth1: flags=4163 mtu 1500 inet 10.245.1.2 netmask + 255.255.255.0 broadcast 10.245.1.255 +``` + +Then verify that your host machine has a network connection to a bridge that can +serve that address: + +```shell +$ ifconfig | grep 10.245.1 -C 2 + +vboxnet5: flags=4163 mtu 1500 + inet 10.245.1.1 netmask 255.255.255.0 broadcast 10.245.1.255 + inet6 fe80::800:27ff:fe00:5 prefixlen 64 scopeid 0x20 + ether 0a:00:27:00:00:05 txqueuelen 1000 (Ethernet) +``` + +If you do not see a response on your host machine, you will most likely need to +connect your host to the virtual network created by the virtualization provider. + +If you do see a network, but are still unable to ping the machine, check if your +VPN is blocking the request. + +#### I just created the cluster, but I am getting authorization errors! + +You probably have an incorrect ~/.kubernetes_vagrant_auth file for the cluster +you are attempting to contact. + +```shell +rm ~/.kubernetes_vagrant_auth +``` + +After using kubectl.sh make sure that the correct credentials are set: + +```shell +cat ~/.kubernetes_vagrant_auth +``` + +```json +{ + "User": "vagrant", + "Password": "vagrant" +} +``` + +#### I just created the cluster, but I do not see my container running! + +If this is your first time creating the cluster, the kubelet on each node +schedules a number of docker pull requests to fetch prerequisite images. This +can take some time and as a result may delay your initial pod getting +provisioned. + +#### I have Vagrant up but the nodes won't validate! + +Log on to one of the nodes (`vagrant ssh node-1`) and inspect the salt node +log (`sudo cat /var/log/salt/node`). + +#### I want to change the number of nodes! + +You can control the number of nodes that are instantiated via the environment +variable `NUM_NODES` on your host machine. If you plan to work with replicas, we +strongly encourage you to work with enough nodes to satisfy your largest +intended replica size. If you do not plan to work with replicas, you can save +some system resources by running with a single node. You do this, by setting +`NUM_NODES` to 1 like so: + +```shell +export NUM_NODES=1 +``` + +#### I want my VMs to have more memory! + +You can control the memory allotted to virtual machines with the +`KUBERNETES_MEMORY` environment variable. Just set it to the number of megabytes +you would like the machines to have. For example: + +```shell +export KUBERNETES_MEMORY=2048 +``` + +If you need more granular control, you can set the amount of memory for the +master and nodes independently. For example: + +```shell +export KUBERNETES_MASTER_MEMORY=1536 +export KUBERNETES_NODE_MEMORY=2048 +``` + +#### I want to set proxy settings for my Kubernetes cluster boot strapping! + +If you are behind a proxy, you need to install the Vagrant proxy plugin and set +the proxy settings: + +```shell +vagrant plugin install vagrant-proxyconf +export KUBERNETES_HTTP_PROXY=http://username:password@proxyaddr:proxyport +export KUBERNETES_HTTPS_PROXY=https://username:password@proxyaddr:proxyport +``` + +You can also specify addresses that bypass the proxy, for example: + +```shell +export KUBERNETES_NO_PROXY=127.0.0.1 +``` + +If you are using sudo to make Kubernetes build, use the `-E` flag to pass in the +environment variables. For example, if running `make quick-release`, use: + +```shell +sudo -E make quick-release +``` + +#### I have repository access errors during VM provisioning! + +Sometimes VM provisioning may fail with errors that look like this: + +``` +Timeout was reached for https://mirrors.fedoraproject.org/metalink?repo=fedora-23&arch=x86_64 [Connection timed out after 120002 milliseconds] +``` + +You may use a custom Fedora repository URL to fix this: + +```shell +export CUSTOM_FEDORA_REPOSITORY_URL=https://download.fedoraproject.org/pub/fedora/ +``` + +#### I ran vagrant suspend and nothing works! + +`vagrant suspend` seems to mess up the network. It's not supported at this time. + +#### I want vagrant to sync folders via nfs! + +You can ensure that vagrant uses nfs to sync folders with virtual machines by +setting the KUBERNETES_VAGRANT_USE_NFS environment variable to 'true'. nfs is +faster than virtualbox or vmware's 'shared folders' and does not require guest +additions. See the +[vagrant docs](http://docs.vagrantup.com/v2/synced-folders/nfs.html) for details +on configuring nfs on the host. This setting will have no effect on the libvirt +provider, which uses nfs by default. For example: + +```shell +export KUBERNETES_VAGRANT_USE_NFS=true +``` + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/developer-guides/vagrant.md?pixel)]() + diff --git a/contributors/devel/development.md b/contributors/devel/development.md new file mode 100644 index 00000000000..1349e0037c7 --- /dev/null +++ b/contributors/devel/development.md @@ -0,0 +1,251 @@ +# Development Guide + +This document is intended to be the canonical source of truth for things like +supported toolchain versions for building Kubernetes. If you find a +requirement that this doc does not capture, please +[submit an issue](https://github.com/kubernetes/kubernetes/issues) on github. If +you find other docs with references to requirements that are not simply links to +this doc, please [submit an issue](https://github.com/kubernetes/kubernetes/issues). + +This document is intended to be relative to the branch in which it is found. +It is guaranteed that requirements will change over time for the development +branch, but release branches of Kubernetes should not change. + +## Building Kubernetes with Docker + +Official releases are built using Docker containers. To build Kubernetes using +Docker please follow [these instructions] +(http://releases.k8s.io/HEAD/build-tools/README.md). + +## Building Kubernetes on a local OS/shell environment + +Many of the Kubernetes development helper scripts rely on a fairly up-to-date +GNU tools environment, so most recent Linux distros should work just fine +out-of-the-box. Note that Mac OS X ships with somewhat outdated BSD-based tools, +some of which may be incompatible in subtle ways, so we recommend +[replacing those with modern GNU tools] +(https://www.topbug.net/blog/2013/04/14/install-and-use-gnu-command-line-tools-in-mac-os-x/). + +### Go development environment + +Kubernetes is written in the [Go](http://golang.org) programming language. +To build Kubernetes without using Docker containers, you'll need a Go +development environment. Builds for Kubernetes 1.0 - 1.2 require Go version +1.4.2. Builds for Kubernetes 1.3 and higher require Go version 1.6.0. If you +haven't set up a Go development environment, please follow [these +instructions](http://golang.org/doc/code.html) to install the go tools. + +Set up your GOPATH and add a path entry for go binaries to your PATH. Typically +added to your ~/.profile: + +```sh +export GOPATH=$HOME/go +export PATH=$PATH:$GOPATH/bin +``` + +### Godep dependency management + +Kubernetes build and test scripts use [godep](https://github.com/tools/godep) to +manage dependencies. + +#### Install godep + +Ensure that [mercurial](http://mercurial.selenic.com/wiki/Download) is +installed on your system. (some of godep's dependencies use the mercurial +source control system). Use `apt-get install mercurial` or `yum install +mercurial` on Linux, or [brew.sh](http://brew.sh) on OS X, or download directly +from mercurial. + +Install godep and go-bindata (may require sudo): + +```sh +go get -u github.com/tools/godep +go get -u github.com/jteeuwen/go-bindata/go-bindata +``` + +Note: +At this time, godep version >= v63 is known to work in the Kubernetes project. + +To check your version of godep: + +```sh +$ godep version +godep v74 (linux/amd64/go1.6.2) +``` + +Developers planning to managing dependencies in the `vendor/` tree may want to +explore alternative environment setups. See +[using godep to manage dependencies](godep.md). + +### Local build using make + +To build Kubernetes using your local Go development environment (generate linux +binaries): + +```sh + make +``` + +You may pass build options and packages to the script as necessary. For example, +to build with optimizations disabled for enabling use of source debug tools: + +```sh + make GOGCFLAGS="-N -l" +``` + +To build binaries for all platforms: + +```sh + make cross +``` + +### How to update the Go version used to test & build k8s + +The kubernetes project tries to stay on the latest version of Go so it can +benefit from the improvements to the language over time and can easily +bump to a minor release version for security updates. + +Since kubernetes is mostly built and tested in containers, there are a few +unique places you need to update the go version. + +- The image for cross compiling in [build-tools/build-image/cross/](../../build-tools/build-image/cross/). The `VERSION` file and `Dockerfile`. +- Update [dockerized-e2e-runner.sh](https://github.com/kubernetes/test-infra/blob/master/jenkins/dockerized-e2e-runner.sh) to run a kubekins-e2e with the desired go version, which requires pushing [e2e-image](https://github.com/kubernetes/test-infra/tree/master/jenkins/e2e-image) and [test-image](https://github.com/kubernetes/test-infra/tree/master/jenkins/test-image) images that are `FROM` the desired go version. +- The docker image being run in [gotest-dockerized.sh](https://github.com/kubernetes/test-infra/tree/master/jenkins/gotest-dockerized.sh). +- The cross tag `KUBE_BUILD_IMAGE_CROSS_TAG` in [build-tools/common.sh](../../build-tools/common.sh) + +## Workflow + +Below, we outline one of the more common git workflows that core developers use. +Other git workflows are also valid. + +### Visual overview + +![Git workflow](git_workflow.png) + +### Fork the main repository + +1. Go to https://github.com/kubernetes/kubernetes +2. Click the "Fork" button (at the top right) + +### Clone your fork + +The commands below require that you have $GOPATH set ([$GOPATH +docs](https://golang.org/doc/code.html#GOPATH)). We highly recommend you put +Kubernetes' code into your GOPATH. Note: the commands below will not work if +there is more than one directory in your `$GOPATH`. + +```sh +mkdir -p $GOPATH/src/k8s.io +cd $GOPATH/src/k8s.io +# Replace "$YOUR_GITHUB_USERNAME" below with your github username +git clone https://github.com/$YOUR_GITHUB_USERNAME/kubernetes.git +cd kubernetes +git remote add upstream 'https://github.com/kubernetes/kubernetes.git' +``` + +### Create a branch and make changes + +```sh +git checkout -b my-feature +# Make your code changes +``` + +### Keeping your development fork in sync + +```sh +git fetch upstream +git rebase upstream/master +``` + +Note: If you have write access to the main repository at +github.com/kubernetes/kubernetes, you should modify your git configuration so +that you can't accidentally push to upstream: + +```sh +git remote set-url --push upstream no_push +``` + +### Committing changes to your fork + +Before committing any changes, please link/copy the pre-commit hook into your +.git directory. This will keep you from accidentally committing non-gofmt'd Go +code. This hook will also do a build and test whether documentation generation +scripts need to be executed. + +The hook requires both Godep and etcd on your `PATH`. + +```sh +cd kubernetes/.git/hooks/ +ln -s ../../hooks/pre-commit . +``` + +Then you can commit your changes and push them to your fork: + +```sh +git commit +git push -f origin my-feature +``` + +### Creating a pull request + +1. Visit https://github.com/$YOUR_GITHUB_USERNAME/kubernetes +2. Click the "Compare & pull request" button next to your "my-feature" branch. +3. Check out the pull request [process](pull-requests.md) for more details + +**Note:** If you have write access, please refrain from using the GitHub UI for creating PRs, because GitHub will create the PR branch inside the main repository rather than inside your fork. + +### Getting a code review + +Once your pull request has been opened it will be assigned to one or more +reviewers. Those reviewers will do a thorough code review, looking for +correctness, bugs, opportunities for improvement, documentation and comments, +and style. + +Very small PRs are easy to review. Very large PRs are very difficult to +review. Github has a built-in code review tool, which is what most people use. +At the assigned reviewer's discretion, a PR may be switched to use +[Reviewable](https://reviewable.k8s.io) instead. Once a PR is switched to +Reviewable, please ONLY send or reply to comments through reviewable. Mixing +code review tools can be very confusing. + +See [Faster Reviews](faster_reviews.md) for some thoughts on how to streamline +the review process. + +### When to retain commits and when to squash + +Upon merge, all git commits should represent meaningful milestones or units of +work. Use commits to add clarity to the development and review process. + +Before merging a PR, squash any "fix review feedback", "typo", and "rebased" +sorts of commits. It is not imperative that every commit in a PR compile and +pass tests independently, but it is worth striving for. For mass automated +fixups (e.g. automated doc formatting), use one or more commits for the +changes to tooling and a final commit to apply the fixup en masse. This makes +reviews much easier. + +## Testing + +Three basic commands let you run unit, integration and/or e2e tests: + +```sh +cd kubernetes +make test # Run every unit test +make test WHAT=pkg/util/cache GOFLAGS=-v # Run tests of a package verbosely +make test-integration # Run integration tests, requires etcd +make test-e2e # Run e2e tests +``` + +See the [testing guide](testing.md) and [end-to-end tests](e2e-tests.md) for additional information and scenarios. + +## Regenerating the CLI documentation + +```sh +hack/update-generated-docs.sh +``` + + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/development.md?pixel)]() + diff --git a/contributors/devel/e2e-node-tests.md b/contributors/devel/e2e-node-tests.md new file mode 100644 index 00000000000..5e5f5b49f4a --- /dev/null +++ b/contributors/devel/e2e-node-tests.md @@ -0,0 +1,231 @@ +# Node End-To-End tests + +Node e2e tests are component tests meant for testing the Kubelet code on a custom host environment. + +Tests can be run either locally or against a host running on GCE. + +Node e2e tests are run as both pre- and post- submit tests by the Kubernetes project. + +*Note: Linux only. Mac and Windows unsupported.* + +*Note: There is no scheduler running. The e2e tests have to do manual scheduling, e.g. by using `framework.PodClient`.* + +# Running tests + +## Locally + +Why run tests *Locally*? Much faster than running tests Remotely. + +Prerequisites: +- [Install etcd](https://github.com/coreos/etcd/releases) on your PATH + - Verify etcd is installed correctly by running `which etcd` + - Or make etcd binary available and executable at `/tmp/etcd` +- [Install ginkgo](https://github.com/onsi/ginkgo) on your PATH + - Verify ginkgo is installed correctly by running `which ginkgo` + +From the Kubernetes base directory, run: + +```sh +make test-e2e-node +``` + +This will: run the *ginkgo* binary against the subdirectory *test/e2e_node*, which will in turn: +- Ask for sudo access (needed for running some of the processes) +- Build the Kubernetes source code +- Pre-pull docker images used by the tests +- Start a local instance of *etcd* +- Start a local instance of *kube-apiserver* +- Start a local instance of *kubelet* +- Run the test using the locally started processes +- Output the test results to STDOUT +- Stop *kubelet*, *kube-apiserver*, and *etcd* + +## Remotely + +Why Run tests *Remotely*? Tests will be run in a customized pristine environment. Closely mimics what will be done +as pre- and post- submit testing performed by the project. + +Prerequisites: +- [join the googlegroup](https://groups.google.com/forum/#!forum/kubernetes-dev) +`kubernetes-dev@googlegroups.com` + - *This provides read access to the node test images.* +- Setup a [Google Cloud Platform](https://cloud.google.com/) account and project with Google Compute Engine enabled +- Install and setup the [gcloud sdk](https://cloud.google.com/sdk/downloads) + - Verify the sdk is setup correctly by running `gcloud compute instances list` and `gcloud compute images list --project kubernetes-node-e2e-images` + +Run: + +```sh +make test-e2e-node REMOTE=true +``` + +This will: +- Build the Kubernetes source code +- Create a new GCE instance using the default test image + - Instance will be called **test-e2e-node-containervm-v20160321-image** +- Lookup the instance public ip address +- Copy a compressed archive file to the host containing the following binaries: + - ginkgo + - kubelet + - kube-apiserver + - e2e_node.test (this binary contains the actual tests to be run) +- Unzip the archive to a directory under **/tmp/gcloud** +- Run the tests using the `ginkgo` command + - Starts etcd, kube-apiserver, kubelet + - The ginkgo command is used because this supports more features than running the test binary directly +- Output the remote test results to STDOUT +- `scp` the log files back to the local host under /tmp/_artifacts/e2e-node-containervm-v20160321-image +- Stop the processes on the remote host +- **Leave the GCE instance running** + +**Note: Subsequent tests run using the same image will *reuse the existing host* instead of deleting it and +provisioning a new one. To delete the GCE instance after each test see +*[DELETE_INSTANCE](#delete-instance-after-tests-run)*.** + + +# Additional Remote Options + +## Run tests using different images + +This is useful if you want to run tests against a host using a different OS distro or container runtime than +provided by the default image. + +List the available test images using gcloud. + +```sh +make test-e2e-node LIST_IMAGES=true +``` + +This will output a list of the available images for the default image project. + +Then run: + +```sh +make test-e2e-node REMOTE=true IMAGES="" +``` + +## Run tests against a running GCE instance (not an image) + +This is useful if you have an host instance running already and want to run the tests there instead of on a new instance. + +```sh +make test-e2e-node REMOTE=true HOSTS="" +``` + +## Delete instance after tests run + +This is useful if you want recreate the instance for each test run to trigger flakes related to starting the instance. + +```sh +make test-e2e-node REMOTE=true DELETE_INSTANCES=true +``` + +## Keep instance, test binaries, and *processes* around after tests run + +This is useful if you want to manually inspect or debug the kubelet process run as part of the tests. + +```sh +make test-e2e-node REMOTE=true CLEANUP=false +``` + +## Run tests using an image in another project + +This is useful if you want to create your own host image in another project and use it for testing. + +```sh +make test-e2e-node REMOTE=true IMAGE_PROJECT="" IMAGES="" +``` + +Setting up your own host image may require additional steps such as installing etcd or docker. See +[setup_host.sh](../../test/e2e_node/environment/setup_host.sh) for common steps to setup hosts to run node tests. + +## Create instances using a different instance name prefix + +This is useful if you want to create instances using a different name so that you can run multiple copies of the +test in parallel against different instances of the same image. + +```sh +make test-e2e-node REMOTE=true INSTANCE_PREFIX="my-prefix" +``` + +# Additional Test Options for both Remote and Local execution + +## Only run a subset of the tests + +To run tests matching a regex: + +```sh +make test-e2e-node REMOTE=true FOCUS="" +``` + +To run tests NOT matching a regex: + +```sh +make test-e2e-node REMOTE=true SKIP="" +``` + +## Run tests continually until they fail + +This is useful if you are trying to debug a flaky test failure. This will cause ginkgo to continually +run the tests until they fail. **Note: this will only perform test setup once (e.g. creating the instance) and is +less useful for catching flakes related creating the instance from an image.** + +```sh +make test-e2e-node REMOTE=true RUN_UNTIL_FAILURE=true +``` + +## Run tests in parallel + +Running test in parallel can usually shorten the test duration. By default node +e2e test runs with`--nodes=8` (see ginkgo flag +[--nodes](https://onsi.github.io/ginkgo/#parallel-specs)). You can use the +`PARALLELISM` option to change the parallelism. + +```sh +make test-e2e-node PARALLELISM=4 # run test with 4 parallel nodes +make test-e2e-node PARALLELISM=1 # run test sequentially +``` + +## Run tests with kubenet network plugin + +[kubenet](http://kubernetes.io/docs/admin/network-plugins/#kubenet) is +the default network plugin used by kubelet since Kubernetes 1.3. The +plugin requires [CNI](https://github.com/containernetworking/cni) and +[nsenter](http://man7.org/linux/man-pages/man1/nsenter.1.html). + +Currently, kubenet is enabled by default for Remote execution `REMOTE=true`, +but disabled for Local execution. **Note: kubenet is not supported for +local execution currently. This may cause network related test result to be +different for Local and Remote execution. So if you want to run network +related test, Remote execution is recommended.** + +To enable/disable kubenet: + +```sh +make test_e2e_node TEST_ARGS="--disable-kubenet=true" # enable kubenet +make test_e2e_node TEST_ARGS="--disable-kubenet=false" # disable kubenet +``` + +## Additional QoS Cgroups Hierarchy level testing + +For testing with the QoS Cgroup Hierarchy enabled, you can pass --experimental-cgroups-per-qos flag as an argument into Ginkgo using TEST_ARGS + +```sh +make test_e2e_node TEST_ARGS="--experimental-cgroups-per-qos=true" +``` + +# Notes on tests run by the Kubernetes project during pre-, post- submit. + +The node e2e tests are run by the PR builder for each Pull Request and the results published at +the bottom of the comments section. To re-run just the node e2e tests from the PR builder add the comment +`@k8s-bot node e2e test this issue: #` and **include a link to the test +failure logs if caused by a flake.** + +The PR builder runs tests against the images listed in [jenkins-pull.properties](../../test/e2e_node/jenkins/jenkins-pull.properties) + +The post submit tests run against the images listed in [jenkins-ci.properties](../../test/e2e_node/jenkins/jenkins-ci.properties) + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/e2e-node-tests.md?pixel)]() + diff --git a/contributors/devel/e2e-tests.md b/contributors/devel/e2e-tests.md new file mode 100644 index 00000000000..fc8f1995428 --- /dev/null +++ b/contributors/devel/e2e-tests.md @@ -0,0 +1,719 @@ +# End-to-End Testing in Kubernetes + +Updated: 5/3/2016 + +**Table of Contents** + + +- [End-to-End Testing in Kubernetes](#end-to-end-testing-in-kubernetes) + - [Overview](#overview) + - [Building and Running the Tests](#building-and-running-the-tests) + - [Cleaning up](#cleaning-up) + - [Advanced testing](#advanced-testing) + - [Bringing up a cluster for testing](#bringing-up-a-cluster-for-testing) + - [Federation e2e tests](#federation-e2e-tests) + - [Configuring federation e2e tests](#configuring-federation-e2e-tests) + - [Image Push Repository](#image-push-repository) + - [Build](#build) + - [Deploy federation control plane](#deploy-federation-control-plane) + - [Run the Tests](#run-the-tests) + - [Teardown](#teardown) + - [Shortcuts for test developers](#shortcuts-for-test-developers) + - [Debugging clusters](#debugging-clusters) + - [Local clusters](#local-clusters) + - [Testing against local clusters](#testing-against-local-clusters) + - [Version-skewed and upgrade testing](#version-skewed-and-upgrade-testing) + - [Kinds of tests](#kinds-of-tests) + - [Viper configuration and hierarchichal test parameters.](#viper-configuration-and-hierarchichal-test-parameters) + - [Conformance tests](#conformance-tests) + - [Defining Conformance Subset](#defining-conformance-subset) + - [Continuous Integration](#continuous-integration) + - [What is CI?](#what-is-ci) + - [What runs in CI?](#what-runs-in-ci) + - [Non-default tests](#non-default-tests) + - [The PR-builder](#the-pr-builder) + - [Adding a test to CI](#adding-a-test-to-ci) + - [Moving a test out of CI](#moving-a-test-out-of-ci) + - [Performance Evaluation](#performance-evaluation) + - [One More Thing](#one-more-thing) + + + +## Overview + +End-to-end (e2e) tests for Kubernetes provide a mechanism to test end-to-end +behavior of the system, and is the last signal to ensure end user operations +match developer specifications. Although unit and integration tests provide a +good signal, in a distributed system like Kubernetes it is not uncommon that a +minor change may pass all unit and integration tests, but cause unforeseen +changes at the system level. + +The primary objectives of the e2e tests are to ensure a consistent and reliable +behavior of the kubernetes code base, and to catch hard-to-test bugs before +users do, when unit and integration tests are insufficient. + +The e2e tests in kubernetes are built atop of +[Ginkgo](http://onsi.github.io/ginkgo/) and +[Gomega](http://onsi.github.io/gomega/). There are a host of features that this +Behavior-Driven Development (BDD) testing framework provides, and it is +recommended that the developer read the documentation prior to diving into the + tests. + +The purpose of *this* document is to serve as a primer for developers who are +looking to execute or add tests using a local development environment. + +Before writing new tests or making substantive changes to existing tests, you +should also read [Writing Good e2e Tests](writing-good-e2e-tests.md) + +## Building and Running the Tests + +There are a variety of ways to run e2e tests, but we aim to decrease the number +of ways to run e2e tests to a canonical way: `hack/e2e.go`. + +You can run an end-to-end test which will bring up a master and nodes, perform +some tests, and then tear everything down. Make sure you have followed the +getting started steps for your chosen cloud platform (which might involve +changing the `KUBERNETES_PROVIDER` environment variable to something other than +"gce"). + +To build Kubernetes, up a cluster, run tests, and tear everything down, use: + +```sh +go run hack/e2e.go -v --build --up --test --down +``` + +If you'd like to just perform one of these steps, here are some examples: + +```sh +# Build binaries for testing +go run hack/e2e.go -v --build + +# Create a fresh cluster. Deletes a cluster first, if it exists +go run hack/e2e.go -v --up + +# Run all tests +go run hack/e2e.go -v --test + +# Run tests matching the regex "\[Feature:Performance\]" +go run hack/e2e.go -v --test --test_args="--ginkgo.focus=\[Feature:Performance\]" + +# Conversely, exclude tests that match the regex "Pods.*env" +go run hack/e2e.go -v --test --test_args="--ginkgo.skip=Pods.*env" + +# Run tests in parallel, skip any that must be run serially +GINKGO_PARALLEL=y go run hack/e2e.go --v --test --test_args="--ginkgo.skip=\[Serial\]" + +# Run tests in parallel, skip any that must be run serially and keep the test namespace if test failed +GINKGO_PARALLEL=y go run hack/e2e.go --v --test --test_args="--ginkgo.skip=\[Serial\] --delete-namespace-on-failure=false" + +# Flags can be combined, and their actions will take place in this order: +# --build, --up, --test, --down +# +# You can also specify an alternative provider, such as 'aws' +# +# e.g.: +KUBERNETES_PROVIDER=aws go run hack/e2e.go -v --build --up --test --down + +# -ctl can be used to quickly call kubectl against your e2e cluster. Useful for +# cleaning up after a failed test or viewing logs. Use -v to avoid suppressing +# kubectl output. +go run hack/e2e.go -v -ctl='get events' +go run hack/e2e.go -v -ctl='delete pod foobar' +``` + +The tests are built into a single binary which can be run used to deploy a +Kubernetes system or run tests against an already-deployed Kubernetes system. +See `go run hack/e2e.go --help` (or the flag definitions in `hack/e2e.go`) for +more options, such as reusing an existing cluster. + +### Cleaning up + +During a run, pressing `control-C` should result in an orderly shutdown, but if +something goes wrong and you still have some VMs running you can force a cleanup +with this command: + +```sh +go run hack/e2e.go -v --down +``` + +## Advanced testing + +### Bringing up a cluster for testing + +If you want, you may bring up a cluster in some other manner and run tests +against it. To do so, or to do other non-standard test things, you can pass +arguments into Ginkgo using `--test_args` (e.g. see above). For the purposes of +brevity, we will look at a subset of the options, which are listed below: + +``` +--ginkgo.dryRun=false: If set, ginkgo will walk the test hierarchy without +actually running anything. Best paired with -v. + +--ginkgo.failFast=false: If set, ginkgo will stop running a test suite after a +failure occurs. + +--ginkgo.failOnPending=false: If set, ginkgo will mark the test suite as failed +if any specs are pending. + +--ginkgo.focus="": If set, ginkgo will only run specs that match this regular +expression. + +--ginkgo.skip="": If set, ginkgo will only run specs that do not match this +regular expression. + +--ginkgo.trace=false: If set, default reporter prints out the full stack trace +when a failure occurs + +--ginkgo.v=false: If set, default reporter print out all specs as they begin. + +--host="": The host, or api-server, to connect to + +--kubeconfig="": Path to kubeconfig containing embedded authinfo. + +--prom-push-gateway="": The URL to prometheus gateway, so that metrics can be +pushed during e2es and scraped by prometheus. Typically something like +127.0.0.1:9091. + +--provider="": The name of the Kubernetes provider (gce, gke, local, vagrant, +etc.) + +--repo-root="../../": Root directory of kubernetes repository, for finding test +files. +``` + +Prior to running the tests, you may want to first create a simple auth file in +your home directory, e.g. `$HOME/.kube/config`, with the following: + +``` +{ + "User": "root", + "Password": "" +} +``` + +As mentioned earlier there are a host of other options that are available, but +they are left to the developer. + +**NOTE:** If you are running tests on a local cluster repeatedly, you may need +to periodically perform some manual cleanup: + + - `rm -rf /var/run/kubernetes`, clear kube generated credentials, sometimes +stale permissions can cause problems. + + - `sudo iptables -F`, clear ip tables rules left by the kube-proxy. + +### Federation e2e tests + +By default, `e2e.go` provisions a single Kubernetes cluster, and any `Feature:Federation` ginkgo tests will be skipped. + +Federation e2e testing involve bringing up multiple "underlying" Kubernetes clusters, +and deploying the federation control plane as a Kubernetes application on the underlying clusters. + +The federation e2e tests are still managed via `e2e.go`, but require some extra configuration items. + +#### Configuring federation e2e tests + +The following environment variables will enable federation e2e building, provisioning and testing. + +```sh +$ export FEDERATION=true +$ export E2E_ZONES="us-central1-a us-central1-b us-central1-f" +``` + +A Kubernetes cluster will be provisioned in each zone listed in `E2E_ZONES`. A zone can only appear once in the `E2E_ZONES` list. + +#### Image Push Repository + +Next, specify the docker repository where your ci images will be pushed. + +* **If `KUBERNETES_PROVIDER=gce` or `KUBERNETES_PROVIDER=gke`**: + + If you use the same GCP project where you to run the e2e tests as the container image repository, + FEDERATION_PUSH_REPO_BASE environment variable will be defaulted to "gcr.io/${DEFAULT_GCP_PROJECT_NAME}". + You can skip ahead to the **Build** section. + + You can simply set your push repo base based on your project name, and the necessary repositories will be + auto-created when you first push your container images. + + ```sh + $ export FEDERATION_PUSH_REPO_BASE="gcr.io/${GCE_PROJECT_NAME}" + ``` + + Skip ahead to the **Build** section. + +* **For all other providers**: + + You'll be responsible for creating and managing access to the repositories manually. + + ```sh + $ export FEDERATION_PUSH_REPO_BASE="quay.io/colin_hom" + ``` + + Given this example, the `federation-apiserver` container image will be pushed to the repository + `quay.io/colin_hom/federation-apiserver`. + + The docker client on the machine running `e2e.go` must have push access for the following pre-existing repositories: + + * `${FEDERATION_PUSH_REPO_BASE}/federation-apiserver` + * `${FEDERATION_PUSH_REPO_BASE}/federation-controller-manager` + + These repositories must allow public read access, as the e2e node docker daemons will not have any credentials. If you're using + GCE/GKE as your provider, the repositories will have read-access by default. + +#### Build + +* Compile the binaries and build container images: + + ```sh + $ KUBE_RELEASE_RUN_TESTS=n KUBE_FASTBUILD=true go run hack/e2e.go -v -build + ``` + +* Push the federation container images + + ```sh + $ build-tools/push-federation-images.sh + ``` + +#### Deploy federation control plane + +The following command will create the underlying Kubernetes clusters in each of `E2E_ZONES`, and then provision the +federation control plane in the cluster occupying the last zone in the `E2E_ZONES` list. + +```sh +$ go run hack/e2e.go -v --up +``` + +#### Run the Tests + +This will run only the `Feature:Federation` e2e tests. You can omit the `ginkgo.focus` argument to run the entire e2e suite. + +```sh +$ go run hack/e2e.go -v --test --test_args="--ginkgo.focus=\[Feature:Federation\]" +``` + +#### Teardown + +```sh +$ go run hack/e2e.go -v --down +``` + +#### Shortcuts for test developers + +* To speed up `e2e.go -up`, provision a single-node kubernetes cluster in a single e2e zone: + + `NUM_NODES=1 E2E_ZONES="us-central1-f"` + + Keep in mind that some tests may require multiple underlying clusters and/or minimum compute resource availability. + +* You can quickly recompile the e2e testing framework via `go install ./test/e2e`. This will not do anything besides + allow you to verify that the go code compiles. + +* If you want to run your e2e testing framework without re-provisioning the e2e setup, you can do so via + `make WHAT=test/e2e/e2e.test` and then re-running the ginkgo tests. + +* If you're hacking around with the federation control plane deployment itself, + you can quickly re-deploy the federation control plane Kubernetes manifests without tearing any resources down. + To re-deploy the federation control plane after running `-up` for the first time: + + ```sh + $ federation/cluster/federation-up.sh + ``` + +### Debugging clusters + +If a cluster fails to initialize, or you'd like to better understand cluster +state to debug a failed e2e test, you can use the `cluster/log-dump.sh` script +to gather logs. + +This script requires that the cluster provider supports ssh. Assuming it does, +running: + +``` +cluster/log-dump.sh +```` + +will ssh to the master and all nodes and download a variety of useful logs to +the provided directory (which should already exist). + +The Google-run Jenkins builds automatically collected these logs for every +build, saving them in the `artifacts` directory uploaded to GCS. + +### Local clusters + +It can be much faster to iterate on a local cluster instead of a cloud-based +one. To start a local cluster, you can run: + +```sh +# The PATH construction is needed because PATH is one of the special-cased +# environment variables not passed by sudo -E +sudo PATH=$PATH hack/local-up-cluster.sh +``` + +This will start a single-node Kubernetes cluster than runs pods using the local +docker daemon. Press Control-C to stop the cluster. + +You can generate a valid kubeconfig file by following instructions printed at the +end of aforementioned script. + +#### Testing against local clusters + +In order to run an E2E test against a locally running cluster, point the tests +at a custom host directly: + +```sh +export KUBECONFIG=/path/to/kubeconfig +export KUBE_MASTER_IP="http://127.0.0.1:" +export KUBE_MASTER=local +go run hack/e2e.go -v --test +``` + +To control the tests that are run: + +```sh +go run hack/e2e.go -v --test --test_args="--ginkgo.focus=\"Secrets\"" +``` + +### Version-skewed and upgrade testing + +We run version-skewed tests to check that newer versions of Kubernetes work +similarly enough to older versions. The general strategy is to cover the following cases: + +1. One version of `kubectl` with another version of the cluster and tests (e.g. + that v1.2 and v1.4 `kubectl` doesn't break v1.3 tests running against a v1.3 + cluster). +1. A newer version of the Kubernetes master with older nodes and tests (e.g. + that upgrading a master to v1.3 with nodes at v1.2 still passes v1.2 tests). +1. A newer version of the whole cluster with older tests (e.g. that a cluster + upgraded---master and nodes---to v1.3 still passes v1.2 tests). +1. That an upgraded cluster functions the same as a brand-new cluster of the + same version (e.g. a cluster upgraded to v1.3 passes the same v1.3 tests as + a newly-created v1.3 cluster). + +[hack/e2e-runner.sh](http://releases.k8s.io/HEAD/hack/jenkins/e2e-runner.sh) is +the authoritative source on how to run version-skewed tests, but below is a +quick-and-dirty tutorial. + +```sh +# Assume you have two copies of the Kubernetes repository checked out, at +# ./kubernetes and ./kubernetes_old + +# If using GKE: +export KUBERNETES_PROVIDER=gke +export CLUSTER_API_VERSION=${OLD_VERSION} + +# Deploy a cluster at the old version; see above for more details +cd ./kubernetes_old +go run ./hack/e2e.go -v --up + +# Upgrade the cluster to the new version +# +# If using GKE, add --upgrade-target=${NEW_VERSION} +# +# You can target Feature:MasterUpgrade or Feature:ClusterUpgrade +cd ../kubernetes +go run ./hack/e2e.go -v --test --check_version_skew=false --test_args="--ginkgo.focus=\[Feature:MasterUpgrade\]" + +# Run old tests with new kubectl +cd ../kubernetes_old +go run ./hack/e2e.go -v --test --test_args="--kubectl-path=$(pwd)/../kubernetes/cluster/kubectl.sh" +``` + +If you are just testing version-skew, you may want to just deploy at one +version and then test at another version, instead of going through the whole +upgrade process: + +```sh +# With the same setup as above + +# Deploy a cluster at the new version +cd ./kubernetes +go run ./hack/e2e.go -v --up + +# Run new tests with old kubectl +go run ./hack/e2e.go -v --test --test_args="--kubectl-path=$(pwd)/../kubernetes_old/cluster/kubectl.sh" + +# Run old tests with new kubectl +cd ../kubernetes_old +go run ./hack/e2e.go -v --test --test_args="--kubectl-path=$(pwd)/../kubernetes/cluster/kubectl.sh" +``` + +## Kinds of tests + +We are working on implementing clearer partitioning of our e2e tests to make +running a known set of tests easier (#10548). Tests can be labeled with any of +the following labels, in order of increasing precedence (that is, each label +listed below supersedes the previous ones): + + - If a test has no labels, it is expected to run fast (under five minutes), be +able to be run in parallel, and be consistent. + + - `[Slow]`: If a test takes more than five minutes to run (by itself or in +parallel with many other tests), it is labeled `[Slow]`. This partition allows +us to run almost all of our tests quickly in parallel, without waiting for the +stragglers to finish. + + - `[Serial]`: If a test cannot be run in parallel with other tests (e.g. it +takes too many resources or restarts nodes), it is labeled `[Serial]`, and +should be run in serial as part of a separate suite. + + - `[Disruptive]`: If a test restarts components that might cause other tests +to fail or break the cluster completely, it is labeled `[Disruptive]`. Any +`[Disruptive]` test is also assumed to qualify for the `[Serial]` label, but +need not be labeled as both. These tests are not run against soak clusters to +avoid restarting components. + + - `[Flaky]`: If a test is found to be flaky and we have decided that it's too +hard to fix in the short term (e.g. it's going to take a full engineer-week), it +receives the `[Flaky]` label until it is fixed. The `[Flaky]` label should be +used very sparingly, and should be accompanied with a reference to the issue for +de-flaking the test, because while a test remains labeled `[Flaky]`, it is not +monitored closely in CI. `[Flaky]` tests are by default not run, unless a +`focus` or `skip` argument is explicitly given. + + - `[Feature:.+]`: If a test has non-default requirements to run or targets +some non-core functionality, and thus should not be run as part of the standard +suite, it receives a `[Feature:.+]` label, e.g. `[Feature:Performance]` or +`[Feature:Ingress]`. `[Feature:.+]` tests are not run in our core suites, +instead running in custom suites. If a feature is experimental or alpha and is +not enabled by default due to being incomplete or potentially subject to +breaking changes, it does *not* block the merge-queue, and thus should run in +some separate test suites owned by the feature owner(s) +(see [Continuous Integration](#continuous-integration) below). + +### Viper configuration and hierarchichal test parameters. + +The future of e2e test configuration idioms will be increasingly defined using viper, and decreasingly via flags. + +Flags in general fall apart once tests become sufficiently complicated. So, even if we could use another flag library, it wouldn't be ideal. + +To use viper, rather than flags, to configure your tests: + +- Just add "e2e.json" to the current directory you are in, and define parameters in it... i.e. `"kubeconfig":"/tmp/x"`. + +Note that advanced testing parameters, and hierarchichally defined parameters, are only defined in viper, to see what they are, you can dive into [TestContextType](../../test/e2e/framework/test_context.go). + +In time, it is our intent to add or autogenerate a sample viper configuration that includes all e2e parameters, to ship with kubernetes. + +### Conformance tests + +Finally, `[Conformance]` tests represent a subset of the e2e-tests we expect to +pass on **any** Kubernetes cluster. The `[Conformance]` label does not supersede +any other labels. + +As each new release of Kubernetes providers new functionality, the subset of +tests necessary to demonstrate conformance grows with each release. Conformance +is thus considered versioned, with the same backwards compatibility guarantees +as laid out in [our versioning policy](../design/versioning.md#supported-releases). +Conformance tests for a given version should be run off of the release branch +that corresponds to that version. Thus `v1.2` conformance tests would be run +from the head of the `release-1.2` branch. eg: + + - A v1.3 development cluster should pass v1.1, v1.2 conformance tests + + - A v1.2 cluster should pass v1.1, v1.2 conformance tests + + - A v1.1 cluster should pass v1.0, v1.1 conformance tests, and fail v1.2 +conformance tests + +Conformance tests are designed to be run with no cloud provider configured. +Conformance tests can be run against clusters that have not been created with +`hack/e2e.go`, just provide a kubeconfig with the appropriate endpoint and +credentials. + +```sh +# setup for conformance tests +export KUBECONFIG=/path/to/kubeconfig +export KUBERNETES_CONFORMANCE_TEST=y +export KUBERNETES_PROVIDER=skeleton + +# run all conformance tests +go run hack/e2e.go -v --test --test_args="--ginkgo.focus=\[Conformance\]" + +# run all parallel-safe conformance tests in parallel +GINKGO_PARALLEL=y go run hack/e2e.go -v --test --test_args="--ginkgo.focus=\[Conformance\] --ginkgo.skip=\[Serial\]" + +# ... and finish up with remaining tests in serial +go run hack/e2e.go -v --test --test_args="--ginkgo.focus=\[Serial\].*\[Conformance\]" +``` + +### Defining Conformance Subset + +It is impossible to define the entire space of Conformance tests without knowing +the future, so instead, we define the compliment of conformance tests, below +(`Please update this with companion PRs as necessary`): + + - A conformance test cannot test cloud provider specific features (i.e. GCE +monitoring, S3 Bucketing, ...) + + - A conformance test cannot rely on any particular non-standard file system +permissions granted to containers or users (i.e. sharing writable host /tmp with +a container) + + - A conformance test cannot rely on any binaries that are not required for the +linux kernel or for a kubelet to run (i.e. git) + + - A conformance test cannot test a feature which obviously cannot be supported +on a broad range of platforms (i.e. testing of multiple disk mounts, GPUs, high +density) + +## Continuous Integration + +A quick overview of how we run e2e CI on Kubernetes. + +### What is CI? + +We run a battery of `e2e` tests against `HEAD` of the master branch on a +continuous basis, and block merges via the [submit +queue](http://submit-queue.k8s.io/) on a subset of those tests if they fail (the +subset is defined in the [munger config] +(https://github.com/kubernetes/contrib/blob/master/mungegithub/mungers/submit-queue.go) +via the `jenkins-jobs` flag; note we also block on `kubernetes-build` and +`kubernetes-test-go` jobs for build and unit and integration tests). + +CI results can be found at [ci-test.k8s.io](http://ci-test.k8s.io), e.g. +[ci-test.k8s.io/kubernetes-e2e-gce/10594](http://ci-test.k8s.io/kubernetes-e2e-gce/10594). + +### What runs in CI? + +We run all default tests (those that aren't marked `[Flaky]` or `[Feature:.+]`) +against GCE and GKE. To minimize the time from regression-to-green-run, we +partition tests across different jobs: + + - `kubernetes-e2e-` runs all non-`[Slow]`, non-`[Serial]`, +non-`[Disruptive]`, non-`[Flaky]`, non-`[Feature:.+]` tests in parallel. + + - `kubernetes-e2e--slow` runs all `[Slow]`, non-`[Serial]`, +non-`[Disruptive]`, non-`[Flaky]`, non-`[Feature:.+]` tests in parallel. + + - `kubernetes-e2e--serial` runs all `[Serial]` and `[Disruptive]`, +non-`[Flaky]`, non-`[Feature:.+]` tests in serial. + +We also run non-default tests if the tests exercise general-availability ("GA") +features that require a special environment to run in, e.g. +`kubernetes-e2e-gce-scalability` and `kubernetes-kubemark-gce`, which test for +Kubernetes performance. + +#### Non-default tests + +Many `[Feature:.+]` tests we don't run in CI. These tests are for features that +are experimental (often in the `experimental` API), and aren't enabled by +default. + +### The PR-builder + +We also run a battery of tests against every PR before we merge it. These tests +are equivalent to `kubernetes-gce`: it runs all non-`[Slow]`, non-`[Serial]`, +non-`[Disruptive]`, non-`[Flaky]`, non-`[Feature:.+]` tests in parallel. These +tests are considered "smoke tests" to give a decent signal that the PR doesn't +break most functionality. Results for your PR can be found at +[pr-test.k8s.io](http://pr-test.k8s.io), e.g. +[pr-test.k8s.io/20354](http://pr-test.k8s.io/20354) for #20354. + +### Adding a test to CI + +As mentioned above, prior to adding a new test, it is a good idea to perform a +`-ginkgo.dryRun=true` on the system, in order to see if a behavior is already +being tested, or to determine if it may be possible to augment an existing set +of tests for a specific use case. + +If a behavior does not currently have coverage and a developer wishes to add a +new e2e test, navigate to the ./test/e2e directory and create a new test using +the existing suite as a guide. + +TODO(#20357): Create a self-documented example which has been disabled, but can +be copied to create new tests and outlines the capabilities and libraries used. + +When writing a test, consult #kinds_of_tests above to determine how your test +should be marked, (e.g. `[Slow]`, `[Serial]`; remember, by default we assume a +test can run in parallel with other tests!). + +When first adding a test it should *not* go straight into CI, because failures +block ordinary development. A test should only be added to CI after is has been +running in some non-CI suite long enough to establish a track record showing +that the test does not fail when run against *working* software. Note also that +tests running in CI are generally running on a well-loaded cluster, so must +contend for resources; see above about [kinds of tests](#kinds_of_tests). + +Generally, a feature starts as `experimental`, and will be run in some suite +owned by the team developing the feature. If a feature is in beta or GA, it +*should* block the merge-queue. In moving from experimental to beta or GA, tests +that are expected to pass by default should simply remove the `[Feature:.+]` +label, and will be incorporated into our core suites. If tests are not expected +to pass by default, (e.g. they require a special environment such as added +quota,) they should remain with the `[Feature:.+]` label, and the suites that +run them should be incorporated into the +[munger config](https://github.com/kubernetes/contrib/blob/master/mungegithub/mungers/submit-queue.go) +via the `jenkins-jobs` flag. + +Occasionally, we'll want to add tests to better exercise features that are +already GA. These tests also shouldn't go straight to CI. They should begin by +being marked as `[Flaky]` to be run outside of CI, and once a track-record for +them is established, they may be promoted out of `[Flaky]`. + +### Moving a test out of CI + +If we have determined that a test is known-flaky and cannot be fixed in the +short-term, we may move it out of CI indefinitely. This move should be used +sparingly, as it effectively means that we have no coverage of that test. When a +test is demoted, it should be marked `[Flaky]` with a comment accompanying the +label with a reference to an issue opened to fix the test. + +## Performance Evaluation + +Another benefit of the e2e tests is the ability to create reproducible loads on +the system, which can then be used to determine the responsiveness, or analyze +other characteristics of the system. For example, the density tests load the +system to 30,50,100 pods per/node and measures the different characteristics of +the system, such as throughput, api-latency, etc. + +For a good overview of how we analyze performance data, please read the +following [post](http://blog.kubernetes.io/2015/09/kubernetes-performance-measurements-and.html) + +For developers who are interested in doing their own performance analysis, we +recommend setting up [prometheus](http://prometheus.io/) for data collection, +and using [promdash](http://prometheus.io/docs/visualization/promdash/) to +visualize the data. There also exists the option of pushing your own metrics in +from the tests using a +[prom-push-gateway](http://prometheus.io/docs/instrumenting/pushing/). +Containers for all of these components can be found +[here](https://hub.docker.com/u/prom/). + +For more accurate measurements, you may wish to set up prometheus external to +kubernetes in an environment where it can access the major system components +(api-server, controller-manager, scheduler). This is especially useful when +attempting to gather metrics in a load-balanced api-server environment, because +all api-servers can be analyzed independently as well as collectively. On +startup, configuration file is passed to prometheus that specifies the endpoints +that prometheus will scrape, as well as the sampling interval. + +``` +#prometheus.conf +job: { + name: "kubernetes" + scrape_interval: "1s" + target_group: { + # apiserver(s) + target: "http://localhost:8080/metrics" + # scheduler + target: "http://localhost:10251/metrics" + # controller-manager + target: "http://localhost:10252/metrics" + } +} +``` + +Once prometheus is scraping the kubernetes endpoints, that data can then be +plotted using promdash, and alerts can be created against the assortment of +metrics that kubernetes provides. + +## One More Thing + +You should also know the [testing conventions](coding-conventions.md#testing-conventions). + +**HAPPY TESTING!** + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/e2e-tests.md?pixel)]() + diff --git a/contributors/devel/faster_reviews.md b/contributors/devel/faster_reviews.md new file mode 100644 index 00000000000..85568d3f840 --- /dev/null +++ b/contributors/devel/faster_reviews.md @@ -0,0 +1,218 @@ +# How to get faster PR reviews + +Most of what is written here is not at all specific to Kubernetes, but it bears +being written down in the hope that it will occasionally remind people of "best +practices" around code reviews. + +You've just had a brilliant idea on how to make Kubernetes better. Let's call +that idea "Feature-X". Feature-X is not even that complicated. You have a pretty +good idea of how to implement it. You jump in and implement it, fixing a bunch +of stuff along the way. You send your PR - this is awesome! And it sits. And +sits. A week goes by and nobody reviews it. Finally someone offers a few +comments, which you fix up and wait for more review. And you wait. Another +week or two goes by. This is horrible. + +What went wrong? One particular problem that comes up frequently is this - your +PR is too big to review. You've touched 39 files and have 8657 insertions. When +your would-be reviewers pull up the diffs they run away - this PR is going to +take 4 hours to review and they don't have 4 hours right now. They'll get to it +later, just as soon as they have more free time (ha!). + +Let's talk about how to avoid this. + +## 0. Familiarize yourself with project conventions + +* [Development guide](development.md) +* [Coding conventions](coding-conventions.md) +* [API conventions](api-conventions.md) +* [Kubectl conventions](kubectl-conventions.md) + +## 1. Don't build a cathedral in one PR + +Are you sure Feature-X is something the Kubernetes team wants or will accept, or +that it is implemented to fit with other changes in flight? Are you willing to +bet a few days or weeks of work on it? If you have any doubt at all about the +usefulness of your feature or the design - make a proposal doc (in +docs/proposals; for example [the QoS proposal](http://prs.k8s.io/11713)) or a +sketch PR (e.g., just the API or Go interface) or both. Write or code up just +enough to express the idea and the design and why you made those choices, then +get feedback on this. Be clear about what type of feedback you are asking for. +Now, if we ask you to change a bunch of facets of the design, you won't have to +re-write it all. + +## 2. Smaller diffs are exponentially better + +Small PRs get reviewed faster and are more likely to be correct than big ones. +Let's face it - attention wanes over time. If your PR takes 60 minutes to +review, I almost guarantee that the reviewer's eye for detail is not as keen in +the last 30 minutes as it was in the first. This leads to multiple rounds of +review when one might have sufficed. In some cases the review is delayed in its +entirety by the need for a large contiguous block of time to sit and read your +code. + +Whenever possible, break up your PRs into multiple commits. Making a series of +discrete commits is a powerful way to express the evolution of an idea or the +different ideas that make up a single feature. There's a balance to be struck, +obviously. If your commits are too small they become more cumbersome to deal +with. Strive to group logically distinct ideas into separate commits. + +For example, if you found that Feature-X needed some "prefactoring" to fit in, +make a commit that JUST does that prefactoring. Then make a new commit for +Feature-X. Don't lump unrelated things together just because you didn't think +about prefactoring. If you need to, fork a new branch, do the prefactoring +there and send a PR for that. If you can explain why you are doing seemingly +no-op work ("it makes the Feature-X change easier, I promise") we'll probably be +OK with it. + +Obviously, a PR with 25 commits is still very cumbersome to review, so use +common sense. + +## 3. Multiple small PRs are often better than multiple commits + +If you can extract whole ideas from your PR and send those as PRs of their own, +you can avoid the painful problem of continually rebasing. Kubernetes is a +fast-moving codebase - lock in your changes ASAP, and make merges be someone +else's problem. + +Obviously, we want every PR to be useful on its own, so you'll have to use +common sense in deciding what can be a PR vs. what should be a commit in a larger +PR. Rule of thumb - if this commit or set of commits is directly related to +Feature-X and nothing else, it should probably be part of the Feature-X PR. If +you can plausibly imagine someone finding value in this commit outside of +Feature-X, try it as a PR. + +Don't worry about flooding us with PRs. We'd rather have 100 small, obvious PRs +than 10 unreviewable monoliths. + +## 4. Don't rename, reformat, comment, etc in the same PR + +Often, as you are implementing Feature-X, you find things that are just wrong. +Bad comments, poorly named functions, bad structure, weak type-safety. You +should absolutely fix those things (or at least file issues, please) - but not +in this PR. See the above points - break unrelated changes out into different +PRs or commits. Otherwise your diff will have WAY too many changes, and your +reviewer won't see the forest because of all the trees. + +## 5. Comments matter + +Read up on GoDoc - follow those general rules. If you're writing code and you +think there is any possible chance that someone might not understand why you did +something (or that you won't remember what you yourself did), comment it. If +you think there's something pretty obvious that we could follow up on, add a +TODO. Many code-review comments are about this exact issue. + +## 5. Tests are almost always required + +Nothing is more frustrating than doing a review, only to find that the tests are +inadequate or even entirely absent. Very few PRs can touch code and NOT touch +tests. If you don't know how to test Feature-X - ask! We'll be happy to help +you design things for easy testing or to suggest appropriate test cases. + +## 6. Look for opportunities to generify + +If you find yourself writing something that touches a lot of modules, think hard +about the dependencies you are introducing between packages. Can some of what +you're doing be made more generic and moved up and out of the Feature-X package? +Do you need to use a function or type from an otherwise unrelated package? If +so, promote! We have places specifically for hosting more generic code. + +Likewise if Feature-X is similar in form to Feature-W which was checked in last +month and it happens to exactly duplicate some tricky stuff from Feature-W, +consider prefactoring core logic out and using it in both Feature-W and +Feature-X. But do that in a different commit or PR, please. + +## 7. Fix feedback in a new commit + +Your reviewer has finally sent you some feedback on Feature-X. You make a bunch +of changes and ... what? You could patch those into your commits with git +"squash" or "fixup" logic. But that makes your changes hard to verify. Unless +your whole PR is pretty trivial, you should instead put your fixups into a new +commit and re-push. Your reviewer can then look at that commit on its own - so +much faster to review than starting over. + +We might still ask you to clean up your commits at the very end, for the sake +of a more readable history, but don't do this until asked, typically at the +point where the PR would otherwise be tagged LGTM. + +General squashing guidelines: + +* Sausage => squash + + When there are several commits to fix bugs in the original commit(s), address +reviewer feedback, etc. Really we only want to see the end state and commit +message for the whole PR. + +* Layers => don't squash + + When there are independent changes layered upon each other to achieve a single +goal. For instance, writing a code munger could be one commit, applying it could +be another, and adding a precommit check could be a third. One could argue they +should be separate PRs, but there's really no way to test/review the munger +without seeing it applied, and there needs to be a precommit check to ensure the +munged output doesn't immediately get out of date. + +A commit, as much as possible, should be a single logical change. Each commit +should always have a good title line (<70 characters) and include an additional +description paragraph describing in more detail the change intended. Do not link +pull requests by `#` in a commit description, because GitHub creates lots of +spam. Instead, reference other PRs via the PR your commit is in. + +## 8. KISS, YAGNI, MVP, etc + +Sometimes we need to remind each other of core tenets of software design - Keep +It Simple, You Aren't Gonna Need It, Minimum Viable Product, and so on. Adding +features "because we might need it later" is antithetical to software that +ships. Add the things you need NOW and (ideally) leave room for things you +might need later - but don't implement them now. + +## 9. Push back + +We understand that it is hard to imagine, but sometimes we make mistakes. It's +OK to push back on changes requested during a review. If you have a good reason +for doing something a certain way, you are absolutely allowed to debate the +merits of a requested change. You might be overruled, but you might also +prevail. We're mostly pretty reasonable people. Mostly. + +## 10. I'm still getting stalled - help?! + +So, you've done all that and you still aren't getting any PR love? Here's some +things you can do that might help kick a stalled process along: + + * Make sure that your PR has an assigned reviewer (assignee in GitHub). If +this is not the case, reply to the PR comment stream asking for one to be +assigned. + + * Ping the assignee (@username) on the PR comment stream asking for an +estimate of when they can get to it. + + * Ping the assignee by email (many of us have email addresses that are well +published or are the same as our GitHub handle @google.com or @redhat.com). + + * Ping the [team](https://github.com/orgs/kubernetes/teams) (via @team-name) +that works in the area you're submitting code. + +If you think you have fixed all the issues in a round of review, and you haven't +heard back, you should ping the reviewer (assignee) on the comment stream with a +"please take another look" (PTAL) or similar comment indicating you are done and +you think it is ready for re-review. In fact, this is probably a good habit for +all PRs. + +One phenomenon of open-source projects (where anyone can comment on any issue) +is the dog-pile - your PR gets so many comments from so many people it becomes +hard to follow. In this situation you can ask the primary reviewer (assignee) +whether they want you to fork a new PR to clear out all the comments. Remember: +you don't HAVE to fix every issue raised by every person who feels like +commenting, but you should at least answer reasonable comments with an +explanation. + +## Final: Use common sense + +Obviously, none of these points are hard rules. There is no document that can +take the place of common sense and good taste. Use your best judgment, but put +a bit of thought into how your work can be made easier to review. If you do +these things your PRs will flow much more easily. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/faster_reviews.md?pixel)]() + diff --git a/contributors/devel/flaky-tests.md b/contributors/devel/flaky-tests.md new file mode 100644 index 00000000000..9656bd5f624 --- /dev/null +++ b/contributors/devel/flaky-tests.md @@ -0,0 +1,194 @@ +# Flaky tests + +Any test that fails occasionally is "flaky". Since our merges only proceed when +all tests are green, and we have a number of different CI systems running the +tests in various combinations, even a small percentage of flakes results in a +lot of pain for people waiting for their PRs to merge. + +Therefore, it's very important that we write tests defensively. Situations that +"almost never happen" happen with some regularity when run thousands of times in +resource-constrained environments. Since flakes can often be quite hard to +reproduce while still being common enough to block merges occasionally, it's +additionally important that the test logs be useful for narrowing down exactly +what caused the failure. + +Note that flakes can occur in unit tests, integration tests, or end-to-end +tests, but probably occur most commonly in end-to-end tests. + +## Filing issues for flaky tests + +Because flakes may be rare, it's very important that all relevant logs be +discoverable from the issue. + +1. Search for the test name. If you find an open issue and you're 90% sure the + flake is exactly the same, add a comment instead of making a new issue. +2. If you make a new issue, you should title it with the test name, prefixed by + "e2e/unit/integration flake:" (whichever is appropriate) +3. Reference any old issues you found in step one. Also, make a comment in the + old issue referencing your new issue, because people monitoring only their + email do not see the backlinks github adds. Alternatively, tag the person or + people who most recently worked on it. +4. Paste, in block quotes, the entire log of the individual failing test, not + just the failure line. +5. Link to durable storage with the rest of the logs. This means (for all the + tests that Google runs) the GCS link is mandatory! The Jenkins test result + link is nice but strictly optional: not only does it expire more quickly, + it's not accessible to non-Googlers. + +## Finding filed flaky test cases + +Find flaky tests issues on GitHub under the [kind/flake issue label][flake]. +There are significant numbers of flaky tests reported on a regular basis and P2 +flakes are under-investigated. Fixing flakes is a quick way to gain expertise +and community goodwill. + +[flake]: https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Akind%2Fflake + +## Expectations when a flaky test is assigned to you + +Note that we won't randomly assign these issues to you unless you've opted in or +you're part of a group that has opted in. We are more than happy to accept help +from anyone in fixing these, but due to the severity of the problem when merges +are blocked, we need reasonably quick turn-around time on test flakes. Therefore +we have the following guidelines: + +1. If a flaky test is assigned to you, it's more important than anything else + you're doing unless you can get a special dispensation (in which case it will + be reassigned). If you have too many flaky tests assigned to you, or you + have such a dispensation, then it's *still* your responsibility to find new + owners (this may just mean giving stuff back to the relevant Team or SIG Lead). +2. You should make a reasonable effort to reproduce it. Somewhere between an + hour and half a day of concentrated effort is "reasonable". It is perfectly + reasonable to ask for help! +3. If you can reproduce it (or it's obvious from the logs what happened), you + should then be able to fix it, or in the case where someone is clearly more + qualified to fix it, reassign it with very clear instructions. +4. PRs that fix or help debug flakes may have the P0 priority set to get them + through the merge queue as fast as possible. +5. Once you have made a change that you believe fixes a flake, it is conservative + to keep the issue for the flake open and see if it manifests again after the + change is merged. +6. If you can't reproduce a flake: __don't just close it!__ Every time a flake comes + back, at least 2 hours of merge time is wasted. So we need to make monotonic + progress towards narrowing it down every time a flake occurs. If you can't + figure it out from the logs, add log messages that would have help you figure + it out. If you make changes to make a flake more reproducible, please link + your pull request to the flake you're working on. +7. If a flake has been open, could not be reproduced, and has not manifested in + 3 months, it is reasonable to close the flake issue with a note saying + why. + +# Reproducing unit test flakes + +Try the [stress command](https://godoc.org/golang.org/x/tools/cmd/stress). + +Just + +``` +$ go install golang.org/x/tools/cmd/stress +``` + +Then build your test binary + +``` +$ go test -c -race +``` + +Then run it under stress + +``` +$ stress ./package.test -test.run=FlakyTest +``` + +It runs the command and writes output to `/tmp/gostress-*` files when it fails. +It periodically reports with run counts. Be careful with tests that use the +`net/http/httptest` package; they could exhaust the available ports on your +system! + +# Hunting flaky unit tests in Kubernetes + +Sometimes unit tests are flaky. This means that due to (usually) race +conditions, they will occasionally fail, even though most of the time they pass. + +We have a goal of 99.9% flake free tests. This means that there is only one +flake in one thousand runs of a test. + +Running a test 1000 times on your own machine can be tedious and time consuming. +Fortunately, there is a better way to achieve this using Kubernetes. + +_Note: these instructions are mildly hacky for now, as we get run once semantics +and logging they will get better_ + +There is a testing image `brendanburns/flake` up on the docker hub. We will use +this image to test our fix. + +Create a replication controller with the following config: + +```yaml +apiVersion: v1 +kind: ReplicationController +metadata: + name: flakecontroller +spec: + replicas: 24 + template: + metadata: + labels: + name: flake + spec: + containers: + - name: flake + image: brendanburns/flake + env: + - name: TEST_PACKAGE + value: pkg/tools + - name: REPO_SPEC + value: https://github.com/kubernetes/kubernetes +``` + +Note that we omit the labels and the selector fields of the replication +controller, because they will be populated from the labels field of the pod +template by default. + +```sh +kubectl create -f ./controller.yaml +``` + +This will spin up 24 instances of the test. They will run to completion, then +exit, and the kubelet will restart them, accumulating more and more runs of the +test. + +You can examine the recent runs of the test by calling `docker ps -a` and +looking for tasks that exited with non-zero exit codes. Unfortunately, docker +ps -a only keeps around the exit status of the last 15-20 containers with the +same image, so you have to check them frequently. + +You can use this script to automate checking for failures, assuming your cluster +is running on GCE and has four nodes: + +```sh +echo "" > output.txt +for i in {1..4}; do + echo "Checking kubernetes-node-${i}" + echo "kubernetes-node-${i}:" >> output.txt + gcloud compute ssh "kubernetes-node-${i}" --command="sudo docker ps -a" >> output.txt +done +grep "Exited ([^0])" output.txt +``` + +Eventually you will have sufficient runs for your purposes. At that point you +can delete the replication controller by running: + +```sh +kubectl delete replicationcontroller flakecontroller +``` + +If you do a final check for flakes with `docker ps -a`, ignore tasks that +exited -1, since that's what happens when you stop the replication controller. + +Happy flake hunting! + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/flaky-tests.md?pixel)]() + diff --git a/contributors/devel/generating-clientset.md b/contributors/devel/generating-clientset.md new file mode 100644 index 00000000000..cbb6141ccaf --- /dev/null +++ b/contributors/devel/generating-clientset.md @@ -0,0 +1,41 @@ +# Generation and release cycle of clientset + +Client-gen is an automatic tool that generates [clientset](../../docs/proposals/client-package-structure.md#high-level-client-sets) based on API types. This doc introduces the use the client-gen, and the release cycle of the generated clientsets. + +## Using client-gen + +The workflow includes three steps: + +1. Marking API types with tags: in `pkg/apis/${GROUP}/${VERSION}/types.go`, mark the types (e.g., Pods) that you want to generate clients for with the `// +genclient=true` tag. If the resource associated with the type is not namespace scoped (e.g., PersistentVolume), you need to append the `nonNamespaced=true` tag as well. + +2. + - a. If you are developing in the k8s.io/kubernetes repository, you just need to run hack/update-codegen.sh. + + - b. If you are running client-gen outside of k8s.io/kubernetes, you need to use the command line argument `--input` to specify the groups and versions of the APIs you want to generate clients for, client-gen will then look into `pkg/apis/${GROUP}/${VERSION}/types.go` and generate clients for the types you have marked with the `genclient` tags. For example, to generated a clientset named "my_release" including clients for api/v1 objects and extensions/v1beta1 objects, you need to run: + +``` +$ client-gen --input="api/v1,extensions/v1beta1" --clientset-name="my_release" +``` + +3. ***Adding expansion methods***: client-gen only generates the common methods, such as CRUD. You can manually add additional methods through the expansion interface. For example, this [file](../../pkg/client/clientset_generated/release_1_5/typed/core/v1/pod_expansion.go) adds additional methods to Pod's client. As a convention, we put the expansion interface and its methods in file ${TYPE}_expansion.go. In most cases, you don't want to remove existing expansion files. So to make life easier, instead of creating a new clientset from scratch, ***you can copy and rename an existing clientset (so that all the expansion files are copied)***, and then run client-gen. + +## Output of client-gen + +- clientset: the clientset will be generated at `pkg/client/clientset_generated/` by default, and you can change the path via the `--clientset-path` command line argument. + +- Individual typed clients and client for group: They will be generated at `pkg/client/clientset_generated/${clientset_name}/typed/generated/${GROUP}/${VERSION}/` + +## Released clientsets + +If you are contributing code to k8s.io/kubernetes, try to use the release_X_Y clientset in this [directory](../../pkg/client/clientset_generated/). + +If you need a stable Go client to build your own project, please refer to the [client-go repository](https://github.com/kubernetes/client-go). + +We are migrating k8s.io/kubernetes to use client-go as well, see issue [#35159](https://github.com/kubernetes/kubernetes/issues/35159). + + [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/generating-clientset.md?pixel)]() + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/generating-clientset.md?pixel)]() + diff --git a/contributors/devel/getting-builds.md b/contributors/devel/getting-builds.md new file mode 100644 index 00000000000..86563390cec --- /dev/null +++ b/contributors/devel/getting-builds.md @@ -0,0 +1,52 @@ +# Getting Kubernetes Builds + +You can use [hack/get-build.sh](http://releases.k8s.io/HEAD/hack/get-build.sh) +to get a build or to use as a reference on how to get the most recent builds +with curl. With `get-build.sh` you can grab the most recent stable build, the +most recent release candidate, or the most recent build to pass our ci and gce +e2e tests (essentially a nightly build). + +Run `./hack/get-build.sh -h` for its usage. + +To get a build at a specific version (v1.1.1) use: + +```console +./hack/get-build.sh v1.1.1 +``` + +To get the latest stable release: + +```console +./hack/get-build.sh release/stable +``` + +Use the "-v" option to print the version number of a build without retrieving +it. For example, the following prints the version number for the latest ci +build: + +```console +./hack/get-build.sh -v ci/latest +``` + +You can also use the gsutil tool to explore the Google Cloud Storage release +buckets. Here are some examples: + +```sh +gsutil cat gs://kubernetes-release-dev/ci/latest.txt # output the latest ci version number +gsutil cat gs://kubernetes-release-dev/ci/latest-green.txt # output the latest ci version number that passed gce e2e +gsutil ls gs://kubernetes-release-dev/ci/v0.20.0-29-g29a55cc/ # list the contents of a ci release +gsutil ls gs://kubernetes-release/release # list all official releases and rcs +``` + +## Install `gsutil` + +Example installation: + +```console +$ curl -sSL https://storage.googleapis.com/pub/gsutil.tar.gz | sudo tar -xz -C /usr/local/src +$ sudo ln -s /usr/local/src/gsutil/gsutil /usr/bin/gsutil +``` + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/getting-builds.md?pixel)]() + diff --git a/contributors/devel/git_workflow.png b/contributors/devel/git_workflow.png new file mode 100644 index 00000000000..80a66248fb8 Binary files /dev/null and b/contributors/devel/git_workflow.png differ diff --git a/contributors/devel/go-code.md b/contributors/devel/go-code.md new file mode 100644 index 00000000000..2af055f4d3a --- /dev/null +++ b/contributors/devel/go-code.md @@ -0,0 +1,32 @@ +# Kubernetes Go Tools and Tips + +Kubernetes is one of the largest open source Go projects, so good tooling a solid understanding of +Go is critical to Kubernetes development. This document provides a collection of resources, tools +and tips that our developers have found useful. + +## Recommended Reading + +- [Kubernetes Go development environment](development.md#go-development-environment) +- [The Go Spec](https://golang.org/ref/spec) - The Go Programming Language + Specification. +- [Go Tour](https://tour.golang.org/welcome/2) - Official Go tutorial. +- [Effective Go](https://golang.org/doc/effective_go.html) - A good collection of Go advice. +- [Kubernetes Code conventions](coding-conventions.md) - Style guide for Kubernetes code. +- [Three Go Landmines](https://gist.github.com/lavalamp/4bd23295a9f32706a48f) - Surprising behavior in the Go language. These have caused real bugs! + +## Recommended Tools + +- [godep](https://github.com/tools/godep) - Used for Kubernetes dependency management. See also [Kubernetes godep and dependency management](development.md#godep-and-dependency-management) +- [Go Version Manager](https://github.com/moovweb/gvm) - A handy tool for managing Go versions. +- [godepq](https://github.com/google/godepq) - A tool for analyzing go import trees. + +## Go Tips + +- [Godoc bookmarklet](https://gist.github.com/timstclair/c891fb8aeb24d663026371d91dcdb3fc) - navigate from a github page to the corresponding godoc page. +- Consider making a separate Go tree for each project, which can make overlapping dependency management much easier. Remember to set the `$GOPATH` correctly! Consider [scripting](https://gist.github.com/timstclair/17ca792a20e0d83b06dddef7d77b1ea0) this. +- Emacs users - setup [go-mode](https://github.com/dominikh/go-mode.el) + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/go-code.md?pixel)]() + diff --git a/contributors/devel/godep.md b/contributors/devel/godep.md new file mode 100644 index 00000000000..ddd6c5b1484 --- /dev/null +++ b/contributors/devel/godep.md @@ -0,0 +1,123 @@ +# Using godep to manage dependencies + +This document is intended to show a way for managing `vendor/` tree dependencies +in Kubernetes. If you are not planning on managing `vendor` dependencies go here +[Godep dependency management](development.md#godep-dependency-management). + +## Alternate GOPATH for installing and using godep + +There are many ways to build and host Go binaries. Here is one way to get +utilities like `godep` installed: + +Create a new GOPATH just for your go tools and install godep: + +```sh +export GOPATH=$HOME/go-tools +mkdir -p $GOPATH +go get -u github.com/tools/godep +``` + +Add this $GOPATH/bin to your path. Typically you'd add this to your ~/.profile: + +```sh +export GOPATH=$HOME/go-tools +export PATH=$PATH:$GOPATH/bin +``` + +## Using godep + +Here's a quick walkthrough of one way to use godeps to add or update a +Kubernetes dependency into `vendor/`. For more details, please see the +instructions in [godep's documentation](https://github.com/tools/godep). + +1) Devote a directory to this endeavor: + +_Devoting a separate directory is not strictly required, but it is helpful to +separate dependency updates from other changes._ + +```sh +export KPATH=$HOME/code/kubernetes +mkdir -p $KPATH/src/k8s.io +cd $KPATH/src/k8s.io +git clone https://github.com/$YOUR_GITHUB_USERNAME/kubernetes.git # assumes your fork is 'kubernetes' +# Or copy your existing local repo here. IMPORTANT: making a symlink doesn't work. +``` + +2) Set up your GOPATH. + +```sh +# This will *not* let your local builds see packages that exist elsewhere on your system. +export GOPATH=$KPATH +``` + +3) Populate your new GOPATH. + +```sh +cd $KPATH/src/k8s.io/kubernetes +godep restore +``` + +4) Next, you can either add a new dependency or update an existing one. + +To add a new dependency is simple (if a bit slow): + +```sh +cd $KPATH/src/k8s.io/kubernetes +DEP=example.com/path/to/dependency +godep get $DEP/... +# Now change code in Kubernetes to use the dependency. +./hack/godep-save.sh +``` + +To update an existing dependency is a bit more complicated. Godep has an +`update` command, but none of us can figure out how to actually make it work. +Instead, this procedure seems to work reliably: + +```sh +cd $KPATH/src/k8s.io/kubernetes +DEP=example.com/path/to/dependency +# NB: For the next step, $DEP is assumed be the repo root. If it is actually a +# subdir of the repo, use the repo root here. This is required to keep godep +# from getting angry because `godep restore` left the tree in a "detached head" +# state. +rm -rf $KPATH/src/$DEP # repo root +godep get $DEP/... +# Change code in Kubernetes, if necessary. +rm -rf Godeps +rm -rf vendor +./hack/godep-save.sh +git checkout -- $(git status -s | grep "^ D" | awk '{print $2}' | grep ^Godeps) +``` + +_If `go get -u path/to/dependency` fails with compilation errors, instead try +`go get -d -u path/to/dependency` to fetch the dependencies without compiling +them. This is unusual, but has been observed._ + +After all of this is done, `git status` should show you what files have been +modified and added/removed. Make sure to `git add` and `git rm` them. It is +commonly advised to make one `git commit` which includes just the dependency +update and Godeps files, and another `git commit` that includes changes to +Kubernetes code to use the new/updated dependency. These commits can go into a +single pull request. + +5) Before sending your PR, it's a good idea to sanity check that your +Godeps.json file and the contents of `vendor/ `are ok by running `hack/verify-godeps.sh` + +_If `hack/verify-godeps.sh` fails after a `godep update`, it is possible that a +transitive dependency was added or removed but not updated by godeps. It then +may be necessary to perform a `hack/godep-save.sh` to pick up the transitive +dependency changes._ + +It is sometimes expedient to manually fix the /Godeps/Godeps.json file to +minimize the changes. However without great care this can lead to failures +with `hack/verify-godeps.sh`. This must pass for every PR. + +6) If you updated the Godeps, please also update `Godeps/LICENSES` by running +`hack/update-godep-licenses.sh`. + + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/godep.md?pixel)]() + diff --git a/contributors/devel/gubernator-images/filterpage.png b/contributors/devel/gubernator-images/filterpage.png new file mode 100644 index 00000000000..2d08bd8e1b6 Binary files /dev/null and b/contributors/devel/gubernator-images/filterpage.png differ diff --git a/contributors/devel/gubernator-images/filterpage1.png b/contributors/devel/gubernator-images/filterpage1.png new file mode 100644 index 00000000000..838cb0fa707 Binary files /dev/null and b/contributors/devel/gubernator-images/filterpage1.png differ diff --git a/contributors/devel/gubernator-images/filterpage2.png b/contributors/devel/gubernator-images/filterpage2.png new file mode 100644 index 00000000000..63da782e1c0 Binary files /dev/null and b/contributors/devel/gubernator-images/filterpage2.png differ diff --git a/contributors/devel/gubernator-images/filterpage3.png b/contributors/devel/gubernator-images/filterpage3.png new file mode 100644 index 00000000000..33066d78662 Binary files /dev/null and b/contributors/devel/gubernator-images/filterpage3.png differ diff --git a/contributors/devel/gubernator-images/skipping1.png b/contributors/devel/gubernator-images/skipping1.png new file mode 100644 index 00000000000..a5dea440e63 Binary files /dev/null and b/contributors/devel/gubernator-images/skipping1.png differ diff --git a/contributors/devel/gubernator-images/skipping2.png b/contributors/devel/gubernator-images/skipping2.png new file mode 100644 index 00000000000..b133347e422 Binary files /dev/null and b/contributors/devel/gubernator-images/skipping2.png differ diff --git a/contributors/devel/gubernator-images/testfailures.png b/contributors/devel/gubernator-images/testfailures.png new file mode 100644 index 00000000000..1b331248b82 Binary files /dev/null and b/contributors/devel/gubernator-images/testfailures.png differ diff --git a/contributors/devel/gubernator.md b/contributors/devel/gubernator.md new file mode 100644 index 00000000000..3fd2e445c85 --- /dev/null +++ b/contributors/devel/gubernator.md @@ -0,0 +1,142 @@ +# Gubernator + +*This document is oriented at developers who want to use Gubernator to debug while developing for Kubernetes.* + + + +- [Gubernator](#gubernator) + - [What is Gubernator?](#what-is-gubernator) + - [Gubernator Features](#gubernator-features) + - [Test Failures list](#test-failures-list) + - [Log Filtering](#log-filtering) + - [Gubernator for Local Tests](#gubernator-for-local-tests) + - [Future Work](#future-work) + + + +## What is Gubernator? + +[Gubernator](https://k8s-gubernator.appspot.com/) is a webpage for viewing and filtering Kubernetes +test results. + +Gubernator simplifies the debugging proccess and makes it easier to track down failures by automating many +steps commonly taken in searching through logs, and by offering tools to filter through logs to find relevant lines. +Gubernator automates the steps of finding the failed tests, displaying relevant logs, and determining the +failed pods and the corresponing pod UID, namespace, and container ID. +It also allows for filtering of the log files to display relevant lines based on selected keywords, and +allows for multiple logs to be woven together by timestamp. + +Gubernator runs on Google App Engine and fetches logs stored on Google Cloud Storage. + +## Gubernator Features + +### Test Failures list + +Issues made by k8s-merge-robot will post a link to a page listing the failed tests. +Each failed test comes with the corresponding error log from a junit file and a link +to filter logs for that test. + +Based on the message logged in the junit file, the pod name may be displayed. + +![alt text](gubernator-images/testfailures.png) + +[Test Failures List Example](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gke/11721) + +### Log Filtering + +The log filtering page comes with checkboxes and textboxes to aid in filtering. Filtered keywords will be bolded +and lines including keywords will be highlighted. Up to four lines around the line of interest will also be displayed. + +![alt text](gubernator-images/filterpage.png) + +If less than 100 lines are skipped, the "... skipping xx lines ..." message can be clicked to expand and show +the hidden lines. + +Before expansion: +![alt text](gubernator-images/skipping1.png) +After expansion: +![alt text](gubernator-images/skipping2.png) + +If the pod name was displayed in the Test Failures list, it will automatically be included in the filters. +If it is not found in the error message, it can be manually entered into the textbox. Once a pod name +is entered, the Pod UID, Namespace, and ContainerID may be automatically filled in as well. These can be +altered as well. To apply the filter, check off the options corresponding to the filter. + +![alt text](gubernator-images/filterpage1.png) + +To add a filter, type the term to be filtered into the textbox labeled "Add filter:" and press enter. +Additional filters will be displayed as checkboxes under the textbox. + +![alt text](gubernator-images/filterpage3.png) + +To choose which logs to view check off the checkboxes corresponding to the logs of interest. If multiple logs are +included, the "Weave by timestamp" option can weave the selected logs together based on the timestamp in each line. + +![alt text](gubernator-images/filterpage2.png) + +[Log Filtering Example 1](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubelet-gce-e2e-ci/5535/nodelog?pod=pod-configmaps-b5b876cb-3e1e-11e6-8956-42010af0001d&junit=junit_03.xml&wrap=on&logfiles=%2Fkubernetes-jenkins%2Flogs%2Fkubelet-gce-e2e-ci%2F5535%2Fartifacts%2Ftmp-node-e2e-7a5a3b40-e2e-node-coreos-stable20160622-image%2Fkube-apiserver.log&logfiles=%2Fkubernetes-jenkins%2Flogs%2Fkubelet-gce-e2e-ci%2F5535%2Fartifacts%2Ftmp-node-e2e-7a5a3b40-e2e-node-coreos-stable20160622-image%2Fkubelet.log&UID=on&poduid=b5b8a59e-3e1e-11e6-b358-42010af0001d&ns=e2e-tests-configmap-oi12h&cID=tmp-node-e2e-7a5a3b40-e2e-node-coreos-stable20160622-image) + +[Log Filtering Example 2](https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gke/11721/nodelog?pod=client-containers-a53f813c-503e-11e6-88dd-0242ac110003&junit=junit_19.xml&wrap=on) + + +### Gubernator for Local Tests + +*Currently Gubernator can only be used with remote node e2e tests.* + +**NOTE: Using Gubernator with local tests will publically upload your test logs to Google Cloud Storage** + +To use Gubernator to view logs from local test runs, set the GUBERNATOR tag to true. +A URL link to view the test results will be printed to the console. +Please note that running with the Gubernator tag will bypass the user confirmation for uploading to GCS. + +```console + +$ make test-e2e-node REMOTE=true GUBERNATOR=true +... +================================================================ +Running gubernator.sh + +Gubernator linked below: +k8s-gubernator.appspot.com/build/yourusername-g8r-logs/logs/e2e-node/timestamp +``` + +The gubernator.sh script can be run after running a remote node e2e test for the same effect. + +```console +$ ./test/e2e_node/gubernator.sh +Do you want to run gubernator.sh and upload logs publicly to GCS? [y/n]y +... +Gubernator linked below: +k8s-gubernator.appspot.com/build/yourusername-g8r-logs/logs/e2e-node/timestamp +``` + +## Future Work + +Gubernator provides a framework for debugging failures and introduces useful features. +There is still a lot of room for more features and growth to make the debugging process more efficient. + +How to contribute (see https://github.com/kubernetes/test-infra/blob/master/gubernator/README.md) + +* Extend GUBERNATOR flag to all local tests + +* More accurate identification of pod name, container ID, etc. + * Change content of logged strings for failures to include more information + * Better regex in Gubernator + +* Automate discovery of more keywords + * Volume Name + * Disk Name + * Pod IP + +* Clickable API objects in the displayed lines in order to add them as filters + +* Construct story of pod's lifetime + * Have concise view of what a pod went through from when pod was started to failure + +* Improve UI + * Have separate folders of logs in rows instead of in one long column + * Improve interface for adding additional features (maybe instead of textbox and checkbox, have chips) + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/gubernator.md?pixel)]() + diff --git a/contributors/devel/how-to-doc.md b/contributors/devel/how-to-doc.md new file mode 100644 index 00000000000..891969d7885 --- /dev/null +++ b/contributors/devel/how-to-doc.md @@ -0,0 +1,205 @@ +# Document Conventions + +Updated: 11/3/2015 + +*This document is oriented at users and developers who want to write documents +for Kubernetes.* + +**Table of Contents** + + +- [Document Conventions](#document-conventions) + - [General Concepts](#general-concepts) + - [How to Get a Table of Contents](#how-to-get-a-table-of-contents) + - [How to Write Links](#how-to-write-links) + - [How to Include an Example](#how-to-include-an-example) + - [Misc.](#misc) + - [Code formatting](#code-formatting) + - [Syntax Highlighting](#syntax-highlighting) + - [Headings](#headings) + - [What Are Mungers?](#what-are-mungers) + - [Auto-added Mungers](#auto-added-mungers) + - [Generate Analytics](#generate-analytics) +- [Generated documentation](#generated-documentation) + + + +## General Concepts + +Each document needs to be munged to ensure its format is correct, links are +valid, etc. To munge a document, simply run `hack/update-munge-docs.sh`. We +verify that all documents have been munged using `hack/verify-munge-docs.sh`. +The scripts for munging documents are called mungers, see the +[mungers section](#what-are-mungers) below if you're curious about how mungers +are implemented or if you want to write one. + +## How to Get a Table of Contents + +Instead of writing table of contents by hand, insert the following code in your +md file: + +``` + + +``` + +After running `hack/update-munge-docs.sh`, you'll see a table of contents +generated for you, layered based on the headings. + +## How to Write Links + +It's important to follow the rules when writing links. It helps us correctly +versionize documents for each release. + +Use inline links instead of urls at all times. When you add internal links to +`docs/` or `examples/`, use relative links; otherwise, use +`http://releases.k8s.io/HEAD/`. For example, avoid using: + +``` +[GCE](https://github.com/kubernetes/kubernetes/blob/master/docs/getting-started-guides/gce.md) # note that it's under docs/ +[Kubernetes package](../../pkg/) # note that it's under pkg/ +http://kubernetes.io/ # external link +``` + +Instead, use: + +``` +[GCE](../getting-started-guides/gce.md) # note that it's under docs/ +[Kubernetes package](http://releases.k8s.io/HEAD/pkg/) # note that it's under pkg/ +[Kubernetes](http://kubernetes.io/) # external link +``` + +The above example generates the following links: +[GCE](../getting-started-guides/gce.md), +[Kubernetes package](http://releases.k8s.io/HEAD/pkg/), and +[Kubernetes](http://kubernetes.io/). + +## How to Include an Example + +While writing examples, you may want to show the content of certain example +files (e.g. [pod.yaml](../../test/fixtures/doc-yaml/user-guide/pod.yaml)). In this case, insert the +following code in the md file: + +``` + + +``` + +Note that you should replace `path/to/file` with the relative path to the +example file. Then `hack/update-munge-docs.sh` will generate a code block with +the content of the specified file, and a link to download it. This way, you save +the time to do the copy-and-paste; what's better, the content won't become +out-of-date every time you update the example file. + +For example, the following: + +``` + + +``` + +generates the following after `hack/update-munge-docs.sh`: + + + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: nginx + labels: + app: nginx +spec: + containers: + - name: nginx + image: nginx + ports: + - containerPort: 80 +``` + +[Download example](../../test/fixtures/doc-yaml/user-guide/pod.yaml?raw=true) + + +## Misc. + +### Code formatting + +Wrap a span of code with single backticks (`` ` ``). To format multiple lines of +code as its own code block, use triple backticks (```` ``` ````). + +### Syntax Highlighting + +Adding syntax highlighting to code blocks improves readability. To do so, in +your fenced block, add an optional language identifier. Some useful identifier +includes `yaml`, `console` (for console output), and `sh` (for shell quote +format). Note that in a console output, put `$ ` at the beginning of each +command and put nothing at the beginning of the output. Here's an example of +console code block: + +``` +```console + +$ kubectl create -f test/fixtures/doc-yaml/user-guide/pod.yaml +pod "foo" created + +```  +``` + +which renders as: + +```console +$ kubectl create -f test/fixtures/doc-yaml/user-guide/pod.yaml +pod "foo" created +``` + +### Headings + +Add a single `#` before the document title to create a title heading, and add +`##` to the next level of section title, and so on. Note that the number of `#` +will determine the size of the heading. + +## What Are Mungers? + +Mungers are like gofmt for md docs which we use to format documents. To use it, +simply place + +``` + + +``` + +in your md files. Note that xxxx is the placeholder for a specific munger. +Appropriate content will be generated and inserted between two brackets after +you run `hack/update-munge-docs.sh`. See +[munger document](http://releases.k8s.io/HEAD/cmd/mungedocs/) for more details. + +## Auto-added Mungers + +After running `hack/update-munge-docs.sh`, you may see some code / mungers in +your md file that are auto-added. You don't have to add them manually. It's +recommended to just read this section as a reference instead of messing up with +the following mungers. + +### Generate Analytics + +ANALYTICS munger inserts a Google Anaylytics link for this page. + +``` + + +``` + +# Generated documentation + +Some documents can be generated automatically. Run `hack/generate-docs.sh` to +populate your repository with these generated documents, and a list of the files +it generates is placed in `.generated_docs`. To reduce merge conflicts, we do +not want to check these documents in; however, to make the link checker in the +munger happy, we check in a placeholder. `hack/update-generated-docs.sh` puts a +placeholder in the location where each generated document would go, and +`hack/verify-generated-docs.sh` verifies that the placeholder is in place. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/how-to-doc.md?pixel)]() + diff --git a/contributors/devel/instrumentation.md b/contributors/devel/instrumentation.md new file mode 100644 index 00000000000..b73221a942b --- /dev/null +++ b/contributors/devel/instrumentation.md @@ -0,0 +1,52 @@ +## Instrumenting Kubernetes with a new metric + +The following is a step-by-step guide for adding a new metric to the Kubernetes +code base. + +We use the Prometheus monitoring system's golang client library for +instrumenting our code. Once you've picked out a file that you want to add a +metric to, you should: + +1. Import "github.com/prometheus/client_golang/prometheus". + +2. Create a top-level var to define the metric. For this, you have to: + + 1. Pick the type of metric. Use a Gauge for things you want to set to a +particular value, a Counter for things you want to increment, or a Histogram or +Summary for histograms/distributions of values (typically for latency). +Histograms are better if you're going to aggregate the values across jobs, while +summaries are better if you just want the job to give you a useful summary of +the values. + 2. Give the metric a name and description. + 3. Pick whether you want to distinguish different categories of things using +labels on the metric. If so, add "Vec" to the name of the type of metric you +want and add a slice of the label names to the definition. + + https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L53 + https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/kubelet/metrics/metrics.go#L31 + +3. Register the metric so that prometheus will know to export it. + + https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/kubelet/metrics/metrics.go#L74 + https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L78 + +4. Use the metric by calling the appropriate method for your metric type (Set, +Inc/Add, or Observe, respectively for Gauge, Counter, or Histogram/Summary), +first calling WithLabelValues if your metric has any labels + + https://github.com/kubernetes/kubernetes/blob/3ce7fe8310ff081dbbd3d95490193e1d5250d2c9/pkg/kubelet/kubelet.go#L1384 + https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L87 + + +These are the metric type definitions if you're curious to learn about them or +need more information: + +https://github.com/prometheus/client_golang/blob/master/prometheus/gauge.go +https://github.com/prometheus/client_golang/blob/master/prometheus/counter.go +https://github.com/prometheus/client_golang/blob/master/prometheus/histogram.go +https://github.com/prometheus/client_golang/blob/master/prometheus/summary.go + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/instrumentation.md?pixel)]() + diff --git a/contributors/devel/issues.md b/contributors/devel/issues.md new file mode 100644 index 00000000000..fe9e94d9753 --- /dev/null +++ b/contributors/devel/issues.md @@ -0,0 +1,59 @@ +## GitHub Issues for the Kubernetes Project + +A quick overview of how we will review and prioritize incoming issues at +https://github.com/kubernetes/kubernetes/issues + +### Priorities + +We use GitHub issue labels for prioritization. The absence of a priority label +means the bug has not been reviewed and prioritized yet. + +We try to apply these priority labels consistently across the entire project, +but if you notice an issue that you believe to be incorrectly prioritized, +please do let us know and we will evaluate your counter-proposal. + +- **priority/P0**: Must be actively worked on as someone's top priority right +now. Stuff is burning. If it's not being actively worked on, someone is expected +to drop what they're doing immediately to work on it. Team leaders are +responsible for making sure that all P0's in their area are being actively +worked on. Examples include user-visible bugs in core features, broken builds or +tests and critical security issues. + +- **priority/P1**: Must be staffed and worked on either currently, or very soon, +ideally in time for the next release. + +- **priority/P2**: There appears to be general agreement that this would be good +to have, but we may not have anyone available to work on it right now or in the +immediate future. Community contributions would be most welcome in the mean time +(although it might take a while to get them reviewed if reviewers are fully +occupied with higher priority issues, for example immediately before a release). + +- **priority/P3**: Possibly useful, but not yet enough support to actually get +it done. These are mostly place-holders for potentially good ideas, so that they +don't get completely forgotten, and can be referenced/deduped every time they +come up. + +### Milestones + +We additionally use milestones, based on minor version, for determining if a bug +should be fixed for the next release. These milestones will be especially +scrutinized as we get to the weeks just before a release. We can release a new +version of Kubernetes once they are empty. We will have two milestones per minor +release. + +- **vX.Y**: The list of bugs that will be merged for that milestone once ready. + +- **vX.Y-candidate**: The list of bug that we might merge for that milestone. A +bug shouldn't be in this milestone for more than a day or two towards the end of +a milestone. It should be triaged either into vX.Y, or moved out of the release +milestones. + +The above priority scheme still applies. P0 and P1 issues are work we feel must +get done before release. P2 and P3 issues are work we would merge into the +release if it gets done, but we wouldn't block the release on it. A few days +before release, we will probably move all P2 and P3 bugs out of that milestone +in bulk. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/issues.md?pixel)]() + diff --git a/contributors/devel/kubectl-conventions.md b/contributors/devel/kubectl-conventions.md new file mode 100644 index 00000000000..1e94b3ba53b --- /dev/null +++ b/contributors/devel/kubectl-conventions.md @@ -0,0 +1,411 @@ +# Kubectl Conventions + +Updated: 8/27/2015 + +**Table of Contents** + + +- [Kubectl Conventions](#kubectl-conventions) + - [Principles](#principles) + - [Command conventions](#command-conventions) + - [Create commands](#create-commands) + - [Rules for extending special resource alias - "all"](#rules-for-extending-special-resource-alias---all) + - [Flag conventions](#flag-conventions) + - [Output conventions](#output-conventions) + - [Documentation conventions](#documentation-conventions) + - [Command implementation conventions](#command-implementation-conventions) + - [Generators](#generators) + + + +## Principles + +* Strive for consistency across commands + +* Explicit should always override implicit + + * Environment variables should override default values + + * Command-line flags should override default values and environment variables + + * `--namespace` should also override the value specified in a specified +resource + +## Command conventions + +* Command names are all lowercase, and hyphenated if multiple words. + +* kubectl VERB NOUNs for commands that apply to multiple resource types. + +* Command itself should not have built-in aliases. + +* NOUNs may be specified as `TYPE name1 name2` or `TYPE/name1 TYPE/name2` or +`TYPE1,TYPE2,TYPE3/name1`; TYPE is omitted when only a single type is expected. + +* Resource types are all lowercase, with no hyphens; both singular and plural +forms are accepted. + +* NOUNs may also be specified by one or more file arguments: `-f file1 -f file2 +...` + +* Resource types may have 2- or 3-letter aliases. + +* Business logic should be decoupled from the command framework, so that it can +be reused independently of kubectl, cobra, etc. + * Ideally, commonly needed functionality would be implemented server-side in +order to avoid problems typical of "fat" clients and to make it readily +available to non-Go clients. + +* Commands that generate resources, such as `run` or `expose`, should obey +specific conventions, see [generators](#generators). + +* A command group (e.g., `kubectl config`) may be used to group related +non-standard commands, such as custom generators, mutations, and computations. + + +### Create commands + +`kubectl create ` commands fill the gap between "I want to try +Kubernetes, but I don't know or care what gets created" (`kubectl run`) and "I +want to create exactly this" (author yaml and run `kubectl create -f`). They +provide an easy way to create a valid object without having to know the vagaries +of particular kinds, nested fields, and object key typos that are ignored by the +yaml/json parser. Because editing an already created object is easier than +authoring one from scratch, these commands only need to have enough parameters +to create a valid object and set common immutable fields. It should default as +much as is reasonably possible. Once that valid object is created, it can be +further manipulated using `kubectl edit` or the eventual `kubectl set` commands. + +`kubectl create ` commands help in cases where you need +to perform non-trivial configuration generation/transformation tailored for a +common use case. `kubectl create secret` is a good example, there's a `generic` +flavor with keys mapping to files, then there's a `docker-registry` flavor that +is tailored for creating an image pull secret, and there's a `tls` flavor for +creating tls secrets. You create these as separate commands to get distinct +flags and separate help that is tailored for the particular usage. + + +### Rules for extending special resource alias - "all" + +Here are the rules to add a new resource to the `kubectl get all` output. + +* No cluster scoped resources + +* No namespace admin level resources (limits, quota, policy, authorization +rules) + +* No resources that are potentially unrecoverable (secrets and pvc) + +* Resources that are considered "similar" to #3 should be grouped +the same (configmaps) + + +## Flag conventions + +* Flags are all lowercase, with words separated by hyphens + +* Flag names and single-character aliases should have the same meaning across +all commands + +* Flag descriptions should start with an uppercase letter and not have a +period at the end of a sentence + +* Command-line flags corresponding to API fields should accept API enums +exactly (e.g., `--restart=Always`) + +* Do not reuse flags for different semantic purposes, and do not use different +flag names for the same semantic purpose -- grep for `"Flags()"` before adding a +new flag + +* Use short flags sparingly, only for the most frequently used options, prefer +lowercase over uppercase for the most common cases, try to stick to well known +conventions for UNIX commands and/or Docker, where they exist, and update this +list when adding new short flags + + * `-f`: Resource file + * also used for `--follow` in `logs`, but should be deprecated in favor of `-F` + * `-n`: Namespace scope + * `-l`: Label selector + * also used for `--labels` in `expose`, but should be deprecated + * `-L`: Label columns + * `-c`: Container + * also used for `--client` in `version`, but should be deprecated + * `-i`: Attach stdin + * `-t`: Allocate TTY + * `-w`: Watch (currently also used for `--www` in `proxy`, but should be deprecated) + * `-p`: Previous + * also used for `--pod` in `exec`, but deprecated + * also used for `--patch` in `patch`, but should be deprecated + * also used for `--port` in `proxy`, but should be deprecated + * `-P`: Static file prefix in `proxy`, but should be deprecated + * `-r`: Replicas + * `-u`: Unix socket + * `-v`: Verbose logging level + + +* `--dry-run`: Don't modify the live state; simulate the mutation and display +the output. All mutations should support it. + +* `--local`: Don't contact the server; just do local read, transformation, +generation, etc., and display the output + +* `--output-version=...`: Convert the output to a different API group/version + +* `--short`: Output a compact summary of normal output; the format is subject +to change and is optimizied for reading not parsing. + +* `--validate`: Validate the resource schema + +## Output conventions + +* By default, output is intended for humans rather than programs + * However, affordances are made for simple parsing of `get` output + +* Only errors should be directed to stderr + +* `get` commands should output one row per resource, and one resource per row + + * Column titles and values should not contain spaces in order to facilitate +commands that break lines into fields: cut, awk, etc. Instead, use `-` as the +word separator. + + * By default, `get` output should fit within about 80 columns + + * Eventually we could perhaps auto-detect width + * `-o wide` may be used to display additional columns + + + * The first column should be the resource name, titled `NAME` (may change this +to an abbreviation of resource type) + + * NAMESPACE should be displayed as the first column when --all-namespaces is +specified + + * The last default column should be time since creation, titled `AGE` + + * `-Lkey` should append a column containing the value of label with key `key`, +with `` if not present + + * json, yaml, Go template, and jsonpath template formats should be supported +and encouraged for subsequent processing + + * Users should use --api-version or --output-version to ensure the output +uses the version they expect + + +* `describe` commands may output on multiple lines and may include information +from related resources, such as events. Describe should add additional +information from related resources that a normal user may need to know - if a +user would always run "describe resource1" and the immediately want to run a +"get type2" or "describe resource2", consider including that info. Examples, +persistent volume claims for pods that reference claims, events for most +resources, nodes and the pods scheduled on them. When fetching related +resources, a targeted field selector should be used in favor of client side +filtering of related resources. + +* For fields that can be explicitly unset (booleans, integers, structs), the +output should say ``. Likewise, for arrays `` should be used; for +external IP, `` should be used; for load balancer, `` should be +used. Lastly `` should be used where unrecognized field type was +specified. + +* Mutations should output TYPE/name verbed by default, where TYPE is singular; +`-o name` may be used to just display TYPE/name, which may be used to specify +resources in other commands + +## Documentation conventions + +* Commands are documented using Cobra; docs are then auto-generated by +`hack/update-generated-docs.sh`. + + * Use should contain a short usage string for the most common use case(s), not +an exhaustive specification + + * Short should contain a one-line explanation of what the command does + * Short descriptions should start with an uppercase case letter and not + have a period at the end of a sentence + * Short descriptions should (if possible) start with a first person + (singular present tense) verb + + * Long may contain multiple lines, including additional information about +input, output, commonly used flags, etc. + * Long descriptions should use proper grammar, start with an uppercase + letter and have a period at the end of a sentence + + + * Example should contain examples + * Start commands with `$` + * A comment should precede each example command, and should begin with `#` + + +* Use "FILENAME" for filenames + +* Use "TYPE" for the particular flavor of resource type accepted by kubectl, +rather than "RESOURCE" or "KIND" + +* Use "NAME" for resource names + +## Command implementation conventions + +For every command there should be a `NewCmd` function that creates +the command and returns a pointer to a `cobra.Command`, which can later be added +to other parent commands to compose the structure tree. There should also be a +`Config` struct with a variable to every flag and argument declared +by the command (and any other variable required for the command to run). This +makes tests and mocking easier. The struct ideally exposes three methods: + +* `Complete`: Completes the struct fields with values that may or may not be +directly provided by the user, for example, by flags pointers, by the `args` +slice, by using the Factory, etc. + +* `Validate`: performs validation on the struct fields and returns appropriate +errors. + +* `Run`: runs the actual logic of the command, taking as assumption +that the struct is complete with all required values to run, and they are valid. + +Sample command skeleton: + +```go +// MineRecommendedName is the recommended command name for kubectl mine. +const MineRecommendedName = "mine" + +// Long command description and examples. +var ( + mineLong = templates.LongDesc(` + mine which is described here + with lots of details.`) + + mineExample = templates.Examples(` + # Run my command's first action + kubectl mine first_action + + # Run my command's second action on latest stuff + kubectl mine second_action --flag`) +) + +// MineConfig contains all the options for running the mine cli command. +type MineConfig struct { + mineLatest bool +} + +// NewCmdMine implements the kubectl mine command. +func NewCmdMine(parent, name string, f *cmdutil.Factory, out io.Writer) *cobra.Command { + opts := &MineConfig{} + + cmd := &cobra.Command{ + Use: fmt.Sprintf("%s [--latest]", name), + Short: "Run my command", + Long: mineLong, + Example: fmt.Sprintf(mineExample, parent+" "+name), + Run: func(cmd *cobra.Command, args []string) { + if err := opts.Complete(f, cmd, args, out); err != nil { + cmdutil.CheckErr(err) + } + if err := opts.Validate(); err != nil { + cmdutil.CheckErr(cmdutil.UsageError(cmd, err.Error())) + } + if err := opts.RunMine(); err != nil { + cmdutil.CheckErr(err) + } + }, + } + + cmd.Flags().BoolVar(&options.mineLatest, "latest", false, "Use latest stuff") + return cmd +} + +// Complete completes all the required options for mine. +func (o *MineConfig) Complete(f *cmdutil.Factory, cmd *cobra.Command, args []string, out io.Writer) error { + return nil +} + +// Validate validates all the required options for mine. +func (o MineConfig) Validate() error { + return nil +} + +// RunMine implements all the necessary functionality for mine. +func (o MineConfig) RunMine() error { + return nil +} +``` + +The `Run` method should contain the business logic of the command +and as noted in [command conventions](#command-conventions), ideally that logic +should exist server-side so any client could take advantage of it. Notice that +this is not a mandatory structure and not every command is implemented this way, +but this is a nice convention so try to be compliant with it. As an example, +have a look at how [kubectl logs](../../pkg/kubectl/cmd/logs.go) is implemented. + +## Generators + +Generators are kubectl commands that generate resources based on a set of inputs +(other resources, flags, or a combination of both). + +The point of generators is: + +* to enable users using kubectl in a scripted fashion to pin to a particular +behavior which may change in the future. Explicit use of a generator will always +guarantee that the expected behavior stays the same. + +* to enable potential expansion of the generated resources for scenarios other +than just creation, similar to how -f is supported for most general-purpose +commands. + +Generator commands shoud obey to the following conventions: + +* A `--generator` flag should be defined. Users then can choose between +different generators, if the command supports them (for example, `kubectl run` +currently supports generators for pods, jobs, replication controllers, and +deployments), or between different versions of a generator so that users +depending on a specific behavior may pin to that version (for example, `kubectl +expose` currently supports two different versions of a service generator). + +* Generation should be decoupled from creation. A generator should implement the +`kubectl.StructuredGenerator` interface and have no dependencies on cobra or the +Factory. See, for example, how the first version of the namespace generator is +defined: + +```go +// NamespaceGeneratorV1 supports stable generation of a namespace +type NamespaceGeneratorV1 struct { + // Name of namespace + Name string +} + +// Ensure it supports the generator pattern that uses parameters specified during construction +var _ StructuredGenerator = &NamespaceGeneratorV1{} + +// StructuredGenerate outputs a namespace object using the configured fields +func (g *NamespaceGeneratorV1) StructuredGenerate() (runtime.Object, error) { + if err := g.validate(); err != nil { + return nil, err + } + namespace := &api.Namespace{} + namespace.Name = g.Name + return namespace, nil +} + +// validate validates required fields are set to support structured generation +func (g *NamespaceGeneratorV1) validate() error { + if len(g.Name) == 0 { + return fmt.Errorf("name must be specified") + } + return nil +} +``` + +The generator struct (`NamespaceGeneratorV1`) holds the necessary fields for +namespace generation. It also satisfies the `kubectl.StructuredGenerator` +interface by implementing the `StructuredGenerate() (runtime.Object, error)` +method which configures the generated namespace that callers of the generator +(`kubectl create namespace` in our case) need to create. + +* `--dry-run` should output the resource that would be created, without +creating it. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/kubectl-conventions.md?pixel)]() + diff --git a/contributors/devel/kubemark-guide.md b/contributors/devel/kubemark-guide.md new file mode 100755 index 00000000000..e914226d946 --- /dev/null +++ b/contributors/devel/kubemark-guide.md @@ -0,0 +1,212 @@ +# Kubemark User Guide + +## Introduction + +Kubemark is a performance testing tool which allows users to run experiments on +simulated clusters. The primary use case is scalability testing, as simulated +clusters can be much bigger than the real ones. The objective is to expose +problems with the master components (API server, controller manager or +scheduler) that appear only on bigger clusters (e.g. small memory leaks). + +This document serves as a primer to understand what Kubemark is, what it is not, +and how to use it. + +## Architecture + +On a very high level Kubemark cluster consists of two parts: real master +components and a set of “Hollow” Nodes. The prefix “Hollow” means an +implementation/instantiation of a component with all “moving” parts mocked out. +The best example is HollowKubelet, which pretends to be an ordinary Kubelet, but +does not start anything, nor mount any volumes - it just lies it does. More +detailed design and implementation details are at the end of this document. + +Currently master components run on a dedicated machine(s), and HollowNodes run +on an ‘external’ Kubernetes cluster. This design has a slight advantage, over +running master components on external cluster, of completely isolating master +resources from everything else. + +## Requirements + +To run Kubemark you need a Kubernetes cluster (called `external cluster`) +for running all your HollowNodes and a dedicated machine for a master. +Master machine has to be directly routable from HollowNodes. You also need an +access to some Docker repository. + +Currently scripts are written to be easily usable by GCE, but it should be +relatively straightforward to port them to different providers or bare metal. + +## Common use cases and helper scripts + +Common workflow for Kubemark is: +- starting a Kubemark cluster (on GCE) +- running e2e tests on Kubemark cluster +- monitoring test execution and debugging problems +- turning down Kubemark cluster + +Included in descriptions there will be comments helpful for anyone who’ll want to +port Kubemark to different providers. + +### Starting a Kubemark cluster + +To start a Kubemark cluster on GCE you need to create an external kubernetes +cluster (it can be GCE, GKE or anything else) by yourself, make sure that kubeconfig +points to it by default, build a kubernetes release (e.g. by running +`make quick-release`) and run `test/kubemark/start-kubemark.sh` script. +This script will create a VM for master components, Pods for HollowNodes +and do all the setup necessary to let them talk to each other. It will use the +configuration stored in `cluster/kubemark/config-default.sh` - you can tweak it +however you want, but note that some features may not be implemented yet, as +implementation of Hollow components/mocks will probably be lagging behind ‘real’ +one. For performance tests interesting variables are `NUM_NODES` and +`MASTER_SIZE`. After start-kubemark script is finished you’ll have a ready +Kubemark cluster, a kubeconfig file for talking to the Kubemark cluster is +stored in `test/kubemark/kubeconfig.kubemark`. + +Currently we're running HollowNode with limit of 0.05 a CPU core and ~60MB or +memory, which taking into account default cluster addons and fluentD running on +an 'external' cluster, allows running ~17.5 HollowNodes per core. + +#### Behind the scene details: + +Start-kubemark script does quite a lot of things: + +- Creates a master machine called hollow-cluster-master and PD for it (*uses +gcloud, should be easy to do outside of GCE*) + +- Creates a firewall rule which opens port 443\* on the master machine (*uses +gcloud, should be easy to do outside of GCE*) + +- Builds a Docker image for HollowNode from the current repository and pushes it +to the Docker repository (*GCR for us, using scripts from +`cluster/gce/util.sh` - it may get tricky outside of GCE*) + +- Generates certificates and kubeconfig files, writes a kubeconfig locally to +`test/kubemark/kubeconfig.kubemark` and creates a Secret which stores kubeconfig for +HollowKubelet/HollowProxy use (*used gcloud to transfer files to Master, should +be easy to do outside of GCE*). + +- Creates a ReplicationController for HollowNodes and starts them up. (*will +work exactly the same everywhere as long as MASTER_IP will be populated +correctly, but you’ll need to update docker image address if you’re not using +GCR and default image name*) + +- Waits until all HollowNodes are in the Running phase (*will work exactly the +same everywhere*) + +\* Port 443 is a secured port on the master machine which is used for all +external communication with the API server. In the last sentence *external* +means all traffic coming from other machines, including all the Nodes, not only +from outside of the cluster. Currently local components, i.e. ControllerManager +and Scheduler talk with API server using insecure port 8080. + +### Running e2e tests on Kubemark cluster + +To run standard e2e test on your Kubemark cluster created in the previous step +you execute `test/kubemark/run-e2e-tests.sh` script. It will configure ginkgo to +use Kubemark cluster instead of something else and start an e2e test. This +script should not need any changes to work on other cloud providers. + +By default (if nothing will be passed to it) the script will run a Density '30 +test. If you want to run a different e2e test you just need to provide flags you want to be +passed to `hack/ginkgo-e2e.sh` script, e.g. `--ginkgo.focus="Load"` to run the +Load test. + +By default, at the end of each test, it will delete namespaces and everything +under it (e.g. events, replication controllers) on Kubemark master, which takes +a lot of time. Such work aren't needed in most cases: if you delete your +Kubemark cluster after running `run-e2e-tests.sh`; you don't care about +namespace deletion performance, specifically related to etcd; etc. There is a +flag that enables you to avoid namespace deletion: `--delete-namespace=false`. +Adding the flag should let you see in logs: `Found DeleteNamespace=false, +skipping namespace deletion!` + +### Monitoring test execution and debugging problems + +Run-e2e-tests prints the same output on Kubemark as on ordinary e2e cluster, but +if you need to dig deeper you need to learn how to debug HollowNodes and how +Master machine (currently) differs from the ordinary one. + +If you need to debug master machine you can do similar things as you do on your +ordinary master. The difference between Kubemark setup and ordinary setup is +that in Kubemark etcd is run as a plain docker container, and all master +components are run as normal processes. There’s no Kubelet overseeing them. Logs +are stored in exactly the same place, i.e. `/var/logs/` directory. Because +binaries are not supervised by anything they won't be restarted in the case of a +crash. + +To help you with debugging from inside the cluster startup script puts a +`~/configure-kubectl.sh` script on the master. It downloads `gcloud` and +`kubectl` tool and configures kubectl to work on unsecured master port (useful +if there are problems with security). After the script is run you can use +kubectl command from the master machine to play with the cluster. + +Debugging HollowNodes is a bit more tricky, as if you experience a problem on +one of them you need to learn which hollow-node pod corresponds to a given +HollowNode known by the Master. During self-registeration HollowNodes provide +their cluster IPs as Names, which means that if you need to find a HollowNode +named `10.2.4.5` you just need to find a Pod in external cluster with this +cluster IP. There’s a helper script +`test/kubemark/get-real-pod-for-hollow-node.sh` that does this for you. + +When you have a Pod name you can use `kubectl logs` on external cluster to get +logs, or use a `kubectl describe pod` call to find an external Node on which +this particular HollowNode is running so you can ssh to it. + +E.g. you want to see the logs of HollowKubelet on which pod `my-pod` is running. +To do so you can execute: + +``` +$ kubectl kubernetes/test/kubemark/kubeconfig.kubemark describe pod my-pod +``` + +Which outputs pod description and among it a line: + +``` +Node: 1.2.3.4/1.2.3.4 +``` + +To learn the `hollow-node` pod corresponding to node `1.2.3.4` you use +aforementioned script: + +``` +$ kubernetes/test/kubemark/get-real-pod-for-hollow-node.sh 1.2.3.4 +``` + +which will output the line: + +``` +hollow-node-1234 +``` + +Now you just use ordinary kubectl command to get the logs: + +``` +kubectl --namespace=kubemark logs hollow-node-1234 +``` + +All those things should work exactly the same on all cloud providers. + +### Turning down Kubemark cluster + +On GCE you just need to execute `test/kubemark/stop-kubemark.sh` script, which +will delete HollowNode ReplicationController and all the resources for you. On +other providers you’ll need to delete all this stuff by yourself. + +## Some current implementation details + +Kubemark master uses exactly the same binaries as ordinary Kubernetes does. This +means that it will never be out of date. On the other hand HollowNodes use +existing fake for Kubelet (called SimpleKubelet), which mocks its runtime +manager with `pkg/kubelet/dockertools/fake_manager.go`, where most logic sits. +Because there’s no easy way of mocking other managers (e.g. VolumeManager), they +are not supported in Kubemark (e.g. we can’t schedule Pods with volumes in them +yet). + +As the time passes more fakes will probably be plugged into HollowNodes, but +it’s crucial to make it as simple as possible to allow running a big number of +Hollows on a single core. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/kubemark-guide.md?pixel)]() + diff --git a/contributors/devel/local-cluster/docker.md b/contributors/devel/local-cluster/docker.md new file mode 100644 index 00000000000..78768f80687 --- /dev/null +++ b/contributors/devel/local-cluster/docker.md @@ -0,0 +1,269 @@ +**Stop. This guide has been superseded by [Minikube](https://github.com/kubernetes/minikube) which is the recommended method of running Kubernetes on your local machine.** + + +The following instructions show you how to set up a simple, single node Kubernetes cluster using Docker. + +Here's a diagram of what the final result will look like: + +![Kubernetes Single Node on Docker](k8s-singlenode-docker.png) + +## Prerequisites + +**Note: These steps have not been tested with the [Docker For Mac or Docker For Windows beta programs](https://blog.docker.com/2016/03/docker-for-mac-windows-beta/).** + +1. You need to have Docker version >= "1.10" installed on the machine. +2. Enable mount propagation. Hyperkube is running in a container which has to mount volumes for other containers, for example in case of persistent storage. The required steps depend on the init system. + + + In case of **systemd**, change MountFlags in the Docker unit file to shared. + + ```shell + DOCKER_CONF=$(systemctl cat docker | head -1 | awk '{print $2}') + sed -i.bak 's/^\(MountFlags=\).*/\1shared/' $DOCKER_CONF + systemctl daemon-reload + systemctl restart docker + ``` + + **Otherwise**, manually set the mount point used by Hyperkube to be shared: + + ```shell + mkdir -p /var/lib/kubelet + mount --bind /var/lib/kubelet /var/lib/kubelet + mount --make-shared /var/lib/kubelet + ``` + + +### Run it + +1. Decide which Kubernetes version to use. Set the `${K8S_VERSION}` variable to a version of Kubernetes >= "v1.2.0". + + + If you'd like to use the current **stable** version of Kubernetes, run the following: + + ```sh + export K8S_VERSION=$(curl -sS https://storage.googleapis.com/kubernetes-release/release/stable.txt) + ``` + + and for the **latest** available version (including unstable releases): + + ```sh + export K8S_VERSION=$(curl -sS https://storage.googleapis.com/kubernetes-release/release/latest.txt) + ``` + +2. Start Hyperkube + + ```shell + export ARCH=amd64 + docker run -d \ + --volume=/sys:/sys:rw \ + --volume=/var/lib/docker/:/var/lib/docker:rw \ + --volume=/var/lib/kubelet/:/var/lib/kubelet:rw,shared \ + --volume=/var/run:/var/run:rw \ + --net=host \ + --pid=host \ + --privileged \ + --name=kubelet \ + gcr.io/google_containers/hyperkube-${ARCH}:${K8S_VERSION} \ + /hyperkube kubelet \ + --hostname-override=127.0.0.1 \ + --api-servers=http://localhost:8080 \ + --config=/etc/kubernetes/manifests \ + --cluster-dns=10.0.0.10 \ + --cluster-domain=cluster.local \ + --allow-privileged --v=2 + ``` + + > Note that `--cluster-dns` and `--cluster-domain` is used to deploy dns, feel free to discard them if dns is not needed. + + > If you would like to mount an external device as a volume, add `--volume=/dev:/dev` to the command above. It may however, cause some problems described in [#18230](https://github.com/kubernetes/kubernetes/issues/18230) + + > Architectures other than `amd64` are experimental and sometimes unstable, but feel free to try them out! Valid values: `arm`, `arm64` and `ppc64le`. ARM is available with Kubernetes version `v1.3.0-alpha.2` and higher. ARM 64-bit and PowerPC 64 little-endian are available with `v1.3.0-alpha.3` and higher. Track progress on multi-arch support [here](https://github.com/kubernetes/kubernetes/issues/17981) + + > If you are behind a proxy, you need to pass the proxy setup to curl in the containers to pull the certificates. Create a .curlrc under /root folder (because the containers are running as root) with the following line: + + ``` + proxy = : + ``` + + This actually runs the kubelet, which in turn runs a [pod](http://kubernetes.io/docs/user-guide/pods/) that contains the other master components. + + ** **SECURITY WARNING** ** services exposed via Kubernetes using Hyperkube are available on the host node's public network interface / IP address. Because of this, this guide is not suitable for any host node/server that is directly internet accessible. Refer to [#21735](https://github.com/kubernetes/kubernetes/issues/21735) for additional info. + +### Download `kubectl` + +At this point you should have a running Kubernetes cluster. You can test it out +by downloading the kubectl binary for `${K8S_VERSION}` (in this example: `{{page.version}}.0`). + + +Downloads: + + - `linux/amd64`: http://storage.googleapis.com/kubernetes-release/release/{{page.version}}.0/bin/linux/amd64/kubectl + - `linux/386`: http://storage.googleapis.com/kubernetes-release/release/{{page.version}}.0/bin/linux/386/kubectl + - `linux/arm`: http://storage.googleapis.com/kubernetes-release/release/{{page.version}}.0/bin/linux/arm/kubectl + - `linux/arm64`: http://storage.googleapis.com/kubernetes-release/release/{{page.version}}.0/bin/linux/arm64/kubectl + - `linux/ppc64le`: http://storage.googleapis.com/kubernetes-release/release/{{page.version}}.0/bin/linux/ppc64le/kubectl + - `OS X/amd64`: http://storage.googleapis.com/kubernetes-release/release/{{page.version}}.0/bin/darwin/amd64/kubectl + - `OS X/386`: http://storage.googleapis.com/kubernetes-release/release/{{page.version}}.0/bin/darwin/386/kubectl + - `windows/amd64`: http://storage.googleapis.com/kubernetes-release/release/{{page.version}}.0/bin/windows/amd64/kubectl.exe + - `windows/386`: http://storage.googleapis.com/kubernetes-release/release/{{page.version}}.0/bin/windows/386/kubectl.exe + +The generic download path is: + +``` +http://storage.googleapis.com/kubernetes-release/release/${K8S_VERSION}/bin/${GOOS}/${GOARCH}/${K8S_BINARY} +``` + +An example install with `linux/amd64`: + +``` +curl -sSL "https://storage.googleapis.com/kubernetes-release/release/{{page.version}}.0/bin/linux/amd64/kubectl" > /usr/bin/kubectl +chmod +x /usr/bin/kubectl +``` + +On OS X, to make the API server accessible locally, setup a ssh tunnel. + +```shell +docker-machine ssh `docker-machine active` -N -L 8080:localhost:8080 +``` + +Setting up a ssh tunnel is applicable to remote docker hosts as well. + +(Optional) Create kubernetes cluster configuration: + +```shell +kubectl config set-cluster test-doc --server=http://localhost:8080 +kubectl config set-context test-doc --cluster=test-doc +kubectl config use-context test-doc +``` + +### Test it out + +List the nodes in your cluster by running: + +```shell +kubectl get nodes +``` + +This should print: + +```shell +NAME STATUS AGE +127.0.0.1 Ready 1h +``` + +### Run an application + +```shell +kubectl run nginx --image=nginx --port=80 +``` + +Now run `docker ps` you should see nginx running. You may need to wait a few minutes for the image to get pulled. + +### Expose it as a service + +```shell +kubectl expose deployment nginx --port=80 +``` + +Run the following command to obtain the cluster local IP of this service we just created: + +```shell{% raw %} +ip=$(kubectl get svc nginx --template={{.spec.clusterIP}}) +echo $ip +{% endraw %}``` + +Hit the webserver with this IP: + +```shell{% raw %} + +curl $ip +{% endraw %}``` + +On OS X, since docker is running inside a VM, run the following command instead: + +```shell +docker-machine ssh `docker-machine active` curl $ip +``` + +## Deploy a DNS + +Read [documentation for manually deploying a DNS](http://kubernetes.io/docs/getting-started-guides/docker-multinode/#deploy-dns-manually-for-v12x) for instructions. + +### Turning down your cluster + +1. Delete the nginx service and deployment: + +If you plan on re-creating your nginx deployment and service you will need to clean it up. + +```shell +kubectl delete service,deployments nginx +``` + +2. Delete all the containers including the kubelet: + +```shell +docker rm -f kubelet +docker rm -f `docker ps | grep k8s | awk '{print $1}'` +``` + +3. Cleanup the filesystem: + +On OS X, first ssh into the docker VM: + +```shell +docker-machine ssh `docker-machine active` +``` + +```shell +grep /var/lib/kubelet /proc/mounts | awk '{print $2}' | sudo xargs -n1 umount +sudo rm -rf /var/lib/kubelet +``` + +### Troubleshooting + +#### Node is in `NotReady` state + +If you see your node as `NotReady` it's possible that your OS does not have memcg enabled. + +1. Your kernel should support memory accounting. Ensure that the +following configs are turned on in your linux kernel: + +```shell +CONFIG_RESOURCE_COUNTERS=y +CONFIG_MEMCG=y +``` + +2. Enable the memory accounting in the kernel, at boot, as command line +parameters as follows: + +```shell +GRUB_CMDLINE_LINUX="cgroup_enable=memory=1" +``` + +NOTE: The above is specifically for GRUB2. +You can check the command line parameters passed to your kernel by looking at the +output of /proc/cmdline: + +```shell +$ cat /proc/cmdline +BOOT_IMAGE=/boot/vmlinuz-3.18.4-aufs root=/dev/sda5 ro cgroup_enable=memory=1 +``` + +## Support Level + + +IaaS Provider | Config. Mgmt | OS | Networking | Conforms | Support Level +-------------------- | ------------ | ------ | ---------- | ---------| ---------------------------- +Docker Single Node | custom | N/A | local | | Project ([@brendandburns](https://github.com/brendandburns)) + + + +## Further reading + +Please see the [Kubernetes docs](http://kubernetes.io/docs) for more details on administering +and using a Kubernetes cluster. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/local-cluster/docker.md?pixel)]() + diff --git a/contributors/devel/local-cluster/k8s-singlenode-docker.png b/contributors/devel/local-cluster/k8s-singlenode-docker.png new file mode 100644 index 00000000000..5ebf812682d Binary files /dev/null and b/contributors/devel/local-cluster/k8s-singlenode-docker.png differ diff --git a/contributors/devel/local-cluster/local.md b/contributors/devel/local-cluster/local.md new file mode 100644 index 00000000000..60bd5a8f08c --- /dev/null +++ b/contributors/devel/local-cluster/local.md @@ -0,0 +1,125 @@ +**Stop. This guide has been superseded by [Minikube](https://github.com/kubernetes/minikube) which is the recommended method of running Kubernetes on your local machine.** + +### Requirements + +#### Linux + +Not running Linux? Consider running Linux in a local virtual machine with [vagrant](https://www.vagrantup.com/), or on a cloud provider like Google Compute Engine + +#### Docker + +At least [Docker](https://docs.docker.com/installation/#installation) +1.8.3+. Ensure the Docker daemon is running and can be contacted (try `docker +ps`). Some of the Kubernetes components need to run as root, which normally +works fine with docker. + +#### etcd + +You need an [etcd](https://github.com/coreos/etcd/releases) in your path, please make sure it is installed and in your ``$PATH``. + +#### go + +You need [go](https://golang.org/doc/install) at least 1.4+ in your path, please make sure it is installed and in your ``$PATH``. + +### Starting the cluster + +First, you need to [download Kubernetes](http://kubernetes.io/docs/getting-started-guides/binary_release/). Then open a separate tab of your terminal +and run the following (since one needs sudo access to start/stop Kubernetes daemons, it is easier to run the entire script as root): + +```shell +cd kubernetes +hack/local-up-cluster.sh +``` + +This will build and start a lightweight local cluster, consisting of a master +and a single node. Type Control-C to shut it down. + +You can use the cluster/kubectl.sh script to interact with the local cluster. hack/local-up-cluster.sh will +print the commands to run to point kubectl at the local cluster. + + +### Running a container + +Your cluster is running, and you want to start running containers! + +You can now use any of the cluster/kubectl.sh commands to interact with your local setup. + +```shell +export KUBERNETES_PROVIDER=local +cluster/kubectl.sh get pods +cluster/kubectl.sh get services +cluster/kubectl.sh get deployments +cluster/kubectl.sh run my-nginx --image=nginx --replicas=2 --port=80 + +## begin wait for provision to complete, you can monitor the docker pull by opening a new terminal + sudo docker images + ## you should see it pulling the nginx image, once the above command returns it + sudo docker ps + ## you should see your container running! + exit +## end wait + +## create a service for nginx, which serves on port 80 +cluster/kubectl.sh expose deployment my-nginx --port=80 --name=my-nginx + +## introspect Kubernetes! +cluster/kubectl.sh get pods +cluster/kubectl.sh get services +cluster/kubectl.sh get deployments + +## Test the nginx service with the IP/port from "get services" command +curl http://10.X.X.X:80/ +``` + +### Running a user defined pod + +Note the difference between a [container](http://kubernetes.io/docs/user-guide/containers/) +and a [pod](http://kubernetes.io/docs/user-guide/pods/). Since you only asked for the former, Kubernetes will create a wrapper pod for you. +However you cannot view the nginx start page on localhost. To verify that nginx is running you need to run `curl` within the docker container (try `docker exec`). + +You can control the specifications of a pod via a user defined manifest, and reach nginx through your browser on the port specified therein: + +```shell +cluster/kubectl.sh create -f test/fixtures/doc-yaml/user-guide/pod.yaml +``` + +Congratulations! + +### FAQs + +#### I cannot reach service IPs on the network. + +Some firewall software that uses iptables may not interact well with +kubernetes. If you have trouble around networking, try disabling any +firewall or other iptables-using systems, first. Also, you can check +if SELinux is blocking anything by running a command such as `journalctl --since yesterday | grep avc`. + +By default the IP range for service cluster IPs is 10.0.*.* - depending on your +docker installation, this may conflict with IPs for containers. If you find +containers running with IPs in this range, edit hack/local-cluster-up.sh and +change the service-cluster-ip-range flag to something else. + +#### I changed Kubernetes code, how do I run it? + +```shell +cd kubernetes +hack/build-go.sh +hack/local-up-cluster.sh +``` + +#### kubectl claims to start a container but `get pods` and `docker ps` don't show it. + +One or more of the Kubernetes daemons might've crashed. Tail the [logs](http://kubernetes.io/docs/admin/cluster-troubleshooting/#looking-at-logs) of each in /tmp. + +```shell +$ ls /tmp/kube*.log +$ tail -f /tmp/kube-apiserver.log +``` + +#### The pods fail to connect to the services by host names + +The local-up-cluster.sh script doesn't start a DNS service. Similar situation can be found [here](http://issue.k8s.io/6667). You can start a manually. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/local-cluster/local.md?pixel)]() + diff --git a/contributors/devel/local-cluster/vagrant.md b/contributors/devel/local-cluster/vagrant.md new file mode 100644 index 00000000000..0f0fe91c0c5 --- /dev/null +++ b/contributors/devel/local-cluster/vagrant.md @@ -0,0 +1,397 @@ +Running Kubernetes with Vagrant (and VirtualBox) is an easy way to run/test/develop on your local machine (Linux, Mac OS X). + +### Prerequisites + +1. Install latest version >= 1.7.4 of [Vagrant](http://www.vagrantup.com/downloads.html) +2. Install one of: + 1. The latest version of [Virtual Box](https://www.virtualbox.org/wiki/Downloads) + 2. [VMWare Fusion](https://www.vmware.com/products/fusion/) version 5 or greater as well as the appropriate [Vagrant VMWare Fusion provider](https://www.vagrantup.com/vmware) + 3. [VMWare Workstation](https://www.vmware.com/products/workstation/) version 9 or greater as well as the [Vagrant VMWare Workstation provider](https://www.vagrantup.com/vmware) + 4. [Parallels Desktop](https://www.parallels.com/products/desktop/) version 9 or greater as well as the [Vagrant Parallels provider](https://parallels.github.io/vagrant-parallels/) + 5. libvirt with KVM and enable support of hardware virtualisation. [Vagrant-libvirt](https://github.com/pradels/vagrant-libvirt). For fedora provided official rpm, and possible to use `yum install vagrant-libvirt` + +### Setup + +Setting up a cluster is as simple as running: + +```sh +export KUBERNETES_PROVIDER=vagrant +curl -sS https://get.k8s.io | bash +``` + +Alternatively, you can download [Kubernetes release](https://github.com/kubernetes/kubernetes/releases) and extract the archive. To start your local cluster, open a shell and run: + +```sh +cd kubernetes + +export KUBERNETES_PROVIDER=vagrant +./cluster/kube-up.sh +``` + +The `KUBERNETES_PROVIDER` environment variable tells all of the various cluster management scripts which variant to use. If you forget to set this, the assumption is you are running on Google Compute Engine. + +By default, the Vagrant setup will create a single master VM (called kubernetes-master) and one node (called kubernetes-node-1). Each VM will take 1 GB, so make sure you have at least 2GB to 4GB of free memory (plus appropriate free disk space). + +If you'd like more than one node, set the `NUM_NODES` environment variable to the number you want: + +```sh +export NUM_NODES=3 +``` + +Vagrant will provision each machine in the cluster with all the necessary components to run Kubernetes. The initial setup can take a few minutes to complete on each machine. + +If you installed more than one Vagrant provider, Kubernetes will usually pick the appropriate one. However, you can override which one Kubernetes will use by setting the [`VAGRANT_DEFAULT_PROVIDER`](https://docs.vagrantup.com/v2/providers/default.html) environment variable: + +```sh +export VAGRANT_DEFAULT_PROVIDER=parallels +export KUBERNETES_PROVIDER=vagrant +./cluster/kube-up.sh +``` + +By default, each VM in the cluster is running Fedora. + +To access the master or any node: + +```sh +vagrant ssh master +vagrant ssh node-1 +``` + +If you are running more than one node, you can access the others by: + +```sh +vagrant ssh node-2 +vagrant ssh node-3 +``` + +Each node in the cluster installs the docker daemon and the kubelet. + +The master node instantiates the Kubernetes master components as pods on the machine. + +To view the service status and/or logs on the kubernetes-master: + +```console +[vagrant@kubernetes-master ~] $ vagrant ssh master +[vagrant@kubernetes-master ~] $ sudo su + +[root@kubernetes-master ~] $ systemctl status kubelet +[root@kubernetes-master ~] $ journalctl -ru kubelet + +[root@kubernetes-master ~] $ systemctl status docker +[root@kubernetes-master ~] $ journalctl -ru docker + +[root@kubernetes-master ~] $ tail -f /var/log/kube-apiserver.log +[root@kubernetes-master ~] $ tail -f /var/log/kube-controller-manager.log +[root@kubernetes-master ~] $ tail -f /var/log/kube-scheduler.log +``` + +To view the services on any of the nodes: + +```console +[vagrant@kubernetes-master ~] $ vagrant ssh node-1 +[vagrant@kubernetes-master ~] $ sudo su + +[root@kubernetes-master ~] $ systemctl status kubelet +[root@kubernetes-master ~] $ journalctl -ru kubelet + +[root@kubernetes-master ~] $ systemctl status docker +[root@kubernetes-master ~] $ journalctl -ru docker +``` + +### Interacting with your Kubernetes cluster with Vagrant. + +With your Kubernetes cluster up, you can manage the nodes in your cluster with the regular Vagrant commands. + +To push updates to new Kubernetes code after making source changes: + +```sh +./cluster/kube-push.sh +``` + +To stop and then restart the cluster: + +```sh +vagrant halt +./cluster/kube-up.sh +``` + +To destroy the cluster: + +```sh +vagrant destroy +``` + +Once your Vagrant machines are up and provisioned, the first thing to do is to check that you can use the `kubectl.sh` script. + +You may need to build the binaries first, you can do this with `make` + +```console +$ ./cluster/kubectl.sh get nodes + +NAME LABELS +10.245.1.4 +10.245.1.5 +10.245.1.3 +``` + +### Authenticating with your master + +When using the vagrant provider in Kubernetes, the `cluster/kubectl.sh` script will cache your credentials in a `~/.kubernetes_vagrant_auth` file so you will not be prompted for them in the future. + +```sh +cat ~/.kubernetes_vagrant_auth +``` + +```json +{ "User": "vagrant", + "Password": "vagrant", + "CAFile": "/home/k8s_user/.kubernetes.vagrant.ca.crt", + "CertFile": "/home/k8s_user/.kubecfg.vagrant.crt", + "KeyFile": "/home/k8s_user/.kubecfg.vagrant.key" +} +``` + +You should now be set to use the `cluster/kubectl.sh` script. For example try to list the nodes that you have started with: + +```sh +./cluster/kubectl.sh get nodes +``` + +### Running containers + +Your cluster is running, you can list the nodes in your cluster: + +```sh +$ ./cluster/kubectl.sh get nodes + +NAME LABELS +10.245.2.4 +10.245.2.3 +10.245.2.2 +``` + +Now start running some containers! + +You can now use any of the `cluster/kube-*.sh` commands to interact with your VM machines. +Before starting a container there will be no pods, services and replication controllers. + +```sh +$ ./cluster/kubectl.sh get pods +NAME READY STATUS RESTARTS AGE + +$ ./cluster/kubectl.sh get services +NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE + +$ ./cluster/kubectl.sh get replicationcontrollers +CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS +``` + +Start a container running nginx with a replication controller and three replicas + +```sh +$ ./cluster/kubectl.sh run my-nginx --image=nginx --replicas=3 --port=80 +``` + +When listing the pods, you will see that three containers have been started and are in Waiting state: + +```sh +$ ./cluster/kubectl.sh get pods +NAME READY STATUS RESTARTS AGE +my-nginx-5kq0g 0/1 Pending 0 10s +my-nginx-gr3hh 0/1 Pending 0 10s +my-nginx-xql4j 0/1 Pending 0 10s +``` + +You need to wait for the provisioning to complete, you can monitor the nodes by doing: + +```sh +$ vagrant ssh node-1 -c 'sudo docker images' +kubernetes-node-1: + REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE + 96864a7d2df3 26 hours ago 204.4 MB + google/cadvisor latest e0575e677c50 13 days ago 12.64 MB + kubernetes/pause latest 6c4579af347b 8 weeks ago 239.8 kB +``` + +Once the docker image for nginx has been downloaded, the container will start and you can list it: + +```sh +$ vagrant ssh node-1 -c 'sudo docker ps' +kubernetes-node-1: + CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES + dbe79bf6e25b nginx:latest "nginx" 21 seconds ago Up 19 seconds k8s--mynginx.8c5b8a3a--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1.etcd--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1--fcfa837f + fa0e29c94501 kubernetes/pause:latest "/pause" 8 minutes ago Up 8 minutes 0.0.0.0:8080->80/tcp k8s--net.a90e7ce4--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1.etcd--7813c8bd_-_3ffe_-_11e4_-_9036_-_0800279696e1--baf5b21b + aa2ee3ed844a google/cadvisor:latest "/usr/bin/cadvisor" 38 minutes ago Up 38 minutes k8s--cadvisor.9e90d182--cadvisor_-_agent.file--4626b3a2 + 65a3a926f357 kubernetes/pause:latest "/pause" 39 minutes ago Up 39 minutes 0.0.0.0:4194->8080/tcp k8s--net.c5ba7f0e--cadvisor_-_agent.file--342fd561 +``` + +Going back to listing the pods, services and replicationcontrollers, you now have: + +```sh +$ ./cluster/kubectl.sh get pods +NAME READY STATUS RESTARTS AGE +my-nginx-5kq0g 1/1 Running 0 1m +my-nginx-gr3hh 1/1 Running 0 1m +my-nginx-xql4j 1/1 Running 0 1m + +$ ./cluster/kubectl.sh get services +NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE + +$ ./cluster/kubectl.sh get replicationcontrollers +CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE +my-nginx my-nginx nginx run=my-nginx 3 1m +``` + +We did not start any services, hence there are none listed. But we see three replicas displayed properly. + +Learn about [running your first containers](http://kubernetes.io/docs/user-guide/simple-nginx/) application to learn how to create a service. + +You can already play with scaling the replicas with: + +```sh +$ ./cluster/kubectl.sh scale rc my-nginx --replicas=2 +$ ./cluster/kubectl.sh get pods +NAME READY STATUS RESTARTS AGE +my-nginx-5kq0g 1/1 Running 0 2m +my-nginx-gr3hh 1/1 Running 0 2m +``` + +Congratulations! + +## Troubleshooting + +#### I keep downloading the same (large) box all the time! + +By default the Vagrantfile will download the box from S3. You can change this (and cache the box locally) by providing a name and an alternate URL when calling `kube-up.sh` + +```sh +export KUBERNETES_BOX_NAME=choose_your_own_name_for_your_kuber_box +export KUBERNETES_BOX_URL=path_of_your_kuber_box +export KUBERNETES_PROVIDER=vagrant +./cluster/kube-up.sh +``` + +#### I am getting timeouts when trying to curl the master from my host! + +During provision of the cluster, you may see the following message: + +```sh +Validating node-1 +............. +Waiting for each node to be registered with cloud provider +error: couldn't read version from server: Get https://10.245.1.2/api: dial tcp 10.245.1.2:443: i/o timeout +``` + +Some users have reported VPNs may prevent traffic from being routed to the host machine into the virtual machine network. + +To debug, first verify that the master is binding to the proper IP address: + +```sh +$ vagrant ssh master +$ ifconfig | grep eth1 -C 2 +eth1: flags=4163 mtu 1500 inet 10.245.1.2 netmask + 255.255.255.0 broadcast 10.245.1.255 +``` + +Then verify that your host machine has a network connection to a bridge that can serve that address: + +```sh +$ ifconfig | grep 10.245.1 -C 2 + +vboxnet5: flags=4163 mtu 1500 + inet 10.245.1.1 netmask 255.255.255.0 broadcast 10.245.1.255 + inet6 fe80::800:27ff:fe00:5 prefixlen 64 scopeid 0x20 + ether 0a:00:27:00:00:05 txqueuelen 1000 (Ethernet) +``` + +If you do not see a response on your host machine, you will most likely need to connect your host to the virtual network created by the virtualization provider. + +If you do see a network, but are still unable to ping the machine, check if your VPN is blocking the request. + +#### I just created the cluster, but I am getting authorization errors! + +You probably have an incorrect ~/.kubernetes_vagrant_auth file for the cluster you are attempting to contact. + +```sh +rm ~/.kubernetes_vagrant_auth +``` + +After using kubectl.sh make sure that the correct credentials are set: + +```sh +cat ~/.kubernetes_vagrant_auth +``` + +```json +{ + "User": "vagrant", + "Password": "vagrant" +} +``` + +#### I just created the cluster, but I do not see my container running! + +If this is your first time creating the cluster, the kubelet on each node schedules a number of docker pull requests to fetch prerequisite images. This can take some time and as a result may delay your initial pod getting provisioned. + +#### I have brought Vagrant up but the nodes cannot validate! + +Log on to one of the nodes (`vagrant ssh node-1`) and inspect the salt minion log (`sudo cat /var/log/salt/minion`). + +#### I want to change the number of nodes! + +You can control the number of nodes that are instantiated via the environment variable `NUM_NODES` on your host machine. If you plan to work with replicas, we strongly encourage you to work with enough nodes to satisfy your largest intended replica size. If you do not plan to work with replicas, you can save some system resources by running with a single node. You do this, by setting `NUM_NODES` to 1 like so: + +```sh +export NUM_NODES=1 +``` + +#### I want my VMs to have more memory! + +You can control the memory allotted to virtual machines with the `KUBERNETES_MEMORY` environment variable. +Just set it to the number of megabytes you would like the machines to have. For example: + +```sh +export KUBERNETES_MEMORY=2048 +``` + +If you need more granular control, you can set the amount of memory for the master and nodes independently. For example: + +```sh +export KUBERNETES_MASTER_MEMORY=1536 +export KUBERNETES_NODE_MEMORY=2048 +``` + +#### I want to set proxy settings for my Kubernetes cluster boot strapping! + +If you are behind a proxy, you need to install vagrant proxy plugin and set the proxy settings by + +```sh +vagrant plugin install vagrant-proxyconf +export VAGRANT_HTTP_PROXY=http://username:password@proxyaddr:proxyport +export VAGRANT_HTTPS_PROXY=https://username:password@proxyaddr:proxyport +``` + +Optionally you can specify addresses to not proxy, for example + +```sh +export VAGRANT_NO_PROXY=127.0.0.1 +``` + +If you are using sudo to make kubernetes build for example make quick-release, you need run `sudo -E make quick-release` to pass the environment variables. + +#### I ran vagrant suspend and nothing works! + +`vagrant suspend` seems to mess up the network. This is not supported at this time. + +#### I want vagrant to sync folders via nfs! + +You can ensure that vagrant uses nfs to sync folders with virtual machines by setting the KUBERNETES_VAGRANT_USE_NFS environment variable to 'true'. nfs is faster than virtualbox or vmware's 'shared folders' and does not require guest additions. See the [vagrant docs](http://docs.vagrantup.com/v2/synced-folders/nfs.html) for details on configuring nfs on the host. This setting will have no effect on the libvirt provider, which uses nfs by default. For example: + +```sh +export KUBERNETES_VAGRANT_USE_NFS=true +``` + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/local-cluster/vagrant.md?pixel)]() + diff --git a/contributors/devel/logging.md b/contributors/devel/logging.md new file mode 100644 index 00000000000..1241ee7fe74 --- /dev/null +++ b/contributors/devel/logging.md @@ -0,0 +1,36 @@ +## Logging Conventions + +The following conventions for the glog levels to use. +[glog](http://godoc.org/github.com/golang/glog) is globally preferred to +[log](http://golang.org/pkg/log/) for better runtime control. + +* glog.Errorf() - Always an error + +* glog.Warningf() - Something unexpected, but probably not an error + +* glog.Infof() has multiple levels: + * glog.V(0) - Generally useful for this to ALWAYS be visible to an operator + * Programmer errors + * Logging extra info about a panic + * CLI argument handling + * glog.V(1) - A reasonable default log level if you don't want verbosity. + * Information about config (listening on X, watching Y) + * Errors that repeat frequently that relate to conditions that can be corrected (pod detected as unhealthy) + * glog.V(2) - Useful steady state information about the service and important log messages that may correlate to significant changes in the system. This is the recommended default log level for most systems. + * Logging HTTP requests and their exit code + * System state changing (killing pod) + * Controller state change events (starting pods) + * Scheduler log messages + * glog.V(3) - Extended information about changes + * More info about system state changes + * glog.V(4) - Debug level verbosity (for now) + * Logging in particularly thorny parts of code where you may want to come back later and check it + +As per the comments, the practical default level is V(2). Developers and QE +environments may wish to run at V(3) or V(4). If you wish to change the log +level, you can pass in `-v=X` where X is the desired maximum level to log. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/logging.md?pixel)]() + diff --git a/contributors/devel/mesos-style.md b/contributors/devel/mesos-style.md new file mode 100644 index 00000000000..81554ce80b9 --- /dev/null +++ b/contributors/devel/mesos-style.md @@ -0,0 +1,218 @@ +# Building Mesos/Omega-style frameworks on Kubernetes + +## Introduction + +We have observed two different cluster management architectures, which can be +categorized as "Borg-style" and "Mesos/Omega-style." In the remainder of this +document, we will abbreviate the latter as "Mesos-style." Although out-of-the +box Kubernetes uses a Borg-style architecture, it can also be configured in a +Mesos-style architecture, and in fact can support both styles at the same time. +This document describes the two approaches and describes how to deploy a +Mesos-style architecture on Kubernetes. + +As an aside, the converse is also true: one can deploy a Borg/Kubernetes-style +architecture on Mesos. + +This document is NOT intended to provide a comprehensive comparison of Borg and +Mesos. For example, we omit discussion of the tradeoffs between scheduling with +full knowledge of cluster state vs. scheduling using the "offer" model. That +issue is discussed in some detail in the Omega paper. +(See [references](#references) below.) + + +## What is a Borg-style architecture? + +A Borg-style architecture is characterized by: + +* a single logical API endpoint for clients, where some amount of processing is +done on requests, such as admission control and applying defaults + +* generic (non-application-specific) collection abstractions described +declaratively, + +* generic controllers/state machines that manage the lifecycle of the collection +abstractions and the containers spawned from them + +* a generic scheduler + +For example, Borg's primary collection abstraction is a Job, and every +application that runs on Borg--whether it's a user-facing service like the GMail +front-end, a batch job like a MapReduce, or an infrastructure service like +GFS--must represent itself as a Job. Borg has corresponding state machine logic +for managing Jobs and their instances, and a scheduler that's responsible for +assigning the instances to machines. + +The flow of a request in Borg is: + +1. Client submits a collection object to the Borgmaster API endpoint + +1. Admission control, quota, applying defaults, etc. run on the collection + +1. If the collection is admitted, it is persisted, and the collection state +machine creates the underlying instances + +1. The scheduler assigns a hostname to the instance, and tells the Borglet to +start the instance's container(s) + +1. Borglet starts the container(s) + +1. The instance state machine manages the instances and the collection state +machine manages the collection during their lifetimes + +Out-of-the-box Kubernetes has *workload-specific* abstractions (ReplicaSet, Job, +DaemonSet, etc.) and corresponding controllers, and in the future may have +[workload-specific schedulers](../../docs/proposals/multiple-schedulers.md), +e.g. different schedulers for long-running services vs. short-running batch. But +these abstractions, controllers, and schedulers are not *application-specific*. + +The usual request flow in Kubernetes is very similar, namely + +1. Client submits a collection object (e.g. ReplicaSet, Job, ...) to the API +server + +1. Admission control, quota, applying defaults, etc. run on the collection + +1. If the collection is admitted, it is persisted, and the corresponding +collection controller creates the underlying pods + +1. Admission control, quota, applying defaults, etc. runs on each pod; if there +are multiple schedulers, one of the admission controllers will write the +scheduler name as an annotation based on a policy + +1. If a pod is admitted, it is persisted + +1. The appropriate scheduler assigns a nodeName to the instance, which triggers +the Kubelet to start the pod's container(s) + +1. Kubelet starts the container(s) + +1. The controller corresponding to the collection manages the pod and the +collection during their lifetime + +In the Borg model, application-level scheduling and cluster-level scheduling are +handled by separate components. For example, a MapReduce master might request +Borg to create a job with a certain number of instances with a particular +resource shape, where each instance corresponds to a MapReduce worker; the +MapReduce master would then schedule individual units of work onto those +workers. + +## What is a Mesos-style architecture? + +Mesos is fundamentally designed to support multiple application-specific +"frameworks." A framework is composed of a "framework scheduler" and a +"framework executor." We will abbreviate "framework scheduler" as "framework" +since "scheduler" means something very different in Kubernetes (something that +just assigns pods to nodes). + +Unlike Borg and Kubernetes, where there is a single logical endpoint that +receives all API requests (the Borgmaster and API server, respectively), in +Mesos every framework is a separate API endpoint. Mesos does not have any +standard set of collection abstractions, controllers/state machines, or +schedulers; the logic for all of these things is contained in each +[application-specific framework](http://mesos.apache.org/documentation/latest/frameworks/) +individually. (Note that the notion of application-specific does sometimes blur +into the realm of workload-specific, for example +[Chronos](https://github.com/mesos/chronos) is a generic framework for batch +jobs. However, regardless of what set of Mesos frameworks you are using, the key +properties remain: each framework is its own API endpoint with its own +client-facing and internal abstractions, state machines, and scheduler). + +A Mesos framework can integrate application-level scheduling and cluster-level +scheduling into a single component. + +Note: Although Mesos frameworks expose their own API endpoints to clients, they +consume a common infrastructure via a common API endpoint for controlling tasks +(launching, detecting failure, etc.) and learning about available cluster +resources. More details +[here](http://mesos.apache.org/documentation/latest/scheduler-http-api/). + +## Building a Mesos-style framework on Kubernetes + +Implementing the Mesos model on Kubernetes boils down to enabling +application-specific collection abstractions, controllers/state machines, and +scheduling. There are just three steps: + +* Use API plugins to create API resources for your new application-specific +collection abstraction(s) + +* Implement controllers for the new abstractions (and for managing the lifecycle +of the pods the controllers generate) + +* Implement a scheduler with the application-specific scheduling logic + +Note that the last two can be combined: a Kubernetes controller can do the +scheduling for the pods it creates, by writing node name to the pods when it +creates them. + +Once you've done this, you end up with an architecture that is extremely similar +to the Mesos-style--the Kubernetes controller is effectively a Mesos framework. +The remaining differences are: + +* In Kubernetes, all API operations go through a single logical endpoint, the +API server (we say logical because the API server can be replicated). In +contrast, in Mesos, API operations go to a particular framework. However, the +Kubernetes API plugin model makes this difference fairly small. + +* In Kubernetes, application-specific admission control, quota, defaulting, etc. +rules can be implemented in the API server rather than in the controller. Of +course you can choose to make these operations be no-ops for your +application-specific collection abstractions, and handle them in your controller. + +* On the node level, Mesos allows application-specific executors, whereas +Kubernetes only has executors for Docker and rkt containers. + +The end-to-end flow is: + +1. Client submits an application-specific collection object to the API server + +2. The API server plugin for that collection object forwards the request to the +API server that handles that collection type + +3. Admission control, quota, applying defaults, etc. runs on the collection +object + +4. If the collection is admitted, it is persisted + +5. The collection controller sees the collection object and in response creates +the underlying pods and chooses which nodes they will run on by setting node +name + +6. Kubelet sees the pods with node name set and starts the container(s) + +7. The collection controller manages the pods and the collection during their +lifetimes + +*Note: if the controller and scheduler are separated, then step 5 breaks +down into multiple steps:* + +(5a) collection controller creates pods with empty node name. + +(5b) API server admission control, quota, defaulting, etc. runs on the +pods; one of the admission controller steps writes the scheduler name as an +annotation on each pods (see pull request `#18262` for more details). + +(5c) The corresponding application-specific scheduler chooses a node and +writes node name, which triggers the Kubelet to start the pod's container(s). + +As a final note, the Kubernetes model allows multiple levels of iterative +refinement of runtime abstractions, as long as the lowest level is the pod. For +example, clients of application Foo might create a `FooSet` which is picked up +by the FooController which in turn creates `BatchFooSet` and `ServiceFooSet` +objects, which are picked up by the BatchFoo controller and ServiceFoo +controller respectively, which in turn create pods. In between each of these +steps there is an opportunity for object-specific admission control, quota, and +defaulting to run in the API server, though these can instead be handled by the +controllers. + +## References + +Mesos is described [here](https://www.usenix.org/legacy/event/nsdi11/tech/full_papers/Hindman_new.pdf). +Omega is described [here](http://research.google.com/pubs/pub41684.html). +Borg is described [here](http://research.google.com/pubs/pub43438.html). + + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/mesos-style.md?pixel)]() + diff --git a/contributors/devel/node-performance-testing.md b/contributors/devel/node-performance-testing.md new file mode 100644 index 00000000000..d6bb657fc98 --- /dev/null +++ b/contributors/devel/node-performance-testing.md @@ -0,0 +1,127 @@ +# Measuring Node Performance + +This document outlines the issues and pitfalls of measuring Node performance, as +well as the tools available. + +## Cluster Set-up + +There are lots of factors which can affect node performance numbers, so care +must be taken in setting up the cluster to make the intended measurements. In +addition to taking the following steps into consideration, it is important to +document precisely which setup was used. For example, performance can vary +wildly from commit-to-commit, so it is very important to **document which commit +or version** of Kubernetes was used, which Docker version was used, etc. + +### Addon pods + +Be aware of which addon pods are running on which nodes. By default Kubernetes +runs 8 addon pods, plus another 2 per node (`fluentd-elasticsearch` and +`kube-proxy`) in the `kube-system` namespace. The addon pods can be disabled for +more consistent results, but doing so can also have performance implications. + +For example, Heapster polls each node regularly to collect stats data. Disabling +Heapster will hide the performance cost of serving those stats in the Kubelet. + +#### Disabling Add-ons + +Disabling addons is simple. Just ssh into the Kubernetes master and move the +addon from `/etc/kubernetes/addons/` to a backup location. More details +[here](../../cluster/addons/). + +### Which / how many pods? + +Performance will vary a lot between a node with 0 pods and a node with 100 pods. +In many cases you'll want to make measurements with several different amounts of +pods. On a single node cluster scaling a replication controller makes this easy, +just make sure the system reaches a steady-state before starting the +measurement. E.g. `kubectl scale replicationcontroller pause --replicas=100` + +In most cases pause pods will yield the most consistent measurements since the +system will not be affected by pod load. However, in some special cases +Kubernetes has been tuned to optimize pods that are not doing anything, such as +the cAdvisor housekeeping (stats gathering). In these cases, performing a very +light task (such as a simple network ping) can make a difference. + +Finally, you should also consider which features yours pods should be using. For +example, if you want to measure performance with probing, you should obviously +use pods with liveness or readiness probes configured. Likewise for volumes, +number of containers, etc. + +### Other Tips + +**Number of nodes** - On the one hand, it can be easier to manage logs, pods, +environment etc. with a single node to worry about. On the other hand, having +multiple nodes will let you gather more data in parallel for more robust +sampling. + +## E2E Performance Test + +There is an end-to-end test for collecting overall resource usage of node +components: [kubelet_perf.go](../../test/e2e/kubelet_perf.go). To +run the test, simply make sure you have an e2e cluster running (`go run +hack/e2e.go -up`) and [set up](#cluster-set-up) correctly. + +Run the test with `go run hack/e2e.go -v -test +--test_args="--ginkgo.focus=resource\susage\stracking"`. You may also wish to +customise the number of pods or other parameters of the test (remember to rerun +`make WHAT=test/e2e/e2e.test` after you do). + +## Profiling + +Kubelet installs the [go pprof handlers] +(https://golang.org/pkg/net/http/pprof/), which can be queried for CPU profiles: + +```console +$ kubectl proxy & +Starting to serve on 127.0.0.1:8001 +$ curl -G "http://localhost:8001/api/v1/proxy/nodes/${NODE}:10250/debug/pprof/profile?seconds=${DURATION_SECONDS}" > $OUTPUT +$ KUBELET_BIN=_output/dockerized/bin/linux/amd64/kubelet +$ go tool pprof -web $KUBELET_BIN $OUTPUT +``` + +`pprof` can also provide heap usage, from the `/debug/pprof/heap` endpoint +(e.g. `http://localhost:8001/api/v1/proxy/nodes/${NODE}:10250/debug/pprof/heap`). + +More information on go profiling can be found +[here](http://blog.golang.org/profiling-go-programs). + +## Benchmarks + +Before jumping through all the hoops to measure a live Kubernetes node in a real +cluster, it is worth considering whether the data you need can be gathered +through a Benchmark test. Go provides a really simple benchmarking mechanism, +just add a unit test of the form: + +```go +// In foo_test.go +func BenchmarkFoo(b *testing.B) { + b.StopTimer() + setupFoo() // Perform any global setup + b.StartTimer() + for i := 0; i < b.N; i++ { + foo() // Functionality to measure + } +} +``` + +Then: + +```console +$ go test -bench=. -benchtime=${SECONDS}s foo_test.go +``` + +More details on benchmarking [here](https://golang.org/pkg/testing/). + +## TODO + +- (taotao) Measuring docker performance +- Expand cluster set-up section +- (vishh) Measuring disk usage +- (yujuhong) Measuring memory usage +- Add section on monitoring kubelet metrics (e.g. with prometheus) + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/node-performance-testing.md?pixel)]() + diff --git a/contributors/devel/on-call-build-cop.md b/contributors/devel/on-call-build-cop.md new file mode 100644 index 00000000000..15c71e5da74 --- /dev/null +++ b/contributors/devel/on-call-build-cop.md @@ -0,0 +1,151 @@ +## Kubernetes "Github and Build-cop" Rotation + +### Preqrequisites + +* Ensure you have [write access to http://github.com/kubernetes/kubernetes](https://github.com/orgs/kubernetes/teams/kubernetes-maintainers) + * Test your admin access by e.g. adding a label to an issue. + +### Traffic sources and responsibilities + +* GitHub Kubernetes [issues](https://github.com/kubernetes/kubernetes/issues) +and [pulls](https://github.com/kubernetes/kubernetes/pulls): Your job is to be +the first responder to all new issues and PRs. If you are not equipped to do +this (which is fine!), it is your job to seek guidance! + + * Support issues should be closed and redirected to Stackoverflow (see example +response below). + + * All incoming issues should be tagged with a team label +(team/{api,ux,control-plane,node,cluster,csi,redhat,mesosphere,gke,release-infra,test-infra,none}); +for issues that overlap teams, you can use multiple team labels + + * There is a related concept of "Github teams" which allow you to @ mention +a set of people; feel free to @ mention a Github team if you wish, but this is +not a substitute for adding a team/* label, which is required + + * [Google teams](https://github.com/orgs/kubernetes/teams?utf8=%E2%9C%93&query=goog-) + * [Redhat teams](https://github.com/orgs/kubernetes/teams?utf8=%E2%9C%93&query=rh-) + * [SIGs](https://github.com/orgs/kubernetes/teams?utf8=%E2%9C%93&query=sig-) + + * If the issue is reporting broken builds, broken e2e tests, or other +obvious P0 issues, label the issue with priority/P0 and assign it to someone. +This is the only situation in which you should add a priority/* label + * non-P0 issues do not need a reviewer assigned initially + + * Assign any issues related to Vagrant to @derekwaynecarr (and @mention him +in the issue) + + * All incoming PRs should be assigned a reviewer. + + * unless it is a WIP (Work in Progress), RFC (Request for Comments), or design proposal. + * An auto-assigner [should do this for you] (https://github.com/kubernetes/kubernetes/pull/12365/files) + * When in doubt, choose a TL or team maintainer of the most relevant team; they can delegate + + * Keep in mind that you can @ mention people in an issue/PR to bring it to +their attention without assigning it to them. You can also @ mention github +teams, such as @kubernetes/goog-ux or @kubernetes/kubectl + + * If you need help triaging an issue or PR, consult with (or assign it to) +@brendandburns, @thockin, @bgrant0607, @quinton-hoole, @davidopp, @dchen1107, +@lavalamp (all U.S. Pacific Time) or @fgrzadkowski (Central European Time). + + * At the beginning of your shift, please add team/* labels to any issues that +have fallen through the cracks and don't have one. Likewise, be fair to the next +person in rotation: try to ensure that every issue that gets filed while you are +on duty is handled. The Github query to find issues with no team/* label is: +[here](https://github.com/kubernetes/kubernetes/issues?utf8=%E2%9C%93&q=is%3Aopen+is%3Aissue+-label%3Ateam%2Fcontrol-plane+-label%3Ateam%2Fmesosphere+-label%3Ateam%2Fredhat+-label%3Ateam%2Frelease-infra+-label%3Ateam%2Fnone+-label%3Ateam%2Fnode+-label%3Ateam%2Fcluster+-label%3Ateam%2Fux+-label%3Ateam%2Fapi+-label%3Ateam%2Ftest-infra+-label%3Ateam%2Fgke+-label%3A"team%2FCSI-API+Machinery+SIG"+-label%3Ateam%2Fhuawei+-label%3Ateam%2Fsig-aws). + +Example response for support issues: + +```code +Please re-post your question to [stackoverflow] +(http://stackoverflow.com/questions/tagged/kubernetes). + +We are trying to consolidate the channels to which questions for help/support +are posted so that we can improve our efficiency in responding to your requests, +and to make it easier for you to find answers to frequently asked questions and +how to address common use cases. + +We regularly see messages posted in multiple forums, with the full response +thread only in one place or, worse, spread across multiple forums. Also, the +large volume of support issues on github is making it difficult for us to use +issues to identify real bugs. + +The Kubernetes team scans stackoverflow on a regular basis, and will try to +ensure your questions don't go unanswered. + +Before posting a new question, please search stackoverflow for answers to +similar questions, and also familiarize yourself with: + + * [user guide](http://kubernetes.io/docs/user-guide/) + * [troubleshooting guide](http://kubernetes.io/docs/admin/cluster-troubleshooting/) + +Again, thanks for using Kubernetes. + +The Kubernetes Team +``` + +### Build-copping + +* The [merge-bot submit queue](http://submit-queue.k8s.io/) +([source](https://github.com/kubernetes/contrib/tree/master/mungegithub/mungers/submit-queue.go)) +should auto-merge all eligible PRs for you once they've passed all the relevant +checks mentioned below and all [critical e2e tests] +(https://goto.google.com/k8s-test/view/Critical%20Builds/) are passing. If the +merge-bot been disabled for some reason, or tests are failing, you might need to +do some manual merging to get things back on track. + +* Once a day or so, look at the [flaky test builds] +(https://goto.google.com/k8s-test/view/Flaky/); if they are timing out, clusters +are failing to start, or tests are consistently failing (instead of just +flaking), file an issue to get things back on track. + +* Jobs that are not in [critical e2e tests](https://goto.google.com/k8s-test/view/Critical%20Builds/) +or [flaky test builds](https://goto.google.com/k8s-test/view/Flaky/) are not +your responsibility to monitor. The `Test owner:` in the job description will be +automatically emailed if the job is failing. + +* If you are oncall, ensure that PRs confirming to the following +pre-requisites are being merged at a reasonable rate: + + * [Have been LGTMd](https://github.com/kubernetes/kubernetes/labels/lgtm) + * Pass Travis and Jenkins per-PR tests. + * Author has signed CLA if applicable. + + +* Although the shift schedule shows you as being scheduled Monday to Monday, + working on the weekend is neither expected nor encouraged. Enjoy your time + off. + +* When the build is broken, roll back the PRs responsible ASAP + +* When E2E tests are unstable, a "merge freeze" may be instituted. During a +merge freeze: + + * Oncall should slowly merge LGTMd changes throughout the day while monitoring +E2E to ensure stability. + + * Ideally the E2E run should be green, but some tests are flaky and can fail +randomly (not as a result of a particular change). + * If a large number of tests fail, or tests that normally pass fail, that +is an indication that one or more of the PR(s) in that build might be +problematic (and should be reverted). + * Use the Test Results Analyzer to see individual test history over time. + + +* Flake mitigation + + * Tests that flake (fail a small percentage of the time) need an issue filed +against them. Please read [this](flaky-tests.md#filing-issues-for-flaky-tests); +the build cop is expected to file issues for any flaky tests they encounter. + + * It's reasonable to manually merge PRs that fix a flake or otherwise mitigate it. + +### Contact information + +[@k8s-oncall](https://github.com/k8s-oncall) will reach the current person on +call. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/on-call-build-cop.md?pixel)]() + diff --git a/contributors/devel/on-call-rotations.md b/contributors/devel/on-call-rotations.md new file mode 100644 index 00000000000..a6535e82024 --- /dev/null +++ b/contributors/devel/on-call-rotations.md @@ -0,0 +1,43 @@ +## Kubernetes On-Call Rotations + +### Kubernetes "first responder" rotations + +Kubernetes has generated a lot of public traffic: email, pull-requests, bugs, +etc. So much traffic that it's becoming impossible to keep up with it all! This +is a fantastic problem to have. In order to be sure that SOMEONE, but not +EVERYONE on the team is paying attention to public traffic, we have instituted +two "first responder" rotations, listed below. Please read this page before +proceeding to the pages linked below, which are specific to each rotation. + +Please also read our [notes on OSS collaboration](collab.md), particularly the +bits about hours. Specifically, each rotation is expected to be active primarily +during work hours, less so off hours. + +During regular workday work hours of your shift, your primary responsibility is +to monitor the traffic sources specific to your rotation. You can check traffic +in the evenings if you feel so inclined, but it is not expected to be as highly +focused as work hours. For weekends, you should check traffic very occasionally +(e.g. once or twice a day). Again, it is not expected to be as highly focused as +workdays. It is assumed that over time, everyone will get weekday and weekend +shifts, so the workload will balance out. + +If you can not serve your shift, and you know this ahead of time, it is your +responsibility to find someone to cover and to change the rotation. If you have +an emergency, your responsibilities fall on the primary of the other rotation, +who acts as your secondary. If you need help to cover all of the tasks, partners +with oncall rotations (e.g., +[Redhat](https://github.com/orgs/kubernetes/teams/rh-oncall)). + +If you are not on duty you DO NOT need to do these things. You are free to focus +on "real work". + +Note that Kubernetes will occasionally enter code slush/freeze, prior to +milestones. When it does, there might be changes in the instructions (assigning +milestones, for instance). + +* [Github and Build Cop Rotation](on-call-build-cop.md) +* [User Support Rotation](on-call-user-support.md) + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/on-call-rotations.md?pixel)]() + diff --git a/contributors/devel/on-call-user-support.md b/contributors/devel/on-call-user-support.md new file mode 100644 index 00000000000..a111c6fe6da --- /dev/null +++ b/contributors/devel/on-call-user-support.md @@ -0,0 +1,89 @@ +## Kubernetes "User Support" Rotation + +### Traffic sources and responsibilities + +* [StackOverflow](http://stackoverflow.com/questions/tagged/kubernetes) and +[ServerFault](http://serverfault.com/questions/tagged/google-kubernetes): +Respond to any thread that has no responses and is more than 6 hours old (over +time we will lengthen this timeout to allow community responses). If you are not +equipped to respond, it is your job to redirect to someone who can. + + * [Query for unanswered Kubernetes StackOverflow questions](http://stackoverflow.com/search?q=%5Bkubernetes%5D+answers%3A0) + * [Query for unanswered Kubernetes ServerFault questions](http://serverfault.com/questions/tagged/google-kubernetes?sort=unanswered&pageSize=15) + * Direct poorly formulated questions to [stackoverflow's tips about how to ask](http://stackoverflow.com/help/how-to-ask) + * Direct off-topic questions to [stackoverflow's policy](http://stackoverflow.com/help/on-topic) + +* [Slack](https://kubernetes.slack.com) ([registration](http://slack.k8s.io)): +Your job is to be on Slack, watching for questions and answering or redirecting +as needed. Also check out the [Slack Archive](http://kubernetes.slackarchive.io/). + +* [Email/Groups](https://groups.google.com/forum/#!forum/google-containers): +Respond to any thread that has no responses and is more than 6 hours old (over +time we will lengthen this timeout to allow community responses). If you are not +equipped to respond, it is your job to redirect to someone who can. + +* [Legacy] [IRC](irc://irc.freenode.net/#google-containers) +(irc.freenode.net #google-containers): watch IRC for questions and try to +redirect users to Slack. Also check out the +[IRC logs](https://botbot.me/freenode/google-containers/). + +In general, try to direct support questions to: + +1. Documentation, such as the [user guide](../user-guide/README.md) and +[troubleshooting guide](http://kubernetes.io/docs/troubleshooting/) + +2. Stackoverflow + +If you see questions on a forum other than Stackoverflow, try to redirect them +to Stackoverflow. Example response: + +```code +Please re-post your question to [stackoverflow] +(http://stackoverflow.com/questions/tagged/kubernetes). + +We are trying to consolidate the channels to which questions for help/support +are posted so that we can improve our efficiency in responding to your requests, +and to make it easier for you to find answers to frequently asked questions and +how to address common use cases. + +We regularly see messages posted in multiple forums, with the full response +thread only in one place or, worse, spread across multiple forums. Also, the +large volume of support issues on github is making it difficult for us to use +issues to identify real bugs. + +The Kubernetes team scans stackoverflow on a regular basis, and will try to +ensure your questions don't go unanswered. + +Before posting a new question, please search stackoverflow for answers to +similar questions, and also familiarize yourself with: + + * [user guide](http://kubernetes.io/docs/user-guide/) + * [troubleshooting guide](http://kubernetes.io/docs/troubleshooting/) + +Again, thanks for using Kubernetes. + +The Kubernetes Team +``` + +If you answer a question (in any of the above forums) that you think might be +useful for someone else in the future, *please add it to one of the FAQs in the +wiki*: + +* [User FAQ](https://github.com/kubernetes/kubernetes/wiki/User-FAQ) +* [Developer FAQ](https://github.com/kubernetes/kubernetes/wiki/Developer-FAQ) +* [Debugging FAQ](https://github.com/kubernetes/kubernetes/wiki/Debugging-FAQ). + +Getting it into the FAQ is more important than polish. Please indicate the date +it was added, so people can judge the likelihood that it is out-of-date (and +please correct any FAQ entries that you see contain out-of-date information). + +### Contact information + +[@k8s-support-oncall](https://github.com/k8s-support-oncall) will reach the +current person on call. + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/on-call-user-support.md?pixel)]() + diff --git a/contributors/devel/owners.md b/contributors/devel/owners.md new file mode 100644 index 00000000000..217585ce614 --- /dev/null +++ b/contributors/devel/owners.md @@ -0,0 +1,100 @@ +# Owners files + +_Note_: This is a design for a feature that is not yet implemented. See the [contrib PR](https://github.com/kubernetes/contrib/issues/1389) for the current progress. + +## Overview + +We want to establish owners for different parts of the code in the Kubernetes codebase. These owners +will serve as the approvers for code to be submitted to these parts of the repository. Notably, owners +are not necessarily expected to do the first code review for all commits to these areas, but they are +required to approve changes before they can be merged. + +**Note** The Kubernetes project has a hiatus on adding new approvers to OWNERS files. At this time we are [adding more reviewers](https://github.com/kubernetes/kubernetes/pulls?utf8=%E2%9C%93&q=is%3Apr%20%22Curating%20owners%3A%22%20) to take the load off of the current set of approvers and once we have had a chance to flush this out for a release we will begin adding new approvers again. Adding new approvers is planned for after the Kubernetes 1.6.0 release. + +## High Level flow + +### Step One: A PR is submitted + +After a PR is submitted, the automated kubernetes PR robot will append a message to the PR indicating the owners +that are required for the PR to be submitted. + +Subsequently, a user can also request the approval message from the robot by writing: + +``` +@k8s-bot approvers +``` + +into a comment. + +In either case, the automation replies with an annotation that indicates +the owners required to approve. The annotation is a comment that is applied to the PR. +This comment will say: + +``` +Approval is required from OR , AND OR , AND ... +``` + +The set of required owners is drawn from the OWNERS files in the repository (see below). For each file +there should be multiple different OWNERS, these owners are listed in the `OR` clause(s). Because +it is possible that a PR may cover different directories, with disjoint sets of OWNERS, a PR may require +approval from more than one person, this is where the `AND` clauses come from. + +`` should be the github user id of the owner _without_ a leading `@` symbol to prevent the owner +from being cc'd into the PR by email. + +### Step Two: A PR is LGTM'd + +Once a PR is reviewed and LGTM'd it is eligible for submission. However, for it to be submitted +an owner for all of the files changed in the PR have to 'approve' the PR. A user is an owner for a +file if they are included in the OWNERS hierarchy (see below) for that file. + +Owner approval comes in two forms: + + * An owner adds a comment to the PR saying "I approve" or "approved" + * An owner is the original author of the PR + +In the case of a comment based approval, the same rules as for the 'lgtm' label apply. If the PR is +changed by pushing new commits to the PR, the previous approval is invalidated, and the owner(s) must +approve again. Because of this is recommended that PR authors squash their PRs prior to getting approval +from owners. + +### Step Three: A PR is merged + +Once a PR is LGTM'd and all required owners have approved, it is eligible for merge. The merge bot takes care of +the actual merging. + +## Design details + +We need to build new features into the existing github munger in order to accomplish this. Additionally +we need to add owners files to the repository. + +### Approval Munger + +We need to add a munger that adds comments to PRs indicating whose approval they require. This munger will +look for PRs that do not have approvers already present in the comments, or where approvers have been +requested, and add an appropriate comment to the PR. + + +### Status Munger + +GitHub has a [status api](https://developer.github.com/v3/repos/statuses/), we will add a status munger that pushes a status onto a PR of approval status. This status will only be approved if the relevant +approvers have approved the PR. + +### Requiring approval status + +Github has the ability to [require status checks prior to merging](https://help.github.com/articles/enabling-required-status-checks/) + +Once we have the status check munger described above implemented, we will add this required status check +to our main branch as well as any release branches. + +### Adding owners files + +In each directory in the repository we may add an OWNERS file. This file will contain the github OWNERS +for that directory. OWNERSHIP is hierarchical, so if a directory does not container an OWNERS file, its +parent's OWNERS file is used instead. There will be a top-level OWNERS file to back-stop the system. + +Obviously changing the OWNERS file requires OWNERS permission. + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/owners.md?pixel)]() + diff --git a/contributors/devel/pr_workflow.dia b/contributors/devel/pr_workflow.dia new file mode 100644 index 00000000000..753a284b4a4 Binary files /dev/null and b/contributors/devel/pr_workflow.dia differ diff --git a/contributors/devel/pr_workflow.png b/contributors/devel/pr_workflow.png new file mode 100644 index 00000000000..0e2bd5d6eda Binary files /dev/null and b/contributors/devel/pr_workflow.png differ diff --git a/contributors/devel/profiling.md b/contributors/devel/profiling.md new file mode 100644 index 00000000000..f50537f12bd --- /dev/null +++ b/contributors/devel/profiling.md @@ -0,0 +1,46 @@ +# Profiling Kubernetes + +This document explain how to plug in profiler and how to profile Kubernetes services. + +## Profiling library + +Go comes with inbuilt 'net/http/pprof' profiling library and profiling web service. The way service works is binding debug/pprof/ subtree on a running webserver to the profiler. Reading from subpages of debug/pprof returns pprof-formatted profiles of the running binary. The output can be processed offline by the tool of choice, or used as an input to handy 'go tool pprof', which can graphically represent the result. + +## Adding profiling to services to APIserver. + +TL;DR: Add lines: + +```go +m.mux.HandleFunc("/debug/pprof/", pprof.Index) +m.mux.HandleFunc("/debug/pprof/profile", pprof.Profile) +m.mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol) +``` + +to the init(c *Config) method in 'pkg/master/master.go' and import 'net/http/pprof' package. + +In most use cases to use profiler service it's enough to do 'import _ net/http/pprof', which automatically registers a handler in the default http.Server. Slight inconvenience is that APIserver uses default server for intra-cluster communication, so plugging profiler to it is not really useful. In 'pkg/kubelet/server/server.go' more servers are created and started as separate goroutines. The one that is usually serving external traffic is secureServer. The handler for this traffic is defined in 'pkg/master/master.go' and stored in Handler variable. It is created from HTTP multiplexer, so the only thing that needs to be done is adding profiler handler functions to this multiplexer. This is exactly what lines after TL;DR do. + +## Connecting to the profiler + +Even when running profiler I found not really straightforward to use 'go tool pprof' with it. The problem is that at least for dev purposes certificates generated for APIserver are not signed by anyone trusted and because secureServer serves only secure traffic it isn't straightforward to connect to the service. The best workaround I found is by creating an ssh tunnel from the kubernetes_master open unsecured port to some external server, and use this server as a proxy. To save everyone looking for correct ssh flags, it is done by running: + +```sh +ssh kubernetes_master -L:localhost:8080 +``` + +or analogous one for you Cloud provider. Afterwards you can e.g. run + +```sh +go tool pprof http://localhost:/debug/pprof/profile +``` + +to get 30 sec. CPU profile. + +## Contention profiling + +To enable contention profiling you need to add line `rt.SetBlockProfileRate(1)` in addition to `m.mux.HandleFunc(...)` added before (`rt` stands for `runtime` in `master.go`). This enables 'debug/pprof/block' subpage, which can be used as an input to `go tool pprof`. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/profiling.md?pixel)]() + diff --git a/contributors/devel/pull-requests.md b/contributors/devel/pull-requests.md new file mode 100644 index 00000000000..888d7320a1f --- /dev/null +++ b/contributors/devel/pull-requests.md @@ -0,0 +1,105 @@ + + +- [Pull Request Process](#pull-request-process) +- [Life of a Pull Request](#life-of-a-pull-request) + - [Before sending a pull request](#before-sending-a-pull-request) + - [Release Notes](#release-notes) + - [Reviewing pre-release notes](#reviewing-pre-release-notes) + - [Visual overview](#visual-overview) +- [Other notes](#other-notes) +- [Automation](#automation) + + + +# Pull Request Process + +An overview of how pull requests are managed for kubernetes. This document +assumes the reader has already followed the [development guide](development.md) +to set up their environment. + +# Life of a Pull Request + +Unless in the last few weeks of a milestone when we need to reduce churn and stabilize, we aim to be always accepting pull requests. + +Either the [on call](on-call-rotations.md) manually or the [github "munger"](https://github.com/kubernetes/contrib/tree/master/mungegithub) submit-queue plugin automatically will manage merging PRs. + +There are several requirements for the submit-queue to work: +* Author must have signed CLA ("cla: yes" label added to PR) +* No changes can be made since last lgtm label was applied +* k8s-bot must have reported the GCE E2E build and test steps passed (Jenkins unit/integration, Jenkins e2e) + +Additionally, for infrequent or new contributors, we require the on call to apply the "ok-to-merge" label manually. This is gated by the [whitelist](https://github.com/kubernetes/contrib/blob/master/mungegithub/whitelist.txt). + +## Before sending a pull request + +The following will save time for both you and your reviewer: + +* Enable [pre-commit hooks](development.md#committing-changes-to-your-fork) and verify they pass. +* Verify `make verify` passes. +* Verify `make test` passes. +* Verify `make test-integration` passes. + +## Release Notes + +This section applies only to pull requests on the master branch. +For cherry-pick PRs, see the [Cherrypick instructions](cherry-picks.md) + +1. All pull requests are initiated with a `release-note-label-needed` label. +1. For a PR to be ready to merge, the `release-note-label-needed` label must be removed and one of the other `release-note-*` labels must be added. +1. `release-note-none` is a valid option if the PR does not need to be mentioned + at release time. +1. `release-note` labeled PRs generate a release note using the PR title by + default OR the release-note block in the PR template if filled in. + * See the [PR template](../../.github/PULL_REQUEST_TEMPLATE.md) for more + details. + * PR titles and body comments are mutable and can be modified at any time + prior to the release to reflect a release note friendly message. + +The only exception to these rules is when a PR is not a cherry-pick and is +targeted directly to the non-master branch. In this case, a `release-note-*` +label is required for that non-master PR. + +### Reviewing pre-release notes + +At any time, you can see what the release notes will look like on any branch. +(NOTE: This only works on Linux for now) + +``` +$ git pull https://github.com/kubernetes/release +$ RELNOTES=$PWD/release/relnotes +$ cd /to/your/kubernetes/repo +$ $RELNOTES -man # for details on how to use the tool +# Show release notes from the last release on a branch to HEAD +$ $RELNOTES --branch=master +``` + +## Visual overview + +![PR workflow](pr_workflow.png) + +# Other notes + +Pull requests that are purely support questions will be closed and +redirected to [stackoverflow](http://stackoverflow.com/questions/tagged/kubernetes). +We do this to consolidate help/support questions into a single channel, +improve efficiency in responding to requests and make FAQs easier +to find. + +Pull requests older than 2 weeks will be closed. Exceptions can be made +for PRs that have active review comments, or that are awaiting other dependent PRs. +Closed pull requests are easy to recreate, and little work is lost by closing a pull +request that subsequently needs to be reopened. We want to limit the total number of PRs in flight to: +* Maintain a clean project +* Remove old PRs that would be difficult to rebase as the underlying code has changed over time +* Encourage code velocity + + +# Automation + +We use a variety of automation to manage pull requests. This automation is described in detail +[elsewhere.](automation.md) + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/pull-requests.md?pixel)]() + diff --git a/contributors/devel/running-locally.md b/contributors/devel/running-locally.md new file mode 100644 index 00000000000..327d685e8e2 --- /dev/null +++ b/contributors/devel/running-locally.md @@ -0,0 +1,170 @@ +Getting started locally +----------------------- + +**Table of Contents** + +- [Requirements](#requirements) + - [Linux](#linux) + - [Docker](#docker) + - [etcd](#etcd) + - [go](#go) + - [OpenSSL](#openssl) +- [Clone the repository](#clone-the-repository) +- [Starting the cluster](#starting-the-cluster) +- [Running a container](#running-a-container) +- [Running a user defined pod](#running-a-user-defined-pod) +- [Troubleshooting](#troubleshooting) + - [I cannot reach service IPs on the network.](#i-cannot-reach-service-ips-on-the-network) + - [I cannot create a replication controller with replica size greater than 1! What gives?](#i-cannot-create-a-replication-controller-with-replica-size-greater-than-1--what-gives) + - [I changed Kubernetes code, how do I run it?](#i-changed-kubernetes-code-how-do-i-run-it) + - [kubectl claims to start a container but `get pods` and `docker ps` don't show it.](#kubectl-claims-to-start-a-container-but-get-pods-and-docker-ps-dont-show-it) + - [The pods fail to connect to the services by host names](#the-pods-fail-to-connect-to-the-services-by-host-names) + +### Requirements + +#### Linux + +Not running Linux? Consider running [Minikube](http://kubernetes.io/docs/getting-started-guides/minikube/), or on a cloud provider like [Google Compute Engine](../getting-started-guides/gce.md). + +#### Docker + +At least [Docker](https://docs.docker.com/installation/#installation) +1.3+. Ensure the Docker daemon is running and can be contacted (try `docker +ps`). Some of the Kubernetes components need to run as root, which normally +works fine with docker. + +#### etcd + +You need an [etcd](https://github.com/coreos/etcd/releases) in your path, please make sure it is installed and in your ``$PATH``. + +#### go + +You need [go](https://golang.org/doc/install) in your path (see [here](development.md#go-versions) for supported versions), please make sure it is installed and in your ``$PATH``. + +#### OpenSSL + +You need [OpenSSL](https://www.openssl.org/) installed. If you do not have the `openssl` command available, you may see the following error in `/tmp/kube-apiserver.log`: + +``` +server.go:333] Invalid Authentication Config: open /tmp/kube-serviceaccount.key: no such file or directory +``` + +### Clone the repository + +In order to run kubernetes you must have the kubernetes code on the local machine. Cloning this repository is sufficient. + +```$ git clone --depth=1 https://github.com/kubernetes/kubernetes.git``` + +The `--depth=1` parameter is optional and will ensure a smaller download. + +### Starting the cluster + +In a separate tab of your terminal, run the following (since one needs sudo access to start/stop Kubernetes daemons, it is easier to run the entire script as root): + +```sh +cd kubernetes +hack/local-up-cluster.sh +``` + +This will build and start a lightweight local cluster, consisting of a master +and a single node. Type Control-C to shut it down. + +If you've already compiled the Kubernetes components, then you can avoid rebuilding them with this script by using the `-O` flag. + +```sh +./hack/local-up-cluster.sh -O +``` + +You can use the cluster/kubectl.sh script to interact with the local cluster. hack/local-up-cluster.sh will +print the commands to run to point kubectl at the local cluster. + + +### Running a container + +Your cluster is running, and you want to start running containers! + +You can now use any of the cluster/kubectl.sh commands to interact with your local setup. + +```sh +cluster/kubectl.sh get pods +cluster/kubectl.sh get services +cluster/kubectl.sh get replicationcontrollers +cluster/kubectl.sh run my-nginx --image=nginx --replicas=2 --port=80 + + +## begin wait for provision to complete, you can monitor the docker pull by opening a new terminal + sudo docker images + ## you should see it pulling the nginx image, once the above command returns it + sudo docker ps + ## you should see your container running! + exit +## end wait + +## introspect Kubernetes! +cluster/kubectl.sh get pods +cluster/kubectl.sh get services +cluster/kubectl.sh get replicationcontrollers +``` + + +### Running a user defined pod + +Note the difference between a [container](../user-guide/containers.md) +and a [pod](../user-guide/pods.md). Since you only asked for the former, Kubernetes will create a wrapper pod for you. +However you cannot view the nginx start page on localhost. To verify that nginx is running you need to run `curl` within the docker container (try `docker exec`). + +You can control the specifications of a pod via a user defined manifest, and reach nginx through your browser on the port specified therein: + +```sh +cluster/kubectl.sh create -f test/fixtures/doc-yaml/user-guide/pod.yaml +``` + +Congratulations! + +### Troubleshooting + +#### I cannot reach service IPs on the network. + +Some firewall software that uses iptables may not interact well with +kubernetes. If you have trouble around networking, try disabling any +firewall or other iptables-using systems, first. Also, you can check +if SELinux is blocking anything by running a command such as `journalctl --since yesterday | grep avc`. + +By default the IP range for service cluster IPs is 10.0.*.* - depending on your +docker installation, this may conflict with IPs for containers. If you find +containers running with IPs in this range, edit hack/local-cluster-up.sh and +change the service-cluster-ip-range flag to something else. + +#### I cannot create a replication controller with replica size greater than 1! What gives? + +You are running a single node setup. This has the limitation of only supporting a single replica of a given pod. If you are interested in running with larger replica sizes, we encourage you to try the local vagrant setup or one of the cloud providers. + +#### I changed Kubernetes code, how do I run it? + +```sh +cd kubernetes +make +hack/local-up-cluster.sh +``` + +#### kubectl claims to start a container but `get pods` and `docker ps` don't show it. + +One or more of the Kubernetes daemons might've crashed. Tail the logs of each in /tmp. + +#### The pods fail to connect to the services by host names + +To start the DNS service, you need to set the following variables: + +```sh +KUBE_ENABLE_CLUSTER_DNS=true +KUBE_DNS_SERVER_IP="10.0.0.10" +KUBE_DNS_DOMAIN="cluster.local" +KUBE_DNS_REPLICAS=1 +``` + +To know more on DNS service you can look [here](http://issue.k8s.io/6667). Related documents can be found [here](../../build-tools/kube-dns/#how-do-i-configure-it) + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/running-locally.md?pixel)]() + diff --git a/contributors/devel/scheduler.md b/contributors/devel/scheduler.md new file mode 100755 index 00000000000..b1cfea7ab31 --- /dev/null +++ b/contributors/devel/scheduler.md @@ -0,0 +1,72 @@ +# The Kubernetes Scheduler + +The Kubernetes scheduler runs as a process alongside the other master +components such as the API server. Its interface to the API server is to watch +for Pods with an empty PodSpec.NodeName, and for each Pod, it posts a Binding +indicating where the Pod should be scheduled. + +## The scheduling process + +``` + +-------+ + +---------------+ node 1| + | +-------+ + | + +----> | Apply pred. filters + | | + | | +-------+ + | +----+---------->+node 2 | + | | +--+----+ + | watch | | + | | | +------+ + | +---------------------->+node 3| ++--+---------------+ | +--+---+ +| Pods in apiserver| | | ++------------------+ | | + | | + | | + +------------V------v--------+ + | Priority function | + +-------------+--------------+ + | + | node 1: p=2 + | node 2: p=5 + v + select max{node priority} = node 2 + +``` + +The Scheduler tries to find a node for each Pod, one at a time. +- First it applies a set of "predicates" to filter out inappropriate nodes. For example, if the PodSpec specifies resource requests, then the scheduler will filter out nodes that don't have at least that much resources available (computed as the capacity of the node minus the sum of the resource requests of the containers that are already running on the node). +- Second, it applies a set of "priority functions" +that rank the nodes that weren't filtered out by the predicate check. For example, it tries to spread Pods across nodes and zones while at the same time favoring the least (theoretically) loaded nodes (where "load" - in theory - is measured as the sum of the resource requests of the containers running on the node, divided by the node's capacity). +- Finally, the node with the highest priority is chosen (or, if there are multiple such nodes, then one of them is chosen at random). The code for this main scheduling loop is in the function `Schedule()` in [plugin/pkg/scheduler/generic_scheduler.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/generic_scheduler.go) + +## Scheduler extensibility + +The scheduler is extensible: the cluster administrator can choose which of the pre-defined +scheduling policies to apply, and can add new ones. + +### Policies (Predicates and Priorities) + +The built-in predicates and priorities are +defined in [plugin/pkg/scheduler/algorithm/predicates/predicates.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithm/predicates/predicates.go) and +[plugin/pkg/scheduler/algorithm/priorities/priorities.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithm/priorities/priorities.go), respectively. + +### Modifying policies + +The policies that are applied when scheduling can be chosen in one of two ways. Normally, +the policies used are selected by the functions `defaultPredicates()` and `defaultPriorities()` in +[plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). +However, the choice of policies can be overridden by passing the command-line flag `--policy-config-file` to the scheduler, pointing to a JSON file specifying which scheduling policies to use. See [examples/scheduler-policy-config.json](../../examples/scheduler-policy-config.json) for an example +config file. (Note that the config file format is versioned; the API is defined in [plugin/pkg/scheduler/api](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/api/)). +Thus to add a new scheduling policy, you should modify predicates.go or priorities.go, and either register the policy in `defaultPredicates()` or `defaultPriorities()`, or use a policy config file. + +## Exploring the code + +If you want to get a global picture of how the scheduler works, you can start in +[plugin/cmd/kube-scheduler/app/server.go](http://releases.k8s.io/HEAD/plugin/cmd/kube-scheduler/app/server.go) + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/scheduler.md?pixel)]() + diff --git a/contributors/devel/scheduler_algorithm.md b/contributors/devel/scheduler_algorithm.md new file mode 100755 index 00000000000..28c6c2bc3fc --- /dev/null +++ b/contributors/devel/scheduler_algorithm.md @@ -0,0 +1,44 @@ +# Scheduler Algorithm in Kubernetes + +For each unscheduled Pod, the Kubernetes scheduler tries to find a node across the cluster according to a set of rules. A general introduction to the Kubernetes scheduler can be found at [scheduler.md](scheduler.md). In this document, the algorithm of how to select a node for the Pod is explained. There are two steps before a destination node of a Pod is chosen. The first step is filtering all the nodes and the second is ranking the remaining nodes to find a best fit for the Pod. + +## Filtering the nodes + +The purpose of filtering the nodes is to filter out the nodes that do not meet certain requirements of the Pod. For example, if the free resource on a node (measured by the capacity minus the sum of the resource requests of all the Pods that already run on the node) is less than the Pod's required resource, the node should not be considered in the ranking phase so it is filtered out. Currently, there are several "predicates" implementing different filtering policies, including: + +- `NoDiskConflict`: Evaluate if a pod can fit due to the volumes it requests, and those that are already mounted. Currently supported volumes are: AWS EBS, GCE PD, and Ceph RBD. Only Persistent Volume Claims for those supported types are checked. Persistent Volumes added directly to pods are not evaluated and are not constrained by this policy. +- `NoVolumeZoneConflict`: Evaluate if the volumes a pod requests are available on the node, given the Zone restrictions. +- `PodFitsResources`: Check if the free resource (CPU and Memory) meets the requirement of the Pod. The free resource is measured by the capacity minus the sum of requests of all Pods on the node. To learn more about the resource QoS in Kubernetes, please check [QoS proposal](../design/resource-qos.md). +- `PodFitsHostPorts`: Check if any HostPort required by the Pod is already occupied on the node. +- `HostName`: Filter out all nodes except the one specified in the PodSpec's NodeName field. +- `MatchNodeSelector`: Check if the labels of the node match the labels specified in the Pod's `nodeSelector` field and, as of Kubernetes v1.2, also match the `scheduler.alpha.kubernetes.io/affinity` pod annotation if present. See [here](../user-guide/node-selection/) for more details on both. +- `MaxEBSVolumeCount`: Ensure that the number of attached ElasticBlockStore volumes does not exceed a maximum value (by default, 39, since Amazon recommends a maximum of 40 with one of those 40 reserved for the root volume -- see [Amazon's documentation](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/volume_limits.html#linux-specific-volume-limits)). The maximum value can be controlled by setting the `KUBE_MAX_PD_VOLS` environment variable. +- `MaxGCEPDVolumeCount`: Ensure that the number of attached GCE PersistentDisk volumes does not exceed a maximum value (by default, 16, which is the maximum GCE allows -- see [GCE's documentation](https://cloud.google.com/compute/docs/disks/persistent-disks#limits_for_predefined_machine_types)). The maximum value can be controlled by setting the `KUBE_MAX_PD_VOLS` environment variable. +- `CheckNodeMemoryPressure`: Check if a pod can be scheduled on a node reporting memory pressure condition. Currently, no ``BestEffort`` should be placed on a node under memory pressure as it gets automatically evicted by kubelet. +- `CheckNodeDiskPressure`: Check if a pod can be scheduled on a node reporting disk pressure condition. Currently, no pods should be placed on a node under disk pressure as it gets automatically evicted by kubelet. + +The details of the above predicates can be found in [plugin/pkg/scheduler/algorithm/predicates/predicates.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithm/predicates/predicates.go). All predicates mentioned above can be used in combination to perform a sophisticated filtering policy. Kubernetes uses some, but not all, of these predicates by default. You can see which ones are used by default in [plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). + +## Ranking the nodes + +The filtered nodes are considered suitable to host the Pod, and it is often that there are more than one nodes remaining. Kubernetes prioritizes the remaining nodes to find the "best" one for the Pod. The prioritization is performed by a set of priority functions. For each remaining node, a priority function gives a score which scales from 0-10 with 10 representing for "most preferred" and 0 for "least preferred". Each priority function is weighted by a positive number and the final score of each node is calculated by adding up all the weighted scores. For example, suppose there are two priority functions, `priorityFunc1` and `priorityFunc2` with weighting factors `weight1` and `weight2` respectively, the final score of some NodeA is: + + finalScoreNodeA = (weight1 * priorityFunc1) + (weight2 * priorityFunc2) + +After the scores of all nodes are calculated, the node with highest score is chosen as the host of the Pod. If there are more than one nodes with equal highest scores, a random one among them is chosen. + +Currently, Kubernetes scheduler provides some practical priority functions, including: + +- `LeastRequestedPriority`: The node is prioritized based on the fraction of the node that would be free if the new Pod were scheduled onto the node. (In other words, (capacity - sum of requests of all Pods already on the node - request of Pod that is being scheduled) / capacity). CPU and memory are equally weighted. The node with the highest free fraction is the most preferred. Note that this priority function has the effect of spreading Pods across the nodes with respect to resource consumption. +- `BalancedResourceAllocation`: This priority function tries to put the Pod on a node such that the CPU and Memory utilization rate is balanced after the Pod is deployed. +- `SelectorSpreadPriority`: Spread Pods by minimizing the number of Pods belonging to the same service, replication controller, or replica set on the same node. If zone information is present on the nodes, the priority will be adjusted so that pods are spread across zones and nodes. +- `CalculateAntiAffinityPriority`: Spread Pods by minimizing the number of Pods belonging to the same service on nodes with the same value for a particular label. +- `ImageLocalityPriority`: Nodes are prioritized based on locality of images requested by a pod. Nodes with larger size of already-installed packages required by the pod will be preferred over nodes with no already-installed packages required by the pod or a small total size of already-installed packages required by the pod. +- `NodeAffinityPriority`: (Kubernetes v1.2) Implements `preferredDuringSchedulingIgnoredDuringExecution` node affinity; see [here](../user-guide/node-selection/) for more details. + +The details of the above priority functions can be found in [plugin/pkg/scheduler/algorithm/priorities](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithm/priorities/). Kubernetes uses some, but not all, of these priority functions by default. You can see which ones are used by default in [plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go](http://releases.k8s.io/HEAD/plugin/pkg/scheduler/algorithmprovider/defaults/defaults.go). Similar as predicates, you can combine the above priority functions and assign weight factors (positive number) to them as you want (check [scheduler.md](scheduler.md) for how to customize). + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/scheduler_algorithm.md?pixel)]() + diff --git a/contributors/devel/testing.md b/contributors/devel/testing.md new file mode 100644 index 00000000000..45848f3bd55 --- /dev/null +++ b/contributors/devel/testing.md @@ -0,0 +1,230 @@ +# Testing guide + +Updated: 5/21/2016 + +**Table of Contents** + + +- [Testing guide](#testing-guide) + - [Unit tests](#unit-tests) + - [Run all unit tests](#run-all-unit-tests) + - [Set go flags during unit tests](#set-go-flags-during-unit-tests) + - [Run unit tests from certain packages](#run-unit-tests-from-certain-packages) + - [Run specific unit test cases in a package](#run-specific-unit-test-cases-in-a-package) + - [Stress running unit tests](#stress-running-unit-tests) + - [Unit test coverage](#unit-test-coverage) + - [Benchmark unit tests](#benchmark-unit-tests) + - [Integration tests](#integration-tests) + - [Install etcd dependency](#install-etcd-dependency) + - [Etcd test data](#etcd-test-data) + - [Run integration tests](#run-integration-tests) + - [Run a specific integration test](#run-a-specific-integration-test) + - [End-to-End tests](#end-to-end-tests) + + + +This assumes you already read the [development guide](development.md) to +install go, godeps, and configure your git client. All command examples are +relative to the `kubernetes` root directory. + +Before sending pull requests you should at least make sure your changes have +passed both unit and integration tests. + +Kubernetes only merges pull requests when unit, integration, and e2e tests are +passing, so it is often a good idea to make sure the e2e tests work as well. + +## Unit tests + +* Unit tests should be fully hermetic + - Only access resources in the test binary. +* All packages and any significant files require unit tests. +* The preferred method of testing multiple scenarios or input is + [table driven testing](https://github.com/golang/go/wiki/TableDrivenTests) + - Example: [TestNamespaceAuthorization](../../test/integration/auth/auth_test.go) +* Unit tests must pass on OS X and Windows platforms. + - Tests using linux-specific features must be skipped or compiled out. + - Skipped is better, compiled out is required when it won't compile. +* Concurrent unit test runs must pass. +* See [coding conventions](coding-conventions.md). + +### Run all unit tests + +`make test` is the entrypoint for running the unit tests that ensures that +`GOPATH` is set up correctly. If you have `GOPATH` set up correctly, you can +also just use `go test` directly. + +```sh +cd kubernetes +make test # Run all unit tests. +``` + +### Set go flags during unit tests + +You can set [go flags](https://golang.org/cmd/go/) by setting the +`KUBE_GOFLAGS` environment variable. + +### Run unit tests from certain packages + +`make test` accepts packages as arguments; the `k8s.io/kubernetes` prefix is +added automatically to these: + +```sh +make test WHAT=pkg/api # run tests for pkg/api +``` + +To run multiple targets you need quotes: + +```sh +make test WHAT="pkg/api pkg/kubelet" # run tests for pkg/api and pkg/kubelet +``` + +In a shell, it's often handy to use brace expansion: + +```sh +make test WHAT=pkg/{api,kubelet} # run tests for pkg/api and pkg/kubelet +``` + +### Run specific unit test cases in a package + +You can set the test args using the `KUBE_TEST_ARGS` environment variable. +You can use this to pass the `-run` argument to `go test`, which accepts a +regular expression for the name of the test that should be run. + +```sh +# Runs TestValidatePod in pkg/api/validation with the verbose flag set +make test WHAT=pkg/api/validation KUBE_GOFLAGS="-v" KUBE_TEST_ARGS='-run ^TestValidatePod$' + +# Runs tests that match the regex ValidatePod|ValidateConfigMap in pkg/api/validation +make test WHAT=pkg/api/validation KUBE_GOFLAGS="-v" KUBE_TEST_ARGS="-run ValidatePod\|ValidateConfigMap$" +``` + +For other supported test flags, see the [golang +documentation](https://golang.org/cmd/go/#hdr-Description_of_testing_flags). + +### Stress running unit tests + +Running the same tests repeatedly is one way to root out flakes. +You can do this efficiently. + +```sh +# Have 2 workers run all tests 5 times each (10 total iterations). +make test PARALLEL=2 ITERATION=5 +``` + +For more advanced ideas please see [flaky-tests.md](flaky-tests.md). + +### Unit test coverage + +Currently, collecting coverage is only supported for the Go unit tests. + +To run all unit tests and generate an HTML coverage report, run the following: + +```sh +make test KUBE_COVER=y +``` + +At the end of the run, an HTML report will be generated with the path +printed to stdout. + +To run tests and collect coverage in only one package, pass its relative path +under the `kubernetes` directory as an argument, for example: + +```sh +make test WHAT=pkg/kubectl KUBE_COVER=y +``` + +Multiple arguments can be passed, in which case the coverage results will be +combined for all tests run. + +### Benchmark unit tests + +To run benchmark tests, you'll typically use something like: + +```sh +go test ./pkg/apiserver -benchmem -run=XXX -bench=BenchmarkWatch +``` + +This will do the following: + +1. `-run=XXX` is a regular expression filter on the name of test cases to run +2. `-bench=BenchmarkWatch` will run test methods with BenchmarkWatch in the name + * See `grep -nr BenchmarkWatch .` for examples +3. `-benchmem` enables memory allocation stats + +See `go help test` and `go help testflag` for additional info. + +## Integration tests + +* Integration tests should only access other resources on the local machine + - Most commonly etcd or a service listening on localhost. +* All significant features require integration tests. + - This includes kubectl commands +* The preferred method of testing multiple scenarios or inputs +is [table driven testing](https://github.com/golang/go/wiki/TableDrivenTests) + - Example: [TestNamespaceAuthorization](../../test/integration/auth/auth_test.go) +* Each test should create its own master, httpserver and config. + - Example: [TestPodUpdateActiveDeadlineSeconds](../../test/integration/pods/pods_test.go) +* See [coding conventions](coding-conventions.md). + +### Install etcd dependency + +Kubernetes integration tests require your `PATH` to include an +[etcd](https://github.com/coreos/etcd/releases) installation. Kubernetes +includes a script to help install etcd on your machine. + +```sh +# Install etcd and add to PATH + +# Option a) install inside kubernetes root +hack/install-etcd.sh # Installs in ./third_party/etcd +echo export PATH="\$PATH:$(pwd)/third_party/etcd" >> ~/.profile # Add to PATH + +# Option b) install manually +grep -E "image.*etcd" cluster/saltbase/etcd/etcd.manifest # Find version +# Install that version using yum/apt-get/etc +echo export PATH="\$PATH:" >> ~/.profile # Add to PATH +``` + +### Etcd test data + +Many tests start an etcd server internally, storing test data in the operating system's temporary directory. + +If you see test failures because the temporary directory does not have sufficient space, +or is on a volume with unpredictable write latency, you can override the test data directory +for those internal etcd instances with the `TEST_ETCD_DIR` environment variable. + +### Run integration tests + +The integration tests are run using `make test-integration`. +The Kubernetes integration tests are writting using the normal golang testing +package but expect to have a running etcd instance to connect to. The `test- +integration.sh` script wraps `make test` and sets up an etcd instance +for the integration tests to use. + +```sh +make test-integration # Run all integration tests. +``` + +This script runs the golang tests in package +[`test/integration`](../../test/integration/). + +### Run a specific integration test + +You can use also use the `KUBE_TEST_ARGS` environment variable with the `hack +/test-integration.sh` script to run a specific integration test case: + +```sh +# Run integration test TestPodUpdateActiveDeadlineSeconds with the verbose flag set. +make test-integration KUBE_GOFLAGS="-v" KUBE_TEST_ARGS="-run ^TestPodUpdateActiveDeadlineSeconds$" +``` + +If you set `KUBE_TEST_ARGS`, the test case will be run with only the `v1` API +version and the watch cache test is skipped. + +## End-to-End tests + +Please refer to [End-to-End Testing in Kubernetes](e2e-tests.md). + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/testing.md?pixel)]() + diff --git a/contributors/devel/update-release-docs.md b/contributors/devel/update-release-docs.md new file mode 100644 index 00000000000..1e0988db303 --- /dev/null +++ b/contributors/devel/update-release-docs.md @@ -0,0 +1,115 @@ +# Table of Contents + + + +- [Table of Contents](#table-of-contents) +- [Overview](#overview) +- [Adding a new docs collection for a release](#adding-a-new-docs-collection-for-a-release) +- [Updating docs in an existing collection](#updating-docs-in-an-existing-collection) + - [Updating docs on HEAD](#updating-docs-on-head) + - [Updating docs in release branch](#updating-docs-in-release-branch) + - [Updating docs in gh-pages branch](#updating-docs-in-gh-pages-branch) + + + +# Overview + +This document explains how to update kubernetes release docs hosted at http://kubernetes.io/docs/. + +http://kubernetes.io is served using the [gh-pages +branch](https://github.com/kubernetes/kubernetes/tree/gh-pages) of kubernetes repo on github. +Updating docs in that branch will update http://kubernetes.io + +There are 2 scenarios which require updating docs: +* Adding a new docs collection for a release. +* Updating docs in an existing collection. + +# Adding a new docs collection for a release + +Whenever a new release series (`release-X.Y`) is cut from `master`, we push the +corresponding set of docs to `http://kubernetes.io/vX.Y/docs`. The steps are as follows: + +* Create a `_vX.Y` folder in `gh-pages` branch. +* Add `vX.Y` as a valid collection in [_config.yml](https://github.com/kubernetes/kubernetes/blob/gh-pages/_config.yml) +* Create a new `_includes/nav_vX.Y.html` file with the navigation menu. This can + be a copy of `_includes/nav_vX.Y-1.html` with links to new docs added and links + to deleted docs removed. Update [_layouts/docwithnav.html] + (https://github.com/kubernetes/kubernetes/blob/gh-pages/_layouts/docwithnav.html) + to include this new navigation html file. Example PR: [#16143](https://github.com/kubernetes/kubernetes/pull/16143). +* [Pull docs from release branch](#updating-docs-in-gh-pages-branch) in `_vX.Y` + folder. + +Once these changes have been submitted, you should be able to reach the docs at +`http://kubernetes.io/vX.Y/docs/` where you can test them. + +To make `X.Y` the default version of docs: + +* Update [_config.yml](https://github.com/kubernetes/kubernetes/blob/gh-pages/_config.yml) + and [/kubernetes/kubernetes/blob/gh-pages/_docs/index.md](https://github.com/kubernetes/kubernetes/blob/gh-pages/_docs/index.md) + to point to the new version. Example PR: [#16416](https://github.com/kubernetes/kubernetes/pull/16416). +* Update [_includes/docversionselector.html](https://github.com/kubernetes/kubernetes/blob/gh-pages/_includes/docversionselector.html) + to make `vX.Y` the default version. +* Add "Disallow: /vX.Y-1/" to existing [robots.txt](https://github.com/kubernetes/kubernetes/blob/gh-pages/robots.txt) + file to hide old content from web crawlers and focus SEO on new docs. Example PR: + [#16388](https://github.com/kubernetes/kubernetes/pull/16388). +* Regenerate [sitemaps.xml](https://github.com/kubernetes/kubernetes/blob/gh-pages/sitemap.xml) + so that it now contains `vX.Y` links. Sitemap can be regenerated using + https://www.xml-sitemaps.com. Example PR: [#17126](https://github.com/kubernetes/kubernetes/pull/17126). +* Resubmit the updated sitemaps file to [Google + webmasters](https://www.google.com/webmasters/tools/sitemap-list?siteUrl=http://kubernetes.io/) for google to index the new links. +* Update [_layouts/docwithnav.html] (https://github.com/kubernetes/kubernetes/blob/gh-pages/_layouts/docwithnav.html) + to include [_includes/archivedocnotice.html](https://github.com/kubernetes/kubernetes/blob/gh-pages/_includes/archivedocnotice.html) + for `vX.Y-1` docs which need to be archived. +* Ping @thockin to update docs.k8s.io to redirect to `http://kubernetes.io/vX.Y/`. [#18788](https://github.com/kubernetes/kubernetes/issues/18788). + +http://kubernetes.io/docs/ should now be redirecting to `http://kubernetes.io/vX.Y/`. + +# Updating docs in an existing collection + +The high level steps to update docs in an existing collection are: + +1. Update docs on `HEAD` (master branch) +2. Cherryick the change in relevant release branch. +3. Update docs on `gh-pages`. + +## Updating docs on HEAD + +[Development guide](development.md) provides general instructions on how to contribute to kubernetes github repo. +[Docs how to guide](how-to-doc.md) provides conventions to follow while writing docs. + +## Updating docs in release branch + +Once docs have been updated in the master branch, the changes need to be +cherrypicked in the latest release branch. +[Cherrypick guide](cherry-picks.md) has more details on how to cherrypick your change. + +## Updating docs in gh-pages branch + +Once release branch has all the relevant changes, we can pull in the latest docs +in `gh-pages` branch. +Run the following 2 commands in `gh-pages` branch to update docs for release `X.Y`: + +``` +_tools/import_docs vX.Y _vX.Y release-X.Y release-X.Y +``` + +For ex: to pull in docs for release 1.1, run: + +``` +_tools/import_docs v1.1 _v1.1 release-1.1 release-1.1 +``` + +Apart from copying over the docs, `_tools/release_docs` also does some post processing +(like updating the links to docs to point to http://kubernetes.io/docs/ instead of pointing to github repo). +Note that we always pull in the docs from release branch and not from master (pulling docs +from master requires some extra processing like versionizing the links and removing unversioned warnings). + +We delete all existing docs before pulling in new ones to ensure that deleted +docs go away. + +If the change added or deleted a doc, then update the corresponding `_includes/nav_vX.Y.html` file as well. + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/update-release-docs.md?pixel)]() + diff --git a/contributors/devel/updating-docs-for-feature-changes.md b/contributors/devel/updating-docs-for-feature-changes.md new file mode 100644 index 00000000000..309b809dca0 --- /dev/null +++ b/contributors/devel/updating-docs-for-feature-changes.md @@ -0,0 +1,76 @@ +# How to update docs for new kubernetes features + +This document describes things to consider when updating Kubernetes docs for new features or changes to existing features (including removing features). + +## Who should read this doc? + +Anyone making user facing changes to kubernetes. This is especially important for Api changes or anything impacting the getting started experience. + +## What docs changes are needed when adding or updating a feature in kubernetes? + +### When making Api changes + +*e.g. adding Deployments* +* Always make sure docs for downstream effects are updated *(StatefulSet -> PVC, Deployment -> ReplicationController)* +* Add or update the corresponding *[Glossary](http://kubernetes.io/docs/reference/)* item +* Verify the guides / walkthroughs do not require any changes: + * **If your change will be recommended over the approaches shown in these guides, then they must be updated to reflect your change** + * [Hello Node](http://kubernetes.io/docs/hellonode/) + * [K8s101](http://kubernetes.io/docs/user-guide/walkthrough/) + * [K8S201](http://kubernetes.io/docs/user-guide/walkthrough/k8s201/) + * [Guest-book](https://github.com/kubernetes/kubernetes/tree/release-1.2/examples/guestbook) + * [Thorough-walkthrough](http://kubernetes.io/docs/user-guide/) +* Verify the [landing page examples](http://kubernetes.io/docs/samples/) do not require any changes (those under "Recently updated samples") + * **If your change will be recommended over the approaches shown in the "Updated" examples, then they must be updated to reflect your change** + * If you are aware that your change will be recommended over the approaches shown in non-"Updated" examples, create an Issue +* Verify the collection of docs under the "Guides" section do not require updates (may need to use grep for this until are docs are more organized) + +### When making Tools changes + +*e.g. updating kube-dash or kubectl* +* If changing kubectl, verify the guides / walkthroughs do not require any changes: + * **If your change will be recommended over the approaches shown in these guides, then they must be updated to reflect your change** + * [Hello Node](http://kubernetes.io/docs/hellonode/) + * [K8s101](http://kubernetes.io/docs/user-guide/walkthrough/) + * [K8S201](http://kubernetes.io/docs/user-guide/walkthrough/k8s201/) + * [Guest-book](https://github.com/kubernetes/kubernetes/tree/release-1.2/examples/guestbook) + * [Thorough-walkthrough](http://kubernetes.io/docs/user-guide/) +* If updating an existing tool + * Search for any docs about the tool and update them +* If adding a new tool for end users + * Add a new page under [Guides](http://kubernetes.io/docs/) +* **If removing a tool (kube-ui), make sure documentation that references it is updated appropriately!** + +### When making cluster setup changes + +*e.g. adding Multi-AZ support* +* Update the relevant [Administering Clusters](http://kubernetes.io/docs/) pages + +### When making Kubernetes binary changes + +*e.g. adding a flag, changing Pod GC behavior, etc* +* Add or update a page under [Configuring Kubernetes](http://kubernetes.io/docs/) + +## Where do the docs live? + +1. Most external user facing docs live in the [kubernetes/docs](https://github.com/kubernetes/kubernetes.github.io) repo + * Also see the *[general instructions](http://kubernetes.io/editdocs/)* for making changes to the docs website +2. Internal design and development docs live in the [kubernetes/kubernetes](https://github.com/kubernetes/kubernetes) repo + +## Who should help review docs changes? + +* cc *@kubernetes/docs* +* Changes to [kubernetes/docs](https://github.com/kubernetes/kubernetes.github.io) repo must have both a Technical Review and a Docs Review + +## Tips for writing new docs + +* Try to keep new docs small and focused +* Document pre-requisites (if they exist) +* Document what concepts will be covered in the document +* Include screen shots or pictures in documents for GUIs +* *TODO once we have a standard widget set we are happy with* - include diagrams to help describe complex ideas (not required yet) + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/updating-docs-for-feature-changes.md?pixel)]() + diff --git a/contributors/devel/writing-a-getting-started-guide.md b/contributors/devel/writing-a-getting-started-guide.md new file mode 100644 index 00000000000..b1d65d60b9b --- /dev/null +++ b/contributors/devel/writing-a-getting-started-guide.md @@ -0,0 +1,101 @@ +# Writing a Getting Started Guide + +This page gives some advice for anyone planning to write or update a Getting Started Guide for Kubernetes. +It also gives some guidelines which reviewers should follow when reviewing a pull request for a +guide. + +A Getting Started Guide is instructions on how to create a Kubernetes cluster on top of a particular +type(s) of infrastructure. Infrastructure includes: the IaaS provider for VMs; +the node OS; inter-node networking; and node Configuration Management system. +A guide refers to scripts, Configuration Management files, and/or binary assets such as RPMs. We call +the combination of all these things needed to run on a particular type of infrastructure a +**distro**. + +[The Matrix](../../docs/getting-started-guides/README.md) lists the distros. If there is already a guide +which is similar to the one you have planned, consider improving that one. + + +Distros fall into two categories: + - **versioned distros** are tested to work with a particular binary release of Kubernetes. These + come in a wide variety, reflecting a wide range of ideas and preferences in how to run a cluster. + - **development distros** are tested work with the latest Kubernetes source code. But, there are + relatively few of these and the bar is much higher for creating one. They must support + fully automated cluster creation, deletion, and upgrade. + +There are different guidelines for each. + +## Versioned Distro Guidelines + +These guidelines say *what* to do. See the Rationale section for *why*. + - Send us a PR. + - Put the instructions in `docs/getting-started-guides/...`. Scripts go there too. This helps devs easily + search for uses of flags by guides. + - We may ask that you host binary assets or large amounts of code in our `contrib` directory or on your + own repo. + - Add or update a row in [The Matrix](../../docs/getting-started-guides/README.md). + - State the binary version of Kubernetes that you tested clearly in your Guide doc. + - Setup a cluster and run the [conformance tests](e2e-tests.md#conformance-tests) against it, and report the + results in your PR. + - Versioned distros should typically not modify or add code in `cluster/`. That is just scripts for developer + distros. + - When a new major or minor release of Kubernetes comes out, we may also release a new + conformance test, and require a new conformance test run to earn a conformance checkmark. + +If you have a cluster partially working, but doing all the above steps seems like too much work, +we still want to hear from you. We suggest you write a blog post or a Gist, and we will link to it on our wiki page. +Just file an issue or chat us on [Slack](http://slack.kubernetes.io) and one of the committers will link to it from the wiki. + +## Development Distro Guidelines + +These guidelines say *what* to do. See the Rationale section for *why*. + - the main reason to add a new development distro is to support a new IaaS provider (VM and + network management). This means implementing a new `pkg/cloudprovider/providers/$IAAS_NAME`. + - Development distros should use Saltstack for Configuration Management. + - development distros need to support automated cluster creation, deletion, upgrading, etc. + This mean writing scripts in `cluster/$IAAS_NAME`. + - all commits to the tip of this repo need to not break any of the development distros + - the author of the change is responsible for making changes necessary on all the cloud-providers if the + change affects any of them, and reverting the change if it breaks any of the CIs. + - a development distro needs to have an organization which owns it. This organization needs to: + - Setting up and maintaining Continuous Integration that runs e2e frequently (multiple times per day) against the + Distro at head, and which notifies all devs of breakage. + - being reasonably available for questions and assisting with + refactoring and feature additions that affect code for their IaaS. + +## Rationale + + - We want people to create Kubernetes clusters with whatever IaaS, Node OS, + configuration management tools, and so on, which they are familiar with. The + guidelines for **versioned distros** are designed for flexibility. + - We want developers to be able to work without understanding all the permutations of + IaaS, NodeOS, and configuration management. The guidelines for **developer distros** are designed + for consistency. + - We want users to have a uniform experience with Kubernetes whenever they follow instructions anywhere + in our Github repository. So, we ask that versioned distros pass a **conformance test** to make sure + really work. + - We want to **limit the number of development distros** for several reasons. Developers should + only have to change a limited number of places to add a new feature. Also, since we will + gate commits on passing CI for all distros, and since end-to-end tests are typically somewhat + flaky, it would be highly likely for there to be false positives and CI backlogs with many CI pipelines. + - We do not require versioned distros to do **CI** for several reasons. It is a steep + learning curve to understand our automated testing scripts. And it is considerable effort + to fully automate setup and teardown of a cluster, which is needed for CI. And, not everyone + has the time and money to run CI. We do not want to + discourage people from writing and sharing guides because of this. + - Versioned distro authors are free to run their own CI and let us know if there is breakage, but we + will not include them as commit hooks -- there cannot be so many commit checks that it is impossible + to pass them all. + - We prefer a single Configuration Management tool for development distros. If there were more + than one, the core developers would have to learn multiple tools and update config in multiple + places. **Saltstack** happens to be the one we picked when we started the project. We + welcome versioned distros that use any tool; there are already examples of + CoreOS Fleet, Ansible, and others. + - You can still run code from head or your own branch + if you use another Configuration Management tool -- you just have to do some manual steps + during testing and deployment. + + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/writing-a-getting-started-guide.md?pixel)]() + diff --git a/contributors/devel/writing-good-e2e-tests.md b/contributors/devel/writing-good-e2e-tests.md new file mode 100644 index 00000000000..ab13aff2353 --- /dev/null +++ b/contributors/devel/writing-good-e2e-tests.md @@ -0,0 +1,235 @@ +# Writing good e2e tests for Kubernetes # + +## Patterns and Anti-Patterns ## + +### Goals of e2e tests ### + +Beyond the obvious goal of providing end-to-end system test coverage, +there are a few less obvious goals that you should bear in mind when +designing, writing and debugging your end-to-end tests. In +particular, "flaky" tests, which pass most of the time but fail +intermittently for difficult-to-diagnose reasons are extremely costly +in terms of blurring our regression signals and slowing down our +automated merge queue. Up-front time and effort designing your test +to be reliable is very well spent. Bear in mind that we have hundreds +of tests, each running in dozens of different environments, and if any +test in any test environment fails, we have to assume that we +potentially have some sort of regression. So if a significant number +of tests fail even only 1% of the time, basic statistics dictates that +we will almost never have a "green" regression indicator. Stated +another way, writing a test that is only 99% reliable is just about +useless in the harsh reality of a CI environment. In fact it's worse +than useless, because not only does it not provide a reliable +regression indicator, but it also costs a lot of subsequent debugging +time, and delayed merges. + +#### Debuggability #### + +If your test fails, it should provide as detailed as possible reasons +for the failure in it's output. "Timeout" is not a useful error +message. "Timed out after 60 seconds waiting for pod xxx to enter +running state, still in pending state" is much more useful to someone +trying to figure out why your test failed and what to do about it. +Specifically, +[assertion](https://onsi.github.io/gomega/#making-assertions) code +like the following generates rather useless errors: + +``` +Expect(err).NotTo(HaveOccurred()) +``` + +Rather +[annotate](https://onsi.github.io/gomega/#annotating-assertions) your assertion with something like this: + +``` +Expect(err).NotTo(HaveOccurred(), "Failed to create %d foobars, only created %d", foobarsReqd, foobarsCreated) +``` + +On the other hand, overly verbose logging, particularly of non-error conditions, can make +it unnecessarily difficult to figure out whether a test failed and if +so why? So don't log lots of irrelevant stuff either. + +#### Ability to run in non-dedicated test clusters #### + +To reduce end-to-end delay and improve resource utilization when +running e2e tests, we try, where possible, to run large numbers of +tests in parallel against the same test cluster. This means that: + +1. you should avoid making any assumption (implicit or explicit) that +your test is the only thing running against the cluster. For example, +making the assumption that your test can run a pod on every node in a +cluster is not a safe assumption, as some other tests, running at the +same time as yours, might have saturated one or more nodes in the +cluster. Similarly, running a pod in the system namespace, and +assuming that that will increase the count of pods in the system +namespace by one is not safe, as some other test might be creating or +deleting pods in the system namespace at the same time as your test. +If you do legitimately need to write a test like that, make sure to +label it ["\[Serial\]"](e2e-tests.md#kinds_of_tests) so that it's easy +to identify, and not run in parallel with any other tests. +1. You should avoid doing things to the cluster that make it difficult +for other tests to reliably do what they're trying to do, at the same +time. For example, rebooting nodes, disconnecting network interfaces, +or upgrading cluster software as part of your test is likely to +violate the assumptions that other tests might have made about a +reasonably stable cluster environment. If you need to write such +tests, please label them as +["\[Disruptive\]"](e2e-tests.md#kinds_of_tests) so that it's easy to +identify them, and not run them in parallel with other tests. +1. You should avoid making assumptions about the Kubernetes API that +are not part of the API specification, as your tests will break as +soon as these assumptions become invalid. For example, relying on +specific Events, Event reasons or Event messages will make your tests +very brittle. + +#### Speed of execution #### + +We have hundreds of e2e tests, some of which we run in serial, one +after the other, in some cases. If each test takes just a few minutes +to run, that very quickly adds up to many, many hours of total +execution time. We try to keep such total execution time down to a +few tens of minutes at most. Therefore, try (very hard) to keep the +execution time of your individual tests below 2 minutes, ideally +shorter than that. Concretely, adding inappropriately long 'sleep' +statements or other gratuitous waits to tests is a killer. If under +normal circumstances your pod enters the running state within 10 +seconds, and 99.9% of the time within 30 seconds, it would be +gratuitous to wait 5 minutes for this to happen. Rather just fail +after 30 seconds, with a clear error message as to why your test +failed ("e.g. Pod x failed to become ready after 30 seconds, it +usually takes 10 seconds"). If you do have a truly legitimate reason +for waiting longer than that, or writing a test which takes longer +than 2 minutes to run, comment very clearly in the code why this is +necessary, and label the test as +["\[Slow\]"](e2e-tests.md#kinds_of_tests), so that it's easy to +identify and avoid in test runs that are required to complete +timeously (for example those that are run against every code +submission before it is allowed to be merged). +Note that completing within, say, 2 minutes only when the test +passes is not generally good enough. Your test should also fail in a +reasonable time. We have seen tests that, for example, wait up to 10 +minutes for each of several pods to become ready. Under good +conditions these tests might pass within a few seconds, but if the +pods never become ready (e.g. due to a system regression) they take a +very long time to fail and typically cause the entire test run to time +out, so that no results are produced. Again, this is a lot less +useful than a test that fails reliably within a minute or two when the +system is not working correctly. + +#### Resilience to relatively rare, temporary infrastructure glitches or delays #### + +Remember that your test will be run many thousands of +times, at different times of day and night, probably on different +cloud providers, under different load conditions. And often the +underlying state of these systems is stored in eventually consistent +data stores. So, for example, if a resource creation request is +theoretically asynchronous, even if you observe it to be practically +synchronous most of the time, write your test to assume that it's +asynchronous (e.g. make the "create" call, and poll or watch the +resource until it's in the correct state before proceeding). +Similarly, don't assume that API endpoints are 100% available. +They're not. Under high load conditions, API calls might temporarily +fail or time-out. In such cases it's appropriate to back off and retry +a few times before failing your test completely (in which case make +the error message very clear about what happened, e.g. "Retried +http://... 3 times - all failed with xxx". Use the standard +retry mechanisms provided in the libraries detailed below. + +### Some concrete tools at your disposal ### + +Obviously most of the above goals apply to many tests, not just yours. +So we've developed a set of reusable test infrastructure, libraries +and best practises to help you to do the right thing, or at least do +the same thing as other tests, so that if that turns out to be the +wrong thing, it can be fixed in one place, not hundreds, to be the +right thing. + +Here are a few pointers: + ++ [E2e Framework](../../test/e2e/framework/framework.go): + Familiarise yourself with this test framework and how to use it. + Amongst others, it automatically creates uniquely named namespaces + within which your tests can run to avoid name clashes, and reliably + automates cleaning up the mess after your test has completed (it + just deletes everything in the namespace). This helps to ensure + that tests do not leak resources. Note that deleting a namespace + (and by implication everything in it) is currently an expensive + operation. So the fewer resources you create, the less cleaning up + the framework needs to do, and the faster your test (and other + tests running concurrently with yours) will complete. Your tests + should always use this framework. Trying other home-grown + approaches to avoiding name clashes and resource leaks has proven + to be a very bad idea. ++ [E2e utils library](../../test/e2e/framework/util.go): + This handy library provides tons of reusable code for a host of + commonly needed test functionality, including waiting for resources + to enter specified states, safely and consistently retrying failed + operations, usefully reporting errors, and much more. Make sure + that you're familiar with what's available there, and use it. + Likewise, if you come across a generally useful mechanism that's + not yet implemented there, add it so that others can benefit from + your brilliance. In particular pay attention to the variety of + timeout and retry related constants at the top of that file. Always + try to reuse these constants rather than try to dream up your own + values. Even if the values there are not precisely what you would + like to use (timeout periods, retry counts etc), the benefit of + having them be consistent and centrally configurable across our + entire test suite typically outweighs your personal preferences. ++ **Follow the examples of stable, well-written tests:** Some of our + existing end-to-end tests are better written and more reliable than + others. A few examples of well-written tests include: + [Replication Controllers](../../test/e2e/rc.go), + [Services](../../test/e2e/service.go), + [Reboot](../../test/e2e/reboot.go). ++ [Ginkgo Test Framework](https://github.com/onsi/ginkgo): This is the + test library and runner upon which our e2e tests are built. Before + you write or refactor a test, read the docs and make sure that you + understand how it works. In particular be aware that every test is + uniquely identified and described (e.g. in test reports) by the + concatenation of it's `Describe` clause and nested `It` clauses. + So for example `Describe("Pods",...).... It(""should be scheduled + with cpu and memory limits")` produces a sane test identifier and + descriptor `Pods should be scheduled with cpu and memory limits`, + which makes it clear what's being tested, and hence what's not + working if it fails. Other good examples include: + +``` + CAdvisor should be healthy on every node +``` + +and + +``` + Daemon set should run and stop complex daemon +``` + + On the contrary +(these are real examples), the following are less good test +descriptors: + +``` + KubeProxy should test kube-proxy +``` + +and + +``` +Nodes [Disruptive] Network when a node becomes unreachable +[replication controller] recreates pods scheduled on the +unreachable node AND allows scheduling of pods on a node after +it rejoins the cluster +``` + +An improvement might be + +``` +Unreachable nodes are evacuated and then repopulated upon rejoining [Disruptive] +``` + +Note that opening issues for specific better tooling is welcome, and +code implementing that tooling is even more welcome :-). + + + +[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/devel/writing-good-e2e-tests.md?pixel)]() +