Skip to content

Latest commit

 

History

History

ray-operator

Ray Kubernetes Operator

The Ray Operator is a Kubernetes operator to automate provisioning, management, autoscaling and operations of Ray clusters deployed to Kubernetes.

Some of the main features of Ray-Operator are:

  • user management via CRD
  • heterogeneous pods in one Ray cluster with specific affinity, toleration and other pre-defined settings
  • monitoring via Prometheus
  • HA for Ray Kubernetes Operator, there will be a lead election if lead crashes

File structure:

ray/deploy/ray-operator
├── api/v1alpha1  // Package v1alpha1 contains API Schema definitions for the ray v1alpha1 API group
│   ├── groupversion_info.go // contains common metadata about the group-version
│   ├── raycluster_types.go  // RayCluster field definitions, user should focus
│   └── zz_generated.deepcopy.go // contains the autogenerated implementation of the aforementioned runtime.Object interface, which marks all of our root types as representing Kinds.
│   
│── config   // contains Kustomize YAML definitions required to launch our controller on a cluster,hold our CustomResourceDefinitions, RBAC configuration, and WebhookConfigurations.
│  ├── certmanager  
│  │   ├── certificate.yaml  // The following manifests contain a self-signed issuer CR and a certificate CR.
│  │   ├── kustomization.yaml
│  │   └── kustomizeconfig.yaml
│  │
│  ├── crd          
│  │   └── bases
│  │   │   └── ray.io_rayclusters.yaml  // RayCluster CRD yaml file
│  │   └── patches
│  │   │   ├── cainjection_in_rayclusters.yaml  // adds a directive for certmanager to inject CA into the CRD
│  │   │   └── webhook_in_rayclusters.yaml  // enables conversion webhook for CRD
│  │   │── kustomization.yaml
│  │   └── kustomizeconfig.yaml
│  │
│  ├── default     // contains a Kustomize base for launching the controller in a standard configuration.
│  │   ├── kustomization.yaml
│  │   ├── manager_auth_proxy_patch.yaml // inject a sidecar container which is a HTTP proxy for the controller manager, it performs RBAC authorization against the Kubernetes API using SubjectAccessReviews.
│  │   ├── manager_webhook_patch.yaml    // webhook yaml file
│  │   └── webhookcainjection_patch.yaml // add annotation to admission webhook config
│  │
│  ├── manager      // launch your controllers as pods in the cluster.
│  │   ├── kustomization.yaml
│  │   └── manager.yaml     // manager yaml to create controller deployment, user should focus
│  │
│  ├── prometheus     
│  │   ├── kustomization.yaml
│  │   └── monitor.yaml     // Prometheus Monitor Service, user should focus
│  │
│  ├── rbac        // permissions required to run your controllers under their own service account.
│  │   ├── auth_proxy_role.yaml
│  │   ├── auth_proxy_role_binding.yaml
│  │   ├── auth_proxy_service.yaml
│  │   ├── kustomization.yaml
│  │   ├── leader_election_role.yaml   // permissions to do leader election.
│  │   ├── leader_election_role_binding.yaml
│  │   └── role_binding.yaml
│  │
│  ├── samples      // sample RayCluster yaml, user should focus
│  │   ├── ray_v1_raycluster.complete.yaml
│  │   ├── ray_v1_raycluster.heterogeneous.yaml
│  │   └── ray_v1_raycluster.mini.yaml
│  │
│  └── webhook
│      ├── kustomization.yaml
│      ├── kustomizeconfig.yaml
│      ├── manifests.yaml
│      └── service.yaml   // webhook-service
│
│── controller
│   ├── common
│   │   ├── constant.go
│   │   ├── meta.go
│   │   ├── pod.go
│   │   └── service.go
│   └── raycluster_controller.go
│
│── main.go
└── Makefile

RayCluster sample CR

To introduce the Ray-Operator, give 3 samples of RayCluster CR.

Sample desc
RayCluster.mini.yaml 2 pods in this sample, 1 for head and 1 for workers.The least information to start ray cluster, run in local test.
RayCluster.heterogeneous.yaml 3 pods in this sample, 1 for head and 2 for workers but with different specifications. Different quota(like CPU/MEMORY) compares to mini version, run in local test.
RayCluster.complete.yaml a complete version CR for customized requirement, show how to set Customized props. More props to set compares to heterogeneous version, run in production.

RayCluster CRD

Refers to file raycluster_types.go for code details.

If interested in CRD, refer to file CRD for more details.

Software requirement

Take care some software have dependency.

software version memo
kustomize v3.1.0+ download
kubectl v1.11.3+ download
Kubernetes Cluster Access to a Kubernetes v1.11.3+ cluster Minikube for local test
go v1.13+ download
docker 17.03+ download

Also you will need kubeconfig in ~/.kube/config, so you can access to Kubernetes Cluster.

Get started

Below gives a guide for user to submit RayCluster step by step:

Install CRDs into a cluster

kustomize build config/crd | kubectl apply -f -

Build manager docker image

View Makefile for more command and info.

make docker-build

Push manager docker image to some docker repo

View Makefile for more command and info.

make docker-push

Deploy the controller in the configured Kubernetes cluster in ~/.kube/config

  • For this version controller will run in ray-operator-system namespace, which maybe can't be tolerated in production.
  • We will add more detailed RBAC file to control the namespace used in production, and the controller will run in that namespace to control the permission.
  • Also, we will provide the more detailed guide for user to run in a controlled way.
kustomize build config/default | kubectl apply -f -

Submit RayCluster to Kubernetes

kubectl create -f config/samples/ray_v1_raycluster.mini.yaml -n ray-operator-system

Apply RayCluster to Kubernetes

kubectl apply -f config/samples/ray_v1_raycluster.mini.yaml -n ray-operator-system

Delete RayCluster to Kubernetes

kubectl delete -f config/samples/ray_v1_raycluster.mini.yaml -n ray-operator-system

Build with bazel

bazel run //:gazelle
bazel build //:ray-operator