This repository contains my OKD deployment scripts. It is designed to deploy an OKD cluster with three control plane nodes and nine workers across three identical hypervisors. A Ceph cluster will be deployed as part of installation, orchestrated and managed by Rook. The Ceph web dashboard will be deployed and available as part of the Ceph deployment. In addition, MetalLB will be deployed into the cluster to provide LoadBalancer
type services. The internal container registry will be configured to use storage from the Ceph cluster.
This configuration is what I have deployed as my homelab and, with the aid of these scripts, can be cleanly torn down and re-deployed with minimum hassle. In addition, the cluster has full high availability and can be upgraded in-place, which means that clusters can be arbitrarily long-lived.
I'll try to come back and update this repo whenever I re-deploy a cluster, though at this point OKD setup is super stable so there shouldn't be much flux in this repository.
This documentation assumes that you are familiar with the OKD deployment process. Specifically, these scripts deploy OKD onto bare metal user-provisioned infrastructure. This is more commonly known as a baremetal UPI deployment in OKD parlance, and "VMs we make ourselves" by the rest of humanity.
If you are not familiar with OKD baremetal UPI deployments, please read and understand the documentation linked below.
This repo makes heavy use of Terraform to provision the VMs that comprise the cluster. The defaults provided here are almost certainly not going to be applicable to your individual environment, and so you will have to edit some Terraform HCL in order to successfully deploy a cluster. Therefore, familiarity with Terraform and what it does, and HCL as a configuration language, is assumed. If you haven't used Terraform very much before, here are some links to relevant documentation.
This repo makes use of the Libvirt provider for Terraform to orchestrate the deployment. This provider must be installed manually. Please follow the installation instructions provided by the README in the linked repository to install the provider prior to deployment.
OKD requires the administrator to set up some moderately sophisticated networking configuration for successful operation. Specifically, it requires DHCP reservations and various types of DNS records (SRV, PTR, TXT, and A records) to be set up prior to cluster deployment. Anything more capable than your ISP's router should provide the ability to configure DHCP reservations and DNS records, but if you don't have the ability to create these, you will have to figure this out prior to deployment.
The exact networking configuration required is documented as part of the bare metal installation documentation. More specific requirements for this setup in particular will be covered later in this guide.
OKD requires an external load balancer to handle ingress and API load balancing amongst nodes. While the documentation specifies two separate load balancers, you can combine both into one configuration and deploy a single load balancer to handle both workloads. Load balancer deployment is out of scope for this repo, as there are many ways to tackle this bit of infrastructure. Personally, I've had excellent success with HAProxy.
It is worh repeating: THIS REPOSITORY DOES NOT SET UP AN EXTERNAL LOAD BALANCER. IT MUST BE PRESENT AND CONFIGURED PRIOR TO DEPLOYMENT.
This repo is designed to deploy onto three identical hypervisors that are set up just so. These are the requirements:
- At least 12 cores of CPU (I use Ryzen 5 3600s)
- At least 64 GiB of RAM (the more the merrier, really)
- Libvirt and QEMU installed
- A Libvirt LVM storage pool with at least 500 GiB of SSD-backed storage named
ssd_pool
- At least 3 block devices 4 TiB or larger. These will be used as storage for container workloads by the cluster.
You will also need to ensure that you can log into the hypervisors from the machine you intend to run the deployment from with passwordless SSH.
Once the networking, load balancer, and hardware are all set up, there is some final one-time configuration setup that must be done prior to deployment. Once you have performed this one-time configuration, you can commit it to source control or otherwise save it, as it will not need updating unless you change your underlying infrastructure.
- Copy
install-config.yaml.example
toinstall-config.yaml
- Update
baseDomain
to be your cluster's intended subdomain. - update
sshKey
to be an SSH public key. The cluster will add this public key to each node'sauthorized_hosts
, and is the only way to log into the nodes for debugging purposes. - You should take a look at the rest of the configuration and update it to match your environment if necessary. Documentation on
install-config.yaml
is available here.
- Update
- Copy
storage/rook-dashboard.yaml.example
tostorage/rook-dashboard.yaml
.- Update
spec.host
with your cluster's subdomain where indicated.
- Update
- Update the
host
default value interraform/bootstrap/variables.tf
to point to one of your hypervisors. It does not matter which one. - Update the
host
default value interraform/hv1/variables.tf
to point to your first hypervisor. - Update the
host
default value interraform/hv2/variables.tf
to point to your second hypervisor. - Update the
host
default value interraform/hv3/variables.tf
to point to your third hypervisor. - Update the
data_disks
default array interraform/hv1/variables.tf
to contain the three block devices you wish to use for cluster storage on your first hypervisor. - Update the
data_disks
default array interraform/hv2/variables.tf
to contain the three block devices you wish to use for cluster storage on your second hypervisor. - Update the
data_disks
default array interraform/hv3/variables.tf
to contain the three block devices you wish to use for cluster storage on your third hypervisor. - Edit
deploy.sh
and update theCLUSTER_SUBDOMAIN
,HYPERVISOR_1
,HYPERVISOR_2
, andHYPERVISOR_3
variables to be the cluster subdomain and hostnames of your three hypervisors, respectively.
The Terraform root modules for each hypervisor, as well as the module for the bootstrap, come with hard-coded MAC addresses for each VM it will host. These MAC addresses must have corresponding DHCP reservations, A records, and PTR records, otherwise the cluster will fail to deploy. You may use the MAC addresses provided or change them to fit your environment. The only hard requirement is that they must match.
The deployment expects the following hostnames and hypervisor placements. Note that all DNS records should include the FQDN of the cluster's subdomain. As an example, if the configured cluster subdomain is cluster.okd.example.com
, then the bootstrap VM's full hostname for the PTR and A records should be bootstrap.cluster.okd.example.com
.
Bootstrap: bootstrap
Masters:
master0
on first hypervisormaster1
on second hypervisormaster2
on third hypervisor
Workers:
worker0
,worker1
, andworker2
on first hypervisorworker3
,worker4
, andworker5
on second hypervisorworker6
,worker7
, andworker8
on third hypervisor
Once you have all of the above set up, it is finally time to deploy the cluster. To deploy the latest stable OKD release with the latest stable Rook and Fedora CoreOS, simply run:
./deploy.sh
If you wish to specify an OKD release, that is done via the OPENSHIFT_INSTALL_RELEASE
environment variable:
OPENSHIFT_INSTALL_RELEASE="4.7.0-0.okd-2021-03-21-094146" ./deploy.sh
If you wish to specify a Rook release, that is done via the ROOK_TAG
environment variable:
ROOK_TAG="v1.5.9" ./deploy.sh
IF you wish to specify a specific Fedora CoreOS version to deploy as the initial pivot, that is done via the COREOS_VERSION
environment variable:
COREOS_VERSION="33.20210301.3.1" ./deploy.sh
Of course, any combination of these three environment variables may be set and the script will respect them. Any of the three that are unset will default to the latest stable release of that component.
To completely tear down the cluster, run the following from the same directory you initially ran ./deploy.sh
from:
./undeploy.sh
If you have to undeploy from a different copy of this repository than the one you initially deployed from, you must create the .coreos_version
stamp file with the CoreOS version used when installing:
echo "33.20210301.3.1" > .coreos_version
./undeploy.sh