This document contains tips, workflows, and more for developing within this repository.
Starting with Fleet v0.3.7 and Rancher v2.6.1, scenarios where Fleet is managing Fleet (i.e. Rancher managing Rancher) will result in two fleet-agent
deployments running every managed Fleet cluster.
The agents will be communicating with two different fleet-controller
deployments.
Local Fleet Cluster Managed Fleet Cluster Downstream Cluster
┌───────────────────────────────┐ ┌────────────────────────────────────┐ ┌────────────────────────────────────┐
│ │ │ │ │ │
│ ┌────cattle-fleet-system────┐ │ │ ┌──────cattle-fleet-system───────┐ │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ ┌─────────────────────┐ │ │ │ │ ┌──────────────────────────┐ │ │ │ │
│ │ │ fleet-controller ◄──┼─┼────┼─┼──► fleet-agent (downstream) │ │ │ │ │
│ │ └─────────────────────┘ │ │ │ │ └──────────────────────────┘ │ │ │ │
│ │ │ │ │ │ │ │ │ │
│ └───────────────────────────┘ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │
│ ┌─cattle-fleet-local-system─┐ │ │ │ │ │ │ ┌──────cattle-fleet-system───────┐ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ ┌─────────────────────┐ │ │ │ │ ┌──────────────────────────┐ │ │ │ │ ┌──────────────────────────┐ │ │
│ │ │ fleet-agent (local) │ │ │ │ │ │ fleet-controller ◄──┼─┼────┼─┼──► fleet-agent (downstream) │ │ │
│ │ └─────────────────────┘ │ │ │ │ └──────────────────────────┘ │ │ │ │ └──────────────────────────┘ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ └───────────────────────────┘ │ │ └────────────────────────────────┘ │ │ └────────────────────────────────┘ │
│ │ │ │ │ │
└───────────────────────────────┘ │ ┌───cattle-fleet-local-system────┐ │ └────────────────────────────────────┘
│ │ │ │
│ │ ┌──────────────────────────┐ │ │
│ │ │ fleet-agent (local) │ │ │
│ │ └──────────────────────────┘ │ │
│ │ │ │
│ └────────────────────────────────┘ │
│ │
└────────────────────────────────────┘
Fleet is a required component of Rancher as of Rancher v2.6+. Fleet clusters are tied directly to native Rancher object types accordingly:
┌───────────────────────────────────┐ == ┌────────────────────────────────────┐ == ┌──────────────────────────────────┐
│ clusters.fleet.cattle.io/v1alpha1 ├──────┤ clusters.provisioning.cattle.io/v1 ├──────┤ clusters.management.cattle.io/v3 │
└────────────────┬──────────────────┘ └───────────────────┬────────────────┘ └──────────────────────────────────┘
│ │
└──────────────────────┬──────────────────────┘
│
┌─────────────▼────────────────────────┐
│ │
┌──────────────────▼──────────────────────┐ == ┌────────▼──────┐
│ fleetworkspaces.management.cattle.io/v3 ├──────┤ namespaces/v1 │
└─────────────────────────────────────────┘ └───────────────┘
All steps in this guide assume your current working directory is the root of the repository. Moreover, this guide was written for Unix-like developer environments, so you may need to modify some steps if you are using a non-Unix-like developer environment (i.e. Windows).
We need to use a registry to store fleet-agent
developer builds.
Using a personal DockerHub repository is usually a suitable choice.
The full repository name must be <your-choice>/fleet-agent
.
Now, we need export an environment variable with your repository name as the value. This will be used when building, pushing, and deploying your agent.
Note: the value for this variable should not include
/fleet-agent
. For example, if your full DockerHub repository name isfoobar/fleet-agent
, the value used below should befoobar
.
Export the new AGENT_REPO
variable and use the aforementioned value.
export AGENT_REPO=<your-OCI-image-repository>
We need a local cluster to work with. For this guide, we will use k3d.
k3d cluster create <NAME>
If you have changed Go code, you may need to generate.
go generate
First, we need to run Rancher locally. You can use the Rancher Wiki for information on how to do so.
If you are unsure about which method you would like use for tunneling to localhost, we recommend ngrok or tunnelware.
Now, let's build and push your fleet-agent
(linux-amd64
image by default), if applicable.
(
go fmt ./...
REPO=$AGENT_REPO make agent-dev
docker push $AGENT_REPO/fleet-agent:dev
)
In the Rancher Dashboard, navigate to the fleet-controller
ConfigMap.
This is likely located in a cattle-fleet-*
or fleet-*
namespace.
Replace the existing agent-related fields with the following information:
- your agent image and tag
- the image pull policy to
Always
(for iterative development)
Once the ConfigMap has been updated, edit the fleet-controller
Deployment and scale down its replicas
to 0
.
With that change, we can now run the controller locally.
(
go fmt ./...
go run cmd/fleetcontroller/main.go
)
Optional: you can test Rancher's FleetWorkspaces
feature by moving Fleet clusters to another workspace in the "Continuous Delivery" section of the Rancher UI.
You can create your own workspace using the API or the UI.
Ensure that clusters are in an "Active" state after migration.
The controller should be running in your terminal window/pane! You can now create GitRepo custom resource objects and test Fleet locally.
- Update and tag rancher/build-tekton
- Update the
rancher/tekton-utils
tag in theGitJob
helm chart in rancher/gitjob - Update and tag rancher/gitjob
- Copy the
GitJob
helm chart to./charts/fleet/charts/gitjob
in rancher/fleet - Generate the
GitJob
CRD into a file in rancher/fleet:go run $GITJOB_REPO/pkg/crdgen/main.go > $FLEET_REPO/charts/fleet-crd/templates/gitjobs-crds.yaml
- Update and tag rancher/fleet (usually as a release candidate) to use those components in a released version of Fleet
Fleet agents run natively on both Windows and Linux downstream Kubernetes nodes. While Fleet is currently unsupported for local Windows clusters, we need to test its usage of downstream Windows clusters.
Once your Windows agent is built (following similar directions from Linux agent development above), start Rancher on a Kubernetes cluster and perform the following steps:
- Create a downstream RKE1 or RKE2 Windows cluster
- Wait for
fleet-agent
to be deployed downstream (native on Linux and Windows, but will likely default to Linux) - Change
fleet-agent
Deployment image name and tag to be your custom agent image name and tag - Change
fleet-agent
Deployment imagePullPolicy
toAlways
- Delete the existing
fleet-agent
pod and wait for the new one to reach aRunning
state (note: ensure there aren't any non-transient error logs) - Create the multi-cluster/windows-helm GitRepo CR in the local cluster
- Observe "Active" or "Running" or "Completed" states for
gitrepos.fleet.cattle.io
and other resources (e.g. pods deployed from the chart)
Testing Windows images locally can be done on a Windows host.
First, enter powershell
and install Git.
You can use Chocolatey to install the package.
Next, clone the repository and checkout your branch.
& 'C:\Program Files\Git\bin\git.exe' clone https://github.com/<user>/<fleet-fork>.git
& 'C:\Program Files\Git\bin\git.exe' checkout --track origin/<dev-branch>
Finally, you can build the image with an uploaded binary from a tagged release.
This is useful when testing the fleet-agent
on a Windows cluster.
docker build -t test -f package\Dockerfile-windows.agent --build-arg SERVERCORE_VERSION=<windows-version> --build-arg RELEASES=releases.rancher.com --build-arg VERSION=<fleet-tag> .
If you need to test Fleet during Rancher startup, you may want to use a custom fork. The following steps can help you do so:
- Fork rancher/fleet and commit your changes to a branch of your choice
- Change image names and tags for the Helm chart(s) for your custom development image(s) accordingly
- Publish images corresponding to the names and tags changed in the Helm chart(s)
- Tag your fork with a SemVer-compliant tag that's "greater" than the Fleet chart tag in your chosen version of Rancher (note: the exact tag name is not that important, but we want it to be "grater" just in case there's a collision for the "latest" version)
- Fork rancher/charts and update branch
dev-v2.x
with your changes to thefleet
,fleet-crd
, andfleet-agent
packages - You'll need to change the chart URL to your charts'
tgz
location (this may need to be self-hosted) - Finally, commits those changes, execute
make charts
and commit those changes in a second commit - Fork rancher/rancher and change the charts URL to point to your fork
- Start Rancher locally (instructions: Rancher Wiki) and your fork's chart should be deployed
Standalone Fleet is not officially tested by Rancher QA, but we should not break standalone Fleet either.
If you would like to test standalone Fleet, you can do the following: build and push your fleet-agent
(linux-amd64
image by default), install your Fleet charts, and then replace the controller deployment with your local controller build.
(
go fmt ./...
REPO=$AGENT_REPO make agent-dev
docker push $AGENT_REPO/fleet-agent:dev
for i in cattle-fleet-system fleet-default fleet-local; do kubectl create namespace $i; done
helm install -n cattle-fleet-system fleet-crd ./charts/fleet-crd
helm install -n cattle-fleet-system fleet --set agentImage.repository=$AGENT_REPO/fleet-agent --set agentImage.imagePullPolicy=Always ./charts/fleet
kubectl delete deployment -n cattle-fleet-system fleet-controller
go run cmd/fleetcontroller/main.go
)
Alternatively, if the agent's code has been unchanged, you can use the latest agent instead. We'll use the latest Git tag for this, and assume it is available on DockerHub.
(
go fmt ./...
for i in cattle-fleet-system fleet-default fleet-local; do kubectl create namespace $i; done
helm install -n cattle-fleet-system fleet-crd ./charts/fleet-crd
helm install -n cattle-fleet-system fleet --set agentImage.tag=$(git tag --sort=taggerdate | tail -1) ./charts/fleet
kubectl delete deployment -n cattle-fleet-system fleet-controller
go run cmd/fleetcontroller/main.go
)
Fleet differs from Rancher in one major design philosophy: nearly all "business logic" happens in the local cluster rather than in downstream clusters via agents.
The good news here is that the fleet-controller
will tell us nearly all that we need to know via pod logs, network traffic and resource usage.
That being said, downstream fleet-agent
deployments can perform Kubernetes API requests back to the local cluster, which means that we have to monitor traffic inbound to the local cluster from our agents as well as the outbound traffic we'd come to expect from the local fleet-controller
.
While network traffic is major point of consideration, we also have to consider whether our performance issues are compute-based, memory-based, or network-based. For example: you may encounter a pod with high compute usage, but that could be caused by heightened network traffic received from the truly malfunctioning pod.
Since the Fleet components' controller framework of choice is Wrangler, we can share caches and avoid unnecessary API requests.
Moreover, we can customize enqueue logic to decrease load on the cluster and its components.
For example: if a BundleDeployment
encounters failure and meets certain criteria such that it'll never become active, we should move the object to a permanent error state that requires manual triage.
While reconciling state and automatically attempting to reach desired state is... desired..., we should find opportunities to eliminate loops, scheduling logic, and frequent re-enqueuing so that we decrease CPU and network load.
Solving example scenario may even result in manual triage for the BundleDeployment
, which could be a good trade-off for the user!
To examine Fleet network load, we can use Istio pod injection to monitor network traffic and observe it with Kiali. If Istio is installed via the Rancher UI, you can perform pod injection with a checkbox per pod. To learn more, please refer to the Istio documentation for Rancher.
This section contains information on releasing Fleet. Please note: it may be sparse since it is only intended for maintainers.
- Ensure that all modules are at their desired versions in
go.mod
- Ensure that all nested and external images are at their desired versions (check
charts/
as well, and you can the following ripgrep command at the root of the repository to see all images used:rg "repository:" -A 1 | rg "tag:" -B 1
- Run
go mod tidy
andgo generate
and ensure thatgit status
is clean - Determine the tag for the next intended release (must be valid SemVer prepended with
v
)
- Checkout the release branch (e.g.,
release-0.4
) or create it based off of the latestmaster
branch. The branch name should be the first 2 parts of the semantic version withrelease-
prepended. - Use
git tag
and append the tag from the Pre-Release section with-rcX
whereX
is an unsigned integer that starts with1
(if-rcX
already exists, incrementX
by one)
- Open a draft release on the GitHub releases page
- Send draft link to maintainers with view permissions to ensure that contents are valid
- Create GitHub release and create a new tag on the appropriate release branch while doing so (using the tag from the Pre-Release section)
- Pull Fleet images from DockerHub to ensure manifests work as expected
- Open a PR in rancher/charts that ensures every Fleet-related chart is using the new RC (branches and number of PRs is dependent on Rancher)
With releases happening on release branches, there are times where a bug fix needs to be handled on the master
branch and pulled into a release that happens through a release branch.
All bug fixes should first happen on the master
branch.
If a bug fix needs to be brought into a release, such as during the release candidate phase, it should be cherry picked from the master
branch to the release branch via a pull request. The pull request should be prefixed with the major and minor version for the release (e.g., [0.4]
) to illustrate it's for a release branch.
- ASCII charts created with asciiflow