From 45873bc24115032d745d4747b0d8509e613b4caa Mon Sep 17 00:00:00 2001 From: Sascha Grunert Date: Tue, 13 Feb 2024 11:52:11 +0100 Subject: [PATCH] Add blog post about CRI-O's seccomp OCI artifact feature This blog post wraps the efforts around a future CRI-O feature implemented in PR: https://github.com/cri-o/cri-o/pull/7719 Signed-off-by: Sascha Grunert Co-authored-by: Tim Bannister Signed-off-by: Sascha Grunert --- .../2024-03-07-cri-o-seccomp-oci-artifacts.md | 476 ++++++++++++++++++ 1 file changed, 476 insertions(+) create mode 100644 content/en/blog/_posts/2024-03-07-cri-o-seccomp-oci-artifacts.md diff --git a/content/en/blog/_posts/2024-03-07-cri-o-seccomp-oci-artifacts.md b/content/en/blog/_posts/2024-03-07-cri-o-seccomp-oci-artifacts.md new file mode 100644 index 0000000000000..c6e3dea6ec8e4 --- /dev/null +++ b/content/en/blog/_posts/2024-03-07-cri-o-seccomp-oci-artifacts.md @@ -0,0 +1,476 @@ +--- +layout: blog +title: "CRI-O: Applying seccomp profiles from OCI registries" +date: 2024-03-07 +slug: cri-o-seccomp-oci-artifacts +--- + +**Author:** Sascha Grunert + +Seccomp stands for secure computing mode and has been a feature of the Linux +kernel since version 2.6.12. It can be used to sandbox the privileges of a +process, restricting the calls it is able to make from userspace into the +kernel. Kubernetes lets you automatically apply seccomp profiles loaded onto a +node to your Pods and containers. + +But distributing those seccomp profiles is a major challenge in Kubernetes, +because the JSON files have to be available on all nodes where a workload can +possibly run. Projects like the [Security Profiles +Operator](https://sigs.k8s.io/security-profiles-operator) solve that problem by +running as a daemon within the cluster, which makes me wonder which part of that +distribution could be done by the [container +runtime](/docs/setup/production-environment/container-runtimes). + +Runtimes usually apply the profiles from a local path, for example: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: pod +spec: + containers: + - name: container + image: nginx:1.25.3 + securityContext: + seccompProfile: + type: Localhost + localhostProfile: nginx-1.25.3.json +``` + +The profile `nginx-1.25.3.json` has to be available in the root directory of the +kubelet, appended by the `seccomp` directory. This means the default location +for the profile on-disk would be `/var/lib/kubelet/seccomp/nginx-1.25.3.json`. +If the profile is not available, then runtimes will fail on container creation +like this: + +```shell +kubectl get pods +``` + +```console +NAME READY STATUS RESTARTS AGE +pod 0/1 CreateContainerError 0 38s +``` + +```shell +kubectl describe pod/pod | tail +``` + +```console +Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s + node.kubernetes.io/unreachable:NoExecute op=Exists for 300s +Events: + Type Reason Age From Message + ---- ------ ---- ---- ------- + Normal Scheduled 117s default-scheduler Successfully assigned default/pod to 127.0.0.1 + Normal Pulling 117s kubelet Pulling image "nginx:1.25.3" + Normal Pulled 111s kubelet Successfully pulled image "nginx:1.25.3" in 5.948s (5.948s including waiting) + Warning Failed 7s (x10 over 111s) kubelet Error: setup seccomp: unable to load local profile "/var/lib/kubelet/seccomp/nginx-1.25.3.json": open /var/lib/kubelet/seccomp/nginx-1.25.3.json: no such file or directory + Normal Pulled 7s (x9 over 111s) kubelet Container image "nginx:1.25.3" already present on machine +``` + +The major obstacle of having to manually distribute the `Localhost` profiles +will lead many end-users to fall back to `RuntimeDefault` or even running their +workloads as `Unconfined` (with disabled seccomp). + +## CRI-O to the rescue + +The Kubernetes container runtime [CRI-O](https://github.com/cri-o/cri-o) +provides various features using custom annotations. The v1.30 release +[adds](https://github.com/cri-o/cri-o/pull/7719) support for a new set of +annotations called `seccomp-profile.kubernetes.cri-o.io/POD` and +`seccomp-profile.kubernetes.cri-o.io/`. Those annotations allow you +to specify: + +- a seccomp profile for a specific container, when used as: + `seccomp-profile.kubernetes.cri-o.io/` (example: + `seccomp-profile.kubernetes.cri-o.io/webserver: +'registry.example/example/webserver:v1'`) +- a seccomp profile for every container within a pod, when used without the + container name suffix but the reserved name `POD`: + `seccomp-profile.kubernetes.cri-o.io/POD` +- a seccomp profile for a whole container image, if the image itself contains + the annotation `seccomp-profile.kubernetes.cri-o.io/POD` or + `seccomp-profile.kubernetes.cri-o.io/`. + +CRI-O will only respect the annotation if the runtime is configured to allow it, +as well as for workloads running as `Unconfined`. All other workloads will still +use the value from the `securityContext` with a higher priority. + +The annotations alone will not help much with the distribution of the profiles, +but the way they can be referenced will! For example, you can now specify +seccomp profiles like regular container images by using OCI artifacts: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: pod + annotations: + seccomp-profile.kubernetes.cri-o.io/POD: quay.io/crio/seccomp:v2 +spec: … +``` + +The image `quay.io/crio/seccomp:v2` contains a `seccomp.json` file, which +contains the actual profile content. Tools like [ORAS](https://oras.land) or +[Skopeo](https://github.com/containers/skopeo) can be used to inspect the +contents of the image: + +```shell +oras pull quay.io/crio/seccomp:v2 +``` + +```console +Downloading 92d8ebfa89aa seccomp.json +Downloaded 92d8ebfa89aa seccomp.json +Pulled [registry] quay.io/crio/seccomp:v2 +Digest: sha256:f0205dac8a24394d9ddf4e48c7ac201ca7dcfea4c554f7ca27777a7f8c43ec1b +``` + +```shell +jq . seccomp.json | head +``` + +```yaml +{ + "defaultAction": "SCMP_ACT_ERRNO", + "defaultErrnoRet": 38, + "defaultErrno": "ENOSYS", + "archMap": [ + { + "architecture": "SCMP_ARCH_X86_64", + "subArchitectures": [ + "SCMP_ARCH_X86", + "SCMP_ARCH_X32" +``` + +```shell +# Inspect the plain manifest of the image +skopeo inspect --raw docker://quay.io/crio/seccomp:v2 | jq . +``` + +```yaml +{ + "schemaVersion": 2, + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "config": + { + "mediaType": "application/vnd.cncf.seccomp-profile.config.v1+json", + "digest": "sha256:ca3d163bab055381827226140568f3bef7eaac187cebd76878e0b63e9e442356", + "size": 3, + }, + "layers": + [ + { + "mediaType": "application/vnd.oci.image.layer.v1.tar", + "digest": "sha256:92d8ebfa89aa6dd752c6443c27e412df1b568d62b4af129494d7364802b2d476", + "size": 18853, + "annotations": { "org.opencontainers.image.title": "seccomp.json" }, + }, + ], + "annotations": { "org.opencontainers.image.created": "2024-02-26T09:03:30Z" }, +} +``` + +The image manifest contains a reference to a specific required config media type +(`application/vnd.cncf.seccomp-profile.config.v1+json`) and a single layer +(`application/vnd.oci.image.layer.v1.tar`) pointing to the `seccomp.json` file. +But now, let's give that new feature a try! + +### Using the annotation for a specific container or whole pod + +CRI-O needs to be configured adequately before it can utilize the annotation. To +do this, add the annotation to the `allowed_annotations` array for the runtime. +This can be done by using a drop-in configuration +`/etc/crio/crio.conf.d/10-crun.conf` like this: + +```toml +[crio.runtime] +default_runtime = "crun" + +[crio.runtime.runtimes.crun] +allowed_annotations = [ + "seccomp-profile.kubernetes.cri-o.io", +] +``` + +Now, let's run CRI-O from the latest `main` commit. This can be done by either +building it from source, using the [static binary bundles](https://github.com/cri-o/packaging?tab=readme-ov-file#using-the-static-binary-bundles-directly) +or [the prerelease packages](https://github.com/cri-o/packaging?tab=readme-ov-file#usage). + +To demonstrate this, I ran the `crio` binary from my command line using a single +node Kubernetes cluster via [`local-up-cluster.sh`](https://github.com/cri-o/cri-o?tab=readme-ov-file#running-kubernetes-with-cri-o). +Now that the cluster is up and running, let's try a pod without the annotation +running as seccomp `Unconfined`: + +```shell +cat pod.yaml +``` + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: pod +spec: + containers: + - name: container + image: nginx:1.25.3 + securityContext: + seccompProfile: + type: Unconfined +``` + +```shell +kubectl apply -f pod.yaml +``` + +The workload is up and running: + +```shell +kubectl get pods +``` + +```console +NAME READY STATUS RESTARTS AGE +pod 1/1 Running 0 15s +``` + +And no seccomp profile got applied if I inspect the container using +[`crictl`](https://sigs.k8s.io/cri-tools): + +```shell +export CONTAINER_ID=$(sudo crictl ps --name container -q) +sudo crictl inspect $CONTAINER_ID | jq .info.runtimeSpec.linux.seccomp +``` + +```console +null +``` + +Now, let's modify the pod to apply the profile `quay.io/crio/seccomp:v2` to the +container: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: pod + annotations: + seccomp-profile.kubernetes.cri-o.io/container: quay.io/crio/seccomp:v2 +spec: + containers: + - name: container + image: nginx:1.25.3 +``` + +I have to delete and recreate the Pod, because only recreation will apply a new +seccomp profile: + +```shell +kubectl delete pod/pod +``` + +```console +pod "pod" deleted +``` + +```shell +kubectl apply -f pod.yaml +``` + +```console +pod/pod created +``` + +The CRI-O logs will now indicate that the runtime pulled the artifact: + +```console +WARN[…] Allowed annotations are specified for workload [seccomp-profile.kubernetes.cri-o.io] +INFO[…] Found container specific seccomp profile annotation: seccomp-profile.kubernetes.cri-o.io/container=quay.io/crio/seccomp:v2 id=26ddcbe6-6efe-414a-88fd-b1ca91979e93 name=/runtime.v1.RuntimeService/CreateContainer +INFO[…] Pulling OCI artifact from ref: quay.io/crio/seccomp:v2 id=26ddcbe6-6efe-414a-88fd-b1ca91979e93 name=/runtime.v1.RuntimeService/CreateContainer +INFO[…] Retrieved OCI artifact seccomp profile of len: 18853 id=26ddcbe6-6efe-414a-88fd-b1ca91979e93 name=/runtime.v1.RuntimeService/CreateContainer +``` + +And the container is finally using the profile: + +```shell +export CONTAINER_ID=$(sudo crictl ps --name container -q) +sudo crictl inspect $CONTAINER_ID | jq .info.runtimeSpec.linux.seccomp | head +``` + +```yaml +{ + "defaultAction": "SCMP_ACT_ERRNO", + "defaultErrnoRet": 38, + "architectures": [ + "SCMP_ARCH_X86_64", + "SCMP_ARCH_X86", + "SCMP_ARCH_X32" + ], + "syscalls": [ + { +``` + +The same would work for every container in the pod, if users replace the +`/container` suffix with the reserved name `/POD`, for example: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: pod + annotations: + seccomp-profile.kubernetes.cri-o.io/POD: quay.io/crio/seccomp:v2 +spec: + containers: + - name: container + image: nginx:1.25.3 +``` + +### Using the annotation for a container image + +While specifying seccomp profiles as OCI artifacts on certain workloads is a +cool feature, the majority of end users would like to link seccomp profiles to +published container images. This can be done by using a container image +annotation; instead of being applied to a Kubernetes Pod, the annotation is some +metadata applied at the container image itself. For example, +[Podman](https://podman.io) can be used to add the image annotation directly +during image build: + +```shell +podman build \ + --annotation seccomp-profile.kubernetes.cri-o.io=quay.io/crio/seccomp:v2 \ + -t quay.io/crio/nginx-seccomp:v2 . +``` + +The pushed image then contains the annotation: + +```shell +skopeo inspect --raw docker://quay.io/crio/nginx-seccomp:v2 | + jq '.annotations."seccomp-profile.kubernetes.cri-o.io"' +``` + +```console +"quay.io/crio/seccomp:v2" +``` + +If I now use that image in an CRI-O test pod definition: + +```yaml +apiVersion: v1 +kind: Pod +metadata: + name: pod + # no Pod annotations set +spec: + containers: + - name: container + image: quay.io/crio/nginx-seccomp:v2 +``` + +Then the CRI-O logs will indicate that the image annotation got evaluated and +the profile got applied: + +```shell +kubectl delete pod/pod +``` + +```console +pod "pod" deleted +``` + +```shell +kubectl apply -f pod.yaml +``` + +```console +pod/pod created +``` + +```console +INFO[…] Found image specific seccomp profile annotation: seccomp-profile.kubernetes.cri-o.io=quay.io/crio/seccomp:v2 id=c1f22c59-e30e-4046-931d-a0c0fdc2c8b7 name=/runtime.v1.RuntimeService/CreateContainer +INFO[…] Pulling OCI artifact from ref: quay.io/crio/seccomp:v2 id=c1f22c59-e30e-4046-931d-a0c0fdc2c8b7 name=/runtime.v1.RuntimeService/CreateContainer +INFO[…] Retrieved OCI artifact seccomp profile of len: 18853 id=c1f22c59-e30e-4046-931d-a0c0fdc2c8b7 name=/runtime.v1.RuntimeService/CreateContainer +INFO[…] Created container 116a316cd9a11fe861dd04c43b94f45046d1ff37e2ed05a4e4194fcaab29ee63: default/pod/container id=c1f22c59-e30e-4046-931d-a0c0fdc2c8b7 name=/runtime.v1.RuntimeService/CreateContainer +``` + +```shell +export CONTAINER_ID=$(sudo crictl ps --name container -q) +sudo crictl inspect $CONTAINER_ID | jq .info.runtimeSpec.linux.seccomp | head +``` + +```yaml +{ + "defaultAction": "SCMP_ACT_ERRNO", + "defaultErrnoRet": 38, + "architectures": [ + "SCMP_ARCH_X86_64", + "SCMP_ARCH_X86", + "SCMP_ARCH_X32" + ], + "syscalls": [ + { +``` + +For container images, the annotation `seccomp-profile.kubernetes.cri-o.io` will +be treated in the same way as `seccomp-profile.kubernetes.cri-o.io/POD` and +applies to the whole pod. In addition to that, the whole feature also works when +using the container specific annotation on an image, for example if a container +is named `container1`: + +```shell +skopeo inspect --raw docker://quay.io/crio/nginx-seccomp:v2-container | + jq '.annotations."seccomp-profile.kubernetes.cri-o.io/container1"' +``` + +```console +"quay.io/crio/seccomp:v2" +``` + +The cool thing about this whole feature is that users can now create seccomp +profiles for specific container images and store them side by side in the same +registry. Linking the images to the profiles provides a great flexibility to +maintain them over the whole application's life cycle. + +### Pushing profiles using ORAS + +The actual creation of the OCI object that contains a seccomp profile requires a +bit more work when using ORAS. I have the hope that tools like Podman will +simplify the overall process in the future. Right now, the container registry +needs to be [OCI compatible](https://oras.land/docs/compatible_oci_registries/#registries-supporting-oci-artifacts), +which is also the case for [Quay.io](https://quay.io). CRI-O expects the seccomp +profile object to have a container image media type +(`application/vnd.cncf.seccomp-profile.config.v1+json`), while ORAS uses +`application/vnd.oci.empty.v1+json` per default. To achieve all of that, the +following commands can be executed: + +```shell +echo "{}" > config.json +oras push \ + --config config.json:application/vnd.cncf.seccomp-profile.config.v1+json \ + quay.io/crio/seccomp:v2 seccomp.json +``` + +The resulting image contains the `mediaType` that CRI-O expects. ORAS pushes a +single layer `seccomp.json` to the registry. The name of the profile does not +matter much. CRI-O will pick the first layer and check if that can act as a +seccomp profile. + +## Future work + +CRI-O internally manages the OCI artifacts like regular files. This provides the +benefit of moving them around, removing them if not used any more or having any +other data available than seccomp profiles. This enables future enhancements in +CRI-O on top of OCI artifacts, but also allows thinking about stacking seccomp +profiles as part of having multiple layers in an OCI artifact. The limitation +that it only works for `Unconfined` workloads for v1.30.x releases is something +different CRI-O would like to address in the future. Simplifying the overall +user experience by not compromising security seems to be the key for a +successful future of seccomp in container workloads. + +The CRI-O maintainers will be happy to listen to any feedback or suggestions on +the new feature! Thank you for reading this blog post, feel free to reach out +to the maintainers via the Kubernetes [Slack channel #crio](https://kubernetes.slack.com/messages/CAZH62UR1) +or create an issue in the [GitHub repository](https://github.com/cri-o/cri-o).