diff --git a/README.md b/README.md index 31b5286..5a6cc20 100644 --- a/README.md +++ b/README.md @@ -103,6 +103,7 @@ For further documentation, you may want to check these other documents: * [K8s](./docs/k8s.md) - documentation about configuring a single-node Kubernetes cluster. * [Kata](./docs/kata.md) - instructions to build our custom Kata fork and `initrd` images. * [Knative](./docs/knative.md) - documentation about Knative, our serverless runtime of choice. +* [Local Registry](./docs/registry.md) - configuring a local registry to store OCI images. * [OVMF](./docs/ovmf.md) - notes on building OVMF and CoCo's OVMF boot process. * [SEV](./docs/sev.md) - speicifc documentation to get the project working with AMD SEV machines. * [Troubleshooting](./docs/troubleshooting.md) - tips to debug when things go sideways. diff --git a/conf-files/knative_controller_custom_certs.yaml.j2 b/conf-files/knative_controller_custom_certs.yaml.j2 new file mode 100644 index 0000000..d8e02e8 --- /dev/null +++ b/conf-files/knative_controller_custom_certs.yaml.j2 @@ -0,0 +1,17 @@ +spec: + template: + spec: + containers: + - name: controller + volumeMounts: + - name: custom-certs + mountPath: {{ path_to_certs }} + env: + - name: SSL_CERT_DIR + value: {{ path_to_certs }} + dnsPolicy: ClusterFirstWithHostNet + hostNetwork: true + volumes: + - name: custom-certs + secret: + secretName: {{ secret_name }} diff --git a/docs/registry.md b/docs/registry.md new file mode 100644 index 0000000..4736f2b --- /dev/null +++ b/docs/registry.md @@ -0,0 +1,34 @@ +# Using a Local Registry + +In order to use a local image registry we need to configure `containerd`, +`Kata`, and `containerd` to like our home-baked registry. In addition, Kata does +not seem to be able to use HTTP registries inside the guest, so we need to go an extra +step and configure HTTPS certificates for our registry too. + +To this extent, we first create a self-signed certificate, and give it the +ALT name of our home-made registry. We must also include an entry in our DNS +records to match our local (reachable from within the guest) IP, to this +registry name. + +Second, we need to update the docker config to include our certificates for +this registry, as well as containerd's. + +Third, we need to include both the updated `/etc/hosts` file with the DNS +entries, as well as the certificate, inside the agent's `initrd`. + +Finally, we need to configure Knative to accept self-signed certificates. To +do so, we need to update the `controller` deployment by applying a [patch]( +./conf-files/knative_controller_custom_certs.yaml.j2). + +All this process is automated when we start the local registry with the provided +task: + +```bash +inv registry.start +``` + +and is undone when we stop it: + +```bash +inv registry.stop +``` diff --git a/eval/README.md b/eval/README.md index 3a7b6b0..05144ef 100644 --- a/eval/README.md +++ b/eval/README.md @@ -33,12 +33,9 @@ Signing and encryption is an interactive process, hence why we do it once, in advance of the evaluation: ```bash -# First encrypt (and sign) the image -inv skopeo.encrypt-container-image "ghcr.io/csegarragonz/coco-helloworld-py:unencrypted" --sign - -# Then sign the unencrypted images used -inv cosign.sign-container-image "ghcr.io/csegarragonz/coco-helloworld-py:unencrypted" -inv cosign.sign-container-image "ghcr.io/csegarragonz/coco-knative-sidecar:unencrypted" +# Enter an empty passphrase or click 'y' when prompted (it will happen +# many times) +inv eval.images.upload ``` Now you are ready to run one of the experiments: @@ -47,6 +44,7 @@ Now you are ready to run one of the experiments: * [Memory Size](#memory-size) - impact on initial VM memory size on start-up time. * [VM Start-Up](#vm-start-up) - breakdown of the cVM start-up costs * [Image Pull Costs](#image-pull) - breakdown of the costs associated to pulling an image on the guest. +* [Throughput Detail](#throughput-detail) - breakdown of the costs associated to starting many services concurrently. ### Start-Up Costs @@ -190,6 +188,12 @@ run: inv kata.replace-agent ``` +In addition, we want to configure the right debug logging settings: + +```bash +inv kata.set-log-level debug containerd.set-log-level debug ovmf.set-log-level info +``` + After that, you may run the experiment with: ```bash @@ -215,6 +219,45 @@ You can see the plot below: ![plot](./plots/image-pull/image_pull.png) +### Throughput Detail + +In this experiment, we pick one of the baselines in the [instantiation throughput]( +#instantiation-throughput) experiment, and try to analyze why the start-up +latency increases linearly with the number of concurrent requests. + +To do so, we pick one of the data points in the aforementioned plot. In +particular, we pick the most secure baseline (`coco-fw-sig-enc`), and the +highest concurrency level (`16`), and record the timestamps of the basic VM +creation events (as reported in the [start-up costs](#start-up-costs) plot). + +Given the amount of concurrent services, we want to use a more succinct +logging configuration: + +```bash +inv kata.set-log-level info containerd.set-log-level info +``` + +> Note that, given the volume of services we spin up, getting the logs from +> `containerd` is unreliable, as `journalctl` will drop lines. Thus, for this +> experiment we use the slightly less precise Kubernetes event's timestamps. + +To run the experiment you may run: + +```bash +inv eval.xput-detail.run +``` + +and you may plot the results using: + +```bash +inv eval.xput-detail.plot +``` + +which generates a plot in [`./plots/xput-detail/xput_detail.png`]( +./plots/xput-detail/xput_detail.png). You can also see the plot below: + +![plot](./plots/xput-detail/xput_detail.png) + ## Benchmarks TODO diff --git a/eval/apps/xput-detail/service.yaml.j2 b/eval/apps/xput-detail/service.yaml.j2 new file mode 100644 index 0000000..b36332c --- /dev/null +++ b/eval/apps/xput-detail/service.yaml.j2 @@ -0,0 +1,26 @@ +apiVersion: serving.knative.dev/v1 +kind: Service +metadata: + name: helloworld-knative-{{ service_num }} + annotations: + "features.knative.dev/podspec-runtimeclassname": "enabled" +spec: + template: + metadata: + labels: + apps.coco-serverless/name: helloworld-py + io.katacontainers.config.pre_attestation.enabled: "false" + spec: + {% if runtime_class is defined %} + runtimeClassName: {{ runtime_class }} + # coco-knative: need to run user container as root + securityContext: + runAsUser: 1000 + {% endif %} + containers: + - image: {{ image_repo }}/{{ image_name }}:{{ image_tag }} + ports: + - containerPort: 8080 + env: + - name: TARGET + value: "World" diff --git a/eval/plots/xput-detail/xput_detail.pdf b/eval/plots/xput-detail/xput_detail.pdf new file mode 100644 index 0000000..8081a01 Binary files /dev/null and b/eval/plots/xput-detail/xput_detail.pdf differ diff --git a/eval/plots/xput-detail/xput_detail.png b/eval/plots/xput-detail/xput_detail.png new file mode 100644 index 0000000..027ae58 Binary files /dev/null and b/eval/plots/xput-detail/xput_detail.png differ diff --git a/eval/results/image-pull/coco-fw-sig-enc_cold.csv b/eval/results/image-pull/coco-fw-sig-enc_cold.csv index 0563cd1..e47098e 100644 --- a/eval/results/image-pull/coco-fw-sig-enc_cold.csv +++ b/eval/results/image-pull/coco-fw-sig-enc_cold.csv @@ -1,25 +1,25 @@ Run,ImageName,Event,TimeStampMs -0,sidecar,StartGCImagePull,1698704327.368399 -0,sidecar,StartPullManifest,1698704327.376768 -0,sidecar,EndPullManifest,1698704327.990064 -0,sidecar,StartSignatureValidation,1698704327.990064 -0,sidecar,EndSignatureValidation,1698704329.296337 -0,sidecar,StartPullLayers,1698704329.296337 -0,sidecar,StartPullSingleLayer,1698704329.296337 -0,sidecar,EndPullSingleLayer,1698704330.0468361 -0,sidecar,StartHandleSingleLayer,1698704330.0468361 -0,sidecar,EndPullLayers,1698704330.819208 -0,sidecar,EndHandleSingleLayer,1698704330.819208 -0,sidecar,EndGCImagePull,1698704330.823149 -0,app,StartGCImagePull,1698704320.396366 -0,app,StartPullManifest,1698704320.411405 -0,app,EndPullManifest,1698704321.025102 -0,app,StartSignatureValidation,1698704321.025102 -0,app,EndSignatureValidation,1698704322.248573 -0,app,StartPullLayers,1698704322.248573 -0,app,StartPullSingleLayer,1698704322.248573 -0,app,EndPullSingleLayer,1698704322.8777323 -0,app,StartHandleSingleLayer,1698704322.8777323 -0,app,EndPullLayers,1698704327.303738 -0,app,EndHandleSingleLayer,1698704327.303738 -0,app,EndGCImagePull,1698704327.307644 +0,sidecar,StartGCImagePull,1698763808.3409 +0,sidecar,StartPullManifest,1698763808.354979 +0,sidecar,EndPullManifest,1698763809.100515 +0,sidecar,StartSignatureValidation,1698763809.100515 +0,sidecar,EndSignatureValidation,1698763810.754743 +0,sidecar,StartPullLayers,1698763810.754743 +0,sidecar,StartPullSingleLayer,1698763810.754743 +0,sidecar,EndPullSingleLayer,1698763812.3743262 +0,sidecar,StartHandleSingleLayer,1698763812.3743262 +0,sidecar,EndPullLayers,1698763812.793883 +0,sidecar,EndHandleSingleLayer,1698763812.793883 +0,sidecar,EndGCImagePull,1698763812.797953 +0,app,StartGCImagePull,1698763800.572303 +0,app,StartPullManifest,1698763800.590206 +0,app,EndPullManifest,1698763801.359332 +0,app,StartSignatureValidation,1698763801.359332 +0,app,EndSignatureValidation,1698763802.833511 +0,app,StartPullLayers,1698763802.833511 +0,app,StartPullSingleLayer,1698763802.833511 +0,app,EndPullSingleLayer,1698763804.1814477 +0,app,StartHandleSingleLayer,1698763804.1814477 +0,app,EndPullLayers,1698763808.241138 +0,app,EndHandleSingleLayer,1698763808.241138 +0,app,EndGCImagePull,1698763808.244155 diff --git a/eval/results/xput-detail/coco-fw-sig-enc_1.csv b/eval/results/xput-detail/coco-fw-sig-enc_1.csv new file mode 100644 index 0000000..8df3770 --- /dev/null +++ b/eval/results/xput-detail/coco-fw-sig-enc_1.csv @@ -0,0 +1,6 @@ +ServiceId,Event,TimeStampSecs +0,Initialized,1698784625.0 +0,PodScheduled,1698784625.0 +0,SandboxReady,1698784631.003848 +0,Ready,1698784642.0 +0,ContainersReady,1698784642.0 diff --git a/eval/results/xput-detail/coco-fw-sig-enc_16.csv b/eval/results/xput-detail/coco-fw-sig-enc_16.csv new file mode 100644 index 0000000..7fb4b86 --- /dev/null +++ b/eval/results/xput-detail/coco-fw-sig-enc_16.csv @@ -0,0 +1,81 @@ +ServiceId,Event,TimeStampSecs +0,Initialized,1698932745.0 +0,PodScheduled,1698932745.0 +0,SandboxReady,1698932754.265692 +0,Ready,1698932868.0 +0,ContainersReady,1698932868.0 +8,Initialized,1698932745.0 +8,PodScheduled,1698932745.0 +8,SandboxReady,1698932757.634508 +8,Ready,1698932872.0 +8,ContainersReady,1698932872.0 +9,Initialized,1698932746.0 +9,PodScheduled,1698932746.0 +9,SandboxReady,1698932757.652055 +9,Ready,1698932876.0 +9,ContainersReady,1698932876.0 +11,Initialized,1698932746.0 +11,PodScheduled,1698932746.0 +11,SandboxReady,1698932757.703852 +11,Ready,1698932880.0 +11,ContainersReady,1698932880.0 +10,Initialized,1698932746.0 +10,PodScheduled,1698932746.0 +10,SandboxReady,1698932757.744147 +10,Ready,1698932883.0 +10,ContainersReady,1698932883.0 +1,Initialized,1698932745.0 +1,PodScheduled,1698932745.0 +1,SandboxReady,1698932757.834407 +1,Ready,1698932888.0 +1,ContainersReady,1698932888.0 +2,Initialized,1698932748.0 +2,PodScheduled,1698932748.0 +2,SandboxReady,1698932760.166568 +2,Ready,1698932891.0 +2,ContainersReady,1698932891.0 +13,Initialized,1698932747.0 +13,PodScheduled,1698932747.0 +13,SandboxReady,1698932760.195484 +13,Ready,1698932894.0 +13,ContainersReady,1698932894.0 +12,Initialized,1698932747.0 +12,PodScheduled,1698932747.0 +12,SandboxReady,1698932760.214206 +12,Ready,1698932899.0 +12,ContainersReady,1698932899.0 +14,Initialized,1698932747.0 +14,PodScheduled,1698932747.0 +14,SandboxReady,1698932760.228735 +14,Ready,1698932902.0 +14,ContainersReady,1698932902.0 +15,Initialized,1698932748.0 +15,PodScheduled,1698932748.0 +15,SandboxReady,1698932760.229141 +15,Ready,1698932906.0 +15,ContainersReady,1698932906.0 +3,Initialized,1698932749.0 +3,PodScheduled,1698932749.0 +3,SandboxReady,1698932760.366882 +3,Ready,1698932909.0 +3,ContainersReady,1698932909.0 +5,Initialized,1698932750.0 +5,PodScheduled,1698932750.0 +5,SandboxReady,1698932760.57967 +5,Ready,1698932912.0 +5,ContainersReady,1698932912.0 +4,Initialized,1698932749.0 +4,PodScheduled,1698932749.0 +4,SandboxReady,1698932760.580918 +4,Ready,1698932915.0 +4,ContainersReady,1698932915.0 +6,Initialized,1698932750.0 +6,PodScheduled,1698932750.0 +6,SandboxReady,1698932760.605338 +6,Ready,1698932919.0 +6,ContainersReady,1698932919.0 +7,Initialized,1698932750.0 +7,PodScheduled,1698932750.0 +7,SandboxReady,1698932760.695114 +7,Ready,1698932923.0 +7,ContainersReady,1698932923.0 diff --git a/eval/results/xput-detail/coco-fw-sig-enc_2.csv b/eval/results/xput-detail/coco-fw-sig-enc_2.csv new file mode 100644 index 0000000..b29f300 --- /dev/null +++ b/eval/results/xput-detail/coco-fw-sig-enc_2.csv @@ -0,0 +1 @@ +Run,ServiceId,Event,TimeStampSecs diff --git a/eval/results/xput-detail/coco-fw-sig-enc_4.csv b/eval/results/xput-detail/coco-fw-sig-enc_4.csv new file mode 100644 index 0000000..8647b8f --- /dev/null +++ b/eval/results/xput-detail/coco-fw-sig-enc_4.csv @@ -0,0 +1,21 @@ +ServiceId,Event,TimeStampSecs +0,Initialized,1698785143.0 +0,PodScheduled,1698785143.0 +0,SandboxReady,1698785151.358817 +0,Ready,1698785182.0 +0,ContainersReady,1698785182.0 +2,Initialized,1698785144.0 +2,PodScheduled,1698785144.0 +2,SandboxReady,1698785151.607405 +2,Ready,1698785185.0 +2,ContainersReady,1698785185.0 +1,Initialized,1698785143.0 +1,PodScheduled,1698785143.0 +1,SandboxReady,1698785151.639349 +1,Ready,1698785189.0 +1,ContainersReady,1698785189.0 +3,Initialized,1698785144.0 +3,PodScheduled,1698785144.0 +3,SandboxReady,1698785151.931775 +3,Ready,1698785192.0 +3,ContainersReady,1698785192.0 diff --git a/eval/results/xput-detail/coco-fw-sig-enc_8.csv b/eval/results/xput-detail/coco-fw-sig-enc_8.csv new file mode 100644 index 0000000..4f85b15 --- /dev/null +++ b/eval/results/xput-detail/coco-fw-sig-enc_8.csv @@ -0,0 +1,41 @@ +Run,ServiceId,Event,TimeStampSecs +1,Initialized,1698774180.0 +1,PodScheduled,1698774180.0 +1,SandboxReady,1698774187.615061 +1,Ready,1698774247.0 +1,ContainersReady,1698774247.0 +0,Initialized,1698774180.0 +0,PodScheduled,1698774180.0 +0,SandboxReady,1698774187.648987 +0,Ready,1698774251.0 +0,ContainersReady,1698774251.0 +2,Initialized,1698774180.0 +2,PodScheduled,1698774180.0 +2,SandboxReady,1698774189.995768 +2,Ready,1698774255.0 +2,ContainersReady,1698774255.0 +4,Initialized,1698774181.0 +4,PodScheduled,1698774181.0 +4,SandboxReady,1698774190.651021 +4,Ready,1698774259.0 +4,ContainersReady,1698774259.0 +3,Initialized,1698774181.0 +3,PodScheduled,1698774181.0 +3,SandboxReady,1698774190.669562 +3,Ready,1698774263.0 +3,ContainersReady,1698774263.0 +5,Initialized,1698774182.0 +5,PodScheduled,1698774182.0 +5,SandboxReady,1698774190.725512 +5,Ready,1698774267.0 +5,ContainersReady,1698774267.0 +6,Initialized,1698774182.0 +6,PodScheduled,1698774182.0 +6,SandboxReady,1698774190.931352 +6,Ready,1698774271.0 +6,ContainersReady,1698774271.0 +7,Initialized,1698774182.0 +7,PodScheduled,1698774182.0 +7,SandboxReady,1698774191.112247 +7,Ready,1698774275.0 +7,ContainersReady,1698774275.0 diff --git a/eval/results/xput-detail/ghcr.io_coco-fw-sig-enc_16.csv b/eval/results/xput-detail/ghcr.io_coco-fw-sig-enc_16.csv new file mode 100644 index 0000000..5c0fd83 --- /dev/null +++ b/eval/results/xput-detail/ghcr.io_coco-fw-sig-enc_16.csv @@ -0,0 +1,81 @@ +ServiceId,Event,TimeStampSecs +1,Initialized,1699289349.0 +1,PodScheduled,1699289349.0 +1,SandboxReady,1699289356.375248 +1,Ready,1699289436.0 +1,ContainersReady,1699289436.0 +0,Initialized,1699289349.0 +0,PodScheduled,1699289349.0 +0,SandboxReady,1699289358.698973 +0,Ready,1699289472.0 +0,ContainersReady,1699289472.0 +11,Initialized,1699289350.0 +11,PodScheduled,1699289350.0 +11,SandboxReady,1699289361.99565 +11,Ready,1699289475.0 +11,ContainersReady,1699289475.0 +10,Initialized,1699289350.0 +10,PodScheduled,1699289350.0 +10,SandboxReady,1699289362.014218 +10,Ready,1699289478.0 +10,ContainersReady,1699289478.0 +9,Initialized,1699289350.0 +9,PodScheduled,1699289350.0 +9,SandboxReady,1699289362.015035 +9,Ready,1699289481.0 +9,ContainersReady,1699289481.0 +8,Initialized,1699289350.0 +8,PodScheduled,1699289350.0 +8,SandboxReady,1699289362.158476 +8,Ready,1699289486.0 +8,ContainersReady,1699289486.0 +13,Initialized,1699289351.0 +13,PodScheduled,1699289351.0 +13,SandboxReady,1699289362.330482 +13,Ready,1699289489.0 +13,ContainersReady,1699289489.0 +3,Initialized,1699289353.0 +3,PodScheduled,1699289353.0 +3,SandboxReady,1699289362.331177 +3,Ready,1699289492.0 +3,ContainersReady,1699289492.0 +12,Initialized,1699289352.0 +12,PodScheduled,1699289352.0 +12,SandboxReady,1699289362.463558 +12,Ready,1699289496.0 +12,ContainersReady,1699289496.0 +14,Initialized,1699289352.0 +14,PodScheduled,1699289352.0 +14,SandboxReady,1699289362.463994 +14,Ready,1699289499.0 +14,ContainersReady,1699289499.0 +15,Initialized,1699289353.0 +15,PodScheduled,1699289353.0 +15,SandboxReady,1699289362.528699 +15,Ready,1699289502.0 +15,ContainersReady,1699289502.0 +2,Initialized,1699289353.0 +2,PodScheduled,1699289353.0 +2,SandboxReady,1699289364.094684 +2,Ready,1699289506.0 +2,ContainersReady,1699289506.0 +7,Initialized,1699289354.0 +7,PodScheduled,1699289354.0 +7,SandboxReady,1699289364.88316 +7,Ready,1699289509.0 +7,ContainersReady,1699289509.0 +5,Initialized,1699289354.0 +5,PodScheduled,1699289354.0 +5,SandboxReady,1699289364.916036 +5,Ready,1699289512.0 +5,ContainersReady,1699289512.0 +4,Initialized,1699289354.0 +4,PodScheduled,1699289354.0 +4,SandboxReady,1699289365.051569 +4,Ready,1699289515.0 +4,ContainersReady,1699289515.0 +6,Initialized,1699289354.0 +6,PodScheduled,1699289354.0 +6,SandboxReady,1699289365.334982 +6,Ready,1699289518.0 +6,ContainersReady,1699289518.0 diff --git a/eval/results/xput-detail/registry.coco-csg.com_coco-fw-sig-enc_16.csv b/eval/results/xput-detail/registry.coco-csg.com_coco-fw-sig-enc_16.csv new file mode 100644 index 0000000..e698538 --- /dev/null +++ b/eval/results/xput-detail/registry.coco-csg.com_coco-fw-sig-enc_16.csv @@ -0,0 +1,81 @@ +ServiceId,Event,TimeStampSecs +0,Initialized,1699045110.0 +0,PodScheduled,1699045110.0 +0,SandboxReady,1699045118.412773 +0,Ready,1699045121.0 +0,ContainersReady,1699045121.0 +1,Initialized,1699045110.0 +1,PodScheduled,1699045110.0 +1,SandboxReady,1699045119.591819 +1,Ready,1699045124.0 +1,ContainersReady,1699045124.0 +8,Initialized,1699045111.0 +8,PodScheduled,1699045111.0 +8,SandboxReady,1699045120.764813 +8,Ready,1699045124.0 +8,ContainersReady,1699045124.0 +9,Initialized,1699045111.0 +9,PodScheduled,1699045111.0 +9,SandboxReady,1699045120.811926 +9,Ready,1699045125.0 +9,ContainersReady,1699045125.0 +6,Initialized,1699045115.0 +6,PodScheduled,1699045115.0 +6,SandboxReady,1699045125.614825 +6,Ready,1699045142.0 +6,ContainersReady,1699045142.0 +10,Initialized,1699045112.0 +10,PodScheduled,1699045112.0 +10,SandboxReady,1699045125.354928 +10,Ready,1699045142.0 +10,ContainersReady,1699045142.0 +11,Initialized,1699045112.0 +11,PodScheduled,1699045112.0 +11,SandboxReady,1699045125.298873 +11,Ready,1699045140.0 +11,ContainersReady,1699045140.0 +12,Initialized,1699045113.0 +12,PodScheduled,1699045113.0 +12,SandboxReady,1699045125.321036 +12,Ready,1699045140.0 +12,ContainersReady,1699045140.0 +13,Initialized,1699045113.0 +13,PodScheduled,1699045113.0 +13,SandboxReady,1699045125.38421 +13,Ready,1699045143.0 +13,ContainersReady,1699045143.0 +14,Initialized,1699045113.0 +14,PodScheduled,1699045113.0 +14,SandboxReady,1699045125.636437 +14,Ready,1699045142.0 +14,ContainersReady,1699045142.0 +15,Initialized,1699045113.0 +15,PodScheduled,1699045113.0 +15,SandboxReady,1699045125.835458 +15,Ready,1699045143.0 +15,ContainersReady,1699045143.0 +2,Initialized,1699045114.0 +2,PodScheduled,1699045114.0 +2,SandboxReady,1699045125.835458 +2,Ready,1699045144.0 +2,ContainersReady,1699045144.0 +3,Initialized,1699045114.0 +3,PodScheduled,1699045114.0 +3,SandboxReady,1699045125.835458 +3,Ready,1699045144.0 +3,ContainersReady,1699045144.0 +4,Initialized,1699045114.0 +4,PodScheduled,1699045114.0 +4,SandboxReady,1699045125.75888 +4,Ready,1699045143.0 +4,ContainersReady,1699045143.0 +5,Initialized,1699045115.0 +5,PodScheduled,1699045115.0 +5,SandboxReady,1699045125.835458 +5,Ready,1699045145.0 +5,ContainersReady,1699045145.0 +7,Initialized,1699045115.0 +7,PodScheduled,1699045115.0 +7,SandboxReady,1699045125.835458 +7,Ready,1699045143.0 +7,ContainersReady,1699045143.0 diff --git a/tasks/__init__.py b/tasks/__init__.py index 1bedad6..c5de02b 100644 --- a/tasks/__init__.py +++ b/tasks/__init__.py @@ -14,6 +14,7 @@ from . import operator from . import ovmf from . import qemu +from . import registry from . import sev from . import skopeo @@ -34,6 +35,7 @@ operator, ovmf, qemu, + registry, sev, skopeo, ) diff --git a/tasks/eval/__init__.py b/tasks/eval/__init__.py index 5117926..370b861 100644 --- a/tasks/eval/__init__.py +++ b/tasks/eval/__init__.py @@ -1,15 +1,19 @@ from invoke import Collection from . import image_pull +from . import images from . import mem_size from . import startup from . import vm_detail from . import xput +from . import xput_detail ns = Collection( image_pull, + images, mem_size, startup, vm_detail, xput, + xput_detail, ) diff --git a/tasks/eval/images.py b/tasks/eval/images.py new file mode 100644 index 0000000..2522fbb --- /dev/null +++ b/tasks/eval/images.py @@ -0,0 +1,18 @@ +from invoke import task +from tasks.eval.util.images import ( + ALL_CTR_REGISTRIES, + copy_images_to_registry, +) + + +@task +def upload(ctx, origin_repo="ghcr.io"): + """ + Upload all necessary docker images to run all the tests + + This method uploads images, signatures, and encrypted images for all + different services required in the evaluation, in all the different + container registries that we will use. + """ + for repo in ALL_CTR_REGISTRIES: + copy_images_to_registry(origin_repo, repo) diff --git a/tasks/eval/util/images.py b/tasks/eval/util/images.py new file mode 100644 index 0000000..9da9a74 --- /dev/null +++ b/tasks/eval/util/images.py @@ -0,0 +1,47 @@ +from os.path import join +from subprocess import run +from tasks.util.cosign import sign_container_image +from tasks.util.env import LOCAL_REGISTRY_URL +from tasks.util.skopeo import encrypt_container_image + +ALL_USED_IMAGES = { + "csegarragonz/coco-knative-sidecar": {"unencrypted"}, + "csegarragonz/coco-helloworld-py": {"unencrypted", "encrypted"}, +} + +ALL_CTR_REGISTRIES = ["ghcr.io", LOCAL_REGISTRY_URL] + + +def copy_images_to_registry(src_repo, dst_repo): + """ + Copy all images from one repo to another + """ + for image in ALL_USED_IMAGES: + for tag in ALL_USED_IMAGES[image]: + if tag == "unencrypted": + src_path = "{}:{}".format(join(src_repo, image), tag) + dst_path = "{}:{}".format(join(dst_repo, image), tag) + + # Push regular images + run("docker pull {}".format(src_path), shell=True, check=True) + run( + "docker tag {} {}".format(src_path, dst_path), + shell=True, + check=True, + ) + run("docker push {}".format(dst_path), shell=True, check=True) + # Tolerate rmi failing, as images should not be there to start off with + run( + "docker rmi {} {}".format(src_path, dst_path), + shell=True, + capture_output=True, + ) + + # Push signature for the image too + sign_container_image(dst_path) + elif tag == "encrypted": + # Note that it is the skopeo method that makes the tag encrypted + dst_path = "{}:unencrypted".format(join(dst_repo, image)) + encrypt_container_image(dst_path, sign=True) + else: + raise RuntimeError("Unrecognised image tag: {}".format(tag)) diff --git a/tasks/eval/util/pod.py b/tasks/eval/util/pod.py index 4f85f4d..703abb2 100644 --- a/tasks/eval/util/pod.py +++ b/tasks/eval/util/pod.py @@ -44,12 +44,14 @@ def wait_for_pod_ready_and_get_ts(pod_name): return get_pod_ready_ts(pod_name) -def get_sandbox_id_from_pod_name(pod_name): +def get_sandbox_id_from_pod_name(pod_name, timeout_mins=1): """ Get the sandbox ID from a pod name """ # The sandbox ID is in the ending pair of the RunPodSandbox event - event_json = get_event_from_containerd_logs("RunPodSandbox", pod_name, 2)[-1] + event_json = get_event_from_containerd_logs( + "RunPodSandbox", pod_name, 1, timeout_mins=timeout_mins + )[0] sbox_id = re_search( r'returns sandbox id \\"([a-zA-Z0-9]*)\\"', event_json["MESSAGE"] ).groups(1)[0] diff --git a/tasks/eval/util/setup.py b/tasks/eval/util/setup.py index f707374..2411344 100644 --- a/tasks/eval/util/setup.py +++ b/tasks/eval/util/setup.py @@ -4,7 +4,7 @@ from tasks.util.kbs import clear_kbs_db, provision_launch_digest -def setup_baseline(baseline, used_images): +def setup_baseline(baseline, used_images, image_repo=EXPERIMENT_IMAGE_REPO): """ Configure the system for a specific baseline @@ -30,7 +30,7 @@ def setup_baseline(baseline, used_images): # Configure signature policy (check image signature or not). We must do # this at the very end as it relies on: (i) the KBS DB being clear, and # (ii) the configuration file populated by the previous methods - images_to_sign = [join(EXPERIMENT_IMAGE_REPO, image) for image in used_images] + images_to_sign = [join(image_repo, image) for image in used_images] provision_launch_digest( images_to_sign, signature_policy=baseline_traits["signature_policy"], diff --git a/tasks/eval/xput_detail.py b/tasks/eval/xput_detail.py new file mode 100644 index 0000000..5c529a3 --- /dev/null +++ b/tasks/eval/xput_detail.py @@ -0,0 +1,360 @@ +from datetime import datetime +from invoke import task +from json import loads as json_loads +from matplotlib.patches import Patch +from matplotlib.pyplot import subplots +from os import makedirs +from os.path import exists, join +from pandas import read_csv +from tasks.eval.util.clean import cleanup_after_run +from tasks.eval.util.csv import init_csv_file, write_csv_line +from tasks.eval.util.env import ( + APPS_DIR, + BASELINES, + EXPERIMENT_IMAGE_REPO, + EVAL_TEMPLATED_DIR, + INTER_RUN_SLEEP_SECS, + PLOTS_DIR, + RESULTS_DIR, +) +from tasks.eval.util.setup import setup_baseline +from tasks.util.containerd import get_ts_for_containerd_event +from tasks.util.env import LOCAL_REGISTRY_URL +from tasks.util.k8s import template_k8s_file +from tasks.util.knative import replace_sidecar +from tasks.util.kubeadm import get_pod_names_in_ns, run_kubectl_command +from time import sleep + + +def do_run(result_file, baseline, image_repo, num_run, num_par_inst): + service_files = [ + "apps_xput-detail_{}_{}_service_{}.yaml".format(image_repo, baseline, i) + for i in range(num_par_inst) + ] + for service_file in service_files: + # Capture output to avoid verbose Knative logging + run_kubectl_command( + "apply -f {}".format(join(EVAL_TEMPLATED_DIR, service_file)), + capture_output=True, + ) + + # Get all pod names + pods = get_pod_names_in_ns("default") + while len(pods) != num_par_inst: + sleep(1) + pods = get_pod_names_in_ns("default") + + # Once we have all pod names, wait for all of them to be ready. We poll the + # pods in round-robin fashion, but we report the "Ready" timestamp as + # logged in Kubernetes, so it doesn't matter that much if we take a while + # to notice that we are done + ready_pods = {pod: False for pod in pods} + pods_ready_ts = {pod: None for pod in pods} + is_done = all(list(ready_pods.values())) + while not is_done: + + def is_pod_done(pod_name): + kube_cmd = "get pod {} -o jsonpath='{{..status.conditions}}'".format( + pod_name + ) + conditions = run_kubectl_command(kube_cmd, capture_output=True) + cond_json = json_loads(conditions) + return all([cond["status"] == "True" for cond in cond_json]) + + def get_pod_ready_ts(pod_name): + kube_cmd = "get pod {} -o jsonpath='{{..status.conditions}}'".format( + pod_name + ) + conditions = run_kubectl_command(kube_cmd, capture_output=True) + cond_json = json_loads(conditions) + for cond in cond_json: + if cond["type"] == "Ready": + return ( + datetime.fromisoformat( + cond["lastTransitionTime"][:-1] + ).timestamp(), + ) + + # Once all pods are ready, query for the relevant events for each pod + def get_events_for_pod(pod_id, pod_name): + events_ts = [] + + kube_cmd = "get pod {} -o jsonpath='{{..status.conditions}}'".format( + pod_name + ) + conditions = run_kubectl_command(kube_cmd, capture_output=True) + cond_json = json_loads(conditions) + + assert all( + [cond["status"] == "True" for cond in cond_json] + ), "Pod {} is not ready".format(pod_name) + + for cond in cond_json: + events_ts.append( + ( + cond["type"], + datetime.fromisoformat( + cond["lastTransitionTime"][:-1] + ).timestamp(), + ) + ) + + # Also get one event from containerd that indicates that the + # sandbox is ready + timeout_mins = 5 + vm_ready_ts = get_ts_for_containerd_event( + "RunPodSandbox", + pod_name, + timeout_mins=timeout_mins, + ) + events_ts.append(("SandboxReady", vm_ready_ts)) + + # Sort the events by timestamp and write them to a file + events_ts = sorted(events_ts, key=lambda x: x[1]) + for event in events_ts: + write_csv_line(result_file, pod_id, event[0], event[1]) + + for pod_id, pod_name in enumerate(pods): + # Skip finished pods + if ready_pods[pod_name]: + continue + + if is_pod_done(pod_name): + ready_pods[pod_name] = True + pods_ready_ts[pod_name] = get_pod_ready_ts(pod_name) + + # As soon as one pod is ready, we process the events from it + # to avoid containerd trimming the logs + print("Getting events for pod {}".format(pod_name)) + get_events_for_pod(pod_id, pod_name) + + is_done = all(list(ready_pods.values())) + sleep(1) + + # Remove the pods when we are done + for service_file in service_files: + run_kubectl_command( + "delete -f {}".format(join(EVAL_TEMPLATED_DIR, service_file)), + capture_output=True, + ) + for pod in pods: + run_kubectl_command("delete pod {}".format(pod), capture_output=True) + + +@task +def run(ctx, repo=None): + """ + Measure the costs associated with starting a fixed number of concurrent + services + """ + baselines_to_run = ["coco-fw-sig-enc"] + image_repos = [EXPERIMENT_IMAGE_REPO, LOCAL_REGISTRY_URL] + num_parallel_instances = [16] + num_runs = 1 + + if repo is not None: + if repo in image_repos: + image_repos = [repo] + else: + raise RuntimeError("Unrecognised image repository: {}".format(repo)) + + results_dir = join(RESULTS_DIR, "xput-detail") + if not exists(results_dir): + makedirs(results_dir) + + if not exists(EVAL_TEMPLATED_DIR): + makedirs(EVAL_TEMPLATED_DIR) + + service_template_file = join(APPS_DIR, "xput-detail", "service.yaml.j2") + image_name = "csegarragonz/coco-helloworld-py" + used_images = ["csegarragonz/coco-knative-sidecar", image_name] + + for image_repo in image_repos: + replace_sidecar(image_repo=image_repo, quiet=True) + + for bline in baselines_to_run: + baseline_traits = BASELINES[bline] + + # Template as many service files as parallel instances + for i in range(max(num_parallel_instances)): + service_file = join( + EVAL_TEMPLATED_DIR, + "apps_xput-detail_{}_{}_service_{}.yaml".format( + image_repo, bline, i + ), + ) + template_vars = { + "image_repo": image_repo, + "image_name": image_name, + "image_tag": baseline_traits["image_tag"], + "service_num": i, + } + if len(baseline_traits["runtime_class"]) > 0: + template_vars["runtime_class"] = baseline_traits["runtime_class"] + template_k8s_file(service_template_file, service_file, template_vars) + + # Second, run any baseline-specific set-up + setup_baseline(bline, used_images, image_repo) + + for num_par in num_parallel_instances: + # Prepare the result file + result_file = join( + results_dir, "{}_{}_{}.csv".format(image_repo, bline, num_par) + ) + init_csv_file(result_file, "ServiceId,Event,TimeStampSecs") + + for nr in range(num_runs): + print( + "Executing baseline {} ({} par srv, {}) run {}/{}...".format( + bline, num_par, image_repo, nr + 1, num_runs + ) + ) + do_run(result_file, bline, image_repo, nr, num_par) + sleep(INTER_RUN_SLEEP_SECS) + cleanup_after_run(bline, used_images) + + +@task +def plot(ctx): + """ + Plot the costs associated with starting a fixed number of concurrent + services + """ + results_dir = join(RESULTS_DIR, "xput-detail") + plots_dir = join(PLOTS_DIR, "xput-detail") + baseline = "coco-fw-sig-enc" + num_par_instances = 16 + image_repos = [EXPERIMENT_IMAGE_REPO, LOCAL_REGISTRY_URL] + + results_file = join(results_dir, "{}_{}.csv".format(baseline, num_par_instances)) + + # Collect results + results_dict = {} + for image_repo in image_repos: + results_file = join( + results_dir, "{}_{}_{}.csv".format(image_repo, baseline, num_par_instances) + ) + results_dict[image_repo] = {} + results = read_csv(results_file) + service_ids = set(results["ServiceId"].to_list()) + for service_id in service_ids: + results_dict[image_repo][service_id] = {} + service_results = results[results.ServiceId == service_id] + groupped = service_results.groupby("Event", as_index=False) + events = list(groupped.groups.keys()) + for event in events: + results_dict[image_repo][service_id][event] = { + "mean": service_results[service_results.Event == event][ + "TimeStampSecs" + ].mean(), + "sem": service_results[service_results.Event == event][ + "TimeStampSecs" + ].sem(), + } + + ordered_events = { + "schedule + make-pod-sandbox": ("PodScheduled", "SandboxReady"), + "pull-images + start-containrs": ("SandboxReady", "ContainersReady"), + } + color_for_event = { + "schedule + make-pod-sandbox": "blue", + "pull-images + start-containrs": "yellow", + } + pattern_for_repo = {EXPERIMENT_IMAGE_REPO: "x", LOCAL_REGISTRY_URL: "|"} + name_for_repo = {EXPERIMENT_IMAGE_REPO: "ghcr", LOCAL_REGISTRY_URL: "local"} + + assert list(color_for_event.keys()) == list(ordered_events.keys()) + assert list(pattern_for_repo.keys()) == list(name_for_repo.keys()) + + # -------------------------- + # Time-series of the different services instantiation + # -------------------------- + + fig, ax = subplots() + + bar_height = 0.5 + + for ind, repo in enumerate(image_repos): + # Y coordinate of the bar + ys = [] + # Width of each bar + widths = [] + # x-axis offset of each bar + xs = [] + # labels = [] + colors = [] + + x_origin = min( + [results_dict[repo][s_id]["PodScheduled"]["mean"] for s_id in service_ids] + ) + + service_ids = sorted( + service_ids, + key=lambda x: results_dict[repo][x]["ContainersReady"]["mean"] + - results_dict[repo][x]["PodScheduled"]["mean"], + ) + + for num, service_id in enumerate(service_ids): + for event in ordered_events: + start_ev = ordered_events[event][0] + end_ev = ordered_events[event][1] + x_left = results_dict[repo][service_id][start_ev]["mean"] + x_right = results_dict[repo][service_id][end_ev]["mean"] + widths.append(x_right - x_left) + xs.append(x_left - x_origin) + ys.append(num * (bar_height * 2) + bar_height * ind) + colors.append(color_for_event[event]) + + ax.barh( + ys, + widths, + height=bar_height, + left=xs, + align="edge", + edgecolor="black", + color=colors, + hatch=pattern_for_repo[repo], + alpha=1 - 0.7 * ind, + ) + + # Misc + ax.set_xlabel("Time [s]") + ax.set_ylim(bottom=0, top=(len(service_ids)) * (bar_height * 2)) + ax.set_ylabel("Knative Service Id") + yticks = [i * (bar_height * 2) for i in range(len(service_ids) + 1)] + yticks_minor = [(i + 0.5) * (bar_height * 2) for i in range(len(service_ids))] + ytick_labels = ["S{}".format(i) for i in range(len(service_ids))] + ax.set_yticks(yticks) + ax.set_yticks(yticks_minor, minor=True) + ax.set_yticklabels(ytick_labels, minor=True) + ax.set_yticklabels([]) + title_str = "Breakdown of the time spent starting 16 services in parallel\n" + title_str += "(baseline: {})\n".format( + baseline, + ) + ax.set_title(title_str) + + # Manually craft the legend + legend_handles = [] + for event in ordered_events: + legend_handles.append( + Patch( + facecolor=color_for_event[event], + edgecolor="black", + label=event, + ) + ) + for ind, repo in enumerate(image_repos): + legend_handles.append( + Patch( + hatch=pattern_for_repo[repo], + facecolor="white", + edgecolor="black", + label="Image registry: {}".format(name_for_repo[repo]), + ) + ) + ax.legend(handles=legend_handles, bbox_to_anchor=(1.05, 1.05)) + + for plot_format in ["pdf", "png"]: + plot_file = join(plots_dir, "xput_detail.{}".format(plot_format)) + fig.savefig(plot_file, format=plot_format, bbox_inches="tight") diff --git a/tasks/kata.py b/tasks/kata.py index 32379e7..1a72655 100644 --- a/tasks/kata.py +++ b/tasks/kata.py @@ -1,12 +1,19 @@ from invoke import task -from os import makedirs -from os.path import dirname, join +from os.path import join from subprocess import run -from tasks.util.env import KATA_CONFIG_DIR, KATA_IMG_DIR, KATA_RUNTIMES, PROJ_ROOT -from tasks.util.toml import remove_entry_from_toml, update_toml +from tasks.util.env import ( + COCO_ROOT, + KATA_CONFIG_DIR, + KATA_RUNTIMES, +) +from tasks.util.kata import ( + KATA_AGENT_SOURCE_DIR, + KATA_SOURCE_DIR, + replace_agent as do_replace_agent, +) +from tasks.util.toml import update_toml -KATA_SOURCE_DIR = join(PROJ_ROOT, "..", "kata-containers") -KATA_AGENT_SOURCE_DIR = join(KATA_SOURCE_DIR, "src", "agent") +KATA_SHIM_SOURCE_DIR = join(KATA_SOURCE_DIR, "src", "runtime") @task @@ -44,7 +51,7 @@ def set_log_level(ctx, log_level): @task -def replace_agent(ctx, agent_source_dir=KATA_AGENT_SOURCE_DIR): +def replace_agent(ctx, agent_source_dir=KATA_AGENT_SOURCE_DIR, extra_files=None): """ Replace the kata-agent with a custom-built one @@ -56,62 +63,37 @@ def replace_agent(ctx, agent_source_dir=KATA_AGENT_SOURCE_DIR): 3. Replace the init process by the new kata agent 4. Re-build the initrd 5. Update the kata config to point to the new initrd + + By using the extra_flags optional argument, you can pass a dictionary of + host_path: guest_path pairs of files you want to be included in the initrd. """ - # Use a hardcoded path, as we want to always start from a _clean_ initrd - initrd_path = join(KATA_IMG_DIR, "kata-containers-initrd-sev.img") - - # Make empty temporary dir to expand the initrd filesystem - workdir = "/tmp/qemu-sev-initrd" - run("sudo rm -rf {}".format(workdir), shell=True, check=True) - makedirs(workdir) - - # sudo unpack the initrd filesystem - zcat_cmd = "sudo bash -c 'zcat {} | cpio -idmv'".format(initrd_path) - run(zcat_cmd, shell=True, check=True, cwd=workdir) - - # Copy our newly built kata-agent into `/usr/bin/kata-agent` as this is the - # path expected by the kata initrd_builder.sh script - agent_host_path = join( - KATA_AGENT_SOURCE_DIR, - "target", - "x86_64-unknown-linux-musl", - "release", - "kata-agent", - ) - agent_initrd_path = join(workdir, "usr/bin/kata-agent") - cp_cmd = "sudo cp {} {}".format(agent_host_path, agent_initrd_path) - run(cp_cmd, shell=True, check=True) - - # We also need to manually copy the agent to /sbin/init (note that - # /init is a symlink to /sbin/init) - alt_agent_initrd_path = join(workdir, "sbin", "init") - run("sudo rm {}".format(alt_agent_initrd_path), shell=True, check=True) - cp_cmd = "sudo cp {} {}".format(agent_host_path, alt_agent_initrd_path) - run(cp_cmd, shell=True, check=True) - - # Pack the initrd again - initrd_builder_path = join( - KATA_SOURCE_DIR, "tools", "osbuilder", "initrd-builder", "initrd_builder.sh" - ) - new_initrd_path = join(dirname(initrd_path), "kata-containers-initrd-sev-csg.img") - work_env = {"AGENT_INIT": "yes"} - initrd_pack_cmd = "env && sudo {} -o {} {}".format( - initrd_builder_path, - new_initrd_path, - workdir, + do_replace_agent(agent_source_dir=agent_source_dir, extra_files=extra_files) + + +@task +def replace_shim(ctx, shim_source_dir=KATA_SHIM_SOURCE_DIR, revert=False): + """ + Replace the containerd-kata-shim with a custom one + + To replace the agent, we just need to change the soft-link from the right + shim to our re-built one + """ + # First, copy the binary from the source tree + src_shim_binary = join(shim_source_dir, "containerd-shim-kata-v2") + dst_shim_binary = join(COCO_ROOT, "bin", "containerd-shim-kata-v2-csg") + run( + "sudo cp {} {}".format(src_shim_binary, dst_shim_binary), shell=True, check=True ) - run(initrd_pack_cmd, shell=True, check=True, env=work_env) - # Lastly, update the Kata config to point to the new initrd - for runtime in KATA_RUNTIMES: - conf_file_path = join(KATA_CONFIG_DIR, "configuration-{}.toml".format(runtime)) - updated_toml_str = """ - [hypervisor.qemu] - initrd = "{new_initrd_path}" - """.format( - new_initrd_path=new_initrd_path - ) - update_toml(conf_file_path, updated_toml_str) + # Second, soft-link the SEV runtime to the right shim binary + if revert: + dst_shim_binary = join(COCO_ROOT, "bin", "containerd-shim-kata-v2") - if runtime == "qemu": - remove_entry_from_toml(conf_file_path, "hypervisor.qemu.image") + # This path is hardcoded in the containerd config/operator + sev_shim_binary = "/usr/local/bin/containerd-shim-kata-qemu-sev-v2" + + run( + "sudo ln -sf {} {}".format(dst_shim_binary, sev_shim_binary), + shell=True, + check=True, + ) diff --git a/tasks/kbs.py b/tasks/kbs.py index db1a13f..518a02f 100644 --- a/tasks/kbs.py +++ b/tasks/kbs.py @@ -50,11 +50,11 @@ def stop(ctx): @task -def clear_db(ctx): +def clear_db(ctx, skip_secrets=False): """ Clear the contents of the KBS DB """ - clear_kbs_db() + clear_kbs_db(skip_secrets=skip_secrets) @task @@ -84,7 +84,11 @@ def provision_launch_digest(ctx, signature_policy=SIGNATURE_POLICY_NONE, clean=F # policy to be included in the signature policy images_to_sign = [ "docker.io/csegarragonz/coco-helloworld-py", - "docker.io/csegarragonz/coco-knatve-sidecar", + "docker.io/csegarragonz/coco-knative-sidecar", + "ghcr.io/csegarragonz/coco-helloworld-py", + "ghcr.io/csegarragonz/coco-knative-sidecar", + "registry.coco-csg.com/csegarragonz/coco-helloworld-py", + "registry.coco-csg.com/csegarragonz/coco-knative-sidecar", ] do_provision_launch_digest( diff --git a/tasks/knative.py b/tasks/knative.py index 17366c8..624bf8d 100644 --- a/tasks/knative.py +++ b/tasks/knative.py @@ -1,9 +1,10 @@ from invoke import task -from os import makedirs -from os.path import exists, join -from subprocess import run -from tasks.util.env import CONF_FILES_DIR, TEMPLATED_FILES_DIR -from tasks.util.k8s import template_k8s_file +from os.path import join +from tasks.util.env import CONF_FILES_DIR +from tasks.util.knative import ( + configure_self_signed_certs as do_configure_self_signed_certs, + replace_sidecar as do_replace_sidecar, +) from tasks.util.kubeadm import run_kubectl_command, wait_for_pods_in_ns from time import sleep @@ -20,12 +21,6 @@ KOURIER_BASE_URL = "https://github.com/knative/net-kourier/releases/download" KOURIER_BASE_URL += "/knative-v{}".format(KNATIVE_VERSION) -# Knative Serving Side-Car Tag -KNATIVE_SIDECAR_IMAGE_TAG = "gcr.io/knative-releases/knative.dev/serving/cmd/" -KNATIVE_SIDECAR_IMAGE_TAG += ( - "queue@sha256:987f53e3ead58627e3022c8ccbb199ed71b965f10c59485bab8015ecf18b44af" -) - def install_kourier(): kube_cmd = "apply -f {}".format(join(KOURIER_BASE_URL, "kourier.yaml")) @@ -147,7 +142,7 @@ def install(ctx): wait_for_pods_in_ns(KNATIVE_NAMESPACE, label="app=default-domain") # Replace the sidecar to use an image we control - replace_sidecar(ctx) + do_replace_sidecar() print("Succesfully deployed Knative! The external IP is: {}".format(actual_ip)) @@ -181,7 +176,7 @@ def uninstall(ctx): @task -def replace_sidecar(ctx, reset_default=False): +def replace_sidecar(ctx, reset_default=False, image_repo="ghcr.io"): """ Replace Knative's side-car image with an image we control @@ -190,58 +185,12 @@ def replace_sidecar(ctx, reset_default=False): default side-car image. Instead, we re-tag the corresponding image, and update Knative's deployment ConfigMap to use our image. """ - k8s_filename = "knative_replace_sidecar.yaml" - - if reset_default: - in_k8s_file = join(CONF_FILES_DIR, "{}.j2".format(k8s_filename)) - out_k8s_file = join(TEMPLATED_FILES_DIR, k8s_filename) - template_k8s_file( - in_k8s_file, - out_k8s_file, - {"knative_sidecar_image_url": KNATIVE_SIDECAR_IMAGE_TAG}, - ) - run_kubectl_command("apply -f {}".format(out_k8s_file)) - return - - # Pull the right Knative Serving side-car image tag - docker_cmd = "docker pull {}".format(KNATIVE_SIDECAR_IMAGE_TAG) - run(docker_cmd, shell=True, check=True) - - # Re-tag it, and push it to our controlled registry - image_repo = "ghcr.io" - image_name = "csegarragonz/coco-knative-sidecar" - image_tag = "unencrypted" - new_image_url = "{}/{}:{}".format(image_repo, image_name, image_tag) - docker_cmd = "docker tag {} {}".format(KNATIVE_SIDECAR_IMAGE_TAG, new_image_url) - run(docker_cmd, shell=True, check=True) - - docker_cmd = "docker push {}".format(new_image_url) - run(docker_cmd, shell=True, check=True) - - # Get the digest for the recently pulled image, and use it to update - # Knative's deployment configmap - docker_cmd = 'docker images {} --digests --format "{{{{.Digest}}}}"'.format( - join(image_repo, image_name), - ) - image_digest = ( - run(docker_cmd, shell=True, capture_output=True).stdout.decode("utf-8").strip() - ) - assert len(image_digest) > 0 + do_replace_sidecar(reset_default, image_repo) - if not exists(TEMPLATED_FILES_DIR): - makedirs(TEMPLATED_FILES_DIR) - in_k8s_file = join(CONF_FILES_DIR, "{}.j2".format(k8s_filename)) - out_k8s_file = join(TEMPLATED_FILES_DIR, k8s_filename) - new_image_url_digest = "{}/{}@{}".format(image_repo, image_name, image_digest) - template_k8s_file( - in_k8s_file, out_k8s_file, {"knative_sidecar_image_url": new_image_url_digest} - ) - run_kubectl_command("apply -f {}".format(out_k8s_file)) - - # Finally, make sure to remove all pulled container images to avoid - # unintended caching issues with CoCo - docker_cmd = "docker rmi {}".format(KNATIVE_SIDECAR_IMAGE_TAG) - run(docker_cmd, shell=True, check=True) - docker_cmd = "docker rmi {}".format(new_image_url) - run(docker_cmd, shell=True, check=True) +@task +def configure_self_signed_certs(ctx, path_to_certs_dir): + """ + Configure Knative to like our self-signed certificates + """ + do_configure_self_signed_certs(path_to_certs_dir) diff --git a/tasks/registry.py b/tasks/registry.py new file mode 100644 index 0000000..ab27a17 --- /dev/null +++ b/tasks/registry.py @@ -0,0 +1,197 @@ +from invoke import task +from os import makedirs +from os.path import exists, join +from subprocess import run +from tasks.util.docker import is_ctr_running +from tasks.util.env import K8S_CONFIG_DIR, LOCAL_REGISTRY_URL +from tasks.util.env import get_node_url +from tasks.util.kata import replace_agent +from tasks.util.knative import configure_self_signed_certs +from tasks.util.kubeadm import run_kubectl_command +from tasks.util.toml import update_toml + +HOST_CERT_DIR = join(K8S_CONFIG_DIR, "local-registry") +GUEST_CERT_DIR = "/certs" +REGISTRY_KEY_FILE = "domain.key" +HOST_KEY_PATH = join(HOST_CERT_DIR, REGISTRY_KEY_FILE) +REGISTRY_CERT_FILE = "domain.crt" +HOST_CERT_PATH = join(HOST_CERT_DIR, REGISTRY_CERT_FILE) +REGISTRY_CTR_NAME = "csg-coco-registry" + +REGISTRY_IMAGE_TAG = "registry:2.7" + +K8S_SECRET_NAME = "csg-coco-registry-customca" + + +@task +def start(ctx): + """ + Configure a local container registry reachable from CoCo guests in K8s + """ + this_ip = get_node_url() + + # ---------- + # DNS Config + # ---------- + + # Add DNS entry (careful to be able to sudo-edit the file) + dns_file = "/etc/hosts" + dns_contents = ( + run("sudo cat {}".format(dns_file), shell=True, capture_output=True) + .stdout.decode("utf-8") + .strip() + .split("\n") + ) + + # Only write the DNS entry if it is not there yet + dns_line = "{} {}".format(this_ip, LOCAL_REGISTRY_URL) + must_write = not any([dns_line in line for line in dns_contents]) + + if must_write: + actual_dns_line = "\n# CSG: DNS entry for local registry\n{}".format(dns_line) + write_cmd = "sudo sh -c \"echo '{}' >> {}\"".format(actual_dns_line, dns_file) + run(write_cmd, shell=True, check=True) + + # If creating a new registry, also update the local SSL certificates + system_cert_path = "/usr/share/ca-certificates/coco_csg_registry.crt" + run( + "sudo cp {} {}".format(HOST_CERT_PATH, system_cert_path), + shell=True, + check=True, + ) + run("sudo dpkg-reconfigure ca-certificates") + + # ---------- + # Docker Registry Config + # ---------- + + # Create certificates for registry + if not exists(HOST_CERT_DIR): + makedirs(HOST_CERT_DIR) + + openssl_cmd = [ + "openssl req", + "-newkey rsa:4096", + "-nodes -sha256", + "-keyout {}".format(HOST_KEY_PATH), + '-addext "subjectAltName = DNS:{}"'.format(LOCAL_REGISTRY_URL), + "-x509 -days 365", + "-out {}".format(HOST_CERT_PATH), + ] + openssl_cmd = " ".join(openssl_cmd) + if not exists(HOST_CERT_PATH): + run(openssl_cmd, shell=True, check=True) + + # Start self-hosted local registry with HTTPS + docker_cmd = [ + "docker run -d", + "--restart=always", + "--name {}".format(REGISTRY_CTR_NAME), + "-v {}:{}".format(HOST_CERT_DIR, GUEST_CERT_DIR), + "-e REGISTRY_HTTP_ADDR=0.0.0.0:443", + "-e REGISTRY_HTTP_TLS_CERTIFICATE={}".format( + join(GUEST_CERT_DIR, REGISTRY_CERT_FILE) + ), + "-e REGISTRY_HTTP_TLS_KEY={}".format(join(GUEST_CERT_DIR, REGISTRY_KEY_FILE)), + "-p 443:443", + REGISTRY_IMAGE_TAG, + ] + docker_cmd = " ".join(docker_cmd) + if not is_ctr_running(REGISTRY_CTR_NAME): + out = run(docker_cmd, shell=True, capture_output=True) + assert out.returncode == 0, "Failed starting docker container: {}".format( + out.stderr + ) + else: + print("WARNING: skipping starting container as it is already running...") + + # Configure docker to be able to push to this registry + docker_certs_dir = join("/etc/docker/certs.d", LOCAL_REGISTRY_URL) + if not exists(docker_certs_dir): + run("sudo mkdir -p {}".format(docker_certs_dir), shell=True, check=True) + + docker_ca_cert_file = join(docker_certs_dir, "ca.crt") + cp_cmd = "sudo cp {} {}".format(HOST_CERT_PATH, docker_ca_cert_file) + run(cp_cmd, shell=True, check=True) + + # ---------- + # containerd config + # ---------- + + containerd_base_certs_dir = "/etc/containerd/certs.d" + updated_toml_str = """ + [plugins."io.containerd.grpc.v1.cri".registry] + config_path = "{containerd_base_certs_dir}" + """.format( + containerd_base_certs_dir=containerd_base_certs_dir + ) + update_toml("/etc/containerd/config.toml", updated_toml_str) + + # Add the correspnding configuration to containerd + containerd_certs_dir = join(containerd_base_certs_dir, LOCAL_REGISTRY_URL) + if not exists(containerd_certs_dir): + run("sudo mkdir -p {}".format(containerd_certs_dir), shell=True, check=True) + + containerd_certs_file = """ +server = "https://{registry_url}" + +[host."https://{registry_url}"] + skip_verify = true + """.format( + registry_url=LOCAL_REGISTRY_URL + ) + run( + "sudo sh -c \"echo '{}' > {}\"".format( + containerd_certs_file, join(containerd_certs_dir, "hosts.toml") + ), + shell=True, + check=True, + ) + + # Restart containerd to pick up the changes (?) + run("sudo service containerd restart", shell=True, check=True) + + # ---------- + # Kata config + # ---------- + + # Populate the right DNS config and certificate files in the agent + extra_files = { + dns_file: {"path": "/etc/hosts", "mode": "w"}, + HOST_CERT_PATH: {"path": "/etc/ssl/certs/ca-certificates.crt", "mode": "a"}, + } + replace_agent(ctx, extra_files=extra_files) + + # ---------- + # Knative config + # ---------- + + # First, create a k8s secret with the credentials + kube_cmd = ( + "-n knative-serving create secret generic {} --from-file=ca.crt={}".format( + K8S_SECRET_NAME, HOST_CERT_PATH + ) + ) + run_kubectl_command(kube_cmd) + + # Second, patch the controller deployment + configure_self_signed_certs(HOST_CERT_PATH, K8S_SECRET_NAME) + + +@task +def stop(ctx): + """ + Remove the container registry in the k8s cluster + + We follow the steps in start in reverse order, paying particular interest + to the steps that are not idempotent (e.g. creating a k8s secret). + """ + # For Knative, we only need to delete the secret, as the other bit is a + # patch to the controller deployment that can be applied again + kube_cmd = "-n knative-serving delete secret {}".format(K8S_SECRET_NAME) + run_kubectl_command(kube_cmd) + + # For Kata and containerd, all configuration is reversible, so we only + # need to sop the container image + docker_cmd = "docker run --rm -f {}".format(REGISTRY_CTR_NAME) + run(docker_cmd, shell=True, check=True) diff --git a/tasks/skopeo.py b/tasks/skopeo.py index 5771c9d..01e3275 100644 --- a/tasks/skopeo.py +++ b/tasks/skopeo.py @@ -1,48 +1,5 @@ -from base64 import b64encode from invoke import task -from json import loads as json_loads -from os.path import exists, join -from subprocess import run -from tasks.util.cosign import sign_container_image -from tasks.util.env import CONF_FILES_DIR, K8S_CONFIG_DIR -from tasks.util.guest_components import ( - start_coco_keyprovider, - stop_coco_keyprovider, -) -from tasks.util.kbs import create_kbs_secret - -SKOPEO_VERSION = "1.13.0" -SKOPEO_IMAGE = "quay.io/skopeo/stable:v{}".format(SKOPEO_VERSION) -SKOPEO_ENCRYPTION_KEY = join(K8S_CONFIG_DIR, "image_enc.key") -AA_CTR_ENCRYPTION_KEY = "/tmp/image_enc.key" - - -def run_skopeo_cmd(cmd, capture_stdout=False): - ocicrypt_conf_host = join(CONF_FILES_DIR, "ocicrypt.conf") - ocicrypt_conf_guest = "/ocicrypt.conf" - skopeo_cmd = [ - "docker run --rm", - "--net host", - "-e OCICRYPT_KEYPROVIDER_CONFIG={}".format(ocicrypt_conf_guest), - "-v {}:{}".format(ocicrypt_conf_host, ocicrypt_conf_guest), - "-v ~/.docker/config.json:/config.json", - SKOPEO_IMAGE, - cmd, - ] - skopeo_cmd = " ".join(skopeo_cmd) - if capture_stdout: - return ( - run(skopeo_cmd, shell=True, capture_output=True) - .stdout.decode("utf-8") - .strip() - ) - else: - run(skopeo_cmd, shell=True, check=True) - - -def create_encryption_key(): - cmd = "head -c32 < /dev/random > {}".format(SKOPEO_ENCRYPTION_KEY) - run(cmd, shell=True, check=True) +from tasks.util.skopeo import encrypt_container_image as do_encrypt_container_image @task @@ -50,56 +7,6 @@ def encrypt_container_image(ctx, image_tag, sign=False): """ Encrypt an OCI container image using Skopeo - The image tag must be provided in the format: docker.io//:tag + The image tag must be provided in the format: //: """ - encryption_key_resource_id = "default/image-encryption-key/1" - if not exists(SKOPEO_ENCRYPTION_KEY): - create_encryption_key() - - # We use CoCo's keyprovider server (that implements the ocicrypt protocol) - # to encrypt the OCI image. To that extent, we need to mount the encryption - # key somewhere that the attestation agent (in the keyprovider) can find - # it - start_coco_keyprovider(SKOPEO_ENCRYPTION_KEY, AA_CTR_ENCRYPTION_KEY) - - encrypted_image_tag = image_tag.split(":")[0] + ":encrypted" - skopeo_cmd = [ - "copy --insecure-policy", - "--authfile /config.json", - "--encryption-key", - "provider:attestation-agent:keyid=kbs:///{}::keypath={}".format( - encryption_key_resource_id, AA_CTR_ENCRYPTION_KEY - ), - "docker://{}".format(image_tag), - "docker://{}".format(encrypted_image_tag), - ] - skopeo_cmd = " ".join(skopeo_cmd) - run_skopeo_cmd(skopeo_cmd) - - # Stop the keyprovider when we are done encrypting layers - stop_coco_keyprovider() - - # Sanity check that the image is actually encrypted - inspect_jsonstr = run_skopeo_cmd( - "inspect --authfile /config.json docker://{}".format(encrypted_image_tag), - capture_stdout=True, - ) - inspect_json = json_loads(inspect_jsonstr) - layers = [ - layer["MIMEType"].endswith("tar+gzip+encrypted") - for layer in inspect_json["LayersData"] - ] - if not all(layers): - print("Some layers in image {} are not encrypted!".format(encrypted_image_tag)) - stop_coco_keyprovider() - raise RuntimeError("Image encryption failed!") - - # Create a secret in KBS with the encryption key. Skopeo needs it as raw - # bytes, whereas KBS wants it base64 encoded, so we do the conversion first - with open(SKOPEO_ENCRYPTION_KEY, "rb") as fh: - key_b64 = b64encode(fh.read()).decode() - - create_kbs_secret(encryption_key_resource_id, key_b64) - - if sign: - sign_container_image(encrypted_image_tag) + do_encrypt_container_image(image_tag, sign=sign) diff --git a/tasks/util/coco.py b/tasks/util/coco.py index 83eb960..710c1ed 100644 --- a/tasks/util/coco.py +++ b/tasks/util/coco.py @@ -1,5 +1,5 @@ from os.path import join -from tasks.util.env import KATA_CONFIG_DIR, KBS_PORT, get_kbs_url +from tasks.util.env import KATA_CONFIG_DIR, KBS_PORT, get_node_url from tasks.util.toml import read_value_from_toml, update_toml @@ -30,7 +30,7 @@ def guest_attestation(mode="off"): [hypervisor.qemu] guest_pre_attestation_kbs_uri = "{kbs_url}:{kbs_port}" """.format( - kbs_url=get_kbs_url(), kbs_port=KBS_PORT + kbs_url=get_node_url(), kbs_port=KBS_PORT ) update_toml(conf_file_path, updated_toml_str) diff --git a/tasks/util/containerd.py b/tasks/util/containerd.py index 475f874..a089f78 100644 --- a/tasks/util/containerd.py +++ b/tasks/util/containerd.py @@ -3,18 +3,27 @@ from time import sleep -def get_journalctl_containerd_logs(): - journalctl_cmd = 'sudo journalctl -xeu containerd --since "1 min ago" -o json' - out = ( - run(journalctl_cmd, shell=True, capture_output=True) - .stdout.decode("utf-8") - .strip() - .split("\n") - ) - return out +def get_journalctl_containerd_logs(timeout_mins=1): + """ + Get the journalctl logs for containerd + + We dump them to a temporary file to prevent the Popen output from being + clipped (or at least remove the chance of it being so) + """ + tmp_file = "/tmp/journalctl.log" + journalctl_cmd = "sudo journalctl -xeu containerd --no-tail " + journalctl_cmd += '--since "{} min ago" -o json > {}'.format(timeout_mins, tmp_file) + run(journalctl_cmd, shell=True, check=True) + + with open(tmp_file, "r") as fh: + lines = fh.readlines() + return lines -def get_event_from_containerd_logs(event_name, event_id, num_events): + +def get_event_from_containerd_logs( + event_name, event_id, num_events, extra_event_id=None, timeout_mins=1 +): """ Get the last `num_events` events in containerd logs that correspond to the `event_name` for sandbox/pod/container id `event_id` @@ -26,7 +35,7 @@ def get_event_from_containerd_logs(event_name, event_id, num_events): backoff_secs = 3 for i in range(num_repeats): try: - out = get_journalctl_containerd_logs() + out = get_journalctl_containerd_logs(timeout_mins) event_json = [] for o in out: @@ -45,7 +54,11 @@ def get_event_from_containerd_logs(event_name, event_id, num_events): event_name in o_json["MESSAGE"] and event_id in o_json["MESSAGE"] ): - event_json.append(o_json) + if ( + extra_event_id is None + or extra_event_id in o_json["MESSAGE"] + ): + event_json.append(o_json) except TypeError as e: print(o_json) print(e) @@ -70,11 +83,19 @@ def get_event_from_containerd_logs(event_name, event_id, num_events): continue -def get_ts_for_containerd_event(event_name, event_id, lower_bound=None): +def get_ts_for_containerd_event( + event_name, + event_id, + lower_bound=None, + extra_event_id=None, + timeout_mins=1, +): """ Get the journalctl timestamp for one event in the containerd logs """ - event_json = get_event_from_containerd_logs(event_name, event_id, 1)[0] + event_json = get_event_from_containerd_logs( + event_name, event_id, 1, extra_event_id=None, timeout_mins=timeout_mins + )[0] ts = int(event_json["__REALTIME_TIMESTAMP"]) / 1e6 if lower_bound is not None: @@ -87,12 +108,24 @@ def get_ts_for_containerd_event(event_name, event_id, lower_bound=None): return ts -def get_start_end_ts_for_containerd_event(event_name, event_id, lower_bound=None): +def get_start_end_ts_for_containerd_event( + event_name, + event_id, + lower_bound=None, + extra_event_id=None, + timeout_mins=1, +): """ Get the start and end timestamps (in epoch floating seconds) for a given event from the containerd journalctl logs """ - event_json = get_event_from_containerd_logs(event_name, event_id, 2) + event_json = get_event_from_containerd_logs( + event_name, + event_id, + 2, + extra_event_id=extra_event_id, + timeout_mins=timeout_mins, + ) start_ts = int(event_json[-2]["__REALTIME_TIMESTAMP"]) / 1e6 end_ts = int(event_json[-1]["__REALTIME_TIMESTAMP"]) / 1e6 diff --git a/tasks/util/docker.py b/tasks/util/docker.py new file mode 100644 index 0000000..4cdfad6 --- /dev/null +++ b/tasks/util/docker.py @@ -0,0 +1,15 @@ +from subprocess import run + + +def is_ctr_running(ctr_name): + """ + Work out whether a container is running or not + """ + docker_cmd = ["docker container inspect", "-f '{{.State.Running}}'", ctr_name] + docker_cmd = " ".join(docker_cmd) + out = run(docker_cmd, shell=True, capture_output=True) + if out.returncode == 0: + value = out.stdout.decode("utf-8").strip() + return value == "true" + + return False diff --git a/tasks/util/env.py b/tasks/util/env.py index 6b705b9..249647a 100644 --- a/tasks/util/env.py +++ b/tasks/util/env.py @@ -22,6 +22,10 @@ CRI_RUNTIME_SOCKET = "unix:///run/containerd/containerd.sock" FLANNEL_INSTALL_DIR = join(GLOBAL_INSTALL_DIR, "flannel") +# Image Registry config + +LOCAL_REGISTRY_URL = "registry.coco-csg.com" + # MicroK8s config UK8S_KUBECONFIG_FILE = join(K8S_CONFIG_DIR, "uk8s_kubeconfig") @@ -50,9 +54,11 @@ KBS_PORT = 44444 -def get_kbs_url(): +def get_node_url(): """ - Get the external KBS IP that can be reached from both host and guest + Get the external node IP that can be reached from both host and guest + + This IP is both used for the KBS, and for deploying a local docker registry. If the KBS is deployed using docker compose with host networking and the port is forwarded to the host (i.e. KBS is bound to :${KBS_PORT}, then diff --git a/tasks/util/kata.py b/tasks/util/kata.py new file mode 100644 index 0000000..684a38c --- /dev/null +++ b/tasks/util/kata.py @@ -0,0 +1,116 @@ +from os import makedirs +from os.path import dirname, exists, join +from subprocess import run +from tasks.util.env import KATA_CONFIG_DIR, KATA_IMG_DIR, KATA_RUNTIMES, PROJ_ROOT +from tasks.util.toml import remove_entry_from_toml, update_toml + +KATA_SOURCE_DIR = join(PROJ_ROOT, "..", "kata-containers") +KATA_AGENT_SOURCE_DIR = join(KATA_SOURCE_DIR, "src", "agent") + + +def replace_agent(agent_source_dir=KATA_AGENT_SOURCE_DIR, extra_files=None): + """ + Replace the kata-agent with a custom-built one + + Replacing the kata-agent is a bit fiddly, as the kata-agent binary lives + inside the initrd guest image that we load to the VM. The replacement + includes the following steps: + 1. Find the initrd file - should be pointed in the Kata config file + 2. Unpack the initrd + 3. Replace the init process by the new kata agent + 4. Re-build the initrd + 5. Update the kata config to point to the new initrd + + By using the extra_flags optional argument, you can pass a dictionary of + host_path: guest_path pairs of files you want to be included in the initrd. + """ + + # Use a hardcoded path, as we want to always start from a _clean_ initrd + initrd_path = join(KATA_IMG_DIR, "kata-containers-initrd-sev.img") + + # Make empty temporary dir to expand the initrd filesystem + workdir = "/tmp/qemu-sev-initrd" + run("sudo rm -rf {}".format(workdir), shell=True, check=True) + makedirs(workdir) + + # sudo unpack the initrd filesystem + zcat_cmd = "sudo bash -c 'zcat {} | cpio -idmv'".format(initrd_path) + out = run(zcat_cmd, shell=True, capture_output=True, cwd=workdir) + assert out.returncode == 0, "Error unpacking initrd: {}".format(out.stderr) + + # Copy our newly built kata-agent into `/usr/bin/kata-agent` as this is the + # path expected by the kata initrd_builder.sh script + agent_host_path = join( + KATA_AGENT_SOURCE_DIR, + "target", + "x86_64-unknown-linux-musl", + "release", + "kata-agent", + ) + agent_initrd_path = join(workdir, "usr/bin/kata-agent") + cp_cmd = "sudo cp {} {}".format(agent_host_path, agent_initrd_path) + run(cp_cmd, shell=True, check=True) + + # We also need to manually copy the agent to /sbin/init (note that + # /init is a symlink to /sbin/init) + alt_agent_initrd_path = join(workdir, "sbin", "init") + run("sudo rm {}".format(alt_agent_initrd_path), shell=True, check=True) + cp_cmd = "sudo cp {} {}".format(agent_host_path, alt_agent_initrd_path) + run(cp_cmd, shell=True, check=True) + + # Include any extra files that the caller may have provided + if extra_files is not None: + for host_path in extra_files: + # Trim any absolute paths expressed as "guest" paths to be able to + # append the rootfs + rel_guest_path = extra_files[host_path]["path"] + if rel_guest_path.startswith("/"): + rel_guest_path = rel_guest_path[1:] + + guest_path = join(workdir, rel_guest_path) + if not exists(dirname(guest_path)): + run( + "sudo mkdir -p {}".format(dirname(guest_path)), + shell=True, + check=True, + ) + + if exists(guest_path) and extra_files[host_path]["mode"] == "a": + run( + 'sudo sh -c "cat {} >> {}"'.format(host_path, guest_path), + shell=True, + check=True, + ) + else: + run( + "sudo cp {} {}".format(host_path, guest_path), + shell=True, + check=True, + ) + + # Pack the initrd again + initrd_builder_path = join( + KATA_SOURCE_DIR, "tools", "osbuilder", "initrd-builder", "initrd_builder.sh" + ) + new_initrd_path = join(dirname(initrd_path), "kata-containers-initrd-sev-csg.img") + work_env = {"AGENT_INIT": "yes"} + initrd_pack_cmd = "env && sudo {} -o {} {}".format( + initrd_builder_path, + new_initrd_path, + workdir, + ) + run(initrd_pack_cmd, shell=True, check=True, env=work_env) + + # Lastly, update the Kata config to point to the new initrd + for runtime in KATA_RUNTIMES: + conf_file_path = join(KATA_CONFIG_DIR, "configuration-{}.toml".format(runtime)) + updated_toml_str = """ + [hypervisor.qemu] + initrd = "{new_initrd_path}" + """.format( + new_initrd_path=new_initrd_path + ) + update_toml(conf_file_path, updated_toml_str) + + if runtime == "qemu": + remove_entry_from_toml(conf_file_path, "hypervisor.qemu.image") diff --git a/tasks/util/knative.py b/tasks/util/knative.py new file mode 100644 index 0000000..85059ac --- /dev/null +++ b/tasks/util/knative.py @@ -0,0 +1,97 @@ +from os import makedirs +from os.path import exists, join +from subprocess import run +from tasks.util.env import CONF_FILES_DIR, TEMPLATED_FILES_DIR +from tasks.util.k8s import template_k8s_file +from tasks.util.kubeadm import run_kubectl_command + +# Knative Serving Side-Car Tag +KNATIVE_SIDECAR_IMAGE_TAG = "gcr.io/knative-releases/knative.dev/serving/cmd/" +KNATIVE_SIDECAR_IMAGE_TAG += ( + "queue@sha256:987f53e3ead58627e3022c8ccbb199ed71b965f10c59485bab8015ecf18b44af" +) + + +def replace_sidecar(reset_default=False, image_repo="ghcr.io", quiet=False): + def do_run(cmd, quiet): + if quiet: + out = run(cmd, shell=True, capture_output=True) + assert out.returncode == 0, "Error running cmd: {} (error: {})".format( + cmd, out.stderr + ) + else: + out = run(cmd, shell=True, check=True) + + k8s_filename = "knative_replace_sidecar.yaml" + + if reset_default: + in_k8s_file = join(CONF_FILES_DIR, "{}.j2".format(k8s_filename)) + out_k8s_file = join(TEMPLATED_FILES_DIR, k8s_filename) + template_k8s_file( + in_k8s_file, + out_k8s_file, + {"knative_sidecar_image_url": KNATIVE_SIDECAR_IMAGE_TAG}, + ) + run_kubectl_command("apply -f {}".format(out_k8s_file)) + return + + # Pull the right Knative Serving side-car image tag + docker_cmd = "docker pull {}".format(KNATIVE_SIDECAR_IMAGE_TAG) + do_run(docker_cmd, quiet) + + # Re-tag it, and push it to our controlled registry + image_name = "csegarragonz/coco-knative-sidecar" + image_tag = "unencrypted" + new_image_url = "{}/{}:{}".format(image_repo, image_name, image_tag) + docker_cmd = "docker tag {} {}".format(KNATIVE_SIDECAR_IMAGE_TAG, new_image_url) + do_run(docker_cmd, quiet) + + docker_cmd = "docker push {}".format(new_image_url) + do_run(docker_cmd, quiet) + + # Get the digest for the recently pulled image, and use it to update + # Knative's deployment configmap + docker_cmd = 'docker images {} --digests --format "{{{{.Digest}}}}"'.format( + join(image_repo, image_name), + ) + image_digest = ( + run(docker_cmd, shell=True, capture_output=True).stdout.decode("utf-8").strip() + ) + assert len(image_digest) > 0 + + if not exists(TEMPLATED_FILES_DIR): + makedirs(TEMPLATED_FILES_DIR) + + in_k8s_file = join(CONF_FILES_DIR, "{}.j2".format(k8s_filename)) + out_k8s_file = join(TEMPLATED_FILES_DIR, k8s_filename) + new_image_url_digest = "{}/{}@{}".format(image_repo, image_name, image_digest) + template_k8s_file( + in_k8s_file, out_k8s_file, {"knative_sidecar_image_url": new_image_url_digest} + ) + run_kubectl_command("apply -f {}".format(out_k8s_file)) + + # Finally, make sure to remove all pulled container images to avoid + # unintended caching issues with CoCo + docker_cmd = "docker rmi {}".format(KNATIVE_SIDECAR_IMAGE_TAG) + do_run(docker_cmd, quiet) + docker_cmd = "docker rmi {}".format(new_image_url) + do_run(docker_cmd, quiet) + + +def configure_self_signed_certs(path_to_certs_dir, secret_name): + """ + Configure Knative to like our self-signed certificates + """ + k8s_filename = "knative_controller_custom_certs.yaml" + in_k8s_file = join(CONF_FILES_DIR, "{}.j2".format(k8s_filename)) + out_k8s_file = join(TEMPLATED_FILES_DIR, k8s_filename) + template_k8s_file( + in_k8s_file, + out_k8s_file, + {"path_to_certs": path_to_certs_dir, "secret_name": secret_name}, + ) + run_kubectl_command( + "-n knative-serving patch deployment controller --patch-file {}".format( + out_k8s_file + ) + ) diff --git a/tasks/util/pid.py b/tasks/util/pid.py new file mode 100644 index 0000000..aa55fd3 --- /dev/null +++ b/tasks/util/pid.py @@ -0,0 +1,9 @@ +from psutil import process_iter + + +def get_pid(string): + for proc in process_iter(): + if string in proc.name(): + return proc.pid + + return None diff --git a/tasks/util/qemu.py b/tasks/util/qemu.py index 7885145..8b48255 100644 --- a/tasks/util/qemu.py +++ b/tasks/util/qemu.py @@ -1,21 +1,13 @@ -from psutil import process_iter +from tasks.util.pid import get_pid from time import sleep -def do_get_pid(string): - for proc in process_iter(): - if string in proc.name(): - return proc.pid - - return None - - def get_qemu_pid(poll_period): """ Get the PID for the QEMU command """ while True: - pid = do_get_pid("qemu-system-x86_64") + pid = get_pid("qemu-system-x86_64") if pid is not None: return pid diff --git a/tasks/util/sev.py b/tasks/util/sev.py index 59904a2..b3892c6 100644 --- a/tasks/util/sev.py +++ b/tasks/util/sev.py @@ -6,7 +6,7 @@ from sevsnpmeasure.vmm_types import VMMType from sevsnpmeasure.vcpu_types import cpu_sig as sev_snp_cpu_sig from subprocess import run -from tasks.util.env import KATA_CONFIG_DIR, KBS_PORT, get_kbs_url +from tasks.util.env import KATA_CONFIG_DIR, KBS_PORT, get_node_url from tasks.util.toml import read_value_from_toml @@ -34,7 +34,7 @@ def get_kernel_append(): "console=hvc1", "debug" if agent_log else "quiet", "panic=1 nr_cpus=1 selinux=0", - "agent.aa_kbc_params=online_sev_kbc::{}:{}".format(get_kbs_url(), KBS_PORT), + "agent.aa_kbc_params=online_sev_kbc::{}:{}".format(get_node_url(), KBS_PORT), "scsi_mod.scan=none", "agent.log=debug" if agent_log else "", "agent.debug_console agent.debug_console_vport=1026" if debug_console else "", diff --git a/tasks/util/skopeo.py b/tasks/util/skopeo.py new file mode 100644 index 0000000..8df9c0d --- /dev/null +++ b/tasks/util/skopeo.py @@ -0,0 +1,113 @@ +from base64 import b64encode +from json import loads as json_loads +from os.path import exists, join +from pymysql.err import IntegrityError +from subprocess import run +from tasks.util.cosign import sign_container_image +from tasks.util.env import CONF_FILES_DIR, K8S_CONFIG_DIR +from tasks.util.guest_components import ( + start_coco_keyprovider, + stop_coco_keyprovider, +) +from tasks.util.kbs import create_kbs_secret + +SKOPEO_VERSION = "1.13.0" +SKOPEO_IMAGE = "quay.io/skopeo/stable:v{}".format(SKOPEO_VERSION) +SKOPEO_ENCRYPTION_KEY = join(K8S_CONFIG_DIR, "image_enc.key") +AA_CTR_ENCRYPTION_KEY = "/tmp/image_enc.key" + + +def run_skopeo_cmd(cmd, capture_stdout=False): + ocicrypt_conf_host = join(CONF_FILES_DIR, "ocicrypt.conf") + ocicrypt_conf_guest = "/ocicrypt.conf" + skopeo_cmd = [ + "docker run --rm", + "--net host", + "-e OCICRYPT_KEYPROVIDER_CONFIG={}".format(ocicrypt_conf_guest), + "-v {}:{}".format(ocicrypt_conf_host, ocicrypt_conf_guest), + "-v ~/.docker/config.json:/config.json", + "-v {}:/certs/domain.crt".format( + join(K8S_CONFIG_DIR, "local-registry", "domain.crt") + ), + SKOPEO_IMAGE, + cmd, + ] + skopeo_cmd = " ".join(skopeo_cmd) + if capture_stdout: + return ( + run(skopeo_cmd, shell=True, capture_output=True) + .stdout.decode("utf-8") + .strip() + ) + else: + run(skopeo_cmd, shell=True, check=True) + + +def create_encryption_key(): + cmd = "head -c32 < /dev/random > {}".format(SKOPEO_ENCRYPTION_KEY) + run(cmd, shell=True, check=True) + + +def encrypt_container_image(image_tag, sign=False): + encryption_key_resource_id = "default/image-encryption-key/1" + if not exists(SKOPEO_ENCRYPTION_KEY): + create_encryption_key() + + # We use CoCo's keyprovider server (that implements the ocicrypt protocol) + # to encrypt the OCI image. To that extent, we need to mount the encryption + # key somewhere that the attestation agent (in the keyprovider) can find + # it + start_coco_keyprovider(SKOPEO_ENCRYPTION_KEY, AA_CTR_ENCRYPTION_KEY) + + encrypted_image_tag = image_tag.split(":")[0] + ":encrypted" + skopeo_cmd = [ + "copy --insecure-policy", + "--authfile /config.json", + "--dest-cert-dir=/certs", + "--src-cert-dir=/certs", + "--encryption-key", + "provider:attestation-agent:keyid=kbs:///{}::keypath={}".format( + encryption_key_resource_id, AA_CTR_ENCRYPTION_KEY + ), + "docker://{}".format(image_tag), + "docker://{}".format(encrypted_image_tag), + ] + skopeo_cmd = " ".join(skopeo_cmd) + run_skopeo_cmd(skopeo_cmd) + + # Stop the keyprovider when we are done encrypting layers + stop_coco_keyprovider() + + # Sanity check that the image is actually encrypted + inspect_jsonstr = run_skopeo_cmd( + "inspect --cert-dir /certs --authfile /config.json docker://{}".format( + encrypted_image_tag + ), + capture_stdout=True, + ) + inspect_json = json_loads(inspect_jsonstr) + layers = [ + layer["MIMEType"].endswith("tar+gzip+encrypted") + for layer in inspect_json["LayersData"] + ] + if not all(layers): + print("Some layers in image {} are not encrypted!".format(encrypted_image_tag)) + stop_coco_keyprovider() + raise RuntimeError("Image encryption failed!") + + # Create a secret in KBS with the encryption key. Skopeo needs it as raw + # bytes, whereas KBS wants it base64 encoded, so we do the conversion first + with open(SKOPEO_ENCRYPTION_KEY, "rb") as fh: + key_b64 = b64encode(fh.read()).decode() + + # When we are encrypting multiple container images, it may happen that the + # encryption key is already there. Thus it is safe to ignore this exception + # here + try: + create_kbs_secret(encryption_key_resource_id, key_b64) + except IntegrityError: + print("WARNING: error creating KBS secret...") + pass + + if sign: + sign_container_image(encrypted_image_tag)