Skip to content

Commit

Permalink
Switch to 'buildkit' to build Airflow images (apache#20664)
Browse files Browse the repository at this point in the history
The "buildkit" is much more modern docker build mechanism and supports
multiarchitecture builds which makes it suitable for our future ARM
support, it also has nicer UI and much more sophisticated caching
mechanisms as well as supports better multi-segment builds.

BuildKit has been promoted to official for quite a while and it is
rather stable now. Also we can now install BuildKit Plugin to docker
that add capabilities of building and managin cache using dedicated
builders (previously BuildKit cache was managed using rather
complex external tools).

This gives us an opportunity to vastly
simplify our build scripts, because it has now much more robust caching
mechanism than the old docker build (which forced us to pull images
before using them as cache).

We had a lot of complexity involved in efficient caching
but with BuildKit all that can be vastly simplified and we can
get rid of:

  * keeping base python images in our registry
  * keeping build segments for prod image in our registry
  * keeping manifest images in our registry
  * deciding when to pull or pull&build image (not needed now, we can
    always build image with --cache-from and buildkit will pull cached
    layers as needed
  * building the image when performing pre-commit (rather than that
    we simply encourage users to rebuild the image via breeze command)
  * pulling the images before building
  * separate 'build' cache kept in our registry (not needed any more
    as buildkit allows to keep cache for all segments of multi-segmented
    build in a single cache
  * the nice animated tty UI of buildkit eliminates the need of manual
    spinner
  * and a number of other complexities.

Depends on apache#20238
  • Loading branch information
potiuk authored Jan 18, 2022
1 parent 730db3f commit ad28f69
Show file tree
Hide file tree
Showing 42 changed files with 387 additions and 1,007 deletions.
5 changes: 0 additions & 5 deletions .github/workflows/build-images.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ permissions:
env:
MOUNT_SELECTED_LOCAL_SOURCES: "false"
FORCE_ANSWER_TO_QUESTIONS: "yes"
FORCE_PULL_IMAGES: "false"
CHECK_IMAGE_FOR_REBUILD: "true"
SKIP_CHECK_REMOTE_IMAGE: "true"
DB_RESET: "true"
Expand Down Expand Up @@ -179,8 +178,6 @@ jobs:
PYTHON_MAJOR_MINOR_VERSION: ${{ matrix.python-version }}
UPGRADE_TO_NEWER_DEPENDENCIES: ${{ needs.build-info.outputs.upgradeToNewerDependencies }}
DOCKER_CACHE: ${{ needs.build-info.outputs.cacheDirective }}
CHECK_IF_BASE_PYTHON_IMAGE_UPDATED: >
${{ github.event_name == 'pull_request_target' && 'false' || 'true' }}
outputs: ${{toJSON(needs.build-info.outputs) }}
steps:
- uses: actions/checkout@v2
Expand Down Expand Up @@ -256,8 +253,6 @@ jobs:
PYTHON_MAJOR_MINOR_VERSION: ${{ matrix.python-version }}
UPGRADE_TO_NEWER_DEPENDENCIES: ${{ needs.build-info.outputs.upgradeToNewerDependencies }}
DOCKER_CACHE: ${{ needs.build-info.outputs.cacheDirective }}
CHECK_IF_BASE_PYTHON_IMAGE_UPDATED: >
${{ github.event_name == 'pull_request_target' && 'false' || 'true' }}
VERSION_SUFFIX_FOR_PYPI: ".dev0"
INSTALL_PROVIDERS_FROM_SOURCES: >
${{ needs.build-info.outputs.defaultBranch == 'main' && 'true' || 'false' }}
Expand Down
24 changes: 9 additions & 15 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ permissions:
env:
MOUNT_SELECTED_LOCAL_SOURCES: "false"
FORCE_ANSWER_TO_QUESTIONS: "yes"
FORCE_PULL_IMAGES: "false"
CHECK_IMAGE_FOR_REBUILD: "true"
SKIP_CHECK_REMOTE_IMAGE: "true"
DB_RESET: "true"
Expand Down Expand Up @@ -1380,12 +1379,11 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
branch: ${{ steps.constraints-branch.outputs.branch }}
directory: "repo"

# Push images to GitHub Registry in Apache repository, if all tests are successful and build
# is executed as result of direct push to "main" or one of the "test" branches
# It actually rebuilds all images using just-pushed constraints if they changed
# It will also check if a new python image was released and will pull the latest one if needed
# Same as build-images.yaml
push-images-to-github-registry:
# Push BuildX cache to GitHub Registry in Apache repository, if all tests are successful and build
# is executed as result of direct push to "main" or one of the "vX-Y-test" branches
# It rebuilds all images using just-pushed constraints using buildx and pushes them to registry
# It will automatically check if a new python image was released and will pull the latest one if needed
push-buildx-cache-to-github-registry:
permissions:
packages: write
timeout-minutes: 40
Expand All @@ -1396,7 +1394,9 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
- constraints
- docs
# Only run it for direct pushes and scheduled builds
if: github.event_name == 'push' || github.event_name == 'schedule'
if: >
(github.event_name == 'push' || github.event_name == 'schedule')
&& github.repository == 'apache/airflow'
strategy:
matrix:
python-version: ${{ fromJson(needs.build-info.outputs.pythonVersions) }}
Expand All @@ -1410,11 +1410,9 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
# a new python image, we will rebuild it from scratch (same as during the "build-images.ci")
GITHUB_REGISTRY_PULL_IMAGE_TAG: "latest"
GITHUB_REGISTRY_PUSH_IMAGE_TAG: "latest"
PUSH_PYTHON_BASE_IMAGE: "true"
FORCE_PULL_IMAGES: "true"
CHECK_IF_BASE_PYTHON_IMAGE_UPDATED: "true"
GITHUB_REGISTRY_WAIT_FOR_IMAGE: "false"
UPGRADE_TO_NEWER_DEPENDENCIES: "false"
PREPARE_BUILDX_CACHE: "true"
steps:
- name: "Checkout ${{ github.ref }} ( ${{ github.sha }} )"
uses: actions/checkout@v2
Expand All @@ -1435,7 +1433,3 @@ ${{ hashFiles('.pre-commit-config.yaml') }}"
run: ./scripts/ci/images/ci_prepare_prod_image_on_ci.sh
env:
VERSION_SUFFIX_FOR_PYPI: ".dev0"
- name: "Push CI image ${{ env.PYTHON_MAJOR_MINOR_VERSION }}:latest"
run: ./scripts/ci/images/ci_push_ci_images.sh
- name: "Push PROD images ${{ env.PYTHON_MAJOR_MINOR_VERSION }}:latest"
run: ./scripts/ci/images/ci_push_production_images.sh
163 changes: 70 additions & 93 deletions BREEZE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1146,10 +1146,10 @@ This is the current syntax for `./breeze <./breeze>`_:
shell [Default] Enters interactive shell in the container
build-docs Builds documentation in the container
build-image Builds CI or Production docker image
prepare-build-cache Prepares CI or Production build cache
cleanup-image Cleans up the container image created
exec Execs into running breeze container in new terminal
generate-constraints Generates pinned constraint files
push-image Pushes images to registry
initialize-local-virtualenv Initializes local virtualenv
prepare-airflow-packages Prepares airflow packages
setup-autocomplete Sets up autocomplete for breeze
Expand Down Expand Up @@ -1254,10 +1254,7 @@ This is the current syntax for `./breeze <./breeze>`_:
'--build-cache-local' or '-build-cache-pulled', or '--build-cache-none'
Choosing whether to force pull images or force build the image:
'--force-build-image', '--force-pull-image'
Checking if the base python image has been updated:
'--check-if-base-python-image-updated'
'--force-build-image'
You can also pass '--production-image' flag to build production image rather than CI image.
Expand Down Expand Up @@ -1300,17 +1297,6 @@ This is the current syntax for `./breeze <./breeze>`_:
automatically for the first time or when changes are detected in
package-related files, but you can force it using this flag.
-P, --force-pull-images
Forces pulling of images from GitHub Container Registry before building to populate cache.
The images are pulled by default only for the first time you run the
environment, later the locally build images are used as cache.
--check-if-base-python-image-updated
Checks if Python base image from DockerHub has been updated vs the current python base
image we store in GitHub Container Registry. Python images are updated regularly with
security fixes, this switch will check if a new one has been released and will pull and
prepare a new base python based on the latest one.
--cleanup-docker-context-files
Removes whl and tar.gz files created in docker-context-files before running the command.
In case there are some files there it unnecessarily increases the context size and
Expand Down Expand Up @@ -1458,6 +1444,74 @@ This is the current syntax for `./breeze <./breeze>`_:
####################################################################################################
Detailed usage for command: prepare-build-cache
breeze prepare-build-cache [FLAGS]
Prepares build cache (CI or production) without entering the container. You can pass
additional options to this command, such as:
Choosing python version:
'--python'
You can also pass '--production-image' flag to build production image rather than CI image.
For GitHub repository, the '--github-repository' can be used to choose repository
to pull/push images. Cleanup docker context files and pull cache are forced. This command
requires buildx to be installed.
Flags:
-p, --python PYTHON_MAJOR_MINOR_VERSION
Python version used for the image. This is always major/minor version.
One of:
3.7 3.8 3.9
-a, --install-airflow-version INSTALL_AIRFLOW_VERSION
Uses different version of Airflow when building PROD image.
2.0.2 2.0.1 2.0.0 wheel sdist
-t, --install-airflow-reference INSTALL_AIRFLOW_REFERENCE
Installs Airflow directly from reference in GitHub when building PROD image.
This can be a GitHub branch like main or v2-2-test, or a tag like 2.2.0rc1.
--installation-method INSTALLATION_METHOD
Method of installing Airflow in PROD image - either from the sources ('.')
or from package 'apache-airflow' to install from PyPI.
Default in Breeze is to install from sources. One of:
. apache-airflow
--upgrade-to-newer-dependencies
Upgrades PIP packages to latest versions available without looking at the constraints.
-I, --production-image
Use production image for entering the environment and builds (not for tests).
-g, --github-repository GITHUB_REPOSITORY
GitHub repository used to pull, push images.
Default: apache/airflow.
-v, --verbose
Show verbose information about executed docker, kind, kubectl, helm commands. Useful for
debugging - when you run breeze with --verbose flags you will be able to see the commands
executed under the hood and copy&paste them to your terminal to debug them more easily.
Note that you can further increase verbosity and see all the commands executed by breeze
by running 'export VERBOSE_COMMANDS="true"' before running breeze.
--dry-run-docker
Only show docker commands to execute instead of actually executing them. The docker
commands are printed in yellow color.
####################################################################################################
Detailed usage for command: cleanup-image
Expand Down Expand Up @@ -1559,61 +1613,6 @@ This is the current syntax for `./breeze <./breeze>`_:
####################################################################################################
Detailed usage for command: push-image
breeze push_image [FLAGS]
Pushes images to GitHub registry.
You can add --github-repository to push to a different repository/organisation.
You can add --github-image-id <COMMIT_SHA> in case you want to push image with specific
SHA tag.
You can also add --production-image flag to switch to production image (default is CI one)
Examples:
'breeze push-image' or
'breeze push-image --production-image' - to push production image or
'breeze push-image \
--github-repository user/airflow' - to push to your user's fork
'breeze push-image \
--github-image-id 9a621eaa394c0a0a336f8e1b31b35eff4e4ee86e' - to push with COMMIT_SHA
Flags:
-g, --github-repository GITHUB_REPOSITORY
GitHub repository used to pull, push images.
Default: apache/airflow.
-s, --github-image-id COMMIT_SHA
<COMMIT_SHA> of the image. Images in GitHub registry are stored with those
to be able to easily find the image for particular CI runs. Once you know the
<COMMIT_SHA>, you can specify it in github-image-id flag and Breeze will
automatically pull and use that image so that you can easily reproduce a problem
that occurred in CI.
Default: latest.
-v, --verbose
Show verbose information about executed docker, kind, kubectl, helm commands. Useful for
debugging - when you run breeze with --verbose flags you will be able to see the commands
executed under the hood and copy&paste them to your terminal to debug them more easily.
Note that you can further increase verbosity and see all the commands executed by breeze
by running 'export VERBOSE_COMMANDS="true"' before running breeze.
--dry-run-docker
Only show docker commands to execute instead of actually executing them. The docker
commands are printed in yellow color.
####################################################################################################
Detailed usage for command: initialize-local-virtualenv
Expand Down Expand Up @@ -1903,17 +1902,6 @@ This is the current syntax for `./breeze <./breeze>`_:
automatically for the first time or when changes are detected in
package-related files, but you can force it using this flag.
-P, --force-pull-images
Forces pulling of images from GitHub Container Registry before building to populate cache.
The images are pulled by default only for the first time you run the
environment, later the locally build images are used as cache.
--check-if-base-python-image-updated
Checks if Python base image from DockerHub has been updated vs the current python base
image we store in GitHub Container Registry. Python images are updated regularly with
security fixes, this switch will check if a new one has been released and will pull and
prepare a new base python based on the latest one.
--cleanup-docker-context-files
Removes whl and tar.gz files created in docker-context-files before running the command.
In case there are some files there it unnecessarily increases the context size and
Expand Down Expand Up @@ -2498,17 +2486,6 @@ This is the current syntax for `./breeze <./breeze>`_:
automatically for the first time or when changes are detected in
package-related files, but you can force it using this flag.
-P, --force-pull-images
Forces pulling of images from GitHub Container Registry before building to populate cache.
The images are pulled by default only for the first time you run the
environment, later the locally build images are used as cache.
--check-if-base-python-image-updated
Checks if Python base image from DockerHub has been updated vs the current python base
image we store in GitHub Container Registry. Python images are updated regularly with
security fixes, this switch will check if a new one has been released and will pull and
prepare a new base python based on the latest one.
--cleanup-docker-context-files
Removes whl and tar.gz files created in docker-context-files before running the command.
In case there are some files there it unnecessarily increases the context size and
Expand Down
16 changes: 0 additions & 16 deletions CI.rst
Original file line number Diff line number Diff line change
Expand Up @@ -149,22 +149,6 @@ You can use those variables when you try to reproduce the build locally.
+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+
| Force variables |
+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+
| ``FORCE_PULL_IMAGES`` | true | true | true | Determines if images are force-pulled, |
| | | | | no matter if they are already present |
| | | | | locally. This includes not only the |
| | | | | CI/PROD images but also the Python base |
| | | | | images. Note that if Python base images |
| | | | | change, also the CI and PROD images |
| | | | | need to be fully rebuild unless they were |
| | | | | already built with that base Python |
| | | | | image. This is false for local development |
| | | | | to avoid often pulling and rebuilding |
| | | | | the image. It is true for CI workflow in |
| | | | | case waiting from images is enabled |
| | | | | as the images needs to be force-pulled from |
| | | | | GitHub Registry, but it is set to |
| | | | | false when waiting for images is disabled. |
+-----------------------------------------+-------------+--------------+------------+-------------------------------------------------+
| ``FORCE_BUILD_IMAGES`` | false | false | false | Forces building images. This is generally not |
| | | | | very useful in CI as in CI environment image |
| | | | | is built or pulled only once, so there is no |
Expand Down
30 changes: 2 additions & 28 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@
# all the build essentials. This makes the image
# much smaller.
#
# Use the same builder frontend version for everyone
# syntax=docker/dockerfile:1.3
ARG AIRFLOW_VERSION="2.2.2"
ARG AIRFLOW_EXTRAS="amazon,async,celery,cncf.kubernetes,dask,docker,elasticsearch,ftp,google,google_auth,grpc,hashicorp,http,ldap,microsoft.azure,mysql,odbc,pandas,postgres,redis,sendgrid,sftp,slack,ssh,statsd,virtualenv"
ARG ADDITIONAL_AIRFLOW_EXTRAS=""
Expand Down Expand Up @@ -327,34 +329,6 @@ RUN if [[ -f /docker-context-files/requirements.txt ]]; then \
pip install --no-cache-dir --user -r /docker-context-files/requirements.txt; \
fi

ARG BUILD_ID
ARG COMMIT_SHA
ARG AIRFLOW_IMAGE_REPOSITORY
ARG AIRFLOW_IMAGE_DATE_CREATED

ENV BUILD_ID=${BUILD_ID} COMMIT_SHA=${COMMIT_SHA}

LABEL org.apache.airflow.distro="debian" \
org.apache.airflow.distro.version="buster" \
org.apache.airflow.module="airflow" \
org.apache.airflow.component="airflow" \
org.apache.airflow.image="airflow-build-image" \
org.apache.airflow.version="${AIRFLOW_VERSION}" \
org.apache.airflow.build-image.build-id=${BUILD_ID} \
org.apache.airflow.build-image.commit-sha=${COMMIT_SHA} \
org.opencontainers.image.source=${AIRFLOW_IMAGE_REPOSITORY} \
org.opencontainers.image.created=${AIRFLOW_IMAGE_DATE_CREATED} \
org.opencontainers.image.authors="[email protected]" \
org.opencontainers.image.url="https://airflow.apache.org" \
org.opencontainers.image.documentation="https://airflow.apache.org/docs/docker-stack/index.html" \
org.opencontainers.image.version="${AIRFLOW_VERSION}" \
org.opencontainers.image.revision="${COMMIT_SHA}" \
org.opencontainers.image.vendor="Apache Software Foundation" \
org.opencontainers.image.licenses="Apache-2.0" \
org.opencontainers.image.ref.name="airflow-build-image" \
org.opencontainers.image.title="Build Image Segment for Production Airflow Image" \
org.opencontainers.image.description="Reference build-time dependencies image for production-ready Apache Airflow image"

##############################################################################################
# This is the actual Airflow image - much smaller than the build one. We copy
# installed Airflow and all it's dependencies from the build image to make it smaller.
Expand Down
Loading

0 comments on commit ad28f69

Please sign in to comment.