Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5830] Get rid of slim image. #6494

Merged
merged 1 commit into from
Nov 5, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
[AIRFLOW-5830] Get rid of slim image
The slim image gave only very small gain on executing the tests in CI. The
image was significantly smaller, but then for local development and testing
you needed both full CI and SLIM-CI image.

This made the scripts and docker image needlessly complex - especially
in the wake of coming Production image it turned to be premature
optimisation really. While it sped-up (slightly - by 10-20 seconds) some
static check jobs in Travis, it increased time needed by developers
to have a working environment and to keep it updated every time it was
needed (by minutes)

Also having two separately managed images made it rather complex to join
some of the Travis CI jobs (there is a follow-up change with getting rid
of Checklicence image).

With this change both static checks and tests are executed using single
image. That also opens doors for further simplification of the scripts
and easier implementation of production image.
  • Loading branch information
potiuk committed Nov 4, 2019
commit 39f13ea8b0b3ee59739ec50e23d423b93d25eab6
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ repos:
require_serial: true
- id: build
name: Check if image build is needed
entry: ./scripts/ci/local_ci_build.sh
entry: ./scripts/ci/pre_commit_ci_build.sh
language: system
always_run: true
pass_filenames: false
Expand Down
8 changes: 3 additions & 5 deletions BREEZE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,11 +79,9 @@ images maintained on the Docker Hub in the ``apache/airflow`` repository.

There are three images that we are currently managing:

* **Slim CI** image that is used for static code checks (size of ~500MB). Its tag follows the pattern
of ``<BRANCH>-python<PYTHON_VERSION>-ci-slim`` (for example, ``apache/airflow:master-python3.6-ci-slim``).
The image is built using the `<Dockerfile>`_ Dockerfile.
* **Full CI image*** that is used for testing. It contains a lot more test-related installed software
(size of ~1GB). Its tag follows the pattern of ``<BRANCH>-python<PYTHON_VERSION>-ci``
* **CI image*** that is used for testing od both Unit tests and static check tests.
It contains a lot test-related packages (size of ~1GB).
Its tag follows the pattern of ``<BRANCH>-python<PYTHON_VERSION>-ci``
(for example, ``apache/airflow:master-python3.6-ci``). The image is built using the
`<Dockerfile>`_ Dockerfile.
* **Checklicense image** that is used during license check with the Apache RAT tool. It does not
Expand Down
189 changes: 78 additions & 111 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,8 @@
#
# WARNING: THIS DOCKERFILE IS NOT INTENDED FOR PRODUCTION USE OR DEPLOYMENT.
#
# Base image for the whole Docker file
ARG APT_DEPS_IMAGE="airflow-apt-deps-ci-slim"
ARG PYTHON_BASE_IMAGE="python:3.6-slim-stretch"
############################################################################################################
# This is the slim image with APT dependencies needed by Airflow. It is based on a python slim image
# Parameters:
# PYTHON_BASE_IMAGE - base python image (python:x.y-slim-stretch)
############################################################################################################
FROM ${PYTHON_BASE_IMAGE} as airflow-apt-deps-ci-slim

FROM ${PYTHON_BASE_IMAGE} as main

SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"]

Expand Down Expand Up @@ -121,116 +113,95 @@ RUN adduser airflow \
&& echo "airflow ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/airflow \
&& chmod 0440 /etc/sudoers.d/airflow

############################################################################################################
# This is an image with all APT dependencies needed by CI. It is built on top of the airlfow APT image
# Parameters:
# airflow-apt-deps - this is the base image for CI deps image.
############################################################################################################
FROM airflow-apt-deps-ci-slim as airflow-apt-deps-ci
ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/

SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"]
# Note missing man directories on debian-stretch
# https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=863199
RUN mkdir -pv /usr/share/man/man1 \
&& mkdir -pv /usr/share/man/man7 \
&& apt-get update \
&& apt-get install --no-install-recommends -y \
gnupg \
apt-transport-https \
ca-certificates \
software-properties-common \
krb5-user \
ldap-utils \
less \
lsb-release \
net-tools \
openjdk-8-jdk \
openssh-client \
openssh-server \
postgresql-client \
python-selinux \
sqlite3 \
tmux \
unzip \
vim \
&& apt-get autoremove -yqq --purge \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
ENV HADOOP_DISTRO="cdh" HADOOP_MAJOR="5" HADOOP_DISTRO_VERSION="5.11.0" HADOOP_VERSION="2.6.0" \
HADOOP_HOME="/tmp/hadoop-cdh"
ENV HIVE_VERSION="1.1.0" HIVE_HOME="/tmp/hive"
ENV HADOOP_URL="https://archive.cloudera.com/${HADOOP_DISTRO}${HADOOP_MAJOR}/${HADOOP_DISTRO}/${HADOOP_MAJOR}/"
ENV MINICLUSTER_BASE="https://github.com/bolkedebruin/minicluster/releases/download/" \
MINICLUSTER_HOME="/tmp/minicluster" \
MINICLUSTER_VER="1.1"

ARG APT_DEPS_IMAGE="airflow-apt-deps-ci-slim"
ENV APT_DEPS_IMAGE=${APT_DEPS_IMAGE}
ARG KUBERNETES_VERSION="v1.15.0"
ENV KUBERNETES_VERSION=${KUBERNETES_VERSION}
ARG KIND_VERSION="v0.5.0"
ENV KIND_VERSION=${KIND_VERSION}
RUN mkdir -pv "${HADOOP_HOME}" \
&& mkdir -pv "${HIVE_HOME}" \
&& mkdir -pv "${MINICLUSTER_HOME}" \
&& mkdir -pv "/user/hive/warehouse" \
&& chmod -R 777 "${HIVE_HOME}" \
&& chmod -R 777 "/user/"

RUN echo "${APT_DEPS_IMAGE}"

# Note the ifs below might be removed if Buildkit will become usable. It should skip building this
# image automatically if it is not used. For now we still go through all layers below but they are empty
RUN if [[ "${APT_DEPS_IMAGE}" == "airflow-apt-deps-ci" ]]; then \
# Note missing man directories on debian-stretch
# https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=863199
mkdir -pv /usr/share/man/man1 \
&& mkdir -pv /usr/share/man/man7 \
&& apt-get update \
&& apt-get install --no-install-recommends -y \
gnupg \
apt-transport-https \
ca-certificates \
software-properties-common \
krb5-user \
ldap-utils \
less \
lsb-release \
net-tools \
openjdk-8-jdk \
openssh-client \
openssh-server \
postgresql-client \
python-selinux \
sqlite3 \
tmux \
unzip \
vim \
&& apt-get autoremove -yqq --purge \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
;\
fi
ENV HADOOP_DOWNLOAD_URL="${HADOOP_URL}hadoop-${HADOOP_VERSION}-${HADOOP_DISTRO}${HADOOP_DISTRO_VERSION}.tar.gz" \
HADOOP_TMP_FILE="/tmp/hadoop.tar.gz"

# TODO: We should think about removing those and moving them into docker-compose dependencies.
COPY scripts/ci/docker_build/ci_build_install_deps.sh /tmp/ci_build_install_deps.sh
RUN curl -sL "${HADOOP_DOWNLOAD_URL}" >"${HADOOP_TMP_FILE}" \
&& tar xzf "${HADOOP_TMP_FILE}" --absolute-names --strip-components 1 -C "${HADOOP_HOME}" \
&& rm "${HADOOP_TMP_FILE}"

# Kubernetes dependencies
RUN \
if [[ "${APT_DEPS_IMAGE}" == "airflow-apt-deps-ci" ]]; then \
curl -fsSL https://download.docker.com/linux/debian/gpg | apt-key add - \
&& add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian stretch stable" \
&& apt-get update \
&& apt-get -y install --no-install-recommends docker-ce \
&& apt-get autoremove -yqq --purge \
&& apt-get clean && rm -rf /var/lib/apt/lists/* \
;\
fi
ENV HIVE_URL="${HADOOP_URL}hive-${HIVE_VERSION}-${HADOOP_DISTRO}${HADOOP_DISTRO_VERSION}.tar.gz" \
HIVE_TMP_FILE="/tmp/hive.tar.gz"

RUN \
if [[ "${APT_DEPS_IMAGE}" == "airflow-apt-deps-ci" ]]; then \
curl -Lo kubectl \
"https://storage.googleapis.com/kubernetes-release/release/${KUBERNETES_VERSION}/bin/linux/amd64/kubectl" \
&& chmod +x kubectl \
&& mv kubectl /usr/local/bin/kubectl \
;\
fi
RUN curl -sL "${HIVE_URL}" >"${HIVE_TMP_FILE}" \
&& tar xzf "${HIVE_TMP_FILE}" --strip-components 1 -C "${HIVE_HOME}" \
&& rm "${HIVE_TMP_FILE}"

RUN \
if [[ "${APT_DEPS_IMAGE}" == "airflow-apt-deps-ci" ]]; then \
curl -Lo kind \
"https://github.com/kubernetes-sigs/kind/releases/download/${KIND_VERSION}/kind-linux-amd64" \
&& chmod +x kind \
&& mv kind /usr/local/bin/kind \
;\
fi

ENV HADOOP_DISTRO=cdh \
HADOOP_MAJOR=5 \
HADOOP_DISTRO_VERSION=5.11.0 \
HADOOP_VERSION=2.6.0 \
HIVE_VERSION=1.1.0
ENV HADOOP_URL=https://archive.cloudera.com/${HADOOP_DISTRO}${HADOOP_MAJOR}/${HADOOP_DISTRO}/${HADOOP_MAJOR}/
ENV HADOOP_HOME=/tmp/hadoop-cdh HIVE_HOME=/tmp/hive

RUN if [[ "${APT_DEPS_IMAGE}" == "airflow-apt-deps-ci" ]]; then /tmp/ci_build_install_deps.sh; fi
ENV MINICLUSTER_URL="${MINICLUSTER_BASE}${MINICLUSTER_VER}/minicluster-${MINICLUSTER_VER}-SNAPSHOT-bin.zip" \
MINICLUSTER_TMP_FILE="/tmp/minicluster.zip"

RUN curl -sL "${MINICLUSTER_URL}" > "${MINICLUSTER_TMP_FILE}" \
&& unzip "${MINICLUSTER_TMP_FILE}" -d "/tmp" \
&& rm "${MINICLUSTER_TMP_FILE}"

ENV PATH "${PATH}:/tmp/hive/bin"

############################################################################################################
# This is the target image - it installs PIP and NPM dependencies including efficient caching
# mechanisms - it might be used to build the bare airflow build or CI build
# Parameters:
# APT_DEPS_IMAGE - image with APT dependencies. It might either be base deps image with airflow
# dependencies or CI deps image that contains also CI-required dependencies
############################################################################################################
FROM ${APT_DEPS_IMAGE} as main
RUN curl -fsSL https://download.docker.com/linux/debian/gpg | apt-key add - \
&& add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian stretch stable" \
&& apt-get update \
&& apt-get -y install --no-install-recommends docker-ce \
&& apt-get autoremove -yqq --purge \
&& apt-get clean && rm -rf /var/lib/apt/lists/*

SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"]
ARG KUBECTL_VERSION="v1.15.0"
ENV KUBECTL_VERSION=${KUBECTL_VERSION}
ARG KIND_VERSION="v0.5.0"
ENV KIND_VERSION=${KIND_VERSION}

RUN echo "Airflow version: ${AIRFLOW_VERSION}"
RUN curl -Lo kubectl \
"https://storage.googleapis.com/kubernetes-release/release/${KUBECTL_VERSION}/bin/linux/amd64/kubectl" \
&& chmod +x kubectl \
&& mv kubectl /usr/local/bin/kubectl

RUN curl -Lo kind \
"https://github.com/kubernetes-sigs/kind/releases/download/${KIND_VERSION}/kind-linux-amd64" \
&& chmod +x kind \
&& mv kind /usr/local/bin/kind

ARG AIRFLOW_USER=airflow
ENV AIRFLOW_USER=${AIRFLOW_USER}
Expand Down Expand Up @@ -366,15 +337,11 @@ RUN if [[ -n "${ADDITIONAL_PYTHON_DEPS}" ]]; then \

COPY --chown=airflow:airflow ./scripts/docker/entrypoint.sh /entrypoint.sh

ARG APT_DEPS_IMAGE="airflow-apt-deps-ci-slim"
ENV APT_DEPS_IMAGE=${APT_DEPS_IMAGE}

COPY --chown=airflow:airflow .bash_completion run-tests-complete run-tests ${HOME}/
COPY --chown=airflow:airflow .bash_completion.d/run-tests-complete \
${HOME}/.bash_completion.d/run-tests-complete

RUN if [[ "${APT_DEPS_IMAGE}" == "airflow-apt-deps-ci" ]]; then \
${AIRFLOW_SOURCES}/scripts/ci/docker_build/ci_build_extract_tests.sh; fi
RUN ${AIRFLOW_SOURCES}/scripts/ci/docker_build/ci_build_extract_tests.sh

USER ${AIRFLOW_USER}

Expand Down
8 changes: 2 additions & 6 deletions breeze
Original file line number Diff line number Diff line change
Expand Up @@ -105,10 +105,7 @@ export AIRFLOW_CONTAINER_PUSH_IMAGES=${AIRFLOW_CONTAINER_PUSH_IMAGES:="false"}
# For local builds we fix file permissions only for setup-related files
export AIRFLOW_FIX_PERMISSIONS=${AIRFLOW_FIX_PERMISSIONS:="setup"}

# Skip building slim image locally - we only need full CI image
export AIRFLOW_CONTAINER_SKIP_CI_SLIM_IMAGE="true"

# Skip building full CI image locally - we only need slim image
# Skip building full CI image locally
export AIRFLOW_CONTAINER_SKIP_CI_IMAGE="false"

# Branch name of the base image used (usually master or v1-10-test or v1-10-stable)
Expand Down Expand Up @@ -682,7 +679,7 @@ echo
CMDNAME="$(basename -- "$0")"

# Cleans up the answer that was given last time, whether to force/
cleanup_last_force_answer
forget_last_answer

export ENV="${ENV:=$(read_from_file ENV)}"
export BACKEND="${BACKEND:=$(read_from_file BACKEND)}"
Expand Down Expand Up @@ -935,7 +932,6 @@ prepare_command_file "${BUILD_CACHE_DIR}/${LAST_DC_TEST_FILE}" "${DC_RUN_COMMAND
prepare_command_file "${BUILD_CACHE_DIR}/${LAST_DC_FILE}" '"' "false"

rebuild_ci_image_if_needed
rebuild_ci_slim_image_if_needed
rebuild_checklicence_image_if_needed

export AIRFLOW_CONTAINER_DOCKER_IMAGE=\
Expand Down
Loading