Skip to content

Commit

Permalink
Implements generation of separate constraints for core and providers (a…
Browse files Browse the repository at this point in the history
…pache#14227)

There are two types of constraints now:

* default constraints that contain all depenedncies of airflow,
  all the provider packages released at the time of the relese
  of that version, as well as all transitive dependencies. Following
  those constraints, you can be sure Airflow's installation is
  repeatable

* no-providers constraints - containing only the dependencies needed
  for core airflow installation. This allows to install/upgrade
  airflow without also forcing the provider's to be installed at
  specific version of Airflow.

This allows for flexible management of Airflow and Provider
packages separately. Documentation about it has been added.

Also the provider 'extras' for apache airflow do not keep direct
dependencies to the packages needed by the provider. Those
dependencies are now transitive only - so 'provider' extras only
depend on 'apache-airflow-provider-EXTRA' package and all
the dependencies are transitive. This will help in the future
to avoid conflicts when installing newer providers using extras.
  • Loading branch information
potiuk authored Feb 21, 2021
1 parent a7e4266 commit d524cec
Show file tree
Hide file tree
Showing 53 changed files with 867 additions and 391 deletions.
5 changes: 5 additions & 0 deletions .github/workflows/build-images-workflow-run.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,11 @@ env:
GITHUB_REGISTRY: "docker.pkg.github.com"
GITHUB_REPOSITORY: ${{ github.repository }}
GITHUB_USERNAME: ${{ github.actor }}
# You can override CONSTRAINTS_GITHUB_REPOSITORY by setting secret in your repo but by default the
# Airflow one is going to be used
CONSTRAINTS_GITHUB_REPOSITORY: >-
${{ secrets.CONSTRAINTS_GITHUB_REPOSITORY != '' &&
secrets.CONSTRAINTS_GITHUB_REPOSITORY || github.repository }}
# This token is WRITE one - workflow_run type of events always have the WRITE token
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# This token should not be empty in workflow_run type of event.
Expand Down
21 changes: 19 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ env:
GITHUB_REGISTRY: "docker.pkg.github.com"
GITHUB_REPOSITORY: ${{ github.repository }}
GITHUB_USERNAME: ${{ github.actor }}
# You can override CONSTRAINTS_GITHUB_REPOSITORY by setting secret in your repo but by default the
# Airflow one is going to be used
CONSTRAINTS_GITHUB_REPOSITORY: >-
${{ secrets.CONSTRAINTS_GITHUB_REPOSITORY != '' &&
secrets.CONSTRAINTS_GITHUB_REPOSITORY || github.repository }}
# In builds from forks, this token is read-only. For scheduler/direct push it is WRITE one
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# In builds from forks, this token is empty, and this is good because such builds do not even try
Expand Down Expand Up @@ -1136,6 +1141,7 @@ jobs:
- ci-images
env:
PYTHON_MAJOR_MINOR_VERSION: ${{ matrix.python-version }}
# Only run it for direct pushes
if: >
github.ref == 'refs/heads/master' || github.ref == 'refs/heads/v1-10-test' ||
github.ref == 'refs/heads/v2-0-test'
Expand All @@ -1153,13 +1159,23 @@ jobs:
if: "!contains(needs.build-info.outputs.runsOn, 'self-hosted')"
- name: "Prepare CI image ${{env.PYTHON_MAJOR_MINOR_VERSION}}:${{ github.sha }}"
run: ./scripts/ci/images/ci_prepare_ci_image_on_ci.sh
- name: "Generate constraints"
- name: "Generate constraints with PyPI providers"
run: ./scripts/ci/constraints/ci_generate_constraints.sh
env:
GENERATE_CONSTRAINTS_MODE: "pypi-providers"
- name: "Generate constraints with source providers"
run: ./scripts/ci/constraints/ci_generate_constraints.sh
env:
GENERATE_CONSTRAINTS_MODE: "source-providers"
- name: "Generate constraints without providers"
run: ./scripts/ci/constraints/ci_generate_constraints.sh
env:
GENERATE_CONSTRAINTS_MODE: "no-providers"
- name: "Upload constraint artifacts"
uses: actions/upload-artifact@v2
with:
name: 'constraints-${{matrix.python-version}}'
path: './files/constraints-${{matrix.python-version}}/constraints-${{matrix.python-version}}.txt'
path: './files/constraints-${{matrix.python-version}}/constraints-*${{matrix.python-version}}.txt'
retention-days: 7

constraints-push:
Expand All @@ -1177,6 +1193,7 @@ jobs:
- tests-mysql
- tests-postgres
- tests-kubernetes
# Only run it for direct pushes
if: >
github.ref == 'refs/heads/master' || github.ref == 'refs/heads/v1-10-test' ||
github.ref == 'refs/heads/v2-0-test'
Expand Down
90 changes: 64 additions & 26 deletions BREEZE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,7 @@ Managing CI environment:
* Stop running interactive environment with ``breeze stop`` command
* Restart running interactive environment with ``breeze restart`` command
* Run test specified with ``breeze tests`` command
* Generate constraints with ``breeze generate-constraints`` command
* Generate constraints with ``breeze generate-constraints``
* Execute arbitrary command in the test environment with ``breeze shell`` command
* Execute arbitrary docker-compose command with ``breeze docker-compose`` command
* Push docker images with ``breeze push-image`` command (require committer's rights to push images)
Expand Down Expand Up @@ -785,9 +785,24 @@ Generating constraints
----------------------

Whenever setup.py gets modified, the CI master job will re-generate constraint files. Those constraint
files are stored in separated orphan branches: ``constraints-master``, ``constraints-2-0`` and ``constraints-1-10``.
They are stored separately for each python version. Those are
constraint files as described in detail in the
files are stored in separated orphan branches: ``constraints-master``, ``constraints-2-0``
and ``constraints-1-10``. They are stored separately for each python version and there are separate
constraints for:

* 'constraints' - those are constraints generated by matching the current airflow version from sources
and providers that are installed from PyPI. Those are constraints used by the users who want to
install airflow with pip

* "constraints-source-providers" - those are constraints generated by using providers installed from
current sources. While adding new providers their dependencies might change, so this set of providers
is the current set of the constraints for airflow and providers from the current master sources.
Those providers are used by CI system to keep "stable" set of constraints.

* "constraints-no-providers" - those are constraints generated from only Apache Airflow, without any
providers. If you want to manage airflow separately and then add providers individually, you can
use those.

Those are constraint files as described in detail in the
`<CONTRIBUTING.rst#pinned-constraint-files>`_ contributing documentation.

In case someone modifies setup.py, the ``CRON`` scheduled CI build automatically upgrades and
Expand All @@ -796,20 +811,19 @@ pushes changed to the constraint files, however you can also perform test run of

.. code-block:: bash
./breeze generate-constraints --python 3.6
.. code-block:: bash
./breeze generate-constraints --python 3.7
for python_version in 3.6 3.7 3.8
do
./breeze generate-constraints --generate-constraints-mode source-providers --python ${python_version}
./breeze generate-constraints --generate-constraints-mode pypi-providers --python ${python_version}
./breeze generate-constraints --generate-constraints-mode no-providers --python ${python_version}
done
.. code-block:: bash
./breeze generate-constraints --python 3.8
This bumps the constraint files to latest versions and stores hash of setup.py. The generated constraint
and setup.py hash files are stored in the ``files`` folder and while generating the constraints diff
of changes vs the previous constraint files is printed.


Using local virtualenv environment in Your Host IDE
---------------------------------------------------

Expand Down Expand Up @@ -1264,8 +1278,13 @@ This is the current syntax for `./breeze <./breeze>`_:
2.7 3.5 3.6 3.7 3.8
-a, --install-airflow-version INSTALL_AIRFLOW_VERSION
If specified, installs Airflow directly from PIP released version. This happens at
image building time in production image and at container entering time for CI image. One of:
In CI image, installs Airflow (in entrypoint) from PIP released version or using
the installation method specified (sdist, wheel, none).
In PROD image the installation of selected method or version happens during image building.
For PROD image, the 'none' options is not valid.
One of:
2.0.0 1.10.14 1.10.12 1.10.11 1.10.10 1.10.9 none wheel sdist
Expand All @@ -1280,8 +1299,9 @@ This is the current syntax for `./breeze <./breeze>`_:
This can be a GitHub branch like master or v1-10-test, or a tag like 2.0.0a1.
--installation-method INSTALLATION_METHOD
Method of installing airflow - either from the sources ('.') or from package
'apache-airflow' to install from PyPI. Default in Breeze is to install from sources. One of:
Method of installing airflow for production image - either from the sources ('.')
or from package 'apache-airflow' to install from PyPI.
Default in Breeze is to install from sources. One of:
. apache-airflow
Expand Down Expand Up @@ -1539,16 +1559,28 @@ This is the current syntax for `./breeze <./breeze>`_:
breeze generate-constraints [FLAGS]
Generates pinned constraint files from setup.py. Those files are generated in files folder
- separate files for different python version. Those constraint files when pushed to orphan
constraints-master, constraints-2-0 and constraints-1-10 branches are used to generate
repeatable CI builds as well as run repeatable production image builds. You can use those
Generates pinned constraint files with all extras from setup.py. Those files are generated in
files folder - separate files for different python version. Those constraint files when
pushed to orphan constraints-master, constraints-2-0 and constraints-1-10 branches are used
to generate repeatable CI builds as well as run repeatable production image builds and
upgrades when you want to include installing or updating some of the released providers
released at the time particular airflow version was released. You can use those
constraints to predictably install released Airflow versions. This is mainly used to test
the constraint generation - constraints are pushed to the orphan branches by a
successful scheduled CRON job in CI automatically.
the constraint generation or manually fix them - constraints are pushed to the orphan
branches by a successful scheduled CRON job in CI automatically, but sometimes manual fix
might be needed.
Flags:
--generate-constraints-mode GENERATE_CONSTRAINTS_MODE
Mode of generating constraints - determines whether providers are installed when generating
constraints and which version of them (either the ones from sources are used or the ones
from pypi.
One of:
source-providers pypi-providers no-providers
-p, --python PYTHON_MAJOR_MINOR_VERSION
Python version used for the image. This is always major/minor version.
Expand Down Expand Up @@ -2442,8 +2474,13 @@ This is the current syntax for `./breeze <./breeze>`_:
Choose different Airflow version to install or run
-a, --install-airflow-version INSTALL_AIRFLOW_VERSION
If specified, installs Airflow directly from PIP released version. This happens at
image building time in production image and at container entering time for CI image. One of:
In CI image, installs Airflow (in entrypoint) from PIP released version or using
the installation method specified (sdist, wheel, none).
In PROD image the installation of selected method or version happens during image building.
For PROD image, the 'none' options is not valid.
One of:
2.0.0 1.10.14 1.10.12 1.10.11 1.10.10 1.10.9 none wheel sdist
Expand All @@ -2458,8 +2495,9 @@ This is the current syntax for `./breeze <./breeze>`_:
This can be a GitHub branch like master or v1-10-test, or a tag like 2.0.0a1.
--installation-method INSTALLATION_METHOD
Method of installing airflow - either from the sources ('.') or from package
'apache-airflow' to install from PyPI. Default in Breeze is to install from sources. One of:
Method of installing airflow for production image - either from the sources ('.')
or from package 'apache-airflow' to install from PyPI.
Default in Breeze is to install from sources. One of:
. apache-airflow
Expand Down
72 changes: 64 additions & 8 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -824,11 +824,26 @@ install in case a direct or transitive dependency is released that breaks the in
when installing ``apache-airflow``, you might need to provide additional constraints (for
example ``pip install apache-airflow==1.10.2 Werkzeug<1.0.0``)

However we now have ``constraints-<PYTHON_MAJOR_MINOR_VERSION>.txt`` files generated
automatically and committed to orphan ``constraints-master``, ``constraints-2-0` and ``constraints-1-10`` branches based on
the set of all latest working and tested dependency versions. Those
``constraints-<PYTHON_MAJOR_MINOR_VERSION>.txt`` files can be used as
constraints file when installing Apache Airflow - either from the sources:
There are several sets of constraints we keep:

* 'constraints' - those are constraints generated by matching the current airflow version from sources
and providers that are installed from PyPI. Those are constraints used by the users who want to
install airflow with pip, they are named ``constraints-<PYTHON_MAJOR_MINOR_VERSION>.txt``.

* "constraints-source-providers" - those are constraints generated by using providers installed from
current sources. While adding new providers their dependencies might change, so this set of providers
is the current set of the constraints for airflow and providers from the current master sources.
Those providers are used by CI system to keep "stable" set of constraints. Thet are named
``constraints-source-providers-<PYTHON_MAJOR_MINOR_VERSION>.txt``

* "constraints-no-providers" - those are constraints generated from only Apache Airflow, without any
providers. If you want to manage airflow separately and then add providers individually, you can
use those. Those constraints are named ``constraints-no-providers-<PYTHON_MAJOR_MINOR_VERSION>.txt``.

We also have constraints with "source-providers" but they are used i

The first ones can be used as constraints file when installing Apache Airflow in a repeatable way.
It can be done from the sources:

.. code-block:: bash
Expand Down Expand Up @@ -864,9 +879,50 @@ fixed valid constraints 1.10.12 can be used by using ``constraints-1.10.12`` tag
There are different set of fixed constraint files for different python major/minor versions and you should
use the right file for the right python version.

The ``constraints-<PYTHON_MAJOR_MINOR_VERSION>.txt`` will be automatically regenerated by CI cron job
every time after the ``setup.py`` is updated and pushed if the tests are successful. There are separate
jobs for each python version.
If you want to update just airflow dependencies, without paying attention to providers, you can do it using
-no-providers constraint files as well.

.. code-block:: bash
pip install . --upgrade \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-no-providers-3.6.txt"
The ``constraints-<PYTHON_MAJOR_MINOR_VERSION>.txt`` and ``constraints-no-providers-<PYTHON_MAJOR_MINOR_VERSION>.txt``
will be automatically regenerated by CI job every time after the ``setup.py`` is updated and pushed
if the tests are successful.

Manually generating constraint files
------------------------------------

The constraint files are generated automatically by the CI job. Sometimes however it is needed to regenerate
them manually (committers only). For example when master build did not succeed for quite some time). This can be done by
running this:

.. code-block:: bash
for python_version in 3.6 3.7 3.8
do
./breeze generate-constraints --generate-constraints-mode source-providers --python ${python_version} --build-cache-local
./breeze generate-constraints --generate-constraints-mode pypi-providers --python ${python_version} --build-cache-local
./breeze generate-constraints --generate-constraints-mode no-providers --python ${python_version} --build-cache-local
done
AIRFLOW_SOURCES=$(pwd)
The constraints will be generated in "files/constraints-PYTHON_VERSION/constraints-*.txt files. You need to
checkout the right 'constraints-' branch in a separate repository and then you can copy, commit and push the
generated files:
.. code-block:: bash
cd <AIRFLOW_WITH_CONSTRAINT_MASTER_DIRECTORY>
git pull
cp ${AIRFLOW_SOURCES}/files/constraints-*/constraints*.txt .
git diff
git add .
git commit -m "Your commit message here" --no-verify
git push
Documentation
=============
Expand Down
13 changes: 10 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -161,8 +161,13 @@ ARG AIRFLOW_EXTRAS
ARG ADDITIONAL_AIRFLOW_EXTRAS=""
ENV AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS}${ADDITIONAL_AIRFLOW_EXTRAS:+,}${ADDITIONAL_AIRFLOW_EXTRAS}

# Allows to override constraints source
ARG CONSTRAINTS_GITHUB_REPOSITORY="apache/airflow"
ENV CONSTRAINTS_GITHUB_REPOSITORY=${CONSTRAINTS_GITHUB_REPOSITORY}

ARG AIRFLOW_CONSTRAINTS_REFERENCE="constraints-master"
ARG AIRFLOW_CONSTRAINTS_LOCATION="https://raw.githubusercontent.com/apache/airflow/${AIRFLOW_CONSTRAINTS_REFERENCE}/constraints-${PYTHON_MAJOR_MINOR_VERSION}.txt"
ARG AIRFLOW_CONSTRAINTS="constraints"
ARG AIRFLOW_CONSTRAINTS_LOCATION="https://raw.githubusercontent.com/${CONSTRAINTS_GITHUB_REPOSITORY}/${AIRFLOW_CONSTRAINTS_REFERENCE}/${AIRFLOW_CONSTRAINTS}-${PYTHON_MAJOR_MINOR_VERSION}.txt"
ENV AIRFLOW_CONSTRAINTS_LOCATION=${AIRFLOW_CONSTRAINTS_LOCATION}

ENV PATH=${PATH}:/root/.local/bin
Expand Down Expand Up @@ -264,9 +269,11 @@ ENV INSTALL_FROM_PYPI=${INSTALL_FROM_PYPI}

# Those are additional constraints that are needed for some extras but we do not want to
# Force them on the main Airflow package.
# * urllib3 - required to keep boto3 happy
# * chardet<4 - required to keep snowflake happy
ARG EAGER_UPGRADE_ADDITIONAL_REQUIREMENTS="urllib3<1.26 chardet<4"
# * urllib3 - required to keep boto3 happy
# * pytz<2021.0: required by snowflake provider
# * pyjwt<2.0.0: flask-jwt-extended requires it
ARG EAGER_UPGRADE_ADDITIONAL_REQUIREMENTS="chardet<4 urllib3<1.26 pytz<2021.0 pyjwt<2.0.0"

WORKDIR /opt/airflow

Expand Down
16 changes: 12 additions & 4 deletions Dockerfile.ci
Original file line number Diff line number Diff line change
Expand Up @@ -237,8 +237,13 @@ ENV AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS}${ADDITIONAL_AIRFLOW_EXTRAS:+,}${ADDITIONAL_

RUN echo "Installing with extras: ${AIRFLOW_EXTRAS}."

# Allows to override constraints source
ARG CONSTRAINTS_GITHUB_REPOSITORY="apache/airflow"
ENV CONSTRAINTS_GITHUB_REPOSITORY=${CONSTRAINTS_GITHUB_REPOSITORY}

ARG AIRFLOW_CONSTRAINTS_REFERENCE="constraints-master"
ARG AIRFLOW_CONSTRAINTS_LOCATION="https://raw.githubusercontent.com/apache/airflow/${AIRFLOW_CONSTRAINTS_REFERENCE}/constraints-${PYTHON_MAJOR_MINOR_VERSION}.txt"
ARG AIRFLOW_CONSTRAINTS="constraints"
ARG AIRFLOW_CONSTRAINTS_LOCATION="https://raw.githubusercontent.com/${CONSTRAINTS_GITHUB_REPOSITORY}/${AIRFLOW_CONSTRAINTS_REFERENCE}/${AIRFLOW_CONSTRAINTS}-${PYTHON_MAJOR_MINOR_VERSION}.txt"
ENV AIRFLOW_CONSTRAINTS_LOCATION=${AIRFLOW_CONSTRAINTS_LOCATION}

# By changing the CI build epoch we can force reinstalling Airflow from the current master
Expand Down Expand Up @@ -332,12 +337,15 @@ COPY setup.cfg ${AIRFLOW_SOURCES}/setup.cfg
COPY airflow/__init__.py ${AIRFLOW_SOURCES}/airflow/__init__.py

# Those are additional constraints that are needed for some extras but we do not want to
# Force them on the main Airflow package. Those limitations are:
# force them on the main Airflow package. Those limitations are:
# * chardet,<4: required by snowflake provider
# * lazy-object-proxy<1.5.0: required by astroid
# * pyOpenSSL: Imposed by snowflake provider https://github.com/snowflakedb/snowflake-connector-python/blob/v2.3.6/setup.py#L201
# * pytz<2021.0: required by snowflake provider
# * pyOpenSSL: required by snowflake provider https://github.com/snowflakedb/snowflake-connector-python/blob/v2.3.6/setup.py#L201
# * urllib3<1.26: Required to keep boto3 happy
ARG EAGER_UPGRADE_ADDITIONAL_REQUIREMENTS="chardet<4 lazy-object-proxy<1.5.0 pyOpenSSL<20.0.0 urllib3<1.26"
# * pyjwt<2.0.0: flask-jwt-extended requires it
ARG EAGER_UPGRADE_ADDITIONAL_REQUIREMENTS="chardet<4 lazy-object-proxy<1.5.0 pyOpenSSL<20.0.0 pytz<2021.0 urllib3<1.26 pyjwt<2.0.0"
ENV EAGER_UPGRADE_ADDITIONAL_REQUIREMENTS=${EAGER_UPGRADE_ADDITIONAL_REQUIREMENTS}

ARG CONTINUE_ON_PIP_CHECK_FAILURE="false"

Expand Down
Loading

0 comments on commit d524cec

Please sign in to comment.