Table of Contents generated with DocToc
- Provider packages
- What the provider packages are
- Generated release notes
- Testing and debugging provider preparation
- Testing provider packages
The Provider packages are separate packages (one package per provider) that implement integrations with external services for Airflow in the form of installable Python packages.
The Release Manager prepares packages separately from the main Airflow Release, using
breeze
commands and accompanying scripts. This document provides an overview of the command line tools
needed to prepare the packages.
First thing that release manager has to do is to change version of the provider to a target
version. Each provider has a provider.yaml
file that, among others, stores information
about provider versions. When you attempt to release a provider you should update that
information based on the changes for the provider, and it's CHANGELOG.rst
. It might be that
CHANGELOG.rst
already contains the right target version. This will be especially true if some
changes in the provider add new features (then minor version is increased) or when the changes
introduce backwards-incompatible, breaking change in the provider (then major version is
incremented). Committers, when approving and merging changes to the providers, should pay attention
that the CHANGELOG.rst
is updated whenever anything other than bugfix is added.
If there are no new features or breaking changes, the release manager should simply increase the patch-level version for the provider.
The new version should be first on the list.
Each of the provider packages contains Release notes in the form of the CHANGELOG.rst
file that is
automatically generated from history of the changes and code of the provider.
They are stored in the documentation directory. The README.md
file generated during package
preparation is not stored anywhere in the repository - it contains however link to the Changelog
generated.
Note! For Backport providers (until April 2021) the changelog was embedded and stored in the
airflow/providers/<PROVIDER>/README_BACKPORT_PACKAGES.md
. Those files will be updated only till April
2021 and will be removed afterwards.
The README.md
file contains the following information:
- summary of requirements for each backport package
- list of dependencies (including extras to install them) when package depends on other providers package
- link to the detailed
README.rst
- generated documentation for the packages.
The index.rst
stored in the docs\apache-airflow-providers-<PROVIDER>
folder contains:
- Contents this is manually maintained there
- the general package information (same for all packages with the name change)
- summary of requirements for each backport package
- list of dependencies (including extras to install them) when package depends on other providers package
- Content of high-level CHANGELOG.rst file that is stored in the provider folder next to
provider.yaml
file. - Detailed list of changes generated for all versions of the provider automatically
When you want to prepare release notes for a package, you need to run:
./breeze prepare-provider-documentation <PACKAGE_ID> ...
- <PACKAGE_ID> is usually directory in the
airflow/providers
folder (for examplegoogle
but in several cases, it might be one level deeper separated with.
for exampleapache.hive
The index.rst is updated automatically in the docs/apache-airflow-providers-<provider>
folder
You can run the script with multiple package names if you want to prepare several packages at the same time.
As soon as you are satisfied with the release notes generated you can commit generated changes/new files to the repository.
You build the packages in the breeze environment, so you do not have to worry about common environment.
Note that readme release notes have to be generated first, so that the package preparation script reads
the provider.yaml
.
- The provider package ids PACKAGE_ID are subdirectories in the
providers
directory. Sometimes they are one level deeper (apache/hive
folder for example, in which case PACKAGE_ID uses "." to separate the folders (for example Apache Hive's PACKAGE_ID isapache.hive
). You can see the list of all available providers by running:
./breeze prepare-provider-packages -- --help
The examples below show how you can build selected packages, but you can also build all packages by omitting the package ids altogether.
By default, you build both
packages, but you can use --package-format wheel
to generate
only wheel package, or --package-format sdist
to only generate sdist package.
- To build the release candidate packages for SVN Apache upload run the following command:
./breeze prepare-provider-packages --version-suffix-for-svn=rc1 [PACKAGE_ID] ...
for example:
./breeze prepare-provider-packages --version-suffix-for-svn=rc1 http ...
- To build the release candidate packages for PyPI upload run the following command:
./breeze prepare-provider-packages --version-suffix-for-pypi=rc1 [PACKAGE_ID] ...
for example:
./breeze prepare-provider-packages --version-suffix-for-pypi=rc1 http ...
- To build the final release packages run the following command:
./breeze prepare-provider-packages [--package-format PACKAGE_FORMAT] [PACKAGE_ID] ...
Where PACKAGE_FORMAT might be one of : wheel
, sdist
, both
(wheel
is the default format)
for example:
./breeze prepare-provider-packages http ...
-
For each package, this creates a wheel package and source distribution package in your
dist
folder with names following the patterns:apache_airflow_providers_<PROVIDER>_YYYY.[M]M.[D]D[suffix]-py3-none-any.whl
apache-airflow-providers-<PROVIDER>-YYYY.[M]M.[D]D[suffix].tar.gz
Note! Even if we always use the two-digit month and day when generating the readme files, the version in PyPI does not contain the leading 0s in version name - therefore the artifacts generated also do not container the leading 0s.
- You can install the .whl packages with
pip install <PACKAGE_FILE>
The provider preparation is done using Breeze
development environment and CI image. This way we have
common environment for package preparation, and we can easily verify if provider packages are OK and can
be installed for released versions of Airflow (including 2.0.0 version).
The same scripts and environment is run in our CI Workflow - the packages are prepared,
installed and tested using the same CI image. The tests are performed via the Production image, also
in the CI workflow. Our production images are built using Airflow and Provider packages prepared on the
CI so that they are as close to what users will be using when they are installing from PyPI. Our scripts
prepare wheel
and sdist
packages for both - airflow and provider packages and install them during
building of the images. This is very helpful in case of testing new providers that do not yet have PyPI
package released, but also it allows checking if provider's authors did not make breaking changes.
All classes from all providers must be imported - otherwise our CI will fail. Also, verification
of the image is performed where expected providers should be installed (for production image) and
providers should be discoverable, as well as pip check
with all the dependencies has to succeed.
You might want to occasionally modify the preparation scripts for providers. They are all present in
the dev/provider_packages
folder. There are the Breeze
commands above - they perform the sequence
of those steps automatically, but you can manually run the scripts as follows to debug them:
The commands are best to execute in the Breeze environment as it has all the dependencies installed, Examples below describe that. However, for development you might run them in your local development environment as it makes it easier to debug. Just make sure you install your development environment with 'devel_all' extra (make sure to ue the right python version).
Note that it is best to use INSTALL_PROVIDERS_FROM_SOURCES
set totrue
, to make sure
that any new added providers are not added as packages (in case they are not yet available in PyPI.
INSTALL_PROVIDERS_FROM_SOURCES="true" pip install -e ".[devel_all]" \
--constraint https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt
Note that you might need to add some extra dependencies to your system to install "devel_all" - many
dependencies are needed to make a clean install - the Breeze
environment has all the
dependencies installed in case you have problem with setting up your local virtualenv.
You can also use breeze
to prepare your virtualenv (it will print extra information if some
dependencies are missing/installation fails and it will also reset your SQLite test db in
the ${HOME}/airflow
directory:
./breeze initialize-local-virtualenv
You can find description of all the commands and more information about the "prepare"
tool by running it with --help
./dev/provider_packages/prepare_provider_packages.py --help
You can see for example list of all provider packages:
./dev/provider_packages/prepare_provider_packages.py list-providers-packages
The script verifies if all provider's classes can be imported.
- Enter Breeze environment (optionally if you have no local virtualenv):
./breeze
All the rest is in-container in case you use Breeze, but can be in your local virtualenv if you have
it installed with devel_all
extra.
- Install remaining dependencies. Until we manage to bring the apache.beam due to i's dependencies without
conflicting dependencies (requires fixing Snowflake and Azure providers). This is optional in case you
already installed the environment with
devel_all
extra
pip install -e ".[devel_all]"
- Run import check:
./dev/import_all_classes.py --path airflow/providers
It checks if all classes from provider packages can be imported.
The script verifies if all provider's classes are correctly named.
- Enter Breeze environment (optionally if you have no local virtualenv):
./breeze
All the rest is in-container in case you use Breeze, but can be in your local virtualenv if you have
it installed with devel_all
extra.
- Install remaining dependencies. Until we manage to bring the apache.beam due to i's dependencies without
conflicting dependencies (requires fixing Snowflake and Azure providers). This is optional in case you
already installed the environment with
devel_all
extra
pip install -e ".[devel_all]"
- Run import check:
./dev/provider_packages/prepare_provider_packages.py verify-provider-classes
It checks if all provider Operators/Hooks etc. are correctly named.
The script updates documentation of the provider packages. Note that it uses airflow git and pulls
the latest version of tags available in Airflow, so you need to enter Breeze with
--mount-all-local-sources flag
- Enter Breeze environment (optionally if you have no local virtualenv):
./breeze --mount-all-local-sources
(all the rest is in-container)
- Install remaining dependencies. Until we manage to bring the apache.beam due to i's dependencies without conflicting dependencies (requires fixing Snowflake and Azure providers). Optionally if you have no local virtualenv.
pip install -e ".[devel_all]"
- Run update documentation (version suffix might be empty):
./dev/provider_packages/prepare_provider_packages.py --version-suffix <SUFFIX> \
update-package-documentation <PACKAGE>
This script will fetch the latest version of airflow from Airflow's repo (it will automatically add
apache-https-for-providers
remote and pull airflow (read only) from there. There is no need
to setup any credentials for it.
In case version being prepared is already tagged in the repo documentation preparation returns immediately and prints warning.
This script prepares the actual packages.
- Enter Breeze environment:
./breeze
(all the rest is in-container)
- Copy Provider Packages sources
This steps copies provider package sources (with cleaning it up before) to provider_packages
folder so that the packages can be built from there. This was necessary for Backport Providers
(described in their own readme as we also performed refactor of
the code. When we remove Backport Packages in April 2021 we can likely simplify the steps using
existing setuptools features, and we will be able to simplify the process.
./dev/provider_packages/copy_provider_package_sources.py
Now you can run package generation step-by-step, separately building one package at a time.
The breeze
command are more convenient if you want to build several packages at the same
time, but for testing and debugging those are the commands executed next:
- Cleanup the artifact directories:
This is needed because setup tools does not clean those files and generating packages one by one without cleanup, might include artifacts from previous package to be included in the new one.
rm -rf -- *.egg-info build/
- Generate setup.py/setup.cfg/MANIFEST.in/provider_info.py/README files files for:
- alpha/beta packages (specify a1,a2,.../b1,b2... suffix)
- release candidates (specify r1,r2,... suffix) - those are release candidate
- official package (to be released in PypI as official package)
The version suffix specified here will be appended to the version retrieved from
provider.yaml
. Note that this command will fail if the tag denoted by the
version + suffix already exist. This means that the version was not updated since the
last time it was generated. In the CI we always add 'dev' suffix, and we never create
TAG for it, so in the CI the setup.py is generated and should never fail.
./dev/provider_packages/prepare_provider_packages.py --version-suffix "<SUFFIX>" \
generate-setup-files <PACKAGE>
The script prepares the package after sources have been copied and setup files generated.
Note that it uses airflow git and pulls the latest version of tags available in Airflow,
so you need to enter Breeze with
--mount-all-local-sources flag
- Enter Breeze environment (optionally if you have no local virtualenv):
./breeze --mount-all-local-sources
(all the rest is in-container)
- Install remaining dependencies. Until we manage to bring the apache.beam due to i's dependencies without conflicting dependencies (requires fixing Snowflake and Azure providers). Optionally if you have no local virtualenv.
pip install -e ".[devel_all]"
- Run update documentation (version suffix might be empty):
./dev/provider_packages/prepare_provider_packages.py --version-suffix <SUFFIX> \
build-provider-packages <PACKAGE>
In case version being prepared is already tagged in the repo documentation preparation returns immediately
and prints error. You can prepare the error regardless and build the packages even if the tag exists, by
specifying --version-suffix
(for example --version-suffix dev
).
By default, you prepare both
packages, but you can add --package-format
argument and specify
wheel
, sdist
to build only one of them.
The provider packages importing and tests execute within the "CI" environment of Airflow -the same image that is used by Breeze. They however require special mounts (no sources of Airflow mounted to it) and possibility to install all extras and packages in order to test if all classes can be imported. It is rather simple but requires some semi-automated process:
- Prepare regular packages
./breeze prepare-provider-packages
This prepares all provider packages in the "dist" folder
- Prepare airflow package from sources
./breeze prepare-airflow-packages
This prepares airflow package in the "dist" folder
- Enter the container:
export INSTALL_AIRFLOW_VERSION="wheel"
unset BACKPORT_PACKAGES
./dev/provider_packages/enter_breeze_provider_package_tests.sh
(the rest of it is in the container)
- [IN CONTAINER] Install apache-beam.
pip install apache-beam[gcp]
- [IN CONTAINER] Install the provider packages from /dist
pip install --no-deps /dist/apache_airflow_providers_*.whl
Note! No-deps is because we are installing the version installed from wheel package.
- [IN CONTAINER] Check the installation folder for providers:
python3 <<EOF 2>/dev/null
import airflow.providers;
path=airflow.providers.__path__
for p in path._path:
print(p)
EOF
- [IN CONTAINER] Check if all the providers can be imported python3 /opt/airflow/dev/import_all_classes.py --path <PATH_REPORTED_IN_THE_PREVIOUS_STEP>