diff --git a/BREEZE.rst b/BREEZE.rst index 30bc49c7f3b06..89fa2db8f45ad 100644 --- a/BREEZE.rst +++ b/BREEZE.rst @@ -21,8 +21,8 @@ .. contents:: :local: -About Airflow Breeze -==================== +Airflow Breeze CI Environment +============================= Airflow Breeze is an easy-to-use integration test environment managed via `Docker Compose `_. @@ -74,16 +74,12 @@ Docker Compose Docker Images Used by Breeze ---------------------------- -For all development tasks, related integration tests and static code checks, we use Docker -images maintained on the Docker Hub in the ``apache/airflow`` repository. - -There are three images that we are currently managing: - -* **CI image*** that is used for testing od both Unit tests and static check tests. - It contains a lot test-related packages (size of ~1GB). - Its tag follows the pattern of ``-python-ci`` - (for example, ``apache/airflow:master-python3.6-ci``). The image is built using the - ``_ Dockerfile. +For all development tasks, related integration tests and static code checks, we use the +**CI image** maintained on the Docker Hub in the ``apache/airflow`` repository. +This Docker image contains a lot test-related packages (size of ~1GB). +Its tag follows the pattern of ``-python-ci`` +(for example, ``apache/airflow:master-python3.6-ci``). The image is built using the +``_ Dockerfile. Before you run tests, enter the environment or run local static checks, the necessary local images should be pulled and built from Docker Hub. This happens automatically for the test environment but you need to @@ -93,7 +89,7 @@ The static checks will fail and inform what to do if the image is not yet built. Building the image first time pulls a pre-built version of images from the Docker Hub, which may take some time. But for subsequent source code changes, no wait time is expected. -However, changes to sensitive files like setup.py or Dockerfile will trigger a rebuild +However, changes to sensitive files like ``setup.py`` or ``Dockerfile`` will trigger a rebuild that may take more time though it is highly optimized to only rebuild what is needed. In most cases, rebuilding an image requires network connectivity (for example, to download new @@ -106,28 +102,29 @@ See `Troubleshooting section <#troubleshooting>`_ for steps you can make to clea Getopt and gstat ---------------- +* For Linux, run ``apt install util-linux coreutils`` or an equivalent if your system is not Debian-based. * For macOS, install GNU ``getopt`` and ``gstat`` utilities to get Airflow Breeze running. Run ``brew install gnu-getopt coreutils`` and then follow instructions to link the gnu-getopt version to become the first on the PATH. Make sure to re-login after you make the suggested changes. - If you use bash, run this command and re-login: +**Examples:** + +If you use bash, run this command and re-login: .. code-block:: bash echo 'export PATH="/usr/local/opt/gnu-getopt/bin:$PATH"' >> ~/.bash_profile . ~/.bash_profile -.. - If you use zsh, run this command and re-login: +If you use zsh, run this command and re-login: .. code-block:: bash echo 'export PATH="/usr/local/opt/gnu-getopt/bin:$PATH"' >> ~/.zprofile . ~/.zprofile -* For Linux, run ``apt install util-linux coreutils`` or an equivalent if your system is not Debian-based. Memory ------ @@ -158,7 +155,6 @@ from your ``logs`` directory in the Airflow sources, so all logs created in the visible in the host as well. Every time you enter the container, the ``logs`` directory is cleaned so that logs do not accumulate. - Using the Airflow Breeze Environment ===================================== @@ -187,9 +183,9 @@ You enter the Breeze integration test environment by running the ``./breeze`` sc the ``--help`` option to see the list of available flags. See `Airflow Breeze flags <#airflow-breeze-flags>`_ for details. - .. code-block:: bash +.. code-block:: bash - ./breeze + ./breeze First time you run Breeze, it pulls and builds a local version of Docker images. It pulls the latest Airflow CI images from `Airflow DockerHub `_ @@ -200,7 +196,7 @@ Once you enter the environment, you are dropped into bash shell of the Airflow c run tests immediately. You can `set up autocomplete <#setting-up-autocomplete>`_ for commands and add the -checked-out Airflow repository to your PATH to run Breeze without the ./ and from any directory. +checked-out Airflow repository to your PATH to run Breeze without the ``./`` and from any directory. Stopping Breeze --------------- @@ -208,9 +204,9 @@ Stopping Breeze After starting up, the environment runs in the background and takes precious memory. You can always stop it via: - .. code-block:: bash +.. code-block:: bash - ./breeze --stop-environment + ./breeze --stop-environment Choosing a Breeze Environment ----------------------------- @@ -222,7 +218,7 @@ environments as we have in matrix builds in Travis CI. For example, you can choose to run Python 3.6 tests with MySQL as backend and in the Docker environment as follows: - .. code-block:: bash +.. code-block:: bash ./breeze --python 3.6 --backend mysql --env docker @@ -243,13 +239,13 @@ environment instead. The following environments are available: - * The ``docker`` environment (default): starts all dependencies required by a full integration test suite - (Postgres, Mysql, Celery, etc). This option is resource intensive so do not forget to - [stop environment](#stopping-the-environment) when you are finished. This option is also RAM intensive - and can slow down your machine. - * The ``kubernetes`` environment: Runs Airflow tests within a Kubernetes cluster. - * The ``bare`` environment: runs Airflow in the Docker without any external dependencies. - It only works for independent tests. You can only run it with the sqlite backend. +* The ``docker`` environment (default): starts all dependencies required by a full integration test suite + (Postgres, Mysql, Celery, etc). This option is resource intensive so do not forget to + [stop environment](#stopping-the-environment) when you are finished. This option is also RAM intensive + and can slow down your machine. +* The ``kubernetes`` environment: Runs Airflow tests within a Kubernetes cluster. +* The ``bare`` environment: runs Airflow in the Docker without any external dependencies. + It only works for independent tests. You can only run it with the sqlite backend. Cleaning Up the Environment @@ -313,7 +309,7 @@ Running Arbitrary Commands in the Breeze Environment To run other commands/executables inside the Breeze Docker-based environment, use the ``-x``, ``--execute-command`` flag. To add arguments, specify them -together with the command surrounded with either ``"`` or ``'``, or pass them after -- as extra arguments. +together with the command surrounded with either ``"`` or ``'``, or pass them after ``--`` as extra arguments. .. code-block:: bash @@ -329,7 +325,7 @@ Running Docker Compose Commands To run Docker Compose commands (such as ``help``, ``pull``, etc), use the ``-d``, ``--docker-compose`` flag. To add extra arguments, specify them -after -- as extra arguments. +after ``--`` as extra arguments. .. code-block:: bash @@ -430,7 +426,7 @@ values of parameters that you can use. You can set up the autocomplete option automatically by running: - .. code-block:: bash +.. code-block:: bash ./breeze --setup-autocomplete @@ -485,344 +481,52 @@ Often errors during documentation generation come from the docstrings of auto-ap During the docs building auto-api generated files are stored in the ``docs/_api`` folder. This helps you easily identify the location the problems with documentation originated from. -Testing and Debugging in Breeze -=============================== - -Debugging with ipdb -------------------- - -You can debug any code you run in the container using ``ipdb`` debugger if you prefer console debugging. -It is as easy as copy&pasting this line into your code: - -.. code-block:: python - - import ipdb; ipdb.set_trace() - -Once you hit the line, you will be dropped into an interactive ``ipdb`` debugger where you have colors -and autocompletion to guide your debugging. This works from the console where you started your program. -Note that in case of ``nosetest`` you need to provide the ``--nocapture`` flag to avoid nosetests -capturing the stdout of your process. - -Running Unit Tests in Airflow Breeze ------------------------------------- - -Once you enter Airflow Breeze environment, you can simply use -``run-tests`` at will. Note that if you want to pass extra parameters to ``nose``, -you should do it after '--'. - -For example, to execute the "core" unit tests, run the following: +Using Your Host IDE +=================== -.. code-block:: bash - - run-tests tests.core:TestCore -- -s --logging-level=DEBUG - -For a single test method, run: - -.. code-block:: bash - - run-tests tests.core:TestCore.test_check_operators -- -s --logging-level=DEBUG - -The tests run ``airflow db reset`` and ``airflow db init`` the first time you -launch them in a running container, so you can count on the database being initialized. - -All subsequent test executions within the same container will run without database -initialization. - -You can also optionally add the ``--with-db-init`` flag if you want to re-initialize -the database. - -.. code-block:: bash - - run-tests --with-db-init tests.core:TestCore.test_check_operators -- -s --logging-level=DEBUG - -Running Tests for a Specified Target ------------------------------------- - -If you wish to only run tests and not to drop into shell, you can do this by providing the --t, --test-target flag. You can add extra nosetest flags after -- in the command line. - -.. code-block:: bash +You can set up your host IDE (for example, IntelliJ's PyCharm/Idea) to work with Breeze +and benefit from all the features provided by your IDE, such as local and remote debugging, +autocompletion, documentation support, etc. - ./breeze --test-target tests/hooks/test_druid_hook.py -- --logging-level=DEBUG +To use your host IDE with Breeze: -You can run the whole test suite with a special '.' test target: +1. Create a local virtual environment as follows: -.. code-block:: bash + ``mkvirtualenv --python=python`` - ./breeze --test-target . + You can use any of the following wrappers to create and manage your virtual environemnts: + `pyenv `_, `pyenv-virtualenv `_, + or `virtualenvwrapper `_. -You can also specify individual tests or a group of tests: + Ideally, you should have virtualenvs for all Python versions supported by Airflow (3.5, 3.6, 3.7) + and switch between them with the ``workon`` command. -.. code-block:: bash +2. Use the ``workon`` command to enter the Breeze environment. - ./breeze --test-target tests.core:TestCore - -Running Static Code Checks --------------------------- +3. Initialize the created local virtualenv: -We have a number of static code checks that are run in Travis CI but you can also run them locally -in the Docker environment. All these tests run in Python 3.6 environment. + ``./breeze --initialize-local-virtualenv`` -The first time you run the checks, it may take some time to rebuild the Docker images. But all the -subsequent runs will be much faster since the build phase will just check whether your code has changed -and rebuild as needed. - -The static code checks launched in the Breeze Docker-based environment do not need a special environment -preparation and provide the same results as the similar tests launched in Travis CI. - -You run the checks via ``-S``, ``--static-check`` flags or ``-F``, ``--static-check-all-files``. -The former ones run appropriate checks only for files changed and staged locally, the latter ones run checks -on all files. - -Note that it may take a lot of time to run checks for all files with pylint on macOS due to a slow -filesystem for macOS Docker. As a workaround, you can add their arguments after ``--`` as extra arguments. -You cannot pass the ``--files`` flag if you select the ``--static-check-all-files`` option. - -You can see the list of available static checks either via ``--help`` flag or by using the autocomplete -option. Note that the ``all`` static check runs all configured static checks. Also since pylint tests take -a lot of time, you can run a special ``all-but-pylint`` check that skips pylint checks. - -Run the ``mypy`` check for the currently staged changes: - -.. code-block:: bash - - ./breeze --static-check mypy - -Run the ``mypy`` check for all files: - -.. code-block:: bash - - ./breeze --static-check-all-files mypy - -Run the ``flake8`` check for the ``tests.core.py`` file with verbose output: - -.. code-block:: bash - - ./breeze --static-check flake8 -- --files tests/core.py --verbose - -Run the ``flake8`` check for the ``tests.core`` package with verbose output: - -.. code-block:: bash - - ./breeze --static-check mypy -- --files tests/hooks/test_druid_hook.py - -Run all tests for the currently staged files: - -.. code-block:: bash - - ./breeze --static-check all - -Run all tests for all files: - -.. code-block:: bash - - ./breeze --static-check-all-files all - -Run all tests but pylint for all files: - -.. code-block:: bash - - ./breeze --static-check-all-files all-but-pylint - -Run pylint checks for all changed files: - -.. code-block:: bash - - ./breeze --static-check pylint - -Run pylint checks for selected files: - -.. code-block:: bash +4. Select the virtualenv you created as the project's default virtualenv in your IDE. - ./breeze --static-check pylint -- --files airflow/configuration.py +Note that you can also use the local virtualenv for Airflow development without Breeze. +This is a lightweight solution that has its own limitations. +More details on using the local virtualenv are avaiable in the `LOCAL_VIRTUALENV.rst `_. -Run pylint checks for all files: - -.. code-block:: bash - - ./breeze --static-check-all-files pylint - - -The ``license`` check is run via a separate script and a separate Docker image containing the -Apache RAT verification tool that checks for Apache-compatibility of licenses within the codebase. -It does not take pre-commit parameters as extra arguments. - -.. code-block:: bash - - ./breeze --static-check-all-files licenses - -Running Static Code Checks from the Host ----------------------------------------- - -You can trigger the static checks from the host environment, without entering the Docker container. To do -this, run the following scripts (the same is done in Travis CI): - -* ``_ - checks the licenses. -* ``_ - checks that documentation can be built without warnings. -* ``_ - runs Flake8 source code style enforcement tool. -* ``_ - runs lint checker for the Dockerfile. -* ``_ - runs a check for mypy type annotation consistency. -* ``_ - runs pylint static code checker for main files. -* '``_ - runs pylint static code checker for tests. - -The scripts may ask you to rebuild the images, if needed. - -You can force rebuilding the images by deleting the [.build](./build) directory. This directory keeps cached -information about the images already built and you can safely delete it if you want to start from scratch. - -After documentation is built, the HTML results are available in the [docs/_build/html](docs/_build/html) -folder. This folder is mounted from the host so you can access those files on your host as well. - -Running Static Code Checks in the Docker ------------------------------------------- - -If you are already in the Breeze Docker environment (by running the ``./breeze`` command), -you can also run the same static checks from the container: - -* Mypy: ``./scripts/ci/in_container/run_mypy.sh airflow tests`` -* Pylint for main files: ``./scripts/ci/in_container/run_pylint_main.sh`` -* Pylint for test files: ``./scripts/ci/in_container/run_pylint_tests.sh`` -* Flake8: ``./scripts/ci/in_container/run_flake8.sh`` -* License check: ``./scripts/ci/in_container/run_check_licence.sh`` -* Documentation: ``./scripts/ci/in_container/run_docs_build.sh`` - -Running Static Code Analysis for Selected Files ------------------------------------------------ - -In all static check scripts, both in the container and host versions, you can also pass a module/file path as -parameters of the scripts to only check selected modules or files. For example: - -In the Docker container: - -.. code-block:: - - ./scripts/ci/in_container/run_pylint.sh ./airflow/example_dags/ - -or - -.. code-block:: - - ./scripts/ci/in_container/run_pylint.sh ./airflow/example_dags/test_utils.py - -On the host: - -.. code-block:: - - ./scripts/ci/ci_pylint.sh ./airflow/example_dags/ - - -.. code-block:: - - ./scripts/ci/ci_pylint.sh ./airflow/example_dags/test_utils.py - -Running Test Suites via Scripts --------------------------------------------- - -To run all tests with default settings (Python 3.6, Sqlite backend, "docker" environment), enter: - -.. code-block:: - - ./scripts/ci/local_ci_run_airflow_testing.sh - - -To select Python 3.6 version, Postgres backend, and a "docker" environment, specify: - -.. code-block:: - - PYTHON_VERSION=3.6 BACKEND=postgres ENV=docker ./scripts/ci/local_ci_run_airflow_testing.sh - -To run Kubernetes tests, enter: - -.. code-block:: - - KUBERNETES_VERSION==v1.13.0 KUBERNETES_MODE=persistent_mode BACKEND=postgres ENV=kubernetes \ - ./scripts/ci/local_ci_run_airflow_testing.sh - -* PYTHON_VERSION is one of 3.6/3.7 -* BACKEND is one of postgres/sqlite/mysql -* ENV is one of docker/kubernetes/bare -* KUBERNETES_VERSION is required for Kubernetes tests. Currently, it is KUBERNETES_VERSION=v1.13.0. -* KUBERNETES_MODE is a mode of kubernetes: either persistent_mode or git_mode. - -Using Your Host IDE with Breeze +Running static checks in Breeze =============================== -Configuring local virtualenv ----------------------------- - -To use your host IDE (for example, IntelliJ's PyCharm/Idea), you need to set up virtual environments. -Ideally, you should have virtualenvs for all Python versions supported by Airflow (3.6, 3.7). -You can create a virtualenv using ``virtualenvwrapper``. This allows you to easily switch between -virtualenvs using the ``workon`` command and manage your virtual environments more easily. - -Typically creating the environment can be done by: +The Breeze environment is also used to run some of the static checks as described in +`STATIC_CODE_CHECKS.rst `_. -.. code-block:: bash - - mkvirtualenv --python=python - -After the virtualenv is created, you need to initialize it. Simply enter the environment by -using ``workon`` and, once you are in it, run: - -.. code-block:: bash - ./breeze --initialize-local-virtualenv +Running Tests in Breeze +======================= -Once initialization is done, select the virtualenv you initialized as a default project -virtualenv in your IDE. +As soon as you enter the Breeze environment, you can run Airflow unit tests via the ``run-tests`` command. -Running Unit Tests via IDE --------------------------- - -When setup is done, you can use the usual **Run Test** option of the IDE, have all the -autocomplete and documentation support from IDE as well as you can debug and click-through -the sources of Airflow, which is very helpful during development. Usually you can also run most -of the unit tests (those that do not have dependencies) directly from the IDE: - -Running unit tests from IDE is as simple as: - -.. image:: images/running_unittests.png - :align: center - :alt: Running unit tests - -Some of the core tests use dags defined in ``tests/dags`` folder. Those tests should have -``AIRFLOW__CORE__UNIT_TEST_MODE`` set to True. You can set it up in your test configuration: - -.. image:: images/airflow_unit_test_mode.png - :align: center - :alt: Airflow Unit test mode - - -You cannot run all the tests this way but only unit tests that do not require external dependencies -such as Postgres/MySQL/Hadoop/etc. You should use the -`run-tests <#running-tests-in-airflow-breeze>`_ command for these tests. You can -still use your IDE to debug those tests as explained in the next section. - -Debugging Airflow Breeze Tests in IDE -------------------------------------- - -When you run example DAGs, even if you run them using unit tests within IDE, they are run in a separate -container. This makes it a little harder to use with IDE built-in debuggers. -Fortunately, IntelliJ/PyCharm provides an effective remote debugging feature (but only in paid versions). -See additional details on -`remote debugging `_. - -You can set up your remote debugging session as follows: - -.. image:: images/setup_remote_debugging.png - :align: center - :alt: Setup remote debugging - -Note that on macOS, you have to use a real IP address of your host rather than default -localhost because on macOS the container runs in a virtual machine with a different IP address. - -Make sure to configure source code mapping in the remote debugging configuration to map -your local sources to the ``/opt/airflow`` location of the sources within the container: - -.. image:: images/source_code_mapping_ide.png - :align: center - :alt: Source code mapping +For supported CI test suites, types of unit tests, and other tests, see `TESTING.rst `_. Breeze Command-Line Interface Reference ======================================= diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst index 83d9fe9da8d1a..c754b8ad98799 100644 --- a/CONTRIBUTING.rst +++ b/CONTRIBUTING.rst @@ -27,7 +27,7 @@ Report Bugs ----------- Report bugs through `Apache -Jira `__. +JIRA `__. Please report relevant information and preferably code that exhibits the problem. @@ -108,7 +108,7 @@ these guidelines: The airflow repo uses `Travis CI `__ to run the tests and `codecov `__ to track coverage. You can set up both for free on your fork (see - `Travis CI Testing Framework <#travis-ci-testing-framework>`__ section below). + `Travis CI Testing Framework `__ usage guidelines). It will help you make sure you do not break the build with your PR and that you help increase coverage. @@ -130,21 +130,20 @@ these guidelines: - Add an `Apache License `__ header to all new files. - If you have `pre-commit hooks <#pre-commit-hooks>`_ enabled, they automatically add + If you have `pre-commit hooks `__ enabled, they automatically add license headers during commit. - If your pull request adds functionality, make sure to update the docs as part of the same PR. Doc string is often sufficient. Make sure to follow the Sphinx compatible standards. -- Make sure the pull request works for Python 3.5, 3.6 and 3.7. +- Make sure your code fulfils all the + `static code checks `__ we have in our code. The easiest way + to make sure of that is to use `pre-commit hooks `__ - Run tests locally before opening PR. - As Airflow grows as a project, we try to enforce a more consistent style and - follow the Python community guidelines. We currently enforce most of - `PEP8 `__ and a few other linting - rules described in `Running static code checks `__ section. +- Make sure the pull request works for Python 3.6 and 3.7. - Adhere to guidelines for commit messages described in this `article `__. This makes the lives of those who come after you a lot easier. @@ -222,8 +221,6 @@ Limitations: Breeze container-based solution provides a reproducible environment that is consistent with other developers. -Possible extensions: - - You are **STRONGLY** encouraged to also install and use `pre-commit hooks <#pre-commit-hooks>`_ for your local virtualenv development environment. Pre-commit hooks can speed up your development cycle a lot. @@ -267,350 +264,34 @@ Limitations: They are optimized for repeatability of tests, maintainability and speed of building rather than production performance. The production images are not yet officially published. -Pylint Checks -============= - -We are in the process of fixing code flagged with pylint checks for the whole Airflow project. -This is a huge task so we implemented an incremental approach for the process. -Currently most of the code is excluded from pylint checks via scripts/ci/pylint_todo.txt. -We have an open JIRA issue AIRFLOW-4364 which has a number of sub-tasks for each of -the modules that should be made compatible. Fixing problems identified with pylint is one of -straightforward and easy tasks to do (but time-consuming), so if you are a first-time -contributor to Airflow, you can choose one of the sub-tasks as your first issue to fix. - -To fix a pylint issue, do the following: - -1. Remove module/modules from the - `scripts/ci/pylint_todo.txt `__. - -2. Run `scripts/ci/ci_pylint_main.sh `__ and -`scripts/ci/ci_pylint_tests.sh `__. - -3. Fix all the issues reported by pylint. - -4. Re-run `scripts/ci/ci_pylint_main.sh `__ and -`scripts/ci/ci_pylint_tests.sh `__. - -5. If you see "success", submit a PR following - `Pull Request guidelines <#pull-request-guidelines>`__. -  - -These are guidelines for fixing errors reported by pylint: - -- Fix the errors rather than disable pylint checks. Often you can easily - refactor the code (IntelliJ/PyCharm might be helpful when extracting methods - in complex code or moving methods around). - -- If disabling a particular problem, make sure to disable only that error by - using the symbolic name of the error as reported by pylint. - -.. code-block:: python - - import airflow.* # pylint: disable=wildcard-import - - -- If there is a single line where you need to disable a particular error, - consider adding a comment to the line that causes the problem. For example: - -.. code-block:: python - - def MakeSummary(pcoll, metric_fn, metric_keys): # pylint: disable=invalid-name - - -- For multiple lines/block of code, to disable an error, you can surround the - block with ``pylint: disable/pylint: enable`` comment lines. For example: - -.. code-block:: python - - # pylint: disable=too-few-public-methods - class LoginForm(Form): - """Form for the user""" - username = StringField('Username', [InputRequired()]) - password = PasswordField('Password', [InputRequired()]) - # pylint: enable=too-few-public-methods - - -Pre-commit Hooks -================ - -Pre-commit hooks help speed up your local development cycle, either in the local virtualenv or Breeze, -and place less burden on the CI infrastructure. Consider installing the pre-commit -hooks as a necessary prerequisite. - -The pre-commit hooks only check the files you are currently working on and make -them fast. Yet, these checks use exactly the same environment as the CI tests -use. So, you can be sure your modifications will also work for CI if they pass -pre-commit hooks. - -We have integrated the fantastic `pre-commit `__ framework -in our development workflow. To install and use it, you need Python 3.6 locally. - -It is the best to use pre-commit hooks when you have your local virtualenv for -Airflow activated since then pre-commit hooks and other dependencies are -automatically installed. You can also install the pre-commit hooks manually -using ``pip install``. - -The pre-commit hooks require the Docker Engine to be configured as the static -checks are executed in the Docker environment. You should build the images -locally before installing pre-commit checks as described in `BREEZE.rst `__. -In case you do not have your local images built, the -pre-commit hooks fail and provide instructions on what needs to be done. - -Prerequisites for Pre-commit Hooks ----------------------------------- - -The pre-commit hooks use several external linters that need to be installed before pre-commit is run. - -Each of the checks installs its own environment, so you do not need to install those, but there are some -checks that require locally installed binaries. On Linux, you typically install -them with ``sudo apt install``, on macOS - with ``brew install``. - -The current list of prerequisites: - -- ``xmllint``: - on Linux, install via ``sudo apt install xmllint``; - on macOS, install via ``brew install xmllint`` - -Enabling Pre-commit Hooks -------------------------- - -To turn on pre-commit checks for ``commit`` operations in git, enter: - -.. code-block:: bash - - pre-commit install - - -To install the checks also for ``pre-push`` operations, enter: - -.. code-block:: bash - - pre-commit install -t pre-push - - -For details on advanced usage of the install method, use: - -.. code-block:: bash - - pre-commit install --help - - -Using Docker Images for Pre-commit Hooks ----------------------------------------- - -Before running the pre-commit hooks, you must first build the Docker images as -described in `BREEZE.rst `__. - -Sometimes your image is outdated and needs to be rebuilt because some -dependencies have been changed. In such case the Docker-based pre-commit will -inform you that you should rebuild the image. - -Supported Pre-commit Hooks --------------------------- - -In Airflow, we have the following checks (The checks with stare in Breeze require `BREEZE.rst `__ -image built locally): - -=================================== ================================================================ ============ -**Hooks** **Description** **Breeze** -=================================== ================================================================ ============ -``base-operator`` Checks that BaseOperator is imported properly ------------------------------------ ---------------------------------------------------------------- ------------ -``build`` Builds image for check-apache-licence, mypy, pylint, flake8. * ------------------------------------ ---------------------------------------------------------------- ------------ -``check-apache-license`` Checks compatibility with Apache License requirements. * ------------------------------------ ---------------------------------------------------------------- ------------ -``check-executables-have-shebangs`` Checks that executables have shebang. ------------------------------------ ---------------------------------------------------------------- ------------ -``check-hooks-apply`` Checks which hooks are applicable to the repository. ------------------------------------ ---------------------------------------------------------------- ------------ -``check-merge-conflict`` Checks if a merge conflict is committed. ------------------------------------ ---------------------------------------------------------------- ------------ -``check-xml`` Checks XML files with xmllint. ------------------------------------ ---------------------------------------------------------------- ------------ -``consistent-pylint`` Consistent usage of pylint enable/disable with space. ------------------------------------ ---------------------------------------------------------------- ------------ -``debug-statements`` Detects accidenatally committed debug statements. ------------------------------------ ---------------------------------------------------------------- ------------ -``detect-private-key`` Detects if private key is added to the repository. ------------------------------------ ---------------------------------------------------------------- ------------ -``doctoc`` Refreshes the table of contents for md files. ------------------------------------ ---------------------------------------------------------------- ------------ -``end-of-file-fixer`` Makes sure that there is an empty line at the end. ------------------------------------ ---------------------------------------------------------------- ------------ -``flake8`` Runs flake8. * ------------------------------------ ---------------------------------------------------------------- ------------ -``forbid-tabs`` Fails if tabs are used in the project. ------------------------------------ ---------------------------------------------------------------- ------------ -``insert-license`` Adds licenses for most file types. ------------------------------------ ---------------------------------------------------------------- ------------ -``isort`` Sorts imports in python files. ------------------------------------ ---------------------------------------------------------------- ------------ -``lint-dockerfile`` Lints a dockerfile. ------------------------------------ ---------------------------------------------------------------- ------------ -``mixed-line-ending`` Detects if mixed line ending is used (\r vs. \r\n). ------------------------------------ ---------------------------------------------------------------- ------------ -``mypy`` Runs mypy. * ------------------------------------ ---------------------------------------------------------------- ------------ -``pydevd`` Check for accidentally commited pydevd statements. ------------------------------------ ---------------------------------------------------------------- ------------ -``pylint`` Runs pylint for main code. * ------------------------------------ ---------------------------------------------------------------- ------------ -``pylint-tests`` Runs pylint for tests. * ------------------------------------ ---------------------------------------------------------------- ------------ -``python-no-log-warn`` Checks if there are no deprecate log warn. ------------------------------------ ---------------------------------------------------------------- ------------ -``rst-backticks`` Checks if RST files use double backticks for code. ------------------------------------ ---------------------------------------------------------------- ------------ -``setup-order`` Checks for an order of dependencies in setup.py ------------------------------------ ---------------------------------------------------------------- ------------ -``shellcheck`` Checks shell files with shellcheck. ------------------------------------ ---------------------------------------------------------------- ------------ -``update-breeze-file`` Update output of breeze command in BREEZE.rst. ------------------------------------ ---------------------------------------------------------------- ------------ -``yamllint`` Checks yaml files with yamllint. -=================================== ================================================================ ============ - - -Using Pre-commit Hooks ----------------------- - -After installation, pre-commit hooks are run automatically when you commit the -code. But you can run pre-commit hooks manually as needed. - -- Run all checks on your staged files by using: - -.. code-block:: bash - - pre-commit run - - -- Run only mypy check on your staged files by using: - -.. code-block:: bash - - pre-commit run mypy - - -- Run only mypy checks on all files by using: - -.. code-block:: bash - - pre-commit run mypy --all-files - - -- Run all checks on all files by using: - -.. code-block:: bash - - pre-commit run --all-files - - -- Skip one or more of the checks by specifying a comma-separated list of - checks to skip in the SKIP variable: - -.. code-block:: bash - - SKIP=pylint,mypy pre-commit run --all-files - - -You can always skip running the tests by providing ``--no-verify`` flag to the -``git commit`` command. - -To check other usage types of the pre-commit framework, see `Pre-commit website `__. - -Importing Airflow core objects -============================== - -When you implement core features or DAGs you might need to import some of the core objects or modules. -Since Apache Airflow can be used both as application (by internal classes) and as library (by DAGs), there are -different ways those core objects and packages are imported. - -Airflow imports some of the core objects directly to 'airflow' package so that they can be used from there. - -Those criteria were assumed for choosing what import path to use: - -* If you work on a core feature inside Apache Airflow, you should import the objects directly from the - package where the object is defined - this minimises the risk of cyclic imports. -* If you import the objects from any of 'providers' classes, you should import the objects from - 'airflow' or 'airflow.models', It is very important for back-porting operators/hooks/sensors - to Airflow 1.10.* (AIP-21) -* If you import objects from within a DAG you write, you should import them from 'airflow' or - 'airflow.models' package where stable location of such import is important. - -Those checks enforced for the most important and repeated objects via pre-commit hooks as described below. - -BaseOperator ------------- - -The BaseOperator should be imported: -* as ``from airflow.models import BaseOperator`` in external DAG/operator -* as ``from airflow.models.baseoperator import BaseOperator`` in Airflow core to avoid cyclic imports - - -Travis CI Testing Framework -=========================== - -Airflow test suite is based on Travis CI framework as running all of the tests -locally requires significant setup. You can set up Travis CI in your fork of -Airflow by following the `Travis -CI Getting Started -guide `__. +Static code checks +================== +We check our code quality via static code checks. See +`STATIC_CODE_CHECKS.rst`_ for details. -There are two different options available for running Travis CI, and they are -set up on GitHub as separate components: +Your code must pass all the static code checks in Travis CI in order to be eligible for Code Review. +The easiest way to make sure your code is good before pushing is to use pre-commit checks locally +as described in the static code checks documentation. -- **Travis CI GitHub App** (new version) -- **Travis CI GitHub Services** (legacy version) +Test Infrastructure +=================== -Using Travis CI GitHub App (new version) ----------------------------------------- - -- Once `installed `__, - configure the Travis CI GitHub App at - `Configure Travis CI `__. - -- Set repository access to either "All repositories" for convenience, or "Only - select repositories" and choose ``USERNAME/airflow`` in the drop-down menu. - -- Access Travis CI for your fork at ``__. - -Using Travis CI GitHub Services (legacy version) ------------------------------------------------- - -**NOTE:** The apache/airflow project is still using the legacy version. +We support the following types of tests: -Travis CI GitHub Services version uses an Authorized OAuth App. - -1. Once installed, configure the Travis CI Authorized OAuth App at - `Travis CI OAuth APP `__. - -2. If you are a GitHub admin, click the **Grant** button next to your - organization; otherwise, click the **Request** button. For the Travis CI - Authorized OAuth App, you may have to grant access to the forked - ``ORGANIZATION/airflow`` repo even though it is public. - -3. Access Travis CI for your fork at - ``_. - -Creating New Projects in Travis CI ----------------------------------- +* **Unit tests** are Python ``nose`` tests launched with ``run-tests``. + Unit tests are available both in the `Breeze environment `_ + and `local virtualenv `_. -If you need to create a new project in Travis CI, use travis-ci.com for both -private repos and open source. +* **Integration tests** are available in the Breeze development environment + that is also used for Airflow Travis CI tests. Integration test are special tests that require + additional services running, such as Postgres,Mysql, Kerberos, etc. These tests are not yet + clearly marked as integration tests but soon they will be clearly separated by the ``pytest`` annotations. -The travis-ci.org site for open source projects is now legacy and you should not use it. +* **System tests** are automatic tests that use external systems like + Google Cloud Platform. These tests are intended for an end-to-end DAG execution. -.. - There is a second Authorized OAuth App available called "Travis CI for Open Source" used - for the legacy travis-ci.org service Don't use it for new projects. - -More information: - -- `Open Source on travis-ci.com `__. -- `Legacy GitHub Services to GitHub Apps Migration Guide `__. -- `Migrating Multiple Repositories to GitHub Apps Guide `__. +For details on running different types of Airflow tests, see `TESTING.rst `_. Metadata Database Updates ============================== @@ -664,17 +345,14 @@ To install it on macOS: 3. Set up your ``.bashrc`` file and then ``source ~/.bashrc`` to reflect the change. - + You can also follow `general npm installation + instructions `__. For example: .. code-block:: bash export PATH="$HOME/.npm-packages/bin:$PATH" - - You can also follow _`general npm installation - instructions `__. - 4. Install third party libraries defined in ``package.json`` by running the following commands within the ``airflow/www/`` directory: @@ -732,20 +410,145 @@ commands: # Check JS code in .js and .html files, report any errors/warnings and fix them if possible npm run lint:fix -Resources & links -================= -- `Airflow’s official documentation `__ +Contribution Workflow Example +============================== + +Typically, you start your first contribution by reviewing open tickets +at `Apache JIRA `__. + +For example, you want to have the following sample ticket assigned to you: +`AIRFLOW-5934: Add extra CC: to the emails sent by Aiflow `_. + +In general, your contribution includes the following stages: + +.. image:: images/workflow.png + :align: center + :alt: Contribution Workflow + +1. Make your own `fork `__ of + the Apache Airflow `main repository `__. + +2. Create a `local virtualenv `_, + initialize the `Breeze environment `__, and + install `pre-commit framework `__. + If you want to add more changes in the future, set up your own `Travis CI + fork `__. + +3. Join `devlist `__ + and set up a `Slack account `__. + +4. Make the change and create a `Pull Request from your fork `__. + +5. Ping @ #development slack, comment @people. Be annoying. Be considerate. + +Step 1: Fork the Apache Repo +---------------------------- +From the `apache/airflow `_ repo, +`create a fork `_: + +.. image:: images/fork.png + :align: center + :alt: Creating a fork + + +Step 2: Configure Your Environment +---------------------------------- +Configure the Docker-based Breeze development environment and run tests. + +You can use the default Breeze configuration as follows: + +1. Install the latest versions of the Docker Community Edition + and Docker Compose and add them to the PATH. + +2. Enter Breeze: ``./breeze`` + + Breeze starts with downloading the Airflow CI image from + the Docker Hub and installing all required dependencies. + +3. Enter the Docker environment and mount your local sources + to make them immediately visible in the environment. + +4. Create a local virtualenv, for example: + +.. code-block:: bash + + mkvirtualenv myenv --python=python3.6 + +5. Initialize the created environment: + +.. code-block:: bash + + ./breeze --initialize-local-virtualenv + +6. Open your IDE (for example, PyCharm) and select the virtualenv you created + as the project's default virtualenv in your IDE. + +Step 3: Connect with People +--------------------------- + +For effective collaboration, make sure to join the following Airflow groups: - Mailing lists: - Developer’s mailing list ``_ + (quite substantial traffic on this list) - All commits mailing list: ``_ + (very high traffic on this list) - Airflow users mailing list: ``_ + (reasonably small traffic on this list) -- `Issues on Apache’s Jira `__ +- `Issues on Apache’s JIRA `__ - `Slack (chat) `__ +Step 4: Prepare PR +------------------ + +1. Update the local sources to address the JIRA ticket. + + For example, to address this example JIRA ticket, do the following: + + * Read about `email configuration in Airflow `__. + + * Find the class you should modify. For the example ticket, this is `email.py `__. + + * Find the test class where you should add tests. For the example ticket, this is `test_email.py `__. + + * Modify the class and add necessary code and unit tests. + + * Run the unit tests from the `IDE `__ or `local virtualenv `__ as you see fit. + + * Run the tests in `Breeze `__. + + * Run and fix all the `static checks `__. If you have + `pre-commits installed `__, + this step is automatically run while you are committing your code. If not, you can do it manually + via ``git add`` and then ``pre-commit run``. + +2. Rebase your fork, squash commits, and resolve all conflicts. + +3. Re-run static code checks again. + +4. Create a pull request with the following title for the sample ticket: + ``[AIRFLOW-5934] Added extra CC: field to the Airflow emails.`` + +Make sure to follow other PR guidelines described in `this document <#pull-request-guidelines>`_. + + +Step 5: Pass PR Review +---------------------- + +.. image:: images/review.png + :align: center + :alt: PR Review + +Note that committers will use **Squash and Merge** instead of **Rebase and Merge** +when merging PRs and your commit will be squashed to single commit. + +Resources & Links +================= +- `Airflow’s official documentation `__ + - `More resources and links to Airflow related content on the Wiki `__ diff --git a/LOCAL_VIRTUALENV.rst b/LOCAL_VIRTUALENV.rst index 37a5dd8ff4f91..aec62243cb6e2 100644 --- a/LOCAL_VIRTUALENV.rst +++ b/LOCAL_VIRTUALENV.rst @@ -19,15 +19,17 @@ .. contents:: :local: Local Virtual Environment (virtualenv) -============================================ +====================================== -Use the local virtualenv development option in the combination with the _`Breeze `_ -development environment. This option helps you benefit from the infrastructure provided -by your IDE (for example, IntelliJ's PyCharm/Idea) and work in the enviroment where all necessary dependencies and tests are -available and set up within Docker images. +Use the local virtualenv development option in the combination with the `Breeze +`_ development environment. This option helps +you benefit from the infrastructure provided +by your IDE (for example, IntelliJ PyCharm/IntelliJ Idea) and work in the +environment where all necessary dependencies and tests are available and set up +within Docker images. -But you can also use the local virtualenv as a standalone development option of -you develop Airflow functionality that does not incur large external dependencies and +But you can also use the local virtualenv as a standalone development option if you +develop Airflow functionality that does not incur large external dependencies and CI test coverage. These are examples of the development options available with the local virtualenv in your IDE: @@ -46,7 +48,7 @@ Prerequisites Required Software Packages -------------------------- -Use system-level package managers like yum, apt-get for Linux, or +Use system-level package managers like yum, apt-get for Linux, or Homebrew for macOS to install required software packages: * Python (3.5 or 3.6) @@ -60,7 +62,7 @@ Extra Packages -------------- You can also install extra packages (like ``[gcp]``, etc) via -``pip install -e [EXTRA1,EXTRA2 ...]``. However, some of them may +``pip install -e [EXTRA1,EXTRA2 ...]``. However, some of them may have additional install and setup requirements for your local system. For example, if you have a trouble installing the mysql client on macOS and get @@ -76,8 +78,8 @@ you should set LIBRARY\_PATH before running ``pip install``: export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/opt/openssl/lib/ -You are STRONGLY encouraged to also install and use `pre-commit hooks `_ -for your local virtualenv development environment. Pre-commit hooks can speed up your +You are STRONGLY encouraged to also install and use `pre-commit hooks `_ +for your local virtualenv development environment. Pre-commit hooks can speed up your development cycle a lot. The full list of extras is available in ``_. @@ -85,11 +87,11 @@ The full list of extras is available in ``_. Creating a Local virtualenv =========================== -To use your IDE for Airflow development and testing, you need to configure a virtual +To use your IDE for Airflow development and testing, you need to configure a virtual environment. Ideally you should set up virtualenv for all Python versions that Airflow -supports (3.5, 3.6). +supports (3.5, 3.6). -Consider using one of the following utilities to create virtual environments and easily +Consider using one of the following utilities to create virtual environments and easily switch between them with the ``workon`` command: - `pyenv `_ @@ -112,7 +114,7 @@ To create and initialize the local virtualenv: 4. Select the virtualenv you created as the project's default virtualenv in your IDE. -Note that if you have the Breeze development environment installed, the ``breeze`` +Note that if you have the Breeze development environment installed, the ``breeze`` script can automate initializing the created virtualenv (steps 2 and 3). Simply enter the Breeze environment by using ``workon`` and, once you are in it, run: @@ -120,141 +122,20 @@ Simply enter the Breeze environment by using ``workon`` and, once you are in it, ./breeze --initialize-local-virtualenv -Debugging and Running Tests -=========================== - -When you set up the local virtualenv, you can use the usual **Run Test** option of the IDE, have all the -autocomplete and documentation support from IDE as well as you can debug and click-through -the sources of Airflow, which is very helpful during development. - -Local and Remote Debugging --------------------------- - -One of the great benefits of using the local virtualenv is an option to run -local debugging in your IDE graphical interface. You can also use ``ipdb`` -if you prefer _`console debugging `_. - -When you run example DAGs, even if you run them using unit tests within IDE, they are run in a separate -container. This makes it a little harder to use with IDE built-in debuggers. -Fortunately, IntelliJ/PyCharm provides an effective remote debugging feature (but only in paid versions). -See additional details on -`remote debugging `_. - -You can set up your remote debugging session as follows: - -.. image:: images/setup_remote_debugging.png - :align: center - :alt: Setup remote debugging - -Note that on macOS, you have to use a real IP address of your host rather than default -localhost because on macOS the container runs in a virtual machine with a different IP address. - -Make sure to configure source code mapping in the remote debugging configuration to map -your local sources to the ``/opt/airflow`` location of the sources within the container: - -.. image:: images/source_code_mapping_ide.png - :align: center - :alt: Source code mapping - -Running Unit Tests via IDE --------------------------- - -Usually you can run most of the unit tests (those that do not have dependencies such as -Postgres/MySQL/Hadoop/etc.) directly from the IDE: - -.. image:: images/running_unittests.png - :align: center - :alt: Running unit tests - -Some of the core tests use dags defined in ``tests/dags`` folder. Those tests should have -``AIRFLOW__CORE__UNIT_TEST_MODE`` set to True. You can set it up in your test configuration: - -.. image:: images/airflow_unit_test_mode.png - :align: center - :alt: Airflow Unit test mode - -Running Tests via Script ------------------------- - -You can also use the ``run-tests`` script that provides a Python -testing framework with more than 300 tests including integration, unit, and -system tests. - -The script is in the path in the Breeze environment but you need to prepend -it with ``./`` when running in the local virtualenv: ``./run-tests``. - -This script has several flags that can be useful for your testing. - -.. code:: text - - Usage: run-tests [FLAGS] [TESTS_TO_RUN] -- - - Runs tests specified (or all tests if no tests are specified). - - Flags: - - -h, --help - Shows this help message. - - -i, --with-db-init - Forces database initialization before tests. - - -s, --nocapture - Doesn't capture stdout when running the tests. This is useful if you are - debugging with ipdb and want to drop into the console with it - by adding this line to source code: - - import ipdb; ipdb.set_trace() - - -v, --verbose - Provides verbose output showing coloured output of tests being run and summary - of the tests (in a manner similar to the tests run in the CI environment). - -You can pass extra parameters to ``nose``, by adding ``nose`` arguments after -``--``. For example, to just execute the "core" unit tests and add ipdb -set\_trace method, you can run the following command: - -.. code:: bash - - ./run-tests tests.core:TestCore --nocapture --verbose - -or a single test method without colors or debug logs: - -.. code:: bash - - ./run-tests tests.core:TestCore.test_check_operators - -Note that the first time it runs, the ``./run_tests`` script -performs a database initialization. If you run further tests without -leaving the environment, the database will not be initialized. But you -can always force the database initialization with the ``--with-db-init`` -(``-i``) switch. The script will inform you what you can do when it is -run. - -In general, the ``run-tests`` script can be used to run unit, integration and system tests. Currently, when you run tests not supported in the local virtualenv, the script may either fail or provide an error message. - -Running Unit Tests from the IDE ------------------------------------ - -Once you created the local virtualenv and selected it as the default project's environment, -running unit tests from the IDE is as simple as: - -.. figure:: images/run_unittests.png - :alt: Run unittests +Running Tests +------------- -Running Integration Tests -------------------------- +Running tests is described in `TESTING.rst `_. While most of the tests are typical unit tests that do not -require external components, there are a number of integration and -system tests. You can technically use local +require external components, there are a number of Integration tests. You can technically use local virtualenv to run those tests, but it requires to set up a number of external components (databases/queues/kubernetes and the like). So, it is -much easier to use the `Breeze development environment `_ -for integration and system tests. +much easier to use the `Breeze `__ development environment +for Integration tests. -Note: Soon we will separate the integration and system tests out +Note: Soon we will separate the integration and system tests out via pytest so that you can clearly know which tests are unit tests and can be run in the local virtualenv and which should be run using Breeze. diff --git a/STATIC_CODE_CHECKS.rst b/STATIC_CODE_CHECKS.rst new file mode 100644 index 0000000000000..e22439c4c0f23 --- /dev/null +++ b/STATIC_CODE_CHECKS.rst @@ -0,0 +1,423 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +.. contents:: :local: + +Static Code Checks +================== + +The static code checks in Airflow are used to verify that the code meets certain quality standards. +All the static code checks can be run through pre-commit hooks. + +Some of the static checks in pre-commits require Breeze Docker images to be installed locally. +The pre-commit hooks perform all the necessary installation when you run them +for the first time. See the table below to identify which pre-commit checks require the Breeze Docker images. + +Sometimes your image is outdated and needs to be rebuilt because some dependencies have been changed. +In such cases, the Docker-based pre-commit will inform you that you should rebuild the image. + +You can also run some static code checks via `Breeze `_ environment +using available bash scripts. + +Pre-commit Hooks +---------------- + +Pre-commit hooks help speed up your local development cycle and place less burden on the CI infrastructure. +Consider installing the pre-commit hooks as a necessary prerequisite. + + +This table lists pre-commit hooks used by Airflow and indicates which hooks +require Breeze Docker images to be installed locally: + +=================================== ================================================================ ============ +**Hooks** **Description** **Breeze** +=================================== ================================================================ ============ +``base-operator`` Checks that BaseOperator is imported properly +----------------------------------- ---------------------------------------------------------------- ------------ +``build`` Builds image for check-apache-licence, mypy, pylint, flake8. * +----------------------------------- ---------------------------------------------------------------- ------------ +``check-apache-license`` Checks compatibility with Apache License requirements. * +----------------------------------- ---------------------------------------------------------------- ------------ +``check-executables-have-shebangs`` Checks that executables have shebang. +----------------------------------- ---------------------------------------------------------------- ------------ +``check-hooks-apply`` Checks which hooks are applicable to the repository. +----------------------------------- ---------------------------------------------------------------- ------------ +``check-merge-conflict`` Checks if a merge conflict is committed. +----------------------------------- ---------------------------------------------------------------- ------------ +``check-xml`` Checks XML files with xmllint. +----------------------------------- ---------------------------------------------------------------- ------------ +``consistent-pylint`` Consistent usage of pylint enable/disable with space. +----------------------------------- ---------------------------------------------------------------- ------------ +``debug-statements`` Detects accidenatally committed debug statements. +----------------------------------- ---------------------------------------------------------------- ------------ +``detect-private-key`` Detects if private key is added to the repository. +----------------------------------- ---------------------------------------------------------------- ------------ +``doctoc`` Refreshes the table of contents for md files. +----------------------------------- ---------------------------------------------------------------- ------------ +``end-of-file-fixer`` Makes sure that there is an empty line at the end. +----------------------------------- ---------------------------------------------------------------- ------------ +``flake8`` Runs flake8. * +----------------------------------- ---------------------------------------------------------------- ------------ +``forbid-tabs`` Fails if tabs are used in the project. +----------------------------------- ---------------------------------------------------------------- ------------ +``insert-license`` Adds licenses for most file types. +----------------------------------- ---------------------------------------------------------------- ------------ +``isort`` Sorts imports in python files. +----------------------------------- ---------------------------------------------------------------- ------------ +``lint-dockerfile`` Lints a dockerfile. +----------------------------------- ---------------------------------------------------------------- ------------ +``mixed-line-ending`` Detects if mixed line ending is used (\r vs. \r\n). +----------------------------------- ---------------------------------------------------------------- ------------ +``mypy`` Runs mypy. * +----------------------------------- ---------------------------------------------------------------- ------------ +``pydevd`` Check for accidentally commited pydevd statements. +----------------------------------- ---------------------------------------------------------------- ------------ +``pylint`` Runs pylint for main code. * +----------------------------------- ---------------------------------------------------------------- ------------ +``pylint-tests`` Runs pylint for tests. * +----------------------------------- ---------------------------------------------------------------- ------------ +``python-no-log-warn`` Checks if there are no deprecate log warn. +----------------------------------- ---------------------------------------------------------------- ------------ +``rst-backticks`` Checks if RST files use double backticks for code. +----------------------------------- ---------------------------------------------------------------- ------------ +``setup-order`` Checks for an order of dependencies in setup.py +----------------------------------- ---------------------------------------------------------------- ------------ +``shellcheck`` Checks shell files with shellcheck. +----------------------------------- ---------------------------------------------------------------- ------------ +``update-breeze-file`` Update output of breeze command in BREEZE.rst. +----------------------------------- ---------------------------------------------------------------- ------------ +``yamllint`` Checks yaml files with yamllint. +=================================== ================================================================ ============ + +The pre-commit hooks only check the files you are currently working on and make +them fast. Yet, these checks use exactly the same environment as the CI tests +use. So, you can be sure your modifications will also work for CI if they pass +pre-commit hooks. + +We have integrated the fantastic `pre-commit `__ framework +in our development workflow. To install and use it, you need Python 3.6 locally. + +It is the best to use pre-commit hooks when you have your local virtualenv for +Airflow activated since then pre-commit hooks and other dependencies are +automatically installed. You can also install the pre-commit hooks manually +using ``pip install``. + +The pre-commit hooks require the Docker Engine to be configured as the static +checks are executed in the Docker environment. You should build the images +locally before installing pre-commit checks as described in `BREEZE.rst `__. +In case you do not have your local images built, the +pre-commit hooks fail and provide instructions on what needs to be done. + +Prerequisites for Pre-commit Hooks +.................................. + +The pre-commit hooks use several external linters that need to be installed before pre-commit is run. + +Each of the checks installs its own environment, so you do not need to install those, but there are some +checks that require locally installed binaries. On Linux, you typically install +them with ``sudo apt install``, on macOS - with ``brew install``. + +The current list of prerequisites is limited to ``xmllint``: + +- on Linux, install via ``sudo apt install xmllint``; + +- on macOS, install via ``brew install xmllint``. + +Enabling Pre-commit Hooks +......................... + +To turn on pre-commit checks for ``commit`` operations in git, enter: + +.. code-block:: bash + + pre-commit install + + +To install the checks also for ``pre-push`` operations, enter: + +.. code-block:: bash + + pre-commit install -t pre-push + + +For details on advanced usage of the install method, use: + +.. code-block:: bash + + pre-commit install --help + + +Using Pre-commit Hooks +...................... + +After installation, pre-commit hooks are run automatically when you commit the +code. But you can run pre-commit hooks manually as needed. + +- Run all checks on your staged files by using: + +.. code-block:: bash + + pre-commit run + + +- Run only mypy check on your staged files by using: + +.. code-block:: bash + + pre-commit run mypy + + +- Run only mypy checks on all files by using: + +.. code-block:: bash + + pre-commit run mypy --all-files + + +- Run all checks on all files by using: + +.. code-block:: bash + + pre-commit run --all-files + + +- Skip one or more of the checks by specifying a comma-separated list of + checks to skip in the SKIP variable: + +.. code-block:: bash + + SKIP=pylint,mypy pre-commit run --all-files + + +You can always skip running the tests by providing ``--no-verify`` flag to the +``git commit`` command. + +To check other usage types of the pre-commit framework, see `Pre-commit website `__. + +Pylint Static Code Checks +------------------------- + +We are in the process of fixing the code flagged with pylint checks for the whole Airflow project. +This is a huge task so we implemented an incremental approach for the process. +Currently most of the code is excluded from pylint checks via scripts/ci/pylint_todo.txt. +We have an open JIRA issue AIRFLOW-4364 which has a number of sub-tasks for each of +the modules that should be made compatible. Fixing problems identified with pylint is one of +straightforward and easy tasks to do (but time-consuming), so if you are a first-time +contributor to Airflow, you can choose one of the sub-tasks as your first issue to fix. + +To fix a pylint issue, do the following: + +1. Remove module/modules from the + `scripts/ci/pylint_todo.txt `__. + +2. Run `scripts/ci/ci_pylint_main.sh `__ and + `scripts/ci/ci_pylint_tests.sh `__. + +3. Fix all the issues reported by pylint. + +4. Re-run `scripts/ci/ci_pylint_main.sh `__ and + `scripts/ci/ci_pylint_tests.sh `__. + +5. If you see "success", submit a PR following + `Pull Request guidelines <#pull-request-guidelines>`__. + + +These are guidelines for fixing errors reported by pylint: + +- Fix the errors rather than disable pylint checks. Often you can easily + refactor the code (IntelliJ/PyCharm might be helpful when extracting methods + in complex code or moving methods around). + +- If disabling a particular problem, make sure to disable only that error by + using the symbolic name of the error as reported by pylint. + +.. code-block:: python + + import airflow.* # pylint: disable=wildcard-import + + +- If there is a single line where you need to disable a particular error, + consider adding a comment to the line that causes the problem. For example: + +.. code-block:: python + + def MakeSummary(pcoll, metric_fn, metric_keys): # pylint: disable=invalid-name + + +- For multiple lines/block of code, to disable an error, you can surround the + block with ``pylint:disable/pylint:enable`` comment lines. For example: + +.. code-block:: python + + # pylint: disable=too-few-public-methods + class LoginForm(Form): + """Form for the user""" + username = StringField('Username', [InputRequired()]) + password = PasswordField('Password', [InputRequired()]) + # pylint: enable=too-few-public-methods + + +Running Static Code Checks via Breeze +------------------------------------- + +The static code checks can be launched using the Breeze environment. + +You run the static code checks via ``-S``, ``--static-check`` flags or ``-F``, +``--static-check-all-files``. The former ones run appropriate +checks only for files changed and staged locally, the latter ones +run checks on all files. + +Note that it may take a lot of time to run checks for all files with pylint on macOS due to a slow +filesystem for macOS Docker. As a workaround, you can add their arguments after ``--`` as extra arguments. +You cannot pass the ``--files`` flag if you select the ``--static-check-all-files`` option. + +You can see the list of available static checks either via ``--help`` flag or by using the autocomplete +option. Note that the ``all`` static check runs all configured static checks. Also since pylint tests take +a lot of time, you can run a special ``all-but-pylint`` check that skips pylint checks. + +Run the ``mypy`` check for the currently staged changes: + +.. code-block:: bash + + ./breeze --static-check mypy + +Run the ``mypy`` check for all files: + +.. code-block:: bash + + ./breeze --static-check-all-files mypy + +Run the ``flake8`` check for the ``tests.core.py`` file with verbose output: + +.. code-block:: bash + + ./breeze --static-check flake8 -- --files tests/core.py --verbose + +Run the ``flake8`` check for the ``tests.core`` package with verbose output: + +.. code-block:: bash + + ./breeze --static-check mypy -- --files tests/hooks/test_druid_hook.py + +Run all tests for the currently staged files: + +.. code-block:: bash + + ./breeze --static-check all + +Run all tests for all files: + +.. code-block:: bash + + ./breeze --static-check-all-files all + +Run all tests but pylint for all files: + +.. code-block:: bash + + ./breeze --static-check-all-files all-but-pylint + +Run pylint checks for all changed files: + +.. code-block:: bash + + ./breeze --static-check pylint + +Run pylint checks for selected files: + +.. code-block:: bash + + ./breeze --static-check pylint -- --files airflow/configuration.py + + +Run pylint checks for all files: + +.. code-block:: bash + + ./breeze --static-check-all-files pylint + + +The ``license`` check is run via a separate script and a separate Docker image containing the +Apache RAT verification tool that checks for Apache-compatibility of licenses within the codebase. +It does not take pre-commit parameters as extra arguments. + +.. code-block:: bash + + ./breeze --static-check-all-files licenses + +Running Static Code Checks via Scripts from the Host +.................................................... + +You can trigger the static checks from the host environment, without entering the Docker container. To do +this, run the following scripts (the same is done in Travis CI): + +* ``_ - checks the licenses. +* ``_ - checks that documentation can be built without warnings. +* ``_ - runs Flake8 source code style enforcement tool. +* ``_ - runs lint checker for the Dockerfile. +* ``_ - runs a check for mypy type annotation consistency. +* ``_ - runs pylint static code checker for main files. +* ``_ - runs pylint static code checker for tests. + +The scripts may ask you to rebuild the images, if needed. + +You can force rebuilding the images by deleting the ``.build`` directory. This directory keeps cached +information about the images already built and you can safely delete it if you want to start from scratch. + +After documentation is built, the HTML results are available in the ``docs/_build/html`` +folder. This folder is mounted from the host so you can access those files on your host as well. + +Running Static Code Checks in the Docker Container +.................................................. + +If you are already in the Breeze Docker environment (by running the ``./breeze`` command), +you can also run the same static checks via run_scripts: + +* Mypy: ``./scripts/ci/in_container/run_mypy.sh airflow tests`` +* Pylint for main files: ``./scripts/ci/in_container/run_pylint_main.sh`` +* Pylint for test files: ``./scripts/ci/in_container/run_pylint_tests.sh`` +* Flake8: ``./scripts/ci/in_container/run_flake8.sh`` +* License check: ``./scripts/ci/in_container/run_check_licence.sh`` +* Documentation: ``./scripts/ci/in_container/run_docs_build.sh`` + +Running Static Code Checks for Selected Files +............................................. + +In all static check scripts, both in the container and host versions, you can also pass a module/file path as +parameters of the scripts to only check selected modules or files. For example: + +In the Docker container: + +.. code-block:: + + ./scripts/ci/in_container/run_pylint.sh ./airflow/example_dags/ + +or + +.. code-block:: + + ./scripts/ci/in_container/run_pylint.sh ./airflow/example_dags/test_utils.py + +On the host: + +.. code-block:: + + ./scripts/ci/ci_pylint.sh ./airflow/example_dags/ + +.. code-block:: + + ./scripts/ci/ci_pylint.sh ./airflow/example_dags/test_utils.py diff --git a/TESTING.rst b/TESTING.rst new file mode 100644 index 0000000000000..e8ce2a3f90fae --- /dev/null +++ b/TESTING.rst @@ -0,0 +1,336 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +.. contents:: :local: + +Airflow Test Infrastructure +=========================== + +* **Unit tests** are Python ``nose`` tests launched with ``run-tests``. + Unit tests are available both in the `Breeze environment `__ + and local virtualenv. + +* **Integration tests** are available in the Breeze development environment + that is also used for Airflow Travis CI tests. Integration test are special tests that require + additional services running - such as Postgres/Mysql/Kerberos etc. Those tests are not yet + clearly marked as integration tests but soon there will be clearly separated by pytest annotations. + +* **System tests** are automatic tests that use external systems like + Google Cloud Platform. These tests are intended for an end-to-end DAG execution. + Note that automated execution of these tests is still + `work in progress `_. + +This document is about running python tests, before the tests are run we also use +`static code checks `__ which allow to catch typical errors in code +before tests are executed. + +Airflow Unit Tests +================== + +All tests for Apache Airflow are run via the ``run-tests`` utility that +provides a Python testing framework with more than 300 +tests including Unit, Integration and System tests. +``run-tests`` can be launched as +a command in the Breeze environment, as a script, or via IDE interface. + +Running Unit Tests from IDE +--------------------------- + +To run unit tests from the IDE, create the `local virtualenv `_, +select it as the default project's environment, and run unit tests as follows: + +.. image:: images/running_unittests.png + :align: center + :alt: Running unit tests + +Some of the core tests use dags defined in ``tests/dags`` folder. Those tests should have +``AIRFLOW__CORE__UNIT_TEST_MODE`` set to True. You can set it up in your test configuration: + +.. image:: images/airflow_unit_test_mode.png + :align: center + :alt: Airflow Unit test mode + +Note that you can run the unit tests in the standalone local virtualenv +(with no Breeze installed) if they do not have dependencies such as +Postgres/MySQL/Hadoop/etc. + +Running Unit Tests from Local virtualenv +---------------------------------------- + +You can use the ``run-tests`` script outside the Breeze Docker container, +directly from your local virtualenv. + +In the Breeze environment, the ``run-test`` script is in the path. +To run it from the local virtualenv, you need to prepend +it with ``./`` as follows: ``./run-tests``. + +This script has several flags that can be useful for your testing. + +.. code:: text + + Usage: run-tests [FLAGS] [TESTS_TO_RUN] -- + + Runs tests specified (or all tests if no tests are specified). + + Flags: + + -h, --help + Shows this help message. + + -i, --with-db-init + Forces database initialization before tests. + + -s, --nocapture + Doesn't capture stdout when running the tests. This is useful if you are + debugging with ipdb and want to drop into the console with it + by adding this line to source code: + + import ipdb; ipdb.set_trace() + + -v, --verbose + Provides verbose output showing coloured output of tests being run and summary + of the tests (in a manner similar to the tests run in the CI environment). + +You can pass extra parameters to ``nose``, by adding ``nose`` arguments after +``--``. For example, to just execute the "core" unit tests and add ipdb +``set\_trace`` method, you can run the following command: + +.. code:: bash + + ./run-tests tests.core:TestCore --nocapture --verbose + +To add a single test method without colors or debug logs, specify: + +.. code:: bash + + ./run-tests tests.core:TestCore.test_check_operators + +Note that the first time the ``./run_tests`` script runs, it +performs a database initialization. If you run further tests without +leaving the environment, the database will not be initialized. But you +can always force the database initialization with the ``--with-db-init`` +(``-i``) switch. The script will inform you what you can do when it is +run. + +**Note:** We do not provide a clear distinction between tests +(Unit/Integration/System tests), but we are working on it. +Currently, when you run tests not supported in the local virtualenv, +the script may either fail or provide an error message. + +Running Unit Tests inside Breeze +-------------------------------- +Tu run unit tests from the Breeze: + +1. Enter Airflow Breeze environment. + +2. Use ``run-tests``. + To pass extra parameters to ``nose``, precede them with '--'. + +The tests run ``airflow db reset`` and ``airflow db init`` the first time you +launch them in a running container, so you can count on the database being initialized. +All subsequent test executions within the same container will run without database +initialization. +You can also optionally add the ``--with-db-init`` flag if you want to re-initialize +the database. + +.. code-block:: bash + + run-tests --with-db-init tests.core:TestCore.test_check_operators -- -s --logging-level=DEBUG + +**Examples**: + +* Execute the "core" unit tests and pass extra parameters to ``nose``: + + ``run-tests tests.core:TestCore -- -s --logging-level=DEBUG`` + +* Execute a single test method: + + ``run-tests tests.core:TestCore.test_check_operators -- -s --logging-level=DEBUG`` + + +Running Tests for a Specified Target using Breeze from the host +--------------------------------------------------------------- + +If you wish to only run tests and not to drop into shell, you can do this by providing the +``-t``, ``--test-target`` flag. You can add extra nosetest flags after ``--`` in the command line. + +.. code-block:: bash + + ./breeze --test-target tests/hooks/test_druid_hook.py -- --logging-level=DEBUG + +You can run the whole test suite with a special '.' test target: + +.. code-block:: bash + + ./breeze --test-target . + +You can also specify individual tests or a group of tests: + +.. code-block:: bash + + ./breeze --test-target tests.core:TestCore + +Running Full test suite via scripts from the host +------------------------------------------------- + +To run all tests with default settings (Python 3.6, Sqlite backend, "docker" environment), enter: + +.. code-block:: + + ./scripts/ci/local_ci_run_airflow_testing.sh + + +To select Python 3.6 version, Postgres backend, and a ``docker`` environment, specify: + +.. code-block:: + + PYTHON_VERSION=3.6 BACKEND=postgres ENV=docker ./scripts/ci/local_ci_run_airflow_testing.sh + +To run Kubernetes tests, enter: + +.. code-block:: + + KUBERNETES_VERSION==v1.13.5 KUBERNETES_MODE=persistent_mode BACKEND=postgres ENV=kubernetes \ + ./scripts/ci/local_ci_run_airflow_testing.sh + +* PYTHON_VERSION is one of 3.6/3.7 +* BACKEND is one of postgres/sqlite/mysql +* ENV is one of docker/kubernetes/bare +* KUBERNETES_VERSION is required for Kubernetes tests. Currently, it is KUBERNETES_VERSION=v1.13.0. +* KUBERNETES_MODE is a mode of kubernetes: either persistent_mode or git_mode. + + +Airflow Integration Tests +========================= + +Running Airflow integration tests cannot be run in local virtualenv. They can only run in Breeze +environment locally and in Travis CI. + +When you are in Breeze environment you can execute both Unit and Integration tests. + +Travis CI Testing Framework +--------------------------- + +Airflow test suite is based on Travis CI framework as running all of the tests +locally requires significant setup. You can set up Travis CI in your fork of +Airflow by following the +`Travis CI Getting Started guide `__. + +Consider using Travis CI framework if you submit multiple pull requests +and want to speed up your builds. + +There are two different options available for running Travis CI, and they are +set up on GitHub as separate components: + +- **Travis CI GitHub App** (new version) +- **Travis CI GitHub Services** (legacy version) + +Travis CI GitHub App (new version) +.................................. + +1. Once `installed `__, + configure the Travis CI GitHub App at + `Configure Travis CI `__. + +2. Set repository access to either "All repositories" for convenience, or "Only + select repositories" and choose ``USERNAME/airflow`` in the drop-down menu. + +3. Access Travis CI for your fork at ``__. + +Travis CI GitHub Services (legacy version) +.......................................... + +**NOTE:** The apache/airflow project is still using the legacy version. + +Travis CI GitHub Services version uses an Authorized OAuth App. + +1. Once installed, configure the Travis CI Authorized OAuth App at + `Travis CI OAuth APP `__. + +2. If you are a GitHub admin, click the **Grant** button next to your + organization; otherwise, click the **Request** button. For the Travis CI + Authorized OAuth App, you may have to grant access to the forked + ``ORGANIZATION/airflow`` repo even though it is public. + +3. Access Travis CI for your fork at + ``_. + +Creating New Projects in Travis CI +.................................. + +If you need to create a new project in Travis CI, use travis-ci.com for both +private repos and open source. + +The travis-ci.org site for open source projects is now legacy and you should not use it. + +.. + There is a second Authorized OAuth App available called **Travis CI for Open Source** used + for the legacy travis-ci.org service. Don't use it for new projects! + +More information: + +- `Open Source on travis-ci.com `__. +- `Legacy GitHub Services to GitHub Apps Migration Guide `__. +- `Migrating Multiple Repositories to GitHub Apps Guide `__. + +Airflow System Tests +==================== + +The System tests for Airflow are not yet fully implemented. They are Work In Progress of the +`AIP-4 Support for System Tests for external systems `__. +These tests need to communicate with external services/systems that are available +if you have appropriate credentials configured for your tests. +The tests derive from ``tests.system_test_class.SystemTests`` class. + +The system tests execute a specified +example DAG file that runs the DAG end-to-end. + +An example of such a system test is +``airflow.tests.providers.google.operators.test_natural_language_system.CloudNaturalLanguageExampleDagsTest``. + +For now you can execute the system tests and follow messages printed to get them running. Soon more information on +running the tests will be available. + + +Local and Remote Debugging +========================== + +One of the great benefits of using the local virtualenv and Breeze is an option to run +local debugging in your IDE graphical interface. You can also use ``ipdb`` +if you prefer `console debugging <#breeze-debugging-with-ipdb>`__. + +When you run example DAGs, even if you run them using unit tests within IDE, they are run in a separate +container. This makes it a little harder to use with IDE built-in debuggers. +Fortunately, IntelliJ/PyCharm provides an effective remote debugging feature (but only in paid versions). +See additional details on +`remote debugging `_. + +You can set up your remote debugging session as follows: + +.. image:: images/setup_remote_debugging.png + :align: center + :alt: Setup remote debugging + +Note that on macOS, you have to use a real IP address of your host rather than default +localhost because on macOS the container runs in a virtual machine with a different IP address. + +Make sure to configure source code mapping in the remote debugging configuration to map +your local sources to the ``/opt/airflow`` location of the sources within the container: + +.. image:: images/source_code_mapping_ide.png + :align: center + :alt: Source code mapping diff --git a/images/fork.png b/images/fork.png new file mode 100644 index 0000000000000..708f3e63d6267 Binary files /dev/null and b/images/fork.png differ diff --git a/images/review.png b/images/review.png new file mode 100644 index 0000000000000..59c8a7f872ecd Binary files /dev/null and b/images/review.png differ diff --git a/images/workflow.png b/images/workflow.png new file mode 100644 index 0000000000000..12f2d96648d6e Binary files /dev/null and b/images/workflow.png differ