In this example, you'll be introduced to the core concepts of Airflow: DAGs, operators, and tasks. You’ll write your very first DAG and learn how workflows are defined and executed in Airflow using pre-defined datasets and sources.
The easiest way to get up and running is with Docker. To start Airflow, run:
$ ./docker/up.sh
Tip: Use the
--pull
to pull a tagged image.
Airflow listens on port 8080
. To view the Airflow UI and verify it's running, open http://localhost:8080. Then, browse to http://localhost:3000 to begin exploring DAG metadata via the Marquez UI.
Airflow is a platform to programmatically author, schedule, and monitor workflows. Below we define some general concepts, but we encourage you to review Airflow's concept doc.
A DAG
(or Directed Acyclic Graph) is a collection of Tasks
. It represents an entire workflow by describing the relationships and dependencies between tasks.
A Task
represents a unit of work within a workflow. Each task is an implementation of an Operator
, for example, a PostgresOperator to execute a SQL query, or a PythonOperator to execute a Python function.
While DAGs
describe how to run a workflow, Operators
determine what actually gets done by a task.
To start, you'll see a list of DAGs. The Airflow scheduler will search for DAGs under dags/
, but also periodically scan the directory for new workflows. To the left of each DAG, the toggle has been switch to ON
. This means that the DAG is ready to run. By default, DAGs are OFF
(an option that users can easily configure).
We encourage you to explore and become familiar with the UI. For a quick overview of how to edit variables, visualize your DAG execution and even view underlying code, see Airflow's UI / screenshots docs.
Next, we'll jump into an example DAG. We'll also cover the basics of defining tasks and combining those tasks into DAGs.