Skip to content

Commit

Permalink
Add Missing Statsd Metrics in Docs (apache#13708)
Browse files Browse the repository at this point in the history
  • Loading branch information
kaxil authored Jan 16, 2021
1 parent 2abfe1e commit 74b2cd7
Showing 1 changed file with 41 additions and 26 deletions.
67 changes: 41 additions & 26 deletions docs/apache-airflow/logging-monitoring/metrics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,32 +77,40 @@ See :doc:`../modules_management` for details on how Python and Airflow manage mo
Counters
--------

======================================= ================================================================
Name Description
======================================= ================================================================
``<job_name>_start`` Number of started ``<job_name>`` job, ex. ``SchedulerJob``, ``LocalTaskJob``
``<job_name>_end`` Number of ended ``<job_name>`` job, ex. ``SchedulerJob``, ``LocalTaskJob``
``operator_failures_<operator_name>`` Operator ``<operator_name>`` failures
``operator_successes_<operator_name>`` Operator ``<operator_name>`` successes
``ti_failures`` Overall task instances failures
``ti_successes`` Overall task instances successes
``zombies_killed`` Zombie tasks killed
``scheduler_heartbeat`` Scheduler heartbeats
``dag_processing.processes`` Number of currently running DAG parsing processes
``scheduler.tasks.killed_externally`` Number of tasks killed externally
``scheduler.tasks.running`` Number of tasks running in executor
``scheduler.tasks.starving`` Number of tasks that cannot be scheduled because of no open slot in pool
``scheduler.orphaned_tasks.cleared`` Number of Orphaned tasks cleared by the Scheduler
``scheduler.orphaned_tasks.adopted`` Number of Orphaned tasks adopted by the Scheduler
``scheduler.critical_section_busy`` Count of times a scheduler process tried to get a lock on the critical
section (needed to send tasks to the executor) and found it locked by
another process.
``sla_email_notification_failure`` Number of failed SLA miss email notification attempts
``ti.start.<dagid>.<taskid>`` Number of started task in a given dag. Similar to <job_name>_start but for task
``ti.finish.<dagid>.<taskid>.<state>`` Number of completed task in a given dag. Similar to <job_name>_end but for task
``dag.callback_exceptions`` Number of exceptions raised from DAG callbacks. When this happens, it means DAG callback is not working.
``celery.task_timeout_error`` Number of ``AirflowTaskTimeout`` errors raised when publishing Task to Celery Broker.
======================================= ================================================================
=========================================== ================================================================
Name Description
=========================================== ================================================================
``<job_name>_start`` Number of started ``<job_name>`` job, ex. ``SchedulerJob``, ``LocalTaskJob``
``<job_name>_end`` Number of ended ``<job_name>`` job, ex. ``SchedulerJob``, ``LocalTaskJob``
``<job_name>_heartbeat_failure`` Number of failed Heartbeats for a ``<job_name>`` job, ex. ``SchedulerJob``,
``LocalTaskJob``
``operator_failures_<operator_name>`` Operator ``<operator_name>`` failures
``operator_successes_<operator_name>`` Operator ``<operator_name>`` successes
``ti_failures`` Overall task instances failures
``ti_successes`` Overall task instances successes
``previously_succeeded`` Number of previously succeeded task instances
``zombies_killed`` Zombie tasks killed
``scheduler_heartbeat`` Scheduler heartbeats
``dag_processing.processes`` Number of currently running DAG parsing processes
``dag_processing.manager_stalls`` Number of stalled ``DagFileProcessorManager``
``dag_file_refresh_error`` Number of failures loading any DAG files
``scheduler.tasks.killed_externally`` Number of tasks killed externally
``scheduler.orphaned_tasks.cleared`` Number of Orphaned tasks cleared by the Scheduler
``scheduler.orphaned_tasks.adopted`` Number of Orphaned tasks adopted by the Scheduler
``scheduler.critical_section_busy`` Count of times a scheduler process tried to get a lock on the critical
section (needed to send tasks to the executor) and found it locked by
another process.
``sla_email_notification_failure`` Number of failed SLA miss email notification attempts
``ti.start.<dagid>.<taskid>`` Number of started task in a given dag. Similar to <job_name>_start but for task
``ti.finish.<dagid>.<taskid>.<state>`` Number of completed task in a given dag. Similar to <job_name>_end but for task
``dag.callback_exceptions`` Number of exceptions raised from DAG callbacks. When this happens, it
means DAG callback is not working.
``celery.task_timeout_error`` Number of ``AirflowTaskTimeout`` errors raised when publishing Task to Celery Broker.
``task_removed_from_dag.<dagid>`` Number of tasks removed for a given dag (i.e. task no longer exists in DAG)
``task_restored_to_dag.<dagid>`` Number of tasks restored for a given dag (i.e. task instance which was
previously in REMOVED state in the DB is added to DAG file)
``task_instance_created-<operator_name>`` Number of tasks instances created for a given Operator
=========================================== ================================================================

Gauges
------
Expand All @@ -116,6 +124,12 @@ Name Description
``dag_processing.last_runtime.<dag_file>`` Seconds spent processing ``<dag_file>`` (in most recent iteration)
``dag_processing.last_run.seconds_ago.<dag_file>`` Seconds since ``<dag_file>`` was last processed
``dag_processing.processor_timeouts`` Number of file processors that have been killed due to taking too long
``scheduler.tasks.without_dagrun`` Number of tasks without DagRuns or with DagRuns not in Running state
``scheduler.tasks.running`` Number of tasks running in executor
``scheduler.tasks.starving`` Number of tasks that cannot be scheduled because of no open slot in pool
``scheduler.tasks.executable`` Number of tasks that are ready for execution (set to queued)
with respect to pool limits, dag concurrency, executor state,
and priority.
``executor.open_slots`` Number of open slots on executor
``executor.queued_tasks`` Number of queued tasks on executor
``executor.running_tasks`` Number of running tasks on executor
Expand Down Expand Up @@ -146,4 +160,5 @@ Name Description
``scheduler.critical_section_duration`` Milliseconds spent in the critical section of scheduler loop --
only a single scheduler can enter this loop at a time
``dagrun.<dag_id>.first_task_scheduling_delay`` Milliseconds elapsed between first task start_date and dagrun expected start
``collect_db_dags`` Milliseconds taken for fetching all Serialized Dags from DB
=================================================== ========================================================================

0 comments on commit 74b2cd7

Please sign in to comment.