The Data Foundry

Built by Data with Pranjal

Back to labs

Airflow Incident Lab

Operate Airflow, not just DAG syntax.

Inspect DAG code, scheduler evidence, and task logs. Diagnose the failure, write the operational fix, and compare your answer with an interview-ready response.

Labs

10

Free

3

Completed

0

beginnerScheduler18 minFree

The Three-Hour Scheduling Delay

Business context

The finance reporting DAG must begin at 02:00 so the warehouse is ready before business users arrive.

Production problem

The scheduler is healthy, but the DAG consistently starts between 04:30 and 05:00. The team keeps adding workers without proving where the delay occurs.

Interactive system map

The Three-Hour Scheduling Delay production path

Follow a scheduled task from DAG parsing to the downstream system.

1

DAG definition

Moves the workflow forward while preserving the contract with the next stage.

Scheduler evidence

[scheduler]
dag_processing.total_parse_time = 142s
queued_tasks = 318
pool.etl_pool.used_slots = 40/40

DAG config:
schedule="0 2 * * *"
max_active_runs=1

Your task

Diagnose first, then write the production response.

Identify whether this is parse delay, capacity backlog, dependency waiting, or task runtime. Propose a safe investigation and fix.

Most likely diagnosis or next action

Drafts are saved locally on this device.