Stripping Apache Airflow down to its core: The Native Trinity

Moe BayatJune 5, 20265 min read read

When you look at Apache Airflow's Docker Compose stack, it looks like a massive wall of config—6 separate containers handling schedulers, web UIs, workers, triggers, and DBs.

But what happens when you strip away Docker entirely?
Looking past container boundaries to see how Airflow manages its lifecycle natively reveals that underneath the abstraction, the engine boils down to an elegant architectural trinity: The Memory, The Brain, and The Muscle.

Here is how these three native pillars divide and conquer your workflow processing:

1. The Memory: The Metadata Database
Natively, this is just a local SQLite file (airflow.db) or a local Postgres service. Airflow is completely stateless; it has no long-term memory. This DB is the single source of truth storing task statuses, DAG structures, and execution history. Without it, the system has instant amnesia.

2. The Brain: The Scheduler
Running airflow scheduler spawns a continuous, native Python process. It sits in a loop scanning your ~/airflow/dags folder, checking the clock, and writing an execution "ticket" to the DB when a task is ready. Crucial insight: The scheduler never executes your data lifting. It just manages the timeline.

3. The Muscle: The Executor / Worker
Natively on a laptop, there isn't a permanent worker container running. Instead, when the scheduler sees a ticket in the DB, it acts as an OS supervisor. It forks a brand-new, temporary child process right on your operating system to execute that specific task's Python code.

The Local RAM Realization
When running natively, your tasks run directly on your host machine's Python environment. They share the exact same uninsulated pool of hardware memory as your IDE, web browser, and OS background apps.
If you write a task that pulls a massive dataset into memory all at once (like loading a 50GB CSV into a Pandas dataframe), that single temporary child process will vertically spike your system RAM until the OS forcefully steps in and kills it (the dreaded Out-of-Memory error, Exit Code 137).

The Takeaway
Whether you run a 6-service Docker stack or a native setup, the core rule of Airflow remains identical: Keep your orchestrator lean and stateless.
Instead of forcing local worker threads to act like heavy data cargo trucks, use them as lightweight dispatchers that fire off SDK commands to external distributed data warehouses (like BigQuery or Snowflake). Let the cloud handle the heavy compute state, keep your local system RAM flat, and your workflow engine stays bulletproof.

Want more on azure cloud architecture?