Skip to content

Commit

Permalink
[dagster-airlift] Federation tutorial overview and setup
Browse files Browse the repository at this point in the history
  • Loading branch information
dpeng817 committed Nov 14, 2024
1 parent ece661e commit 4c45c85
Show file tree
Hide file tree
Showing 6 changed files with 112 additions and 3 deletions.
7 changes: 6 additions & 1 deletion docs/content/_navigation.json
Original file line number Diff line number Diff line change
Expand Up @@ -945,7 +945,12 @@
{
"title": "Airflow Federation Tutorial",
"path": "/integrations/airlift/federation-tutorial/overview",
"children": []
"children": [
{
"title": "Part 1: Setup upstream and downstream Airflow instances",
"path": "/integrations/airlift/federation-tutorial/setup"
}
]
},
{
"title": "Reference",
Expand Down
Original file line number Diff line number Diff line change
@@ -1 +1,29 @@
Will be filled out in a future PR.
# Airflow Federation Tutorial

This tutorial demonstrates using `dagster-airlift` to observe DAGs from multiple Airflow instances, and federate execution between them using Dagster as a centralized control plane.

Using `dagster-airlift` we can

- Observe Airflow DAGs and their execution history
- Directly trigger Airflow DAGs from Dagster
- Set up federated execution _across_ Airflow instances

All of this can be done with no changes to Airflow code.

## Overview

This tutorial will take you through an imaginary data platform team that has the following scenario:

- An Airflow instance `warehouse`, run by another team, that is responsible for loading data into a data warehouse.
- An Airflow instance `metrics`, run by the data platform team, that deploys all the metrics constructed by data scientists on top of the data warehouse.

Two DAGs have been causing a lot of pain lately for the team: `warehouse.load_customers` and `metrics.customer_metrics`. The `warehouse.load_customers` DAG is responsible for loading customer data into the data warehouse, and the `metrics.customer_metrics` DAG is responsible for computing metrics on top of the customer data. There's a cross-instance dependency relationship between these two DAGs, but it's not observable or controllable. The data platform team would ideally _only_ like to rebuild the `metrics.customer_metrics` DAG when the `warehouse.load_customers` DAG has new data. In this guide, we'll use `dagster-airlift` to observe the `warehouse` and `metrics` Airflow instances, and set up a federated execution controlled by Dagster that only triggers the `metrics.customer_metrics` DAG when the `warehouse.load_customers` DAG has new data. This process won't require any changes to the Airflow code.

## Pages

<ArticleList>
<ArticleListItem
title="Setup"
href="/integrations/airlift/federation-tutorial/setup"
></ArticleListItem>
</ArticleList>
76 changes: 76 additions & 0 deletions docs/content/integrations/airlift/federation-tutorial/setup.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Airflow Migration Tutorial: Setup

In this step, we'll

- Install the example code
- Set up a local environment
- Ensure we can run Airflow locally.

## Installation & Project Structure

First, clone the tutorial example repo locally, and enter the repo directory.

```bash
git clone [email protected]:dagster-io/airlift-migration-tutorial.git
cd airlift-federation-tutorial
```

Next, we'll create a fresh virtual environment using `uv`.

```bash
pip install uv
uv venv
source .venv/bin/activate
```

## Running Airflow locally

The tutorial example involves running a local Airflow instance. This can be done by running the following commands from the root of the `airlift-migration-tutorial` directory.

First, install the required python packages:

```bash
make airflow_install
```

Next, scaffold the two Airflow instances we'll be using for this tutorial:

```bash
make airflow_setup
```

Finally, let's run the two Airflow instances with environment variables set:

In one shell run:

```bash
make upstream_airflow_run
```

In a separate shell, run:

```bash
make downstream_airflow_run
```

This will run two Airflow Web UIs, one for each Airflow instance. You should now be able to access the upstream Airflow UI at `http://localhost:8081`, with the default username and password set to `admin`.

You should be able to see the `load_customers` DAG in the Airflow UI.

<Image
alt="load_customers DAG"
src="/images/integrations/airlift/load_customers.png"
width={1484}
height={300}
/>

Similarly, you should be able to access the downstream Airflow UI at `http://localhost:8082`, with the default username and password set to `admin`.

You should be able to see the `customer_metrics` DAG in the Airflow UI.

<Image
alt="customer_metrics DAG"
src="/images/integrations/airlift/customer_metrics.png"
width={1484}
height={300}
/>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion examples/airlift-federation-tutorial/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ help:
### TUTORIAL COMMANDS ###
airflow_install:
pip install uv && \
uv pip install dagster-airlift[in-airflow,dbt,tutorial] && \
uv pip install dagster-airlift[tutorial] && \
uv pip install -e $(MAKEFILE_DIR)

airflow_setup: wipe
Expand Down

0 comments on commit 4c45c85

Please sign in to comment.