diff --git a/docs/content/_navigation.json b/docs/content/_navigation.json index 10d19759761bb..f761f044b3e9d 100644 --- a/docs/content/_navigation.json +++ b/docs/content/_navigation.json @@ -945,7 +945,12 @@ { "title": "Airflow Federation Tutorial", "path": "/integrations/airlift/federation-tutorial/overview", - "children": [] + "children": [ + { + "title": "Part 1: Setup upstream and downstream Airflow instances", + "path": "/integrations/airlift/federation-tutorial/setup" + } + ] }, { "title": "Reference", diff --git a/docs/content/integrations/airlift/federation-tutorial/overview.mdx b/docs/content/integrations/airlift/federation-tutorial/overview.mdx index 47971eef50a92..4c6080022eea4 100644 --- a/docs/content/integrations/airlift/federation-tutorial/overview.mdx +++ b/docs/content/integrations/airlift/federation-tutorial/overview.mdx @@ -1 +1,29 @@ -Will be filled out in a future PR. +# Airflow Federation Tutorial + +This tutorial demonstrates using `dagster-airlift` to observe DAGs from multiple Airflow instances, and federate execution between them using Dagster as a centralized control plane. + +Using `dagster-airlift` we can + +- Observe Airflow DAGs and their execution history +- Directly trigger Airflow DAGs from Dagster +- Set up federated execution _across_ Airflow instances + +All of this can be done with no changes to Airflow code. + +## Overview + +This tutorial will take you through an imaginary data platform team that has the following scenario: + +- An Airflow instance `warehouse`, run by another team, that is responsible for loading data into a data warehouse. +- An Airflow instance `metrics`, run by the data platform team, that deploys all the metrics constructed by data scientists on top of the data warehouse. + +Two DAGs have been causing a lot of pain lately for the team: `warehouse.load_customers` and `metrics.customer_metrics`. The `warehouse.load_customers` DAG is responsible for loading customer data into the data warehouse, and the `metrics.customer_metrics` DAG is responsible for computing metrics on top of the customer data. There's a cross-instance dependency relationship between these two DAGs, but it's not observable or controllable. The data platform team would ideally _only_ like to rebuild the `metrics.customer_metrics` DAG when the `warehouse.load_customers` DAG has new data. In this guide, we'll use `dagster-airlift` to observe the `warehouse` and `metrics` Airflow instances, and set up a federated execution controlled by Dagster that only triggers the `metrics.customer_metrics` DAG when the `warehouse.load_customers` DAG has new data. This process won't require any changes to the Airflow code. + +## Pages + + + + diff --git a/docs/content/integrations/airlift/federation-tutorial/setup.mdx b/docs/content/integrations/airlift/federation-tutorial/setup.mdx new file mode 100644 index 0000000000000..29bbc87380c78 --- /dev/null +++ b/docs/content/integrations/airlift/federation-tutorial/setup.mdx @@ -0,0 +1,76 @@ +# Airflow Migration Tutorial: Setup + +In this step, we'll + +- Install the example code +- Set up a local environment +- Ensure we can run Airflow locally. + +## Installation & Project Structure + +First, clone the tutorial example repo locally, and enter the repo directory. + +```bash +git clone git@github.com:dagster-io/airlift-migration-tutorial.git +cd airlift-federation-tutorial +``` + +Next, we'll create a fresh virtual environment using `uv`. + +```bash +pip install uv +uv venv +source .venv/bin/activate +``` + +## Running Airflow locally + +The tutorial example involves running a local Airflow instance. This can be done by running the following commands from the root of the `airlift-migration-tutorial` directory. + +First, install the required python packages: + +```bash +make airflow_install +``` + +Next, scaffold the two Airflow instances we'll be using for this tutorial: + +```bash +make airflow_setup +``` + +Finally, let's run the two Airflow instances with environment variables set: + +In one shell run: + +```bash +make upstream_airflow_run +``` + +In a separate shell, run: + +```bash +make downstream_airflow_run +``` + +This will run two Airflow Web UIs, one for each Airflow instance. You should now be able to access the upstream Airflow UI at `http://localhost:8081`, with the default username and password set to `admin`. + +You should be able to see the `load_customers` DAG in the Airflow UI. + + + +Similarly, you should be able to access the downstream Airflow UI at `http://localhost:8082`, with the default username and password set to `admin`. + +You should be able to see the `customer_metrics` DAG in the Airflow UI. + + diff --git a/docs/next/public/images/integrations/airlift/customer_metrics.png b/docs/next/public/images/integrations/airlift/customer_metrics.png new file mode 100644 index 0000000000000..359dc28d09959 Binary files /dev/null and b/docs/next/public/images/integrations/airlift/customer_metrics.png differ diff --git a/docs/next/public/images/integrations/airlift/load_customers.png b/docs/next/public/images/integrations/airlift/load_customers.png new file mode 100644 index 0000000000000..3b66525bcb24d Binary files /dev/null and b/docs/next/public/images/integrations/airlift/load_customers.png differ diff --git a/examples/airlift-federation-tutorial/Makefile b/examples/airlift-federation-tutorial/Makefile index a75c1001dab7c..a87409fafd207 100644 --- a/examples/airlift-federation-tutorial/Makefile +++ b/examples/airlift-federation-tutorial/Makefile @@ -26,7 +26,7 @@ help: ### TUTORIAL COMMANDS ### airflow_install: pip install uv && \ - uv pip install dagster-airlift[in-airflow,dbt,tutorial] && \ + uv pip install dagster-airlift[tutorial] && \ uv pip install -e $(MAKEFILE_DIR) airflow_setup: wipe