This repo contains all of the setup needed to run airflow in docker, mount your local dag repo into the containers that need it, and allow you to develop dags hopefully quickly!
This repo expects:
- to be a git submodule living in the root of your dags repo.
- dag files are in a directory in the top level.
- a
local.env
file withDAG_DIR=some_name
with the name of the directory containing your dag files.
In the root of you dags repository, run git submodule add [email protected]:tulibraries/airflow-docker-dev-setup docker
, and then commit the changes to .gitmodules and the new docker
directory that has just been created.
Next, in the root of your dags repository, create a file called local.env
. Add the line DAG_DIR=my_dag_dir
with the actual name of the directory in the root of the repo containing your dag files. If your dags do any python import statements, this directory name needs to match the top level package in those import statements.
Finally, you should ensure that you have set the following environment varibles, TUPSFTP_PASSWORD
, WORKER_SSH_KEY_PATH
, TUP_ACCOUNT_NAME
, TUP_SSH_KEY_PATH
, and TUP_SFTP_ACCOUNT_NAME
. For example:
export TUPSFTP_PASSWORD="REINDEER FLOTILLA"
export WORKER_SSH_KEY_PATH="/home/flynn/.ssh/id_rsa"
export TUP_ACCOUNT_NAME="flynn_the_deployer"
export TUP_SSH_KEY_PATH="/usr/local/airflow/.ssh/flynn_the_deployer"
export TUP_SFTP_ACCOUNT_NAME="flynnsplace"
To use the docker setup, cd
into the docker
directory. This contains the docker-compose.yml
and some other docker configurations, a docker-requirement.txt
for pypi packages you want installed on the container, and a Makefile defining some useful commands.
$ make up
This spins up an Airflow stack using Postgres for the metadata database; Celery, Redis & Flower for job management; CeleryExecutor, Scheduler, Web-Server and Worker Airflow services; and mounting the local dags
directory as the Airflow stack's DAGs directory. That DAGs directory has cob_datapipeline and manifold_airflow_dags cloned into it if these subdirectories do not already exist. This will also create some known Variables and Connections, based off of the variables.json
file found in the dag repository. (the task copies this into data/local-dev-variables.json
if that file doesn't exist, then loads variables into Airflow from there).
Airflow 2 requires users to login, so an initial user is created as part of this workflow. The username is test-user and the password is "password".
Give this up to 1 minute to start up. You can check the Airflow web-server health-check state by running:
$ docker-compose -p infra ps
If you change something in the docker setup, e.g. an airflow worker build step, you may want to restart the docker instances (restarts, does not destroy and rebuild):
$ make reload
$ make stop
This will stop but not delete the Airflow docker stack, for ease of restart if you want to continue using these instances.
$ make down
Run shell in Airflow Worker instance:
$ make tty-worker
Run shell in Airflow Webserver instance:
$ make tty-webserver
Run shell in Airflow Scheduler instance:
$ make tty-scheduler
Run shell as root in Airflow Worker instance:
$ make tty-root-worker
Run shell as root in Airflow Webserver instance:
$ make tty-root-webserver
Run shell as root in Airflow Scheduler instance:
$ make tty-root-scheduler