Skip to content

Commit 40b52ba

Browse files
Julian de RuiterJulian de RuiterBasPH
authored
Update for Airflow 2.0 (#34)
* Example for Airflow 2. * Update chapter 3 for Airflow 2. * Update chapter 5. * Updated chapter 8 for Airflow 2. * Fix black. * Fix docker + k8s examples for alpha 2. * Update chapters 11, 16 and 17. * Update all Docker images to Airflow 2 beta * Rename user to anonymous admin. * Rename several dags. * Update to beta 3. * Update providers for newest beta. * Remove backport ref. * Example for Airflow 2. * Update chapter 3 for Airflow 2. * Update chapter 5. * Updated chapter 8 for Airflow 2. * Fix black. * Fix docker + k8s examples for alpha 2. * Update chapters 11, 16 and 17. * Update all Docker images to Airflow 2 beta * Rename user to anonymous admin. * Rename several dags. * Update to beta 3. * Update providers for newest beta. * Remove backport ref. * Update chapter 6 to Airflow 2 * Update all chapter 6 code to Airflow 2 * Update all chapter 2 DAGs to Airflow 2 * Update CH4 for Airflow 2 * Update CH12 for Airflow 2 * Updated LDAP example to Airflow 2 * Update CH13 RBAC example to Airflow 2 * Update CH13 secretsbackend example for Airflow 2 * Update CH14 for Airflow 2 * Make links in CH14 readme * Update for Airflow 2 * Update CH2 for Airflow 2 * Update for Airflow 2 * Add back DAG with name download_rocket_launches to follow book examples * Add task groups. * Update chapter 3. * Add task groups. * Small updates for chapter 5. * Small updates to chapter 8. * Small updates for ch10. * Remove provide_context. * Update docker image version. * Update bash imports. * Rename remaining old chapters to new numbers. * Move chapters to top-level. * Merge chapter 10 folders. * Fix dummy references. * Fix readme refs. * Update CH6 for Airflow 2 * Update CH4 for Airflow 2 * Update CH2 for Airflow 2 * CH12 Airflow 2 updates * CH13 updates for Airflow 2 * CH12 update for Airfow 2.0 * Fix CH18 docker-compose file * Update CH2 using new Launch Library API v2 * Restructure ch10, update providers. * Align CH6 DAG names * Fix ch10 container names. * CH7 Update for Airflow 2, fixing bugs along the way * Update CH9 * Update CH13 for Airflow 2 * DAG serialization settings are obsolete in Airflow 2 * Update CH18 for Airflow 2 * Fix flake8 issues. Co-authored-by: Julian de Ruiter <[email protected]> Co-authored-by: Bas Harenslak <[email protected]>
1 parent d2073a7 commit 40b52ba

File tree

274 files changed

+2997
-1292
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

274 files changed

+2997
-1292
lines changed

Diff for: .flake8

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22
max-line-length = 88
33
ignore = E203,E501
44
per-file-ignores =
5-
chapters/chapter18/dags/gcp.py:W503
5+
chapter18/dags/gcp.py:W503

Diff for: Makefile

-54
This file was deleted.

Diff for: README.md

+10-11
Original file line numberDiff line numberDiff line change
@@ -7,20 +7,19 @@ Code accompanying the Manning book [Data Pipelines with Apache Airflow](https://
77
Overall, this repository is structured as follows:
88

99
```
10-
├── CHANGELOG.md # Changelog detailing updates to the code.
11-
├── LICENSE
12-
├── Makefile # Helper commands.
13-
├── README.md # This readme.
14-
├── chapters # Code examples for each of the Chapters.
15-
├── docker # Supporting Docker image (containing Airflow).
16-
└── requirements.txt
10+
├── chapter01 # Code examples for Chapter 1.
11+
├── chapter02 # Code examples for Chapter 2.
12+
├── ...
13+
├── .pre-commit-config.yaml # Pre-commit config for the CI.
14+
├── CHANGELOG.md # Changelog detailing updates to the code.
15+
├── LICENSE # Code license.
16+
├── README.md # This readme.
17+
└── requirements.txt # CI dependencies.
1718
```
1819

19-
The most interesting parts are probably the *docker* directory and the *chapter* directories under *chapters*.
20+
The *chapterXX* directories contain the code examples for each specific Chapter.
2021

21-
The *docker* directory contains a custom Airflow image that will be used through out the book.
22-
23-
The *chapter* directories contain the code examples for each specific Chapter. Code for each Chapter is generally structured something like follows:
22+
Code for each Chapter is generally structured something like follows:
2423

2524
```
2625
├── dags # Airflow DAG examples (+ other code).
File renamed without changes.

Diff for: chapters/chapter01/dags/01_umbrella_predictions.py renamed to chapter01/dags/01_umbrella.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@
22

33
import airflow.utils.dates
44
from airflow import DAG
5-
from airflow.operators.dummy_operator import DummyOperator
5+
from airflow.operators.dummy import DummyOperator
66

77
dag = DAG(
8-
dag_id="01_umbrella_predictions",
8+
dag_id="01_umbrella",
99
description="Umbrella example with DummyOperators.",
1010
start_date=airflow.utils.dates.days_ago(5),
1111
schedule_interval="@daily",

Diff for: chapters/chapter11/docker-compose.yml renamed to chapter01/docker-compose.yml

+3-2
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ x-environment: &airflow_environment
99
- AIRFLOW__CORE__STORE_SERIALIZED_DAGS=True
1010
- AIRFLOW__WEBSERVER__EXPOSE_CONFIG=True
1111
- AIRFLOW__WEBSERVER__RBAC=False
12-
x-airflow-image: &airflow_image apache/airflow:1.10.12-python3.8
12+
x-airflow-image: &airflow_image apache/airflow:2.0.0-python3.8
1313
# ====================================== /AIRFLOW ENVIRONMENT VARIABLES ======================================
1414
services:
1515
postgres:
@@ -25,7 +25,8 @@ services:
2525
depends_on:
2626
- postgres
2727
environment: *airflow_environment
28-
command: upgradedb
28+
entrypoint: /bin/bash
29+
command: -c 'airflow db upgrade && sleep 5 && airflow users create --username admin --password admin --firstname Anonymous --lastname Admin --role Admin --email [email protected]'
2930
webserver:
3031
image: *airflow_image
3132
restart: always
File renamed without changes.

Diff for: chapter02/dags/download_rocket_launches.py

+57
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
import json
2+
import pathlib
3+
4+
import airflow.utils.dates
5+
import requests
6+
import requests.exceptions as requests_exceptions
7+
from airflow import DAG
8+
from airflow.operators.bash import BashOperator
9+
from airflow.operators.python import PythonOperator
10+
11+
dag = DAG(
12+
dag_id="download_rocket_launches",
13+
description="Download rocket pictures of recently launched rockets.",
14+
start_date=airflow.utils.dates.days_ago(14),
15+
schedule_interval="@daily",
16+
)
17+
18+
download_launches = BashOperator(
19+
task_id="download_launches",
20+
bash_command="curl -o /tmp/launches.json -L 'https://ll.thespacedevs.com/2.0.0/launch/upcoming'", # noqa: E501
21+
dag=dag,
22+
)
23+
24+
25+
def _get_pictures():
26+
# Ensure directory exists
27+
pathlib.Path("/tmp/images").mkdir(parents=True, exist_ok=True)
28+
29+
# Download all pictures in launches.json
30+
with open("/tmp/launches.json") as f:
31+
launches = json.load(f)
32+
image_urls = [launch["image"] for launch in launches["results"]]
33+
for image_url in image_urls:
34+
try:
35+
response = requests.get(image_url)
36+
image_filename = image_url.split("/")[-1]
37+
target_file = f"/tmp/images/{image_filename}"
38+
with open(target_file, "wb") as f:
39+
f.write(response.content)
40+
print(f"Downloaded {image_url} to {target_file}")
41+
except requests_exceptions.MissingSchema:
42+
print(f"{image_url} appears to be an invalid URL.")
43+
except requests_exceptions.ConnectionError:
44+
print(f"Could not connect to {image_url}.")
45+
46+
47+
get_pictures = PythonOperator(
48+
task_id="get_pictures", python_callable=_get_pictures, dag=dag
49+
)
50+
51+
notify = BashOperator(
52+
task_id="notify",
53+
bash_command='echo "There are now $(ls /tmp/images/ | wc -l) images."',
54+
dag=dag,
55+
)
56+
57+
download_launches >> get_pictures >> notify

Diff for: chapters/chapter02/dags/listing_2_10.py renamed to chapter02/dags/listing_2_10.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55
import requests
66
import requests.exceptions as requests_exceptions
77
from airflow import DAG
8-
from airflow.operators.bash_operator import BashOperator
9-
from airflow.operators.python_operator import PythonOperator
8+
from airflow.operators.bash import BashOperator
9+
from airflow.operators.python import PythonOperator
1010

1111
dag = DAG(
1212
dag_id="listing_2_10",
@@ -16,7 +16,7 @@
1616

1717
download_launches = BashOperator(
1818
task_id="download_launches",
19-
bash_command="curl -o /tmp/launches.json 'https://launchlibrary.net/1.4/launch?next=5&mode=verbose'", # noqa: E501
19+
bash_command="curl -o /tmp/launches.json -L 'https://ll.thespacedevs.com/2.0.0/launch/upcoming'", # noqa: E501
2020
dag=dag,
2121
)
2222

@@ -28,7 +28,7 @@ def _get_pictures():
2828
# Download all pictures in launches.json
2929
with open("/tmp/launches.json") as f:
3030
launches = json.load(f)
31-
image_urls = [launch["rocket"]["imageURL"] for launch in launches["launches"]]
31+
image_urls = [launch["image"] for launch in launches["results"]]
3232
for image_url in image_urls:
3333
try:
3434
response = requests.get(image_url)

Diff for: chapters/chapter02/dags/listing_2_2.py renamed to chapter02/dags/listing_2_2.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,18 @@
55
import requests
66
import requests.exceptions as requests_exceptions
77
from airflow import DAG
8-
from airflow.operators.bash_operator import BashOperator
9-
from airflow.operators.python_operator import PythonOperator
8+
from airflow.operators.bash import BashOperator
9+
from airflow.operators.python import PythonOperator
1010

1111
dag = DAG(
12-
dag_id="listing_2_2",
12+
dag_id="listing_2_02",
1313
start_date=airflow.utils.dates.days_ago(14),
1414
schedule_interval=None,
1515
)
1616

1717
download_launches = BashOperator(
1818
task_id="download_launches",
19-
bash_command="curl -o /tmp/launches.json 'https://launchlibrary.net/1.4/launch?next=5&mode=verbose'", # noqa: E501
19+
bash_command="curl -o /tmp/launches.json -L 'https://ll.thespacedevs.com/2.0.0/launch/upcoming'", # noqa: E501
2020
dag=dag,
2121
)
2222

@@ -28,7 +28,7 @@ def _get_pictures():
2828
# Download all pictures in launches.json
2929
with open("/tmp/launches.json") as f:
3030
launches = json.load(f)
31-
image_urls = [launch["rocket"]["imageURL"] for launch in launches["launches"]]
31+
image_urls = [launch["image"] for launch in launches["results"]]
3232
for image_url in image_urls:
3333
try:
3434
response = requests.get(image_url)

Diff for: chapters/chapter02/dags/listing_2_3.py renamed to chapter02/dags/listing_2_3.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
from airflow import DAG
33

44
dag = DAG(
5-
dag_id="listing_2_3",
5+
dag_id="listing_2_03",
66
start_date=airflow.utils.dates.days_ago(14),
77
schedule_interval=None,
88
)
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
import airflow
22
from airflow import DAG
3-
from airflow.operators.bash_operator import BashOperator
3+
from airflow.operators.bash import BashOperator
44

55
dag = DAG(
6-
dag_id="listing_2_4",
6+
dag_id="listing_2_04",
77
start_date=airflow.utils.dates.days_ago(14),
88
schedule_interval=None,
99
)
1010

1111
download_launches = BashOperator(
1212
task_id="download_launches",
13-
bash_command="curl -o /tmp/launches.json 'https://launchlibrary.net/1.4/launch?next=5&mode=verbose'", # noqa: E501
13+
bash_command="curl -o /tmp/launches.json -L 'https://ll.thespacedevs.com/2.0.0/launch/upcoming'", # noqa: E501
1414
dag=dag,
1515
)

Diff for: chapters/chapter02/dags/listing_2_6.py renamed to chapter02/dags/listing_2_6.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,18 @@
55
import requests
66
import requests.exceptions as requests_exceptions
77
from airflow import DAG
8-
from airflow.operators.bash_operator import BashOperator
9-
from airflow.operators.python_operator import PythonOperator
8+
from airflow.operators.bash import BashOperator
9+
from airflow.operators.python import PythonOperator
1010

1111
dag = DAG(
12-
dag_id="listing_2_6",
12+
dag_id="listing_2_06",
1313
start_date=airflow.utils.dates.days_ago(14),
1414
schedule_interval=None,
1515
)
1616

1717
download_launches = BashOperator(
1818
task_id="download_launches",
19-
bash_command="curl -o /tmp/launches.json 'https://launchlibrary.net/1.4/launch?next=5&mode=verbose'", # noqa: E501
19+
bash_command="curl -o /tmp/launches.json -L 'https://ll.thespacedevs.com/2.0.0/launch/upcoming'", # noqa: E501
2020
dag=dag,
2121
)
2222

@@ -28,7 +28,7 @@ def _get_pictures():
2828
# Download all pictures in launches.json
2929
with open("/tmp/launches.json") as f:
3030
launches = json.load(f)
31-
image_urls = [launch["rocket"]["imageURL"] for launch in launches["launches"]]
31+
image_urls = [launch["image"] for launch in launches["results"]]
3232
for image_url in image_urls:
3333
try:
3434
response = requests.get(image_url)

Diff for: chapters/chapter02/docker-compose.yml renamed to chapter02/docker-compose.yml

+6-3
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,8 @@ x-environment: &airflow_environment
99
- AIRFLOW__CORE__STORE_DAG_CODE=True
1010
- AIRFLOW__CORE__STORE_SERIALIZED_DAGS=True
1111
- AIRFLOW__WEBSERVER__EXPOSE_CONFIG=True
12-
- AIRFLOW__WEBSERVER__RBAC=False
1312

14-
x-airflow-image: &airflow_image apache/airflow:1.10.12-python3.8
13+
x-airflow-image: &airflow_image apache/airflow:2.0.0-python3.8
1514
# ====================================== /AIRFLOW ENVIRONMENT VARIABLES ======================================
1615
services:
1716
postgres:
@@ -22,12 +21,15 @@ services:
2221
- POSTGRES_DB=airflow
2322
ports:
2423
- "5432:5432"
24+
2525
init:
2626
image: *airflow_image
2727
depends_on:
2828
- postgres
2929
environment: *airflow_environment
30-
command: upgradedb
30+
entrypoint: /bin/bash
31+
command: -c 'airflow db init && airflow users create --username admin --password admin --firstname Anonymous --lastname Admin --role Admin --email [email protected]'
32+
3133
webserver:
3234
image: *airflow_image
3335
restart: always
@@ -39,6 +41,7 @@ services:
3941
- logs:/opt/airflow/logs
4042
environment: *airflow_environment
4143
command: webserver
44+
4245
scheduler:
4346
image: *airflow_image
4447
restart: always

0 commit comments

Comments
 (0)