Skip to content

Commit 3ca7b1f

Browse files
authored
Merge pull request #35 from nlp-with-transformers/add-figures
Add figures
2 parents ae5b7c1 + 4c0aab9 commit 3ca7b1f

File tree

88 files changed

+566
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+566
-0
lines changed

01_introduction.ipynb

+77
Original file line numberDiff line numberDiff line change
@@ -31,27 +31,83 @@
3131
"# Hello Transformers"
3232
]
3333
},
34+
{
35+
"cell_type": "markdown",
36+
"metadata": {},
37+
"source": [
38+
"<img alt=\"transformer-timeline\" caption=\"The transformers timeline\" src=\"images/chapter01_timeline.png\" id=\"transformer-timeline\"/>"
39+
]
40+
},
3441
{
3542
"cell_type": "markdown",
3643
"metadata": {},
3744
"source": [
3845
"## The Encoder-Decoder Framework"
3946
]
4047
},
48+
{
49+
"cell_type": "markdown",
50+
"metadata": {},
51+
"source": [
52+
"<img alt=\"rnn\" caption=\"Unrolling an RNN in time.\" src=\"images/chapter01_rnn.png\" id=\"rnn\"/>"
53+
]
54+
},
55+
{
56+
"cell_type": "markdown",
57+
"metadata": {},
58+
"source": [
59+
"<img alt=\"enc-dec\" caption=\"Encoder-decoder architecture with a pair of RNNs. In general, there are many more recurrent layers than those shown.\" src=\"images/chapter01_enc-dec.png\" id=\"enc-dec\"/>"
60+
]
61+
},
4162
{
4263
"cell_type": "markdown",
4364
"metadata": {},
4465
"source": [
4566
"## Attention Mechanisms"
4667
]
4768
},
69+
{
70+
"cell_type": "markdown",
71+
"metadata": {},
72+
"source": [
73+
"<img alt=\"enc-dec-attn\" caption=\"Encoder-decoder architecture with an attention mechanism for a pair of RNNs.\" src=\"images/chapter01_enc-dec-attn.png\" id=\"enc-dec-attn\"/> "
74+
]
75+
},
76+
{
77+
"cell_type": "markdown",
78+
"metadata": {},
79+
"source": [
80+
"<img alt=\"attention-alignment\" width=\"500\" caption=\"RNN encoder-decoder alignment of words in English and the generated translation in French (courtesy of Dzmitry Bahdanau).\" src=\"images/chapter02_attention-alignment.png\" id=\"attention-alignment\"/> "
81+
]
82+
},
83+
{
84+
"cell_type": "markdown",
85+
"metadata": {},
86+
"source": [
87+
"<img alt=\"transformer-self-attn\" caption=\"Encoder-decoder architecture of the original Transformer.\" src=\"images/chapter01_self-attention.png\" id=\"transformer-self-attn\"/> "
88+
]
89+
},
4890
{
4991
"cell_type": "markdown",
5092
"metadata": {},
5193
"source": [
5294
"## Transfer Learning in NLP"
5395
]
5496
},
97+
{
98+
"cell_type": "markdown",
99+
"metadata": {},
100+
"source": [
101+
"<img alt=\"transfer-learning\" caption=\"Comparison of traditional supervised learning (left) and transfer learning (right).\" src=\"images/chapter01_transfer-learning.png\" id=\"transfer-learning\"/> "
102+
]
103+
},
104+
{
105+
"cell_type": "markdown",
106+
"metadata": {},
107+
"source": [
108+
"<img alt=\"ulmfit\" width=\"500\" caption=\"The ULMFiT process (courtesy of Jeremy Howard).\" src=\"images/chapter01_ulmfit.png\" id=\"ulmfit\"/>"
109+
]
110+
},
55111
{
56112
"cell_type": "markdown",
57113
"metadata": {},
@@ -497,13 +553,34 @@
497553
"## The Hugging Face Ecosystem"
498554
]
499555
},
556+
{
557+
"cell_type": "markdown",
558+
"metadata": {},
559+
"source": [
560+
"<img alt=\"ecosystem\" width=\"500\" caption=\"An overview of the Hugging Face ecosystem of libraries and the Hub.\" src=\"images/chapter01_hf-ecosystem.png\" id=\"ecosystem\"/>"
561+
]
562+
},
500563
{
501564
"cell_type": "markdown",
502565
"metadata": {},
503566
"source": [
504567
"### The Hugging Face Hub"
505568
]
506569
},
570+
{
571+
"cell_type": "markdown",
572+
"metadata": {},
573+
"source": [
574+
"<img alt=\"hub-overview\" width=\"1000\" caption=\"The models page of the Hugging Face Hub, showing filters on the left and a list of models on the right.\" src=\"images/chapter01_hub-overview.png\" id=\"hub-overview\"/> "
575+
]
576+
},
577+
{
578+
"cell_type": "markdown",
579+
"metadata": {},
580+
"source": [
581+
"<img alt=\"hub-model-card\" width=\"1000\" caption=\"A example model card from the Hugging Face Hub. The inference widget is shown on the right, where you can interact with the model.\" src=\"images/chapter01_hub-model-card.png\" id=\"hub-model-card\"/> "
582+
]
583+
},
507584
{
508585
"cell_type": "markdown",
509586
"metadata": {},

03_transformer-anatomy.ipynb

+56
Original file line numberDiff line numberDiff line change
@@ -48,20 +48,41 @@
4848
"## The Transformer Architecture"
4949
]
5050
},
51+
{
52+
"cell_type": "markdown",
53+
"metadata": {},
54+
"source": [
55+
"<img alt=\"transformer-encoder-decoder\" caption=\"Encoder-decoder architecture of the transformer, with the encoder shown in the upper half of the figure and the decoder in the lower half\" src=\"images/chapter03_transformer-encoder-decoder.png\" id=\"transformer-encoder-decoder\"/>"
56+
]
57+
},
5158
{
5259
"cell_type": "markdown",
5360
"metadata": {},
5461
"source": [
5562
"## The Encoder"
5663
]
5764
},
65+
{
66+
"cell_type": "markdown",
67+
"metadata": {},
68+
"source": [
69+
"<img alt=\"encoder-zoom\" caption=\"Zooming into the encoder layer\" src=\"images/chapter03_encoder-zoom.png\" id=\"encoder-zoom\"/>"
70+
]
71+
},
5872
{
5973
"cell_type": "markdown",
6074
"metadata": {},
6175
"source": [
6276
"### Self-Attention"
6377
]
6478
},
79+
{
80+
"cell_type": "markdown",
81+
"metadata": {},
82+
"source": [
83+
"<img alt=\"Contextualized embeddings\" caption=\"Diagram showing how self-attention updates raw token embeddings (upper) into contextualized embeddings (lower) to create representations that incorporate information from the whole sequence\" src=\"images/chapter03_contextualized-embedding.png\" id=\"contextualized-embeddings\"/>"
84+
]
85+
},
6586
{
6687
"cell_type": "markdown",
6788
"metadata": {},
@@ -181,6 +202,13 @@
181202
"### End sidebar"
182203
]
183204
},
205+
{
206+
"cell_type": "markdown",
207+
"metadata": {},
208+
"source": [
209+
"<img alt=\"Operations in scaled dot-product attention\" height=\"125\" caption=\"Operations in scaled dot-product attention\" src=\"images/chapter03_attention-ops.png\" id=\"attention-ops\"/>"
210+
]
211+
},
184212
{
185213
"cell_type": "code",
186214
"execution_count": null,
@@ -351,6 +379,13 @@
351379
"#### Multi-headed attention"
352380
]
353381
},
382+
{
383+
"cell_type": "markdown",
384+
"metadata": {},
385+
"source": [
386+
"<img alt=\"Multi-head attention\" height=\"125\" caption=\"Multi-head attention\" src=\"images/chapter03_multihead-attention.png\" id=\"multihead-attention\"/>"
387+
]
388+
},
354389
{
355390
"cell_type": "code",
356391
"execution_count": null,
@@ -557,6 +592,13 @@
557592
"### Adding Layer Normalization"
558593
]
559594
},
595+
{
596+
"cell_type": "markdown",
597+
"metadata": {},
598+
"source": [
599+
"<img alt=\"Transformer layer normalization\" height=\"500\" caption=\"Different arrangements of layer normalization in a transformer encoder layer\" src=\"images/chapter03_layer-norm.png\" id=\"layer-norm\"/>"
600+
]
601+
},
560602
{
561603
"cell_type": "code",
562604
"execution_count": null,
@@ -757,6 +799,13 @@
757799
"## The Decoder"
758800
]
759801
},
802+
{
803+
"cell_type": "markdown",
804+
"metadata": {},
805+
"source": [
806+
"<img alt=\"Transformer decoder zoom\" caption=\"Zooming into the transformer decoder layer\" src=\"images/chapter03_decoder-zoom.png\" id=\"decoder-zoom\"/> "
807+
]
808+
},
760809
{
761810
"cell_type": "code",
762811
"execution_count": null,
@@ -851,6 +900,13 @@
851900
"### The Transformer Tree of Life"
852901
]
853902
},
903+
{
904+
"cell_type": "markdown",
905+
"metadata": {},
906+
"source": [
907+
"<img alt=\"Transformer family tree\" caption=\"An overview of some of the most prominent transformer architectures\" src=\"images/chapter03_transformers-compact.png\" id=\"family-tree\"/>"
908+
]
909+
},
854910
{
855911
"cell_type": "markdown",
856912
"metadata": {},

04_multilingual-ner.ipynb

+35
Original file line numberDiff line numberDiff line change
@@ -960,6 +960,13 @@
960960
"### The Tokenizer Pipeline"
961961
]
962962
},
963+
{
964+
"cell_type": "markdown",
965+
"metadata": {},
966+
"source": [
967+
"<img alt=\"Tokenizer pipeline\" caption=\"The steps in the tokenization pipeline\" src=\"images/chapter04_tokenizer-pipeline.png\" id=\"toknizer-pipeline\"/>"
968+
]
969+
},
963970
{
964971
"cell_type": "markdown",
965972
"metadata": {},
@@ -1001,6 +1008,20 @@
10011008
"## Transformers for Named Entity Recognition"
10021009
]
10031010
},
1011+
{
1012+
"cell_type": "markdown",
1013+
"metadata": {},
1014+
"source": [
1015+
"<img alt=\"Architecture of a transformer encoder for classification.\" caption=\"Fine-tuning an encoder-based transformer for sequence classification\" src=\"images/chapter04_clf-architecture.png\" id=\"clf-arch\"/>"
1016+
]
1017+
},
1018+
{
1019+
"cell_type": "markdown",
1020+
"metadata": {},
1021+
"source": [
1022+
"<img alt=\"Architecture of a transformer encoder for named entity recognition. The wide linear layer shows that the same linear layer is applied to all hidden states.\" caption=\"Fine-tuning an encoder-based transformer for named entity recognition\" src=\"images/chapter04_ner-architecture.png\" id=\"ner-arch\"/>"
1023+
]
1024+
},
10041025
{
10051026
"cell_type": "markdown",
10061027
"metadata": {},
@@ -1015,6 +1036,13 @@
10151036
"### Bodies and Heads"
10161037
]
10171038
},
1039+
{
1040+
"cell_type": "markdown",
1041+
"metadata": {},
1042+
"source": [
1043+
"<img alt=\"bert-body-head\" caption=\"The `BertModel` class only contains the body of the model, while the `BertFor&lt;Task&gt;` classes combine the body with a dedicated head for a given task\" src=\"images/chapter04_bert-body-head.png\" id=\"bert-body-head\"/>"
1044+
]
1045+
},
10181046
{
10191047
"cell_type": "markdown",
10201048
"metadata": {},
@@ -4096,6 +4124,13 @@
40964124
"## Interacting with Model Widgets"
40974125
]
40984126
},
4127+
{
4128+
"cell_type": "markdown",
4129+
"metadata": {},
4130+
"source": [
4131+
"<img alt=\"A Hub widget\" caption=\"Example of a widget on the Hugging Face Hub\" src=\"images/chapter04_ner-widget.png\" id=\"ner-widget\"/> "
4132+
]
4133+
},
40994134
{
41004135
"cell_type": "markdown",
41014136
"metadata": {},

05_text-generation.ipynb

+28
Original file line numberDiff line numberDiff line change
@@ -41,13 +41,34 @@
4141
"# Text Generation"
4242
]
4343
},
44+
{
45+
"cell_type": "markdown",
46+
"metadata": {},
47+
"source": [
48+
"<img alt=\"LM Meta Learning\" width=\"800\" caption=\"During pretraining, language models are exposed to sequences of tasks that can be adapted during inference (courtesy of Tom B. Brown)\" src=\"images/chapter05_lm-meta-learning.png\" id=\"lm-meta-learning\"/>"
49+
]
50+
},
51+
{
52+
"cell_type": "markdown",
53+
"metadata": {},
54+
"source": [
55+
"<img alt=\"Meena\" width=\"300\" caption=\"Meena on the left telling a corny joke to a human on the right (courtesy of Daniel Adiwardana and Thang Luong)\" src=\"images/chapter05_meena.png\" id=\"meena\"/>"
56+
]
57+
},
4458
{
4559
"cell_type": "markdown",
4660
"metadata": {},
4761
"source": [
4862
"## The Challenge with Generating Coherent Text"
4963
]
5064
},
65+
{
66+
"cell_type": "markdown",
67+
"metadata": {},
68+
"source": [
69+
"<img alt=\"Text generation\" width=\"700\" caption=\"Generating text from an input sequence by adding a new word to the input at each step\" src=\"images/chapter05_text-generation.png\" id=\"text-generation\"/> "
70+
]
71+
},
5172
{
5273
"cell_type": "markdown",
5374
"metadata": {},
@@ -320,6 +341,13 @@
320341
"## Beam Search Decoding"
321342
]
322343
},
344+
{
345+
"cell_type": "markdown",
346+
"metadata": {},
347+
"source": [
348+
"<img alt=\"Beam search\" width=\"700\" caption=\"Beam search with two beams—the most probable sequences at each timestep are highlighted in blue\" src=\"images/chapter05_beam-search.png\" id=\"beam-search\"/> "
349+
]
350+
},
323351
{
324352
"cell_type": "code",
325353
"execution_count": null,

06_summarization.ipynb

+14
Original file line numberDiff line numberDiff line change
@@ -269,6 +269,13 @@
269269
"### T5"
270270
]
271271
},
272+
{
273+
"cell_type": "markdown",
274+
"metadata": {},
275+
"source": [
276+
"<img alt=\"T5\" width=\"700\" caption=\"Diagram of T5's text-to-text framework (courtesy of Colin Raffel); besides translation and summarization, the CoLA (linguistic acceptability) and STSB (semantic similarity) tasks are shown\" src=\"images/chapter08_t5.png\" id=\"T5\"/>"
277+
]
278+
},
272279
{
273280
"cell_type": "code",
274281
"execution_count": null,
@@ -307,6 +314,13 @@
307314
"### PEGASUS"
308315
]
309316
},
317+
{
318+
"cell_type": "markdown",
319+
"metadata": {},
320+
"source": [
321+
"<img alt=\"pegasus\" width=\"700\" caption=\"Diagram of PEGASUS architecture (courtesy of Jingqing Zhang et al.)\" src=\"images/chapter08_pegasus.png\" id=\"pegasus\"/>"
322+
]
323+
},
310324
{
311325
"cell_type": "code",
312326
"execution_count": null,

0 commit comments

Comments
 (0)