Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add figures #35

Merged
merged 10 commits into from
Mar 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions 01_introduction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -31,27 +31,83 @@
"# Hello Transformers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"transformer-timeline\" caption=\"The transformers timeline\" src=\"images/chapter01_timeline.png\" id=\"transformer-timeline\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Encoder-Decoder Framework"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"rnn\" caption=\"Unrolling an RNN in time.\" src=\"images/chapter01_rnn.png\" id=\"rnn\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"enc-dec\" caption=\"Encoder-decoder architecture with a pair of RNNs. In general, there are many more recurrent layers than those shown.\" src=\"images/chapter01_enc-dec.png\" id=\"enc-dec\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Attention Mechanisms"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"enc-dec-attn\" caption=\"Encoder-decoder architecture with an attention mechanism for a pair of RNNs.\" src=\"images/chapter01_enc-dec-attn.png\" id=\"enc-dec-attn\"/> "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"attention-alignment\" width=\"500\" caption=\"RNN encoder-decoder alignment of words in English and the generated translation in French (courtesy of Dzmitry Bahdanau).\" src=\"images/chapter02_attention-alignment.png\" id=\"attention-alignment\"/> "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"transformer-self-attn\" caption=\"Encoder-decoder architecture of the original Transformer.\" src=\"images/chapter01_self-attention.png\" id=\"transformer-self-attn\"/> "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Transfer Learning in NLP"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"transfer-learning\" caption=\"Comparison of traditional supervised learning (left) and transfer learning (right).\" src=\"images/chapter01_transfer-learning.png\" id=\"transfer-learning\"/> "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"ulmfit\" width=\"500\" caption=\"The ULMFiT process (courtesy of Jeremy Howard).\" src=\"images/chapter01_ulmfit.png\" id=\"ulmfit\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -497,13 +553,34 @@
"## The Hugging Face Ecosystem"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"ecosystem\" width=\"500\" caption=\"An overview of the Hugging Face ecosystem of libraries and the Hub.\" src=\"images/chapter01_hf-ecosystem.png\" id=\"ecosystem\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The Hugging Face Hub"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"hub-overview\" width=\"1000\" caption=\"The models page of the Hugging Face Hub, showing filters on the left and a list of models on the right.\" src=\"images/chapter01_hub-overview.png\" id=\"hub-overview\"/> "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"hub-model-card\" width=\"1000\" caption=\"A example model card from the Hugging Face Hub. The inference widget is shown on the right, where you can interact with the model.\" src=\"images/chapter01_hub-model-card.png\" id=\"hub-model-card\"/> "
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
56 changes: 56 additions & 0 deletions 03_transformer-anatomy.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -48,20 +48,41 @@
"## The Transformer Architecture"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"transformer-encoder-decoder\" caption=\"Encoder-decoder architecture of the transformer, with the encoder shown in the upper half of the figure and the decoder in the lower half\" src=\"images/chapter03_transformer-encoder-decoder.png\" id=\"transformer-encoder-decoder\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Encoder"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"encoder-zoom\" caption=\"Zooming into the encoder layer\" src=\"images/chapter03_encoder-zoom.png\" id=\"encoder-zoom\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Self-Attention"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"Contextualized embeddings\" caption=\"Diagram showing how self-attention updates raw token embeddings (upper) into contextualized embeddings (lower) to create representations that incorporate information from the whole sequence\" src=\"images/chapter03_contextualized-embedding.png\" id=\"contextualized-embeddings\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -181,6 +202,13 @@
"### End sidebar"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"Operations in scaled dot-product attention\" height=\"125\" caption=\"Operations in scaled dot-product attention\" src=\"images/chapter03_attention-ops.png\" id=\"attention-ops\"/>"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -351,6 +379,13 @@
"#### Multi-headed attention"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"Multi-head attention\" height=\"125\" caption=\"Multi-head attention\" src=\"images/chapter03_multihead-attention.png\" id=\"multihead-attention\"/>"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -557,6 +592,13 @@
"### Adding Layer Normalization"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"Transformer layer normalization\" height=\"500\" caption=\"Different arrangements of layer normalization in a transformer encoder layer\" src=\"images/chapter03_layer-norm.png\" id=\"layer-norm\"/>"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -757,6 +799,13 @@
"## The Decoder"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"Transformer decoder zoom\" caption=\"Zooming into the transformer decoder layer\" src=\"images/chapter03_decoder-zoom.png\" id=\"decoder-zoom\"/> "
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -851,6 +900,13 @@
"### The Transformer Tree of Life"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"Transformer family tree\" caption=\"An overview of some of the most prominent transformer architectures\" src=\"images/chapter03_transformers-compact.png\" id=\"family-tree\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
35 changes: 35 additions & 0 deletions 04_multilingual-ner.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -960,6 +960,13 @@
"### The Tokenizer Pipeline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"Tokenizer pipeline\" caption=\"The steps in the tokenization pipeline\" src=\"images/chapter04_tokenizer-pipeline.png\" id=\"toknizer-pipeline\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -1001,6 +1008,20 @@
"## Transformers for Named Entity Recognition"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"Architecture of a transformer encoder for classification.\" caption=\"Fine-tuning an encoder-based transformer for sequence classification\" src=\"images/chapter04_clf-architecture.png\" id=\"clf-arch\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"Architecture of a transformer encoder for named entity recognition. The wide linear layer shows that the same linear layer is applied to all hidden states.\" caption=\"Fine-tuning an encoder-based transformer for named entity recognition\" src=\"images/chapter04_ner-architecture.png\" id=\"ner-arch\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -1015,6 +1036,13 @@
"### Bodies and Heads"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"bert-body-head\" caption=\"The `BertModel` class only contains the body of the model, while the `BertFor&lt;Task&gt;` classes combine the body with a dedicated head for a given task\" src=\"images/chapter04_bert-body-head.png\" id=\"bert-body-head\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -4096,6 +4124,13 @@
"## Interacting with Model Widgets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"A Hub widget\" caption=\"Example of a widget on the Hugging Face Hub\" src=\"images/chapter04_ner-widget.png\" id=\"ner-widget\"/> "
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
28 changes: 28 additions & 0 deletions 05_text-generation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -41,13 +41,34 @@
"# Text Generation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"LM Meta Learning\" width=\"800\" caption=\"During pretraining, language models are exposed to sequences of tasks that can be adapted during inference (courtesy of Tom B. Brown)\" src=\"images/chapter05_lm-meta-learning.png\" id=\"lm-meta-learning\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"Meena\" width=\"300\" caption=\"Meena on the left telling a corny joke to a human on the right (courtesy of Daniel Adiwardana and Thang Luong)\" src=\"images/chapter05_meena.png\" id=\"meena\"/>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Challenge with Generating Coherent Text"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"Text generation\" width=\"700\" caption=\"Generating text from an input sequence by adding a new word to the input at each step\" src=\"images/chapter05_text-generation.png\" id=\"text-generation\"/> "
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -320,6 +341,13 @@
"## Beam Search Decoding"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"Beam search\" width=\"700\" caption=\"Beam search with two beams—the most probable sequences at each timestep are highlighted in blue\" src=\"images/chapter05_beam-search.png\" id=\"beam-search\"/> "
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down
14 changes: 14 additions & 0 deletions 06_summarization.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,13 @@
"### T5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"T5\" width=\"700\" caption=\"Diagram of T5's text-to-text framework (courtesy of Colin Raffel); besides translation and summarization, the CoLA (linguistic acceptability) and STSB (semantic similarity) tasks are shown\" src=\"images/chapter08_t5.png\" id=\"T5\"/>"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -307,6 +314,13 @@
"### PEGASUS"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<img alt=\"pegasus\" width=\"700\" caption=\"Diagram of PEGASUS architecture (courtesy of Jingqing Zhang et al.)\" src=\"images/chapter08_pegasus.png\" id=\"pegasus\"/>"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down
Loading