Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Commit 0d464ff

Browse files
authored
Merge pull request #576 from rsepassi/push
v1.5.0
2 parents eaefc32 + 88a3c9b commit 0d464ff

File tree

203 files changed

+847
-568
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

203 files changed

+847
-568
lines changed

README.md

+144-93
Original file line numberDiff line numberDiff line change
@@ -12,73 +12,50 @@ welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CO
1212

1313
[Tensor2Tensor](https://github.com/tensorflow/tensor2tensor), or
1414
[T2T](https://github.com/tensorflow/tensor2tensor) for short, is a library
15-
of deep learning models and datasets. It has binaries to train the models and
16-
to download and prepare the data for you. T2T is modular and extensible and can
17-
be used in [notebooks](https://goo.gl/wkHexj) for prototyping your own models
18-
or running existing ones on your data. It is actively used and maintained by
19-
researchers and engineers within
20-
the [Google Brain team](https://research.google.com/teams/brain/) and was used
21-
to develop state-of-the-art models for translation (see
22-
[Attention Is All You Need](https://arxiv.org/abs/1706.03762)), summarization,
23-
image generation and other tasks. You can read
24-
more about T2T in the [Google Research Blog post introducing
25-
it](https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html).
26-
27-
We're eager to collaborate with you on extending T2T, so please feel
28-
free to [open an issue on
29-
GitHub](https://github.com/tensorflow/tensor2tensor/issues) or
30-
send along a pull request to add your dataset or model.
31-
See [our contribution
32-
doc](CONTRIBUTING.md) for details and our [open
33-
issues](https://github.com/tensorflow/tensor2tensor/issues).
34-
You can chat with us and other users on
35-
[Gitter](https://gitter.im/tensor2tensor/Lobby) and please join our
36-
[Google Group](https://groups.google.com/forum/#!forum/tensor2tensor) to keep up
37-
with T2T announcements.
15+
of deep learning models and datasets designed to [accelerate deep learning
16+
research](https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html) and make it more accessible.
17+
18+
T2T is actively used and maintained by researchers and engineers within the
19+
[Google Brain team](https://research.google.com/teams/brain/) and a community
20+
of users. We're eager to collaborate with you too, so feel free to
21+
[open an issue on GitHub](https://github.com/tensorflow/tensor2tensor/issues)
22+
or send along a pull request (see [our contribution doc](CONTRIBUTING.md)).
23+
You can chat with us on
24+
[Gitter](https://gitter.im/tensor2tensor/Lobby) and join the
25+
[T2T Google Group](https://groups.google.com/forum/#!forum/tensor2tensor).
3826

3927
### Quick Start
4028

4129
[This iPython notebook](https://goo.gl/wkHexj) explains T2T and runs in your
4230
browser using a free VM from Google, no installation needed.
43-
44-
Alternatively, here is a one-command version that installs T2T, downloads data,
45-
trains an English-German translation model, and evaluates it:
31+
Alternatively, here is a one-command version that installs T2T, downloads MNIST,
32+
trains a model and evaluates it:
4633

4734
```
4835
pip install tensor2tensor && t2t-trainer \
4936
--generate_data \
5037
--data_dir=~/t2t_data \
51-
--problems=translate_ende_wmt32k \
52-
--model=transformer \
53-
--hparams_set=transformer_base_single_gpu \
54-
--output_dir=~/t2t_train/base
55-
```
56-
57-
You can decode from the model interactively:
58-
59-
```
60-
t2t-decoder \
61-
--data_dir=~/t2t_data \
62-
--problems=translate_ende_wmt32k \
63-
--model=transformer \
64-
--hparams_set=transformer_base_single_gpu \
65-
--output_dir=~/t2t_train/base \
66-
--decode_interactive
38+
--output_dir=~/t2t_train/mnist \
39+
--problems=image_mnist \
40+
--model=shake_shake \
41+
--hparams_set=shake_shake_quick \
42+
--train_steps=1000 \
43+
--eval_steps=100
6744
```
6845

69-
See the [Walkthrough](#walkthrough) below for more details on each step
70-
and [Suggested Models](#suggested-models) for well performing models
71-
on common tasks.
72-
7346
### Contents
7447

75-
* [Walkthrough](#walkthrough)
76-
* [Suggested Models](#suggested-models)
77-
* [Translation](#translation)
78-
* [Summarization](#summarization)
48+
* [Suggested Datasets and Models](#suggested-datasets-and-models)
7949
* [Image Classification](#image-classification)
80-
* [Installation](#installation)
81-
* [Features](#features)
50+
* [Language Modeling](#language-modeling)
51+
* [Sentiment Analysis](#sentiment-analysis)
52+
* [Speech Recognition](#speech-recognition)
53+
* [Summarization](#summarization)
54+
* [Translation](#translation)
55+
* [Basics](#basics)
56+
* [Walkthrough](#walkthrough)
57+
* [Installation](#installation)
58+
* [Features](#features)
8259
* [T2T Overview](#t2t-overview)
8360
* [Datasets](#datasets)
8461
* [Problems and Modalities](#problems-and-modalities)
@@ -87,10 +64,102 @@ on common tasks.
8764
* [Trainer](#trainer)
8865
* [Adding your own components](#adding-your-own-components)
8966
* [Adding a dataset](#adding-a-dataset)
67+
* [Papers](#papers)
68+
69+
## Suggested Datasets and Models
9070

91-
---
71+
Below we list a number of tasks that can be solved with T2T when
72+
you train the appropriate model on the appropriate problem.
73+
We give the problem and model below and we suggest a setting of
74+
hyperparameters that we know works well in our setup. We usually
75+
run either on Cloud TPUs or on 8-GPU machines; you might need
76+
to modify the hyperparameters if you run on a different setup.
9277

93-
## Walkthrough
78+
### Image Classification
79+
80+
For image classification, we have a number of standard data-sets:
81+
* ImageNet (a large data-set): `--problems=image_imagenet`, or one
82+
of the re-scaled versions (`image_imagenet224`, `image_imagenet64`,
83+
`image_imagenet32`)
84+
* CIFAR-10: `--problems=image_cifar10` (or
85+
`--problems=image_cifar10_plain` to turn off data augmentation)
86+
* CIFAR-100: `--problems=image_cifar100`
87+
* MNIST: `--problems=image_mnist`
88+
89+
For ImageNet, we suggest to use the ResNet or Xception, i.e.,
90+
use `--model=resnet --hparams_set=resnet_50` or
91+
`--model=xception --hparams_set=xception_base`.
92+
Resnet should get to above 76% top-1 accuracy on ImageNet.
93+
94+
For CIFAR and MNIST, we suggest to try the shake-shake model:
95+
`--model=shake_shake --hparams_set=shakeshake_big`.
96+
This setting trained for `--train_steps=700000` should yield
97+
close to 97% accuracy on CIFAR-10.
98+
99+
### Language Modeling
100+
101+
For language modeling, we have these data-sets in T2T:
102+
* PTB (a small data-set): `--problems=languagemodel_ptb10k` for
103+
word-level modeling and `--problems=languagemodel_ptb_characters`
104+
for character-level modeling.
105+
* LM1B (a billion-word corpus): `--problems=languagemodel_lm1b32k` for
106+
subword-level modeling and `--problems=languagemodel_lm1b_characters`
107+
for character-level modeling.
108+
109+
We suggest to start with `--model=transformer` on this task and use
110+
`--hparams_set=transformer_small` for PTB and
111+
`--hparams_set=transformer_base` for LM1B.
112+
113+
### Sentiment Analysis
114+
115+
For the task of recognizing the sentiment of a sentence, use
116+
* the IMDB data-set: `--problems=sentiment_imdb`
117+
118+
We suggest to use `--model=transformer_encoder` here and since it is
119+
a small data-set, try `--hparams_set=transformer_tiny` and train for
120+
few steps (e.g., `--train_steps=2000`).
121+
122+
### Speech Recognition
123+
124+
For speech-to-text, we have these data-sets in T2T:
125+
* Librispeech (English speech to text): `--problems=librispeech` for
126+
the whole set and `--problems=librispeech_clean` for a smaller
127+
but nicely filtered part.
128+
129+
### Summarization
130+
131+
For summarizing longer text into shorter one we have these data-sets:
132+
* CNN/DailyMail articles summarized into a few sentences:
133+
`--problems=summarize_cnn_dailymail32k`
134+
135+
We suggest to use `--model=transformer` and
136+
`--hparams_set=transformer_prepend` for this task.
137+
This yields good ROUGE scores.
138+
139+
### Translation
140+
141+
There are a number of translation data-sets in T2T:
142+
* English-German: `--problems=translate_ende_wmt32k`
143+
* English-French: `--problems=translate_enfr_wmt32k`
144+
* English-Czech: `--problems=translate_encs_wmt32k`
145+
* English-Chinese: `--problems=translate_enzh_wmt32k`
146+
147+
You can get translations in the other direction by appending `_rev` to
148+
the problem name, e.g., for German-English use
149+
`--problems=translate_ende_wmt32k_rev`.
150+
151+
For all translation problems, we suggest to try the Transformer model:
152+
`--model=transformer`. At first it is best to try the base setting,
153+
`--hparams_set=transformer_base`. When trained on 8 GPUs for 300K steps
154+
this should reach a BLEU score of about 28 on the English-German data-set,
155+
which is close to state-of-the art. If training on a single GPU, try the
156+
`--hparams_set=transformer_base_single_gpu` setting. For very good results
157+
or larger data-sets (e.g., for English-French)m, try the big model
158+
with `--hparams_set=transformer_big`.
159+
160+
## Basics
161+
162+
### Walkthrough
94163

95164
Here's a walkthrough training a good English-to-German translation
96165
model using the Transformer model from [*Attention Is All You
@@ -156,36 +225,8 @@ cat translation.en
156225
t2t-bleu --translation=translation.en --reference=ref-translation.de
157226
```
158227

159-
---
160-
161-
## Suggested Models
162-
163-
Here are some combinations of models, hparams and problems that we found
164-
work well, so we suggest to use them if you're interested in that problem.
165-
166-
### Translation
167-
168-
For translation, esp. English-German and English-French, we suggest to use
169-
the Transformer model in base or big configurations, i.e.
170-
for `--problems=translate_ende_wmt32k` use `--model=transformer` and
171-
`--hparams_set=transformer_base`. When trained on 8 GPUs for 300K steps
172-
this should reach a BLEU score of about 28.
228+
### Installation
173229

174-
### Summarization
175-
176-
For summarization suggest to use the Transformer model in prepend mode, i.e.
177-
for `--problems=summarize_cnn_dailymail32k` use `--model=transformer` and
178-
`--hparams_set=transformer_prepend`.
179-
180-
### Image Classification
181-
182-
For image classification suggest to use the ResNet or Xception, i.e.
183-
for `--problems=image_imagenet` use `--model=resnet50` and
184-
`--hparams_set=resnet_base` or `--model=xception` and
185-
`--hparams_set=xception_base`.
186-
187-
188-
## Installation
189230

190231
```
191232
# Assumes tensorflow or tensorflow-gpu installed
@@ -214,9 +255,7 @@ Library usage:
214255
python -c "from tensor2tensor.models.transformer import Transformer"
215256
```
216257

217-
---
218-
219-
## Features
258+
### Features
220259

221260
* Many state of the art and baseline models are built-in and new models can be
222261
added easily (open an issue or pull request!).
@@ -229,11 +268,10 @@ python -c "from tensor2tensor.models.transformer import Transformer"
229268
specification.
230269
* Support for multi-GPU machines and synchronous (1 master, many workers) and
231270
asynchronous (independent workers synchronizing through a parameter server)
232-
[distributed training](https://github.com/tensorflow/tensor2tensor/tree/master/docs/distributed_training.md).
271+
[distributed training](https://tensorflow.github.io/tensor2tensor/distributed_training.html).
233272
* Easily swap amongst datasets and models by command-line flag with the data
234273
generation script `t2t-datagen` and the training script `t2t-trainer`.
235-
236-
---
274+
* Train on [Google Cloud ML](https://tensorflow.github.io/tensor2tensor/cloud_mlengine.html) and [Cloud TPUs](https://tensorflow.github.io/tensor2tensor/cloud_tpu.html).
237275

238276
## T2T overview
239277

@@ -289,9 +327,7 @@ inference. Users can easily switch between problems, models, and hyperparameter
289327
sets by using the `--model`, `--problems`, and `--hparams_set` flags. Specific
290328
hyperparameters can be overridden with the `--hparams` flag. `--schedule` and
291329
related flags control local and distributed training/evaluation
292-
([distributed training documentation](https://github.com/tensorflow/tensor2tensor/tree/master/docs/distributed_training.md)).
293-
294-
---
330+
([distributed training documentation](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/g3doc/distributed_training.md)).
295331

296332
## Adding your own components
297333

@@ -317,6 +353,21 @@ for an example.
317353
Also see the [data generators
318354
README](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/README.md).
319355

320-
---
356+
## Papers
357+
358+
Tensor2Tensor was used to develop a number of state-of-the-art models
359+
and deep learning methods. Here we list some papers that were based on T2T
360+
from the start and benefited from its features and architecture in ways
361+
described in the [Google Research Blog post introducing
362+
T2T](https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html).
363+
364+
* [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
365+
* [Depthwise Separable Convolutions for Neural Machine
366+
Translation](https://arxiv.org/abs/1706.03059)
367+
* [One Model To Learn Them All](https://arxiv.org/abs/1706.05137)
368+
* [Discrete Autoencoders for Sequence Models](https://arxiv.org/abs/1801.09797)
369+
* [Generating Wikipedia by Summarizing Long
370+
Sequences](https://arxiv.org/abs/1801.10198)
371+
* [Image Transformer](https://openreview.net/forum?id=r16Vyf-0-)
321372

322373
*Note: This is not an official Google product.*

docs/cloud_tpu.md

+44-10
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,30 @@
11
# Running on Cloud TPUs
22

3-
Tensor2Tensor supports running on Google Cloud Platforms TPUs, chips specialized
4-
for ML training.
3+
Tensor2Tensor supports running on Google Cloud Platforms TPUs, chips
4+
specialized for ML training. See the official tutorial for [running Transfomer
5+
on Cloud TPUs](https://cloud.google.com/tpu/docs/tutorials/transformer) or
6+
read on for more T2T models on TPUs.
57

6-
Models and hparams that are known to work on TPU:
7-
* `transformer` with `transformer_tpu`
8-
* `transformer_encoder` with `transformer_tpu`
9-
* `transformer_decoder` with `transformer_tpu`
10-
* `resnet50` with `resnet_base`
11-
* `revnet104` with `revnet_base`
8+
## Models and hparams for TPU:
129

13-
TPU access is currently limited, but access will expand soon, so get excited for
14-
your future ML supercomputers in the cloud.
10+
Transformer:
11+
* `transformer` with `transformer_tpu` (or `transformer_packed_tpu`,
12+
`transformer_tiny_tpu`, `transformer_big_tpu`)
13+
* `transformer_encoder` with `transformer_tpu` (and the above ones)
14+
15+
You can run the Transformer model on a number of problems,
16+
from translation through language modeling to sentiment analysis.
17+
See the official tutorial for [running Transfomer
18+
on Cloud TPUs](https://cloud.google.com/tpu/docs/tutorials/transformer)
19+
for some examples and try out your own problems.
20+
21+
Residual networks:
22+
* `resnet` with `resnet_50` (or `resnet_18` or `resnet_34`)
23+
* `revnet` with `revnet_104` (or `revnet_38_cifar`)
24+
* `shake_shake` with `shakeshake_tpu` (or `shakeshake_small`)
25+
26+
We run residual networks on MNIST, CIFAR and ImageNet, but they should
27+
work on any image classification data-set.
1528

1629
## Tutorial: Transformer En-De translation on TPU
1730

@@ -75,3 +88,24 @@ switch between hardware at will.
7588
* `--cloud_tpu_name`: The name of the TPU instance to use or create. If you want
7689
to launch multiple jobs on TPU, provide different names here for each one.
7790
Each TPU instance can only be training one model at a time.
91+
92+
## Other T2T models on TPU
93+
94+
To run other models on TPU, proceed exactly as in the tutorial above,
95+
just with different model, problem and hparams_set (and directories).
96+
For example, to train a shake-shake model on CIFAR you can run this command.
97+
```
98+
t2t-trainer \
99+
--model=shake_shake \
100+
--hparams_set=shakeshake_tpu \
101+
--problems=image_cifar10 \
102+
--train_steps=180000 \
103+
--eval_steps=9 \
104+
--local_eval_frequency=100 \
105+
--data_dir=$DATA_DIR \
106+
--output_dir=$OUT_DIR \
107+
--cloud_tpu \
108+
--cloud_delete_on_done
109+
```
110+
Note that `eval_steps` should not be too high so as not to run out
111+
of evaluation data.

0 commit comments

Comments
 (0)