@@ -12,73 +12,50 @@ welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CO
12
12
13
13
[ Tensor2Tensor] ( https://github.com/tensorflow/tensor2tensor ) , or
14
14
[ T2T] ( https://github.com/tensorflow/tensor2tensor ) for short, is a library
15
- of deep learning models and datasets. It has binaries to train the models and
16
- to download and prepare the data for you. T2T is modular and extensible and can
17
- be used in [ notebooks] ( https://goo.gl/wkHexj ) for prototyping your own models
18
- or running existing ones on your data. It is actively used and maintained by
19
- researchers and engineers within
20
- the [ Google Brain team] ( https://research.google.com/teams/brain/ ) and was used
21
- to develop state-of-the-art models for translation (see
22
- [ Attention Is All You Need] ( https://arxiv.org/abs/1706.03762 ) ), summarization,
23
- image generation and other tasks. You can read
24
- more about T2T in the [ Google Research Blog post introducing
25
- it] ( https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html ) .
26
-
27
- We're eager to collaborate with you on extending T2T, so please feel
28
- free to [ open an issue on
29
- GitHub] ( https://github.com/tensorflow/tensor2tensor/issues ) or
30
- send along a pull request to add your dataset or model.
31
- See [ our contribution
32
- doc] ( CONTRIBUTING.md ) for details and our [ open
33
- issues] ( https://github.com/tensorflow/tensor2tensor/issues ) .
34
- You can chat with us and other users on
35
- [ Gitter] ( https://gitter.im/tensor2tensor/Lobby ) and please join our
36
- [ Google Group] ( https://groups.google.com/forum/#!forum/tensor2tensor ) to keep up
37
- with T2T announcements.
15
+ of deep learning models and datasets designed to [ accelerate deep learning
16
+ research] ( https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html ) and make it more accessible.
17
+
18
+ T2T is actively used and maintained by researchers and engineers within the
19
+ [ Google Brain team] ( https://research.google.com/teams/brain/ ) and a community
20
+ of users. We're eager to collaborate with you too, so feel free to
21
+ [ open an issue on GitHub] ( https://github.com/tensorflow/tensor2tensor/issues )
22
+ or send along a pull request (see [ our contribution doc] ( CONTRIBUTING.md ) ).
23
+ You can chat with us on
24
+ [ Gitter] ( https://gitter.im/tensor2tensor/Lobby ) and join the
25
+ [ T2T Google Group] ( https://groups.google.com/forum/#!forum/tensor2tensor ) .
38
26
39
27
### Quick Start
40
28
41
29
[ This iPython notebook] ( https://goo.gl/wkHexj ) explains T2T and runs in your
42
30
browser using a free VM from Google, no installation needed.
43
-
44
- Alternatively, here is a one-command version that installs T2T, downloads data,
45
- trains an English-German translation model, and evaluates it:
31
+ Alternatively, here is a one-command version that installs T2T, downloads MNIST,
32
+ trains a model and evaluates it:
46
33
47
34
```
48
35
pip install tensor2tensor && t2t-trainer \
49
36
--generate_data \
50
37
--data_dir=~/t2t_data \
51
- --problems=translate_ende_wmt32k \
52
- --model=transformer \
53
- --hparams_set=transformer_base_single_gpu \
54
- --output_dir=~/t2t_train/base
55
- ```
56
-
57
- You can decode from the model interactively:
58
-
59
- ```
60
- t2t-decoder \
61
- --data_dir=~/t2t_data \
62
- --problems=translate_ende_wmt32k \
63
- --model=transformer \
64
- --hparams_set=transformer_base_single_gpu \
65
- --output_dir=~/t2t_train/base \
66
- --decode_interactive
38
+ --output_dir=~/t2t_train/mnist \
39
+ --problems=image_mnist \
40
+ --model=shake_shake \
41
+ --hparams_set=shake_shake_quick \
42
+ --train_steps=1000 \
43
+ --eval_steps=100
67
44
```
68
45
69
- See the [ Walkthrough] ( #walkthrough ) below for more details on each step
70
- and [ Suggested Models] ( #suggested-models ) for well performing models
71
- on common tasks.
72
-
73
46
### Contents
74
47
75
- * [ Walkthrough] ( #walkthrough )
76
- * [ Suggested Models] ( #suggested-models )
77
- * [ Translation] ( #translation )
78
- * [ Summarization] ( #summarization )
48
+ * [ Suggested Datasets and Models] ( #suggested-datasets-and-models )
79
49
* [ Image Classification] ( #image-classification )
80
- * [ Installation] ( #installation )
81
- * [ Features] ( #features )
50
+ * [ Language Modeling] ( #language-modeling )
51
+ * [ Sentiment Analysis] ( #sentiment-analysis )
52
+ * [ Speech Recognition] ( #speech-recognition )
53
+ * [ Summarization] ( #summarization )
54
+ * [ Translation] ( #translation )
55
+ * [ Basics] ( #basics )
56
+ * [ Walkthrough] ( #walkthrough )
57
+ * [ Installation] ( #installation )
58
+ * [ Features] ( #features )
82
59
* [ T2T Overview] ( #t2t-overview )
83
60
* [ Datasets] ( #datasets )
84
61
* [ Problems and Modalities] ( #problems-and-modalities )
@@ -87,10 +64,102 @@ on common tasks.
87
64
* [ Trainer] ( #trainer )
88
65
* [ Adding your own components] ( #adding-your-own-components )
89
66
* [ Adding a dataset] ( #adding-a-dataset )
67
+ * [ Papers] ( #papers )
68
+
69
+ ## Suggested Datasets and Models
90
70
91
- ---
71
+ Below we list a number of tasks that can be solved with T2T when
72
+ you train the appropriate model on the appropriate problem.
73
+ We give the problem and model below and we suggest a setting of
74
+ hyperparameters that we know works well in our setup. We usually
75
+ run either on Cloud TPUs or on 8-GPU machines; you might need
76
+ to modify the hyperparameters if you run on a different setup.
92
77
93
- ## Walkthrough
78
+ ### Image Classification
79
+
80
+ For image classification, we have a number of standard data-sets:
81
+ * ImageNet (a large data-set): ` --problems=image_imagenet ` , or one
82
+ of the re-scaled versions (` image_imagenet224 ` , ` image_imagenet64 ` ,
83
+ ` image_imagenet32 ` )
84
+ * CIFAR-10: ` --problems=image_cifar10 ` (or
85
+ ` --problems=image_cifar10_plain ` to turn off data augmentation)
86
+ * CIFAR-100: ` --problems=image_cifar100 `
87
+ * MNIST: ` --problems=image_mnist `
88
+
89
+ For ImageNet, we suggest to use the ResNet or Xception, i.e.,
90
+ use ` --model=resnet --hparams_set=resnet_50 ` or
91
+ ` --model=xception --hparams_set=xception_base ` .
92
+ Resnet should get to above 76% top-1 accuracy on ImageNet.
93
+
94
+ For CIFAR and MNIST, we suggest to try the shake-shake model:
95
+ ` --model=shake_shake --hparams_set=shakeshake_big ` .
96
+ This setting trained for ` --train_steps=700000 ` should yield
97
+ close to 97% accuracy on CIFAR-10.
98
+
99
+ ### Language Modeling
100
+
101
+ For language modeling, we have these data-sets in T2T:
102
+ * PTB (a small data-set): ` --problems=languagemodel_ptb10k ` for
103
+ word-level modeling and ` --problems=languagemodel_ptb_characters `
104
+ for character-level modeling.
105
+ * LM1B (a billion-word corpus): ` --problems=languagemodel_lm1b32k ` for
106
+ subword-level modeling and ` --problems=languagemodel_lm1b_characters `
107
+ for character-level modeling.
108
+
109
+ We suggest to start with ` --model=transformer ` on this task and use
110
+ ` --hparams_set=transformer_small ` for PTB and
111
+ ` --hparams_set=transformer_base ` for LM1B.
112
+
113
+ ### Sentiment Analysis
114
+
115
+ For the task of recognizing the sentiment of a sentence, use
116
+ * the IMDB data-set: ` --problems=sentiment_imdb `
117
+
118
+ We suggest to use ` --model=transformer_encoder ` here and since it is
119
+ a small data-set, try ` --hparams_set=transformer_tiny ` and train for
120
+ few steps (e.g., ` --train_steps=2000 ` ).
121
+
122
+ ### Speech Recognition
123
+
124
+ For speech-to-text, we have these data-sets in T2T:
125
+ * Librispeech (English speech to text): ` --problems=librispeech ` for
126
+ the whole set and ` --problems=librispeech_clean ` for a smaller
127
+ but nicely filtered part.
128
+
129
+ ### Summarization
130
+
131
+ For summarizing longer text into shorter one we have these data-sets:
132
+ * CNN/DailyMail articles summarized into a few sentences:
133
+ ` --problems=summarize_cnn_dailymail32k `
134
+
135
+ We suggest to use ` --model=transformer ` and
136
+ ` --hparams_set=transformer_prepend ` for this task.
137
+ This yields good ROUGE scores.
138
+
139
+ ### Translation
140
+
141
+ There are a number of translation data-sets in T2T:
142
+ * English-German: ` --problems=translate_ende_wmt32k `
143
+ * English-French: ` --problems=translate_enfr_wmt32k `
144
+ * English-Czech: ` --problems=translate_encs_wmt32k `
145
+ * English-Chinese: ` --problems=translate_enzh_wmt32k `
146
+
147
+ You can get translations in the other direction by appending ` _rev ` to
148
+ the problem name, e.g., for German-English use
149
+ ` --problems=translate_ende_wmt32k_rev ` .
150
+
151
+ For all translation problems, we suggest to try the Transformer model:
152
+ ` --model=transformer ` . At first it is best to try the base setting,
153
+ ` --hparams_set=transformer_base ` . When trained on 8 GPUs for 300K steps
154
+ this should reach a BLEU score of about 28 on the English-German data-set,
155
+ which is close to state-of-the art. If training on a single GPU, try the
156
+ ` --hparams_set=transformer_base_single_gpu ` setting. For very good results
157
+ or larger data-sets (e.g., for English-French)m, try the big model
158
+ with ` --hparams_set=transformer_big ` .
159
+
160
+ ## Basics
161
+
162
+ ### Walkthrough
94
163
95
164
Here's a walkthrough training a good English-to-German translation
96
165
model using the Transformer model from [ * Attention Is All You
@@ -156,36 +225,8 @@ cat translation.en
156
225
t2t-bleu --translation=translation.en --reference=ref-translation.de
157
226
```
158
227
159
- ---
160
-
161
- ## Suggested Models
162
-
163
- Here are some combinations of models, hparams and problems that we found
164
- work well, so we suggest to use them if you're interested in that problem.
165
-
166
- ### Translation
167
-
168
- For translation, esp. English-German and English-French, we suggest to use
169
- the Transformer model in base or big configurations, i.e.
170
- for ` --problems=translate_ende_wmt32k ` use ` --model=transformer ` and
171
- ` --hparams_set=transformer_base ` . When trained on 8 GPUs for 300K steps
172
- this should reach a BLEU score of about 28.
228
+ ### Installation
173
229
174
- ### Summarization
175
-
176
- For summarization suggest to use the Transformer model in prepend mode, i.e.
177
- for ` --problems=summarize_cnn_dailymail32k ` use ` --model=transformer ` and
178
- ` --hparams_set=transformer_prepend ` .
179
-
180
- ### Image Classification
181
-
182
- For image classification suggest to use the ResNet or Xception, i.e.
183
- for ` --problems=image_imagenet ` use ` --model=resnet50 ` and
184
- ` --hparams_set=resnet_base ` or ` --model=xception ` and
185
- ` --hparams_set=xception_base ` .
186
-
187
-
188
- ## Installation
189
230
190
231
```
191
232
# Assumes tensorflow or tensorflow-gpu installed
@@ -214,9 +255,7 @@ Library usage:
214
255
python -c "from tensor2tensor.models.transformer import Transformer"
215
256
```
216
257
217
- ---
218
-
219
- ## Features
258
+ ### Features
220
259
221
260
* Many state of the art and baseline models are built-in and new models can be
222
261
added easily (open an issue or pull request!).
@@ -229,11 +268,10 @@ python -c "from tensor2tensor.models.transformer import Transformer"
229
268
specification.
230
269
* Support for multi-GPU machines and synchronous (1 master, many workers) and
231
270
asynchronous (independent workers synchronizing through a parameter server)
232
- [ distributed training] ( https://github.com/tensorflow/ tensor2tensor/tree/master/docs/ distributed_training.md ) .
271
+ [ distributed training] ( https://tensorflow. github.io/ tensor2tensor/distributed_training.html ) .
233
272
* Easily swap amongst datasets and models by command-line flag with the data
234
273
generation script ` t2t-datagen ` and the training script ` t2t-trainer ` .
235
-
236
- ---
274
+ * Train on [ Google Cloud ML] ( https://tensorflow.github.io/tensor2tensor/cloud_mlengine.html ) and [ Cloud TPUs] ( https://tensorflow.github.io/tensor2tensor/cloud_tpu.html ) .
237
275
238
276
## T2T overview
239
277
@@ -289,9 +327,7 @@ inference. Users can easily switch between problems, models, and hyperparameter
289
327
sets by using the ` --model ` , ` --problems ` , and ` --hparams_set ` flags. Specific
290
328
hyperparameters can be overridden with the ` --hparams ` flag. ` --schedule ` and
291
329
related flags control local and distributed training/evaluation
292
- ([ distributed training documentation] ( https://github.com/tensorflow/tensor2tensor/tree/master/docs/distributed_training.md ) ).
293
-
294
- ---
330
+ ([ distributed training documentation] ( https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/g3doc/distributed_training.md ) ).
295
331
296
332
## Adding your own components
297
333
@@ -317,6 +353,21 @@ for an example.
317
353
Also see the [ data generators
318
354
README] ( https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/README.md ) .
319
355
320
- ---
356
+ ## Papers
357
+
358
+ Tensor2Tensor was used to develop a number of state-of-the-art models
359
+ and deep learning methods. Here we list some papers that were based on T2T
360
+ from the start and benefited from its features and architecture in ways
361
+ described in the [ Google Research Blog post introducing
362
+ T2T] ( https://research.googleblog.com/2017/06/accelerating-deep-learning-research.html ) .
363
+
364
+ * [ Attention Is All You Need] ( https://arxiv.org/abs/1706.03762 )
365
+ * [ Depthwise Separable Convolutions for Neural Machine
366
+ Translation] ( https://arxiv.org/abs/1706.03059 )
367
+ * [ One Model To Learn Them All] ( https://arxiv.org/abs/1706.05137 )
368
+ * [ Discrete Autoencoders for Sequence Models] ( https://arxiv.org/abs/1801.09797 )
369
+ * [ Generating Wikipedia by Summarizing Long
370
+ Sequences] ( https://arxiv.org/abs/1801.10198 )
371
+ * [ Image Transformer] ( https://openreview.net/forum?id=r16Vyf-0- )
321
372
322
373
* Note: This is not an official Google product.*
0 commit comments