Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Commit a1d7ed7

Browse files
authored
Merge pull request #506 from rsepassi/push
v1.4.2
2 parents d9cba5c + c7f24da commit a1d7ed7

File tree

92 files changed

+11664
-1966
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

92 files changed

+11664
-1966
lines changed

.travis.yml

+3-2
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,11 @@ env:
1414
- T2T_DATA_DIR=/tmp/t2t-data
1515
- T2T_TRAIN_DIR=/tmp/t2t-train
1616
script:
17-
- pytest --ignore=tensor2tensor/utils/registry_test.py --ignore=tensor2tensor/problems_test.py --ignore=tensor2tensor/tpu/tpu_trainer_lib_test.py --ignore=tensor2tensor/data_generators/algorithmic_math_test.py
17+
- pytest --ignore=tensor2tensor/utils/registry_test.py --ignore=tensor2tensor/problems_test.py --ignore=tensor2tensor/utils/trainer_lib_test.py --ignore=tensor2tensor/data_generators/algorithmic_math_test.py
1818
- pytest tensor2tensor/utils/registry_test.py
19-
- pytest tensor2tensor/tpu/tpu_trainer_lib_test.py
19+
- pytest tensor2tensor/utils/trainer_lib_test.py
2020
- t2t-datagen 2>&1 | grep translate && echo passed
21+
- t2t-trainer --registry_help --t2t_usr_dir=./tensor2tensor/test_data/example_usr_dir 2>&1 | grep my_very_own_hparams && echo passed
2122
- python -c "from tensor2tensor.models import transformer; print(transformer.Transformer.__name__)"
2223
- t2t-trainer --registry_help
2324
- mkdir $T2T_DATA_DIR

README.md

+2-30
Original file line numberDiff line numberDiff line change
@@ -296,36 +296,8 @@ specifying the `--t2t_usr_dir` flag in `t2t-trainer`.
296296
You can do so for models, hyperparameter sets, modalities, and problems. Please
297297
do submit a pull request if your component might be useful to others.
298298

299-
Here's an example with a new hyperparameter set:
300-
301-
```python
302-
# In ~/usr/t2t_usr/my_registrations.py
303-
304-
from tensor2tensor.models import transformer
305-
from tensor2tensor.utils import registry
306-
307-
@registry.register_hparams
308-
def transformer_my_very_own_hparams_set():
309-
hparams = transformer.transformer_base()
310-
hparams.hidden_size = 1024
311-
...
312-
```
313-
314-
```python
315-
# In ~/usr/t2t_usr/__init__.py
316-
from . import my_registrations
317-
```
318-
319-
```
320-
t2t-trainer --t2t_usr_dir=~/usr/t2t_usr --registry_help
321-
```
322-
323-
You'll see under the registered HParams your
324-
`transformer_my_very_own_hparams_set`, which you can directly use on the command
325-
line with the `--hparams_set` flag.
326-
327-
`t2t-datagen` also supports the `--t2t_usr_dir` flag for `Problem`
328-
registrations.
299+
See the [`example_usr_dir`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/test_data/example_usr_dir)
300+
for an example user directory.
329301

330302
## Adding a dataset
331303

docs/cloud_tpu.md

+8-7
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,10 @@ for ML training.
55

66
Models and hparams that are known to work on TPU:
77
* `transformer` with `transformer_tpu`
8-
* `xception` with `xception_base`
8+
* `transformer_encoder` with `transformer_tpu`
9+
* `transformer_decoder` with `transformer_tpu`
910
* `resnet50` with `resnet_base`
11+
* `revnet104` with `revnet_base`
1012

1113
To run on TPUs, you need to be part of the alpha program; if you're not, these
1214
commands won't work for you currently, but access will expand soon, so get
@@ -34,16 +36,15 @@ gcloud compute instances create $USER-vm \
3436
Launch the TPU instance; the Python program will connect to this to train on the
3537
TPU device.
3638
```
39+
gcloud alpha compute tpus list
40+
# Make an IP with structure 10.240.X.2 that’s unique in the list
3741
TPU_IP=10.240.0.2
3842
gcloud alpha compute tpus create \
3943
$USER-tpu \
4044
--range=${TPU_IP/%2/0}/29 \
4145
--version=nightly
4246
```
4347

44-
To see all TPU instances running: `gcloud alpha compute tpus list`. The
45-
`TPU_IP` should be unique amongst the list and follow the format `10.240.i.2`.
46-
4748
SSH in with port forwarding for TensorBoard
4849
```
4950
gcloud compute ssh $USER-vm -- -L 6006:localhost:6006
@@ -52,7 +53,7 @@ gcloud compute ssh $USER-vm -- -L 6006:localhost:6006
5253
Now that you're on the cloud instance, install T2T:
5354
```
5455
pip install tensor2tensor --user
55-
# If your python bin dir isn't already in your path
56+
# Add the python bin dir to your path
5657
export PATH=$HOME/.local/bin:$PATH
5758
```
5859

@@ -67,9 +68,9 @@ t2t-datagen --problem=translate_ende_wmt8k --data_dir=$DATA_DIR
6768
Setup some vars used below. `TPU_IP` and `DATA_DIR` should be the same as what
6869
was used above. Note that the `DATA_DIR` and `OUT_DIR` must be GCS buckets.
6970
```
70-
TPU_IP=<IP of TPU machine>
71+
TPU_IP=10.240.0.2
7172
DATA_DIR=$GCS_BUCKET/t2t/data/
72-
OUT_DIR=$GCS_BUCKET/t2t/training/
73+
OUT_DIR=$GCS_BUCKET/t2t/training/transformer_ende_1
7374
TPU_MASTER=grpc://$TPU_IP:8470
7475
```
7576

docs/new_problem.md

+13-6
Original file line numberDiff line numberDiff line change
@@ -264,16 +264,22 @@ t2t-datagen \
264264
```
265265

266266
Where:
267-
* `PROBLEM` is the name of the class that was registered with `@registry.register_problem()`, but converted from `CamelCase` to `snake_case`.
268-
* `PATH_TO_YOUR_PROBLEM_DIR` is a path to the directory of your python problem file.
267+
* `PROBLEM` is the name of the class that was registered with
268+
`@registry.register_problem()`, but converted from `CamelCase` to
269+
`snake_case`.
270+
* `PATH_TO_YOUR_PROBLEM_DIR` is a path to the directory of your python problem
271+
file.
269272

270-
If you plan to contribute to the tensor2tensor repository, you can install the local cloned version in developer mode with `pip install -e .` from the tensor2tensor directory. You can also add your new problem file to [`all_problems.py`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/all_problems.py).
273+
If you plan to contribute to the tensor2tensor repository, you can install the
274+
local cloned version in developer mode with `pip install -e .` from the
275+
tensor2tensor directory. You can also add your new problem file to
276+
[`all_problems.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/data_generators/all_problems.py).
271277

272278
# Run the problem
273-
Now that we've gotten our problem set up, let's train a model and generate definitions.
279+
Now that we've gotten our problem set up, let's train a model and generate
280+
definitions.
274281

275282
To train, specify the problem name, the model, and hparams:
276-
277283
```bash
278284
PROBLEM=word2def
279285
MODEL=transformer
@@ -282,6 +288,7 @@ HPARAMS=word2def_hparams
282288

283289
The rest of the steps are as given in the [walkthrough](walkthrough.md).
284290

285-
What if we wanted to train a model to generate words given definitions? In T2T, we can change the problem name to be `PROBLEM=word2def_rev`.
291+
What if we wanted to train a model to generate words given definitions? In T2T,
292+
we can change the problem name to be `PROBLEM=word2def_rev`.
286293

287294
All done. Let us know what definitions your model generated.

docs/overview.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ to training, evaluation, and decoding.
1414

1515
Some key files and their functions:
1616

17-
* [`tpu_trainer.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/tpu/tpu_trainer.py) and [`tpu_trainer_lib.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/tpu/tpu_trainer_lib.py):
17+
* [`t2t_trainer.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/bin/t2t_trainer.py) and [`trainer_lib.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/trainer_lib.py):
1818
Main entrypoint for training and evaluation. Constructs and runs all the
1919
main components of the system (the `Problem`, the `HParams`, the
2020
`Estimator`, the `Experiment`, the `input_fn`s and `model_fn`).
@@ -134,7 +134,7 @@ The default implementations of `bottom`, `top`, and `loss` depend on the
134134

135135
The actual training loop and related services (checkpointing, summaries,
136136
continuous evaluation, etc.) are all handled by `Estimator` and `Experiment`
137-
objects. `tpu_trainer.py` is the main entrypoint and uses `tpu_trainer_lib.py`
137+
objects. `t2t_trainer.py` is the main entrypoint and uses `trainer_lib.py`
138138
to construct the various components.
139139

140140
## Decoding
@@ -144,7 +144,7 @@ to construct the various components.
144144

145145
## System Overview for Train/Eval
146146

147-
See `tpu_trainer.py`.
147+
See `t2t_trainer.py` and `trainer_lib.py`.
148148

149149
* Create HParams
150150
* Create `RunConfig`, including `Parallelism` object (i.e. `data_parallelism`)

docs/walkthrough.md

+2-30
Original file line numberDiff line numberDiff line change
@@ -296,36 +296,8 @@ specifying the `--t2t_usr_dir` flag in `t2t-trainer`.
296296
You can do so for models, hyperparameter sets, modalities, and problems. Please
297297
do submit a pull request if your component might be useful to others.
298298

299-
Here's an example with a new hyperparameter set:
300-
301-
```python
302-
# In ~/usr/t2t_usr/my_registrations.py
303-
304-
from tensor2tensor.models import transformer
305-
from tensor2tensor.utils import registry
306-
307-
@registry.register_hparams
308-
def transformer_my_very_own_hparams_set():
309-
hparams = transformer.transformer_base()
310-
hparams.hidden_size = 1024
311-
...
312-
```
313-
314-
```python
315-
# In ~/usr/t2t_usr/__init__.py
316-
from . import my_registrations
317-
```
318-
319-
```
320-
t2t-trainer --t2t_usr_dir=~/usr/t2t_usr --registry_help
321-
```
322-
323-
You'll see under the registered HParams your
324-
`transformer_my_very_own_hparams_set`, which you can directly use on the command
325-
line with the `--hparams_set` flag.
326-
327-
`t2t-datagen` also supports the `--t2t_usr_dir` flag for `Problem`
328-
registrations.
299+
See the [`example_usr_dir`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/test_data/example_usr_dir)
300+
for an example user directory.
329301

330302
## Adding a dataset
331303

setup.py

+12-3
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
setup(
77
name='tensor2tensor',
8-
version='1.4.1',
8+
version='1.4.2',
99
description='Tensor2Tensor',
1010
author='Google Inc.',
1111
author_email='[email protected]',
@@ -23,10 +23,19 @@
2323
'tensor2tensor/bin/t2t-datagen',
2424
'tensor2tensor/bin/t2t-decoder',
2525
'tensor2tensor/bin/t2t-make-tf-configs',
26+
'tensor2tensor/bin/t2t-exporter',
27+
'tensor2tensor/bin/t2t-query-server',
28+
'tensor2tensor/bin/t2t-insights-server',
29+
'tensor2tensor/bin/t2t-avg-all',
30+
'tensor2tensor/bin/t2t-bleu',
31+
'tensor2tensor/bin/t2t-translate-all',
2632
],
2733
install_requires=[
2834
'bz2file',
35+
'flask',
2936
'future',
37+
'gevent',
38+
'gunicorn',
3039
'gym',
3140
'numpy',
3241
'requests',
@@ -35,8 +44,8 @@
3544
'six',
3645
],
3746
extras_require={
38-
'tensorflow': ['tensorflow>=1.4.0'],
39-
'tensorflow_gpu': ['tensorflow-gpu>=1.4.0'],
47+
'tensorflow': ['tensorflow>=1.4.1'],
48+
'tensorflow_gpu': ['tensorflow-gpu>=1.4.1'],
4049
'tests': ['pytest', 'h5py', 'mock'],
4150
},
4251
classifiers=[

tensor2tensor/bin/t2t-avg-all

+4-94
Original file line numberDiff line numberDiff line change
@@ -1,105 +1,15 @@
11
#!/usr/bin/env python
2-
# coding=utf-8
3-
# Copyright 2017 The Tensor2Tensor Authors.
4-
#
5-
# Licensed under the Apache License, Version 2.0 (the "License");
6-
# you may not use this file except in compliance with the License.
7-
# You may obtain a copy of the License at
8-
#
9-
# http://www.apache.org/licenses/LICENSE-2.0
10-
#
11-
# Unless required by applicable law or agreed to in writing, software
12-
# distributed under the License is distributed on an "AS IS" BASIS,
13-
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14-
# See the License for the specific language governing permissions and
15-
# limitations under the License.
16-
17-
"""Script to continously average last N checkpoints in a given directory."""
2+
"""t2t-avg-all."""
183
from __future__ import absolute_import
194
from __future__ import division
205
from __future__ import print_function
216

22-
import os
23-
import logging
24-
25-
# Dependency imports
7+
from tensor2tensor.bin import t2t_avg_all
268

27-
import numpy as np
28-
import six
29-
from six.moves import zip # pylint: disable=redefined-builtin
30-
from collections import deque
31-
import shutil
329
import tensorflow as tf
33-
from tensor2tensor.utils import bleu_hook
34-
35-
flags = tf.flags
36-
FLAGS = flags.FLAGS
37-
38-
flags.DEFINE_string("model_dir", "", "Directory to load model checkpoints from.")
39-
flags.DEFINE_string("output_dir", "avg/", "Directory to output the averaged checkpoints to.")
40-
flags.DEFINE_integer("n", 8, "How many checkpoints should be averaged?")
41-
flags.DEFINE_integer("min_steps", 0, "Ignore checkpoints with less steps.")
42-
flags.DEFINE_integer("wait_minutes", 0, "Wait upto N minutes for a new checkpoint.")
43-
44-
45-
def main(_):
46-
tf.logging._handler.setFormatter(logging.Formatter("%(asctime)s:" + logging.BASIC_FORMAT, None))
47-
tf.logging.set_verbosity(tf.logging.INFO)
48-
49-
model_dir = os.path.expanduser(FLAGS.model_dir)
50-
output_dir = os.path.expanduser(FLAGS.output_dir)
51-
out_base_file = os.path.join(output_dir, 'model.ckpt')
52-
53-
# Copy flags.txt with the original time, so t2t-bleu can report correct relative time.
54-
os.makedirs(FLAGS.output_dir, exist_ok=True)
55-
if not os.path.exists(os.path.join(output_dir, 'flags.txt')):
56-
shutil.copy2(os.path.join(model_dir, 'flags.txt'), os.path.join(output_dir, 'flags.txt'))
57-
58-
models_processed = 0
59-
queue = deque()
60-
for model in bleu_hook.stepfiles_iterator(model_dir, FLAGS.wait_minutes, FLAGS.min_steps):
61-
if models_processed == 0:
62-
var_list = tf.contrib.framework.list_variables(model.filename)
63-
avg_values = {}
64-
for (name, shape) in var_list:
65-
if not name.startswith("global_step"):
66-
avg_values[name] = np.zeros(shape)
67-
models_processed += 1
68-
69-
tf.logging.info("Loading [%d]: %s" % (models_processed, model.filename))
70-
reader = tf.contrib.framework.load_checkpoint(model.filename)
71-
for name in avg_values:
72-
avg_values[name] += reader.get_tensor(name) / FLAGS.n
73-
queue.append(model)
74-
if len(queue) < FLAGS.n:
75-
continue
76-
77-
out_file = "%s-%d" % (out_base_file, model.steps)
78-
tf_vars = []
79-
tf.logging.info("Averaging %s" % (out_file))
80-
for (name, value) in six.iteritems(avg_values):
81-
tf_vars.append(tf.get_variable(name, shape=value.shape)) # TODO , dtype=var_dtypes[name]
82-
placeholders = [tf.placeholder(v.dtype, shape=v.shape) for v in tf_vars]
83-
assign_ops = [tf.assign(v, p) for (v, p) in zip(tf_vars, placeholders)]
84-
85-
global_step = tf.Variable(model.steps, name="global_step", trainable=False, dtype=tf.int64)
86-
saver = tf.train.Saver(tf.global_variables())
87-
88-
tf.logging.info("Running session for %s" % (out_file))
89-
with tf.Session() as sess:
90-
sess.run(tf.global_variables_initializer())
91-
for p, assign_op, (name, value) in zip(placeholders, assign_ops, six.iteritems(avg_values)):
92-
sess.run(assign_op, {p: value})
93-
tf.logging.info("Storing to %s" % out_file)
94-
saver.save(sess, out_base_file, global_step=global_step)
95-
os.utime(out_file + '.index', (model.mtime, model.mtime))
96-
97-
tf.reset_default_graph()
98-
first_model = queue.popleft()
9910

100-
reader = tf.contrib.framework.load_checkpoint(first_model.filename)
101-
for name in avg_values:
102-
avg_values[name] -= reader.get_tensor(name) / FLAGS.n
11+
def main(argv):
12+
t2t_avg_all.main(argv)
10313

10414

10515
if __name__ == "__main__":

0 commit comments

Comments
 (0)