This repository has been archived by the owner on Dec 16, 2022. It is now read-only.
v2.7.0
#5394
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What's new
Added 🎉
evaluate
command.pytorch_lr_schedulers.py
so that they will have their own documentation page.allennlp.nn.parallel
with a new base class,DdpAccelerator
, which generalizesPyTorch's
DistributedDataParallel
wrapper to support other implementations. Two implementations ofthis class are provided. The default is
TorchDdpAccelerator
(registered at "torch"), which is just a thin wrapper aroundDistributedDataParallel
. The other isFairScaleFsdpAccelerator
, which wraps FairScale'sFullyShardedDataParallel
.You can specify the
DdpAccelerator
in the "distributed" section of a configuration file under the key "ddp_accelerator".allennlp.nn.checkpoint
with a new base class,CheckpointWrapper
, for implementationsof activation/gradient checkpointing. Two implentations are provided. The default implementation is
TorchCheckpointWrapper
(registered as "torch"),which exposes PyTorch's checkpoint functionality.
The other is
FairScaleCheckpointWrapper
which exposes the more flexiblecheckpointing funtionality from FairScale.
Model
base class now takes addp_accelerator
parameter (an instance ofDdpAccelerator
) which will be available asself.ddp_accelerator
during distributed training. This is useful when, for example, instantiating submodules in yourmodel's
__init__()
method by wrapping them withself.ddp_accelerator.wrap_module()
. See theallennlp.modules.transformer.t5
for an example.
ScaledDotProductMatrixAttention
, and converted the transformer toolkit to use itAttention
andMatrixAttention
implementations are interchangeablefrom_pretrained_transformer_and_instances
constructor toVocabulary
TransformerTextField
now supports__len__
.Fixed ✅
ConditionalRandomField
:transitions
andtag_sequence
tensors were not initialized on the desired device causing high CPU usage (see Why CRF lead a high cost on CPU? #2884)contructor_extras
inLazy()
is now correctly calledconstructor_extras
.allennlp.nn.initializers
docs.BeamSearch
wherelast_backpointers
was not being passed to anyConstraint
s.TransformerTextField
can now take tensors of shape(1, n)
like the tensors produced from a HuggingFace tokenizer.tqdm
lock is now set insideMultiProcessDataLoading
when new workers are spawned to avoid contention when writing output.ConfigurationError
is now pickleable.TextFieldTensor
in heads, not just in the backbone.ScaledDotProductAttention
to match the otherAttention
classesallennlp
commands will now catchSIGTERM
signals and handle them similar toSIGINT
(keyboard interrupt).MultiProcessDataLoader
will properly shutdown its workers when aSIGTERM
is received.Step
instances.Changed⚠️
grad_norm
parameter ofGradientDescentTrainer
is nowUnion[float, bool]
,with a default value of
False
.False
means gradients are not rescaled and the gradientnorm is never even calculated.
True
means the gradients are still not rescaled but the gradientnorm is calculated and passed on to callbacks. A
float
value means gradients are rescaled.TensorCache
now supports more concurrent readers and writers.Commits
48af9d3 Multiple datasets and output files support for the evaluate command (#5340)
60213cd Tiny tango tweaks (#5383)
2895021 improve signal handling and worker cleanup (#5378)
b41cb3e Fix distributed loss (#5381)
6355f07 Fix Checkpointer cleaner regex on Windows (#5361)
27da04c Dataset remix (#5372)
75af38e Create Vocabulary from both pretrained transformers and instances (#5368)
5dc80a6 Adds a dataset that can be read and written lazily (#5344)
01e8a35 Improved Documentation For Learning Rate Schedulers (#5365)
8370cfa skip loading t5-base in CI (#5371)
13de38d Log batch metrics (#5362)
1f5c6e5 Use our own base images to build allennlp Docker images (#5366)
bffdbfd Bugfix: initializing all tensors and parameters of the
ConditionalRandomField
model on the proper device (#5335)d45a2da Make sure that all attention works the same (#5360)
c1edaef Update google-cloud-storage requirement (#5357)
524244b Update wandb requirement from <0.12.0,>=0.10.0 to >=0.10.0,<0.13.0 (#5356)
90bf33b small fixes for tango (#5350)
2e11a15 tick version for nightly releases
311f110 Tango (#5162)
1df2e51 Bump fairscale from 0.3.8 to 0.3.9 (#5337)
b72bbfc fix constraint bug in beam search, clean up tests (#5328)
ec3e294 Create CITATION.cff (#5336)
8714aa0 This is a desperate attempt to make TensorCache a little more stable (#5334)
fd429b2 Update transformers requirement from <4.9,>=4.1 to >=4.1,<4.10 (#5326)
1b5ef3a Update spacy requirement from <3.1,>=2.1.0 to >=2.1.0,<3.2 (#5305)
1f20513 TextFieldTensor in multitask models (#5331)
76f2487 set tqdm lock when new workers are spawned (#5330)
67add9d Fix
ConfigurationError
deserialization (#5319)42d8529 allow TransformerTextField to take input directly from HF tokenizer (#5329)
64043ac Bump black from 21.6b0 to 21.7b0 (#5320)
3275055 Update mkdocs-material requirement from <7.2.0,>=5.5.0 to >=5.5.0,<7.3.0 (#5327)
5b1da90 Update links in initializers documentation (#5317)
ca656fc FairScale integration (#5242)
This discussion was created from the release v2.7.0.
Beta Was this translation helpful? Give feedback.
All reactions