PyTorchLightning/pytorch-lightning: TPU support & profiling
Creators
- William Falcon1
- Jirka Borovec2
- Nic Eggert3
- Vadim Bereznyuk
- Ir1dXD4
- Adrian Wälchli
- Jeremy Jordan
- Sebastian Præsius
- Tullie Murrell5
- Ethan Harris6
- Shreyas Bapat7
- Hendrik Schröter
- Akshay Kulkarni8
- Verena Haunschmid
- Dmitry Lipin
- Alok Singh
- Thomas J Fan
- Nicki Skafte
- Hadrien Mary9
- Cristobal Eyzaguirre10
- cinjon
- Anton Bakhtin
- Z ZH
- Yongrae Jo11
- Peter Izsak12
- Oscar A. Rangel13
- Jeffrey Ling
- Harsh Sharma
- Elliot Waite
- Ayberk Aydın
- 1. Facebook AI Research
- 2. CTU in Prague
- 3. Target
- 4. Peking University, @24OI
- 5. @facebookresearch
- 6. University of Southampton
- 7. Indian Institute of Technology Mandi
- 8. IvLabs, VNIT
- 9. University of McGill
- 10. Pontificia Universidad Católica
- 11. Voithru
- 12. Intel AI
- 13. Biometrica
Description
Overview
This is the first joined release with pytorch-bearer, here we come...
This release extends the training features by addIng Tensor Processing Unit (TPU) support, see docs. It brings together the flexibility from pytorch-bearer of extended support for user-defined callbacks, see docs. For the easier development, we have added profiling tool used in training runs, see docs.
We have added an automatic sampler setup. Depending on DDP or TPU, lightning configures the sampler correctly (user needs to do nothing). We have extended support for multiple loggers to be passed to Trainer
as an iterable (e.g. list, tuple, etc.), docs and added support for step-based learning rate scheduling
At last, we were fixing many reported issues as you can see from detail changelog attached bellow.
Detail changes Added- Added automatic sampler setup. Depending on DDP or TPU, lightning configures the sampler correctly (user needs to do nothing) (#926)
- Added
reload_dataloaders_every_epoch=False
flag for trainer. Some users require reloading data every epoch (#926) - Added
progress_bar_refresh_rate=50
flag for trainer. Throttle refresh rate on notebooks (#926) - Updated governance docs
- Added a check to ensure that the metric used for early stopping exists before training commences (#542)
- Added
optimizer_idx
argument tobackward
hook (#733) - Added
entity
argument toWandbLogger
to be passed towandb.init
(#783) - Added a tool for profiling training runs (#782)
- Improved flexibility for naming of TensorBoard logs, can now set
version
to astr
to just save to that directory, and usename=''
to prevent experiment-name directory (#804) - Added option to specify
step
key when logging metrics (#808) - Added
train_dataloader
,val_dataloader
andtest_dataloader
arguments toTrainer.fit()
, for alternative data parsing (#759) - Added Tensor Processing Unit (TPU) support (#868)
- Added semantic segmentation example (#751, #876, #881)
- Split callbacks in multiple files (#849)
- Support for user defined callbacks (#889 and #950)
- Added support for multiple loggers to be passed to
Trainer
as an iterable (e.g. list, tuple, etc.) (#903) - Added support for step-based learning rate scheduling (#941)
- Added support for logging hparams as dict (#1029)
- Checkpoint and early stopping now work without val. step (#1041)
- Support graceful training cleanup after Keyboard Interrupt (#856, #1019)
- Added type hints for function arguments (#912)
- Added default
argparser
forTrainer
(#952, #1023) - Added TPU gradient clipping (#963)
- Added max/min number of steps in Trainer (#728)
- Changed default TQDM to use
tqdm.auto
for prettier outputs in IPython notebooks (#752) - Changed
pytorch_lightning.logging
topytorch_lightning.loggers
(#767) - Moved the default
tqdm_dict
definition from Trainer toLightningModule
, so it can be overridden by the user (#749) - Moved functionality of
LightningModule.load_from_metrics
intoLightningModule.load_from_checkpoint
(#995) - Changed Checkpoint path parameter from
filepath
todirpath
(#1016) - Freezed models
hparams
asNamespace
property (#1029) - Dropped
logging
config in package init (#1015) - Renames model steps (#1051)
training_end
>>training_epoch_end
validation_end
>>validation_epoch_end
test_end
>>test_epoch_end
- Refactor dataloading, supports infinite dataloader (#955)
- Create single file in
TensorBoardLogger
(#777)
- Deprecated
pytorch_lightning.logging
(#767) - Deprecated
LightningModule.load_from_metrics
in favour ofLightningModule.load_from_checkpoint
(#995, #1079) - Deprecated
@data_loader
decorator (#926) - Deprecated model steps
training_end
,validation_end
andtest_end
(#1051, #1056)
- Removed dependency on
pandas
(#736) - Removed dependency on
torchvision
(#797) - Removed dependency on
scikit-learn
(#801)
- Fixed a bug where early stopping
on_end_epoch
would be called inconsistently whencheck_val_every_n_epoch == 0
(#743) - Fixed a bug where the model checkpointer didn't write to the same directory as the logger (#771)
- Fixed a bug where the
TensorBoardLogger
class would create an additional empty log file during fitting (#777) - Fixed a bug where
global_step
was advanced incorrectly when usingaccumulate_grad_batches > 1
(#832) - Fixed a bug when calling
self.logger.experiment
with multiple loggers (#1009) - Fixed a bug when calling
logger.append_tags
on aNeptuneLogger
with a single tag (#1009) - Fixed sending back data from
.spawn
by saving and loading the trained model in/out of the process (#1017) - Fixed port collision on DDP (#1010)
- Fixed/tested pass overrides (#918)
- Fixed comet logger to log after train (#892)
- Remove deprecated args to learning rate step function (#890)
@jeremyjordan, @yukw777, @bobkemp, @AljoSt, @Borda, @neggert, @calclavia, @awaelchli, @airglow, @akshaykvnit, @AntixK, @xeTaiz, @djbyrne, @Calysto, @ethanwharris, @theevann, @fdelrio89, @onkyo14taro, @hadim, @hanbyul-kim, @kuynzereb, @luiscape, @MattPainter01, @peteriz, @shoarora, @shoarora, @SkafteNicki, @smallzzy, @srush, @baeseongsu, @tullie, @williamFalcon, @xssChauhan
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Files
PyTorchLightning/pytorch-lightning-0.7.0.zip
Files
(10.1 MB)
Name | Size | Download all |
---|---|---|
md5:ad97e64a9ee71f9ef76e24fbe1a7f023
|
10.1 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/PyTorchLightning/pytorch-lightning/tree/0.7.0 (URL)