PyTorchLightning/pytorch-lightning: TPU support & profiling

William Falcon; Jirka Borovec; Nic Eggert; Vadim Bereznyuk; Ir1dXD; Adrian Wälchli; Jeremy Jordan; Sebastian Præsius; Tullie Murrell; Ethan Harris; Shreyas Bapat; Hendrik Schröter; Akshay Kulkarni; Verena Haunschmid; Dmitry Lipin; Alok Singh; Thomas J Fan; Nicki Skafte; Hadrien Mary; Cristobal Eyzaguirre; cinjon; Anton Bakhtin; Z ZH; Yongrae Jo; Peter Izsak; Oscar A. Rangel; Jeffrey Ling; Harsh Sharma; Elliot Waite; Ayberk Aydın

doi:10.5281/zenodo.3706330

Published March 10, 2020 | Version 0.7.0

Software Open

PyTorchLightning/pytorch-lightning: TPU support & profiling

1. Facebook AI Research
2. CTU in Prague
3. Target
4. Peking University, @24OI
5. @facebookresearch
6. University of Southampton
7. Indian Institute of Technology Mandi
8. IvLabs, VNIT
9. University of McGill
10. Pontificia Universidad Católica
11. Voithru
12. Intel AI
13. Biometrica

Overview

This is the first joined release with pytorch-bearer, here we come...

This release extends the training features by addIng Tensor Processing Unit (TPU) support, see docs. It brings together the flexibility from pytorch-bearer of extended support for user-defined callbacks, see docs. For the easier development, we have added profiling tool used in training runs, see docs. We have added an automatic sampler setup. Depending on DDP or TPU, lightning configures the sampler correctly (user needs to do nothing). We have extended support for multiple loggers to be passed to Trainer as an iterable (e.g. list, tuple, etc.), docs and added support for step-based learning rate scheduling

At last, we were fixing many reported issues as you can see from detail changelog attached bellow.

Detail changes Added

Added automatic sampler setup. Depending on DDP or TPU, lightning configures the sampler correctly (user needs to do nothing) (#926)
Added reload_dataloaders_every_epoch=False flag for trainer. Some users require reloading data every epoch (#926)
Added progress_bar_refresh_rate=50 flag for trainer. Throttle refresh rate on notebooks (#926)
Updated governance docs
Added a check to ensure that the metric used for early stopping exists before training commences (#542)
Added optimizer_idx argument to backward hook (#733)
Added entity argument to WandbLogger to be passed to wandb.init (#783)
Added a tool for profiling training runs (#782)
Improved flexibility for naming of TensorBoard logs, can now set version to a str to just save to that directory, and use name='' to prevent experiment-name directory (#804)
Added option to specify step key when logging metrics (#808)
Added train_dataloader, val_dataloader and test_dataloader arguments to Trainer.fit(), for alternative data parsing (#759)
Added Tensor Processing Unit (TPU) support (#868)
Added semantic segmentation example (#751, #876, #881)
Split callbacks in multiple files (#849)
Support for user defined callbacks (#889 and #950)
Added support for multiple loggers to be passed to Trainer as an iterable (e.g. list, tuple, etc.) (#903)
Added support for step-based learning rate scheduling (#941)
Added support for logging hparams as dict (#1029)
Checkpoint and early stopping now work without val. step (#1041)
Support graceful training cleanup after Keyboard Interrupt (#856, #1019)
Added type hints for function arguments (#912)
Added default argparser for Trainer (#952, #1023)
Added TPU gradient clipping (#963)
Added max/min number of steps in Trainer (#728)

Changed

Changed default TQDM to use tqdm.auto for prettier outputs in IPython notebooks (#752)
Changed pytorch_lightning.logging to pytorch_lightning.loggers (#767)
Moved the default tqdm_dict definition from Trainer to LightningModule, so it can be overridden by the user (#749)
Moved functionality of LightningModule.load_from_metrics into LightningModule.load_from_checkpoint (#995)
Changed Checkpoint path parameter from filepath to dirpath (#1016)
Freezed models hparams as Namespace property (#1029)
Dropped logging config in package init (#1015)
Renames model steps (#1051)
- training_end >> training_epoch_end
- validation_end >> validation_epoch_end
- test_end >> test_epoch_end
Refactor dataloading, supports infinite dataloader (#955)
Create single file in TensorBoardLogger (#777)

Deprecated

Deprecated pytorch_lightning.logging (#767)
Deprecated LightningModule.load_from_metrics in favour of LightningModule.load_from_checkpoint (#995, #1079)
Deprecated @data_loader decorator (#926)
Deprecated model steps training_end, validation_end and test_end (#1051, #1056)

Removed

Removed dependency on pandas (#736)
Removed dependency on torchvision (#797)
Removed dependency on scikit-learn (#801)

Fixed

Fixed a bug where early stopping on_end_epoch would be called inconsistently when check_val_every_n_epoch == 0 (#743)
Fixed a bug where the model checkpointer didn't write to the same directory as the logger (#771)
Fixed a bug where the TensorBoardLogger class would create an additional empty log file during fitting (#777)
Fixed a bug where global_step was advanced incorrectly when using accumulate_grad_batches > 1 (#832)
Fixed a bug when calling self.logger.experiment with multiple loggers (#1009)
Fixed a bug when calling logger.append_tags on a NeptuneLogger with a single tag (#1009)
Fixed sending back data from .spawn by saving and loading the trained model in/out of the process (#1017)
Fixed port collision on DDP (#1010)
Fixed/tested pass overrides (#918)
Fixed comet logger to log after train (#892)
Remove deprecated args to learning rate step function (#890)

Contributors

@jeremyjordan, @yukw777, @bobkemp, @AljoSt, @Borda, @neggert, @calclavia, @awaelchli, @airglow, @akshaykvnit, @AntixK, @xeTaiz, @djbyrne, @Calysto, @ethanwharris, @theevann, @fdelrio89, @onkyo14taro, @hadim, @hanbyul-kim, @kuynzereb, @luiscape, @MattPainter01, @peteriz, @shoarora, @shoarora, @SkafteNicki, @smallzzy, @srush, @baeseongsu, @tullie, @williamFalcon, @xssChauhan

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Files

PyTorchLightning/pytorch-lightning-0.7.0.zip

Files (10.1 MB)

Name	Size	Download all
PyTorchLightning/pytorch-lightning-0.7.0.zip md5:ad97e64a9ee71f9ef76e24fbe1a7f023	10.1 MB	Preview Download

Additional details

Is supplement to: https://github.com/PyTorchLightning/pytorch-lightning/tree/0.7.0 (URL)

	All versions	This version
Views	8,497	373
Downloads	289	10
Data volume	3.8 GB	101.2 MB

PyTorchLightning/pytorch-lightning: TPU support & profiling

Creators

Description

Files

PyTorchLightning/pytorch-lightning-0.7.0.zip

Files (10.1 MB)

Additional details

Related works