Slideflow: A Unified Deep Learning Pipeline for Digital Histology

James Dolezal; Sara Kochanny; Frederick Howard

doi:10.5281/zenodo.6465196

Published April 16, 2022 | Version 1.1.0

Software Open

Slideflow: A Unified Deep Learning Pipeline for Digital Histology

1. University of Chicago Medical Center
2. University of Chicago

Slideflow is a computational pathology Python package which provides a unified API for building and testing deep learning models for histopathology, supporting both Tensorflow/Keras and PyTorch.

Slideflow includes tools for whole-slide image processing and segmentation, customizable deep learning model training with dozens of supported architectures, explainability tools including heatmaps and mosaic maps, analysis of activations from model layers, uncertainty quantification, and more. A variety of fast, optimized whole-slide image processing tools are included, including background filtering, blur/artifact detection, digital stain normalization, and efficient storage in *.tfrecords format. Model training is easy and highly configurable, with support for dozens of model architectures (from tf.keras.applications or torchvision.models) and an easy drop-in API for training custom architectures. For entirely custom training loops, Slideflow can be used as an image processing backend, serving an optimized tf.data.Dataset or torch.utils.data.DataLoader which can read and process *.tfrecords images and perform real-time stain normalization.

Version 1.1.0 - Major Features and Improvements :

Uncertainty quantification (UQ)

Estimation of uncertainty via dropout
- Models can now estimate uncertainty via dropout, by setting uq=True for a ModelParams.
- Uncertainty is saved as the tile-level and patient-level in predictions files.
- Heatmaps using UQ models will display heatmaps of uncertainty by default
- DatasetFeatures built with UQ models now store tile-level uncertainty in DatasetFeatures.uncertainty.
- SlideMap includes a new function to visualize uncertainty on UMAPs, SlideMap.label_by_uncertainty().

Updated normalizers

New reinhard_fast algorithm
- New normalizer strategy designed for computational efficiency. The reinhard_fast strategy is based on the standard reinhard normalizer with the brightness normalization step removed. This results in up to 10-fold speed improvements.
- New option to fit normalizers to a dataset
  - Previously, normalizers could only be fit to a single image, which is manually specified by the user using normalizer_source, or defaulting to an internal image stored at norm/norm_tile.jpg
  - By setting normalizer_source='dataset', a normalizer can be fit to the entire training dataset, rather than an arbitrary single image.
Vectorized normalizers
- The reinhard and reinhard_fast normalizers have been optimized with vectorized implementation, improving normalization speed to >12,000 img/sec
Normalizer logging
- To improve normalizer consistency when applying models to new datasets, normalizer fit parameters are now logged in the model parameters file, params.json, via the key norm_fit

Blur augmentation

Random gaussian blur augmentation can now be performed during training, in both Tensorflow and PyTorch backends.
Enable blur augmentation by adding 'b' to the augment parameter for a set of ModelParams.
Blur augmentation is now used by default.

Model updates

New architectures
- EfficientNet architectures are now supported, including EfficientNetV2B0, EfficientNetV2B1, EfficientNetV2B2, EfficientNetV2B3, EfficientNetV2S, EfficientNetV2M, and EfficientNetV2L
Hidden layer improvements
- Hidden layers now include batch normalization post-ReLU
Dropout
- Dropout layers now enabled for the PyTorch backend
Training improvements
- Adds a new save_model argument to P.train(), allowing models to not be saved after training is finished.
- Validation splits and IDs are now logged during training
- mixed_precision is now an argument for P.train() instead of a project-wide setting.
New regularization
- Adds new l1, l1_dense, l2, and l2_dense parameters in ModelParams to enable more granular control over regularization.

Bug Fixes and Other Changes

Predictions are now generated using a Pandas DataFrame backend
Revamped errors, moved to slideflow.errors
The active backend (Tensorflow or PyTorch) is now logged in models' params.json
Improvements to Neptune logging
Fixed bug with PDF generation during tile extraction
Removed several verbose arguments for functions, which were no longer being used. Instead, control logging verbosity via the SF_LOGGING_LEVEL environmental variable.
Minor documentation updates

Files

slideflow-1.1.0.zip

Files (8.9 MB)

Name	Size	Download all
slideflow-1.1.0.zip md5:89a811ce0cf5b2f8d1a0a665110c1ed0	8.9 MB	Preview Download

Additional details

Is supplement to: https://github.com/jamesdolezal/slideflow/tree/1.1.0 (URL)

	All versions	This version
Views	1,608	390
Downloads	92	11
Data volume	3.4 GB	97.7 MB

Slideflow: A Unified Deep Learning Pipeline for Digital Histology

Creators

Description

Files

slideflow-1.1.0.zip

Files (8.9 MB)

Additional details

Related works