Slideflow: A Unified Deep Learning Pipeline for Digital Histology
- 1. University of Chicago Medical Center
- 2. University of Chicago
Description
Slideflow is a computational pathology Python package which provides a unified API for building and testing deep learning models for histopathology, supporting both Tensorflow/Keras and PyTorch.
Slideflow includes tools for whole-slide image processing and segmentation, customizable deep learning model training with dozens of supported architectures, explainability tools including heatmaps and mosaic maps, analysis of activations from model layers, uncertainty quantification, and more. A variety of fast, optimized whole-slide image processing tools are included, including background filtering, blur/artifact detection, digital stain normalization, and efficient storage in *.tfrecords format. Model training is easy and highly configurable, with support for dozens of model architectures (from tf.keras.applications or torchvision.models) and an easy drop-in API for training custom architectures. For entirely custom training loops, Slideflow can be used as an image processing backend, serving an optimized tf.data.Dataset or torch.utils.data.DataLoader which can read and process *.tfrecords images and perform real-time stain normalization.
Version 1.1.0 - Major Features and Improvements :
Uncertainty quantification (UQ)
- Estimation of uncertainty via dropout
- Models can now estimate uncertainty via dropout, by setting
uq=Truefor aModelParams. - Uncertainty is saved as the tile-level and patient-level in predictions files.
- Heatmaps using UQ models will display heatmaps of uncertainty by default
DatasetFeaturesbuilt with UQ models now store tile-level uncertainty inDatasetFeatures.uncertainty.SlideMapincludes a new function to visualize uncertainty on UMAPs,SlideMap.label_by_uncertainty().
- Models can now estimate uncertainty via dropout, by setting
Updated normalizers
- New
reinhard_fastalgorithm- New normalizer strategy designed for computational efficiency. The
reinhard_faststrategy is based on the standardreinhardnormalizer with the brightness normalization step removed. This results in up to 10-fold speed improvements. - New option to fit normalizers to a dataset
- Previously, normalizers could only be fit to a single image, which is manually specified by the user using
normalizer_source, or defaulting to an internal image stored atnorm/norm_tile.jpg - By setting
normalizer_source='dataset', a normalizer can be fit to the entire training dataset, rather than an arbitrary single image.
- Previously, normalizers could only be fit to a single image, which is manually specified by the user using
- New normalizer strategy designed for computational efficiency. The
- Vectorized normalizers
- The
reinhardandreinhard_fastnormalizers have been optimized with vectorized implementation, improving normalization speed to >12,000 img/sec
- The
- Normalizer logging
- To improve normalizer consistency when applying models to new datasets, normalizer fit parameters are now logged in the model parameters file,
params.json, via the keynorm_fit
- To improve normalizer consistency when applying models to new datasets, normalizer fit parameters are now logged in the model parameters file,
Blur augmentation
- Random gaussian blur augmentation can now be performed during training, in both Tensorflow and PyTorch backends.
- Enable blur augmentation by adding 'b' to the
augmentparameter for a set ofModelParams. - Blur augmentation is now used by default.
Model updates
- New architectures
- EfficientNet architectures are now supported, including
EfficientNetV2B0,EfficientNetV2B1,EfficientNetV2B2,EfficientNetV2B3,EfficientNetV2S,EfficientNetV2M, andEfficientNetV2L
- EfficientNet architectures are now supported, including
- Hidden layer improvements
- Hidden layers now include batch normalization post-ReLU
- Dropout
- Dropout layers now enabled for the PyTorch backend
- Training improvements
- Adds a new
save_modelargument toP.train(), allowing models to not be saved after training is finished. - Validation splits and IDs are now logged during training
mixed_precisionis now an argument forP.train()instead of a project-wide setting.
- Adds a new
- New regularization
- Adds new
l1,l1_dense,l2, andl2_denseparameters inModelParamsto enable more granular control over regularization.
- Adds new
Bug Fixes and Other Changes
- Predictions are now generated using a Pandas DataFrame backend
- Revamped errors, moved to
slideflow.errors - The active backend (Tensorflow or PyTorch) is now logged in models'
params.json - Improvements to Neptune logging
- Fixed bug with PDF generation during tile extraction
- Removed several
verbosearguments for functions, which were no longer being used. Instead, control logging verbosity via theSF_LOGGING_LEVELenvironmental variable. - Minor documentation updates
Files
slideflow-1.1.0.zip
Files
(8.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:89a811ce0cf5b2f8d1a0a665110c1ed0
|
8.9 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/jamesdolezal/slideflow/tree/1.1.0 (URL)