Training¶
Prepare hyperparameters¶
The first step of model training is configuring a set of model parameters / training hyperparameters. There are two methods for configuring model parameters. If you intend to train a model using a single combination of hyperparameters, use the ModelParams class:
import slideflow as sf
hp = sf.ModelParams(
epochs=[1, 5],
model='xception',
learning_rate=0.0001,
batch_size=8,
...
)
Alternatively, if you intend to perform a sweep across multiple hyperparameter combinations, use the Project.create_hp_sweep() function to automatically save a sweep to a JSON file. For example, the following would set up a batch_train file with two combinations; the first with a learning rate of 0.01, and the second with a learning rate of 0.001:
P.create_hp_sweep(
epochs=[5],
toplayer_epochs=0,
model=['xception'],
pooling=['avg'],
loss='sparse_categorical_crossentropy',
learning_rate=[0.001, 0.0001],
batch_size=64,
hidden_layers=[1],
optimizer='Adam',
augment='xyrj'
)
Available hyperparameters include:
augment - Image augmentations to perform, including flipping/rotating and random JPEG compression. Please see
slideflow.model.ModelParamsfor more details.batch_size - Batch size for training.
dropout - Adds dropout layers after each fully-connected layer.
early_stop - Stop training early if validation loss/accuracy is not decreasing.
early_stop_patience - Number of epochs to wait before allowing early stopping.
early_stop_method - mMtric to use for early stopping. Includes ‘loss’, ‘accuracy’, or ‘manual’.
epochs - Number of epochs to spend training the full model.
include_top - Include the default, preconfigured, fully connected top layers of the specified model.
hidden_layers - Number of fully-connected final hidden layers before softmax prediction.
hidden_layer_width - Width of hidden layers.
l1 - Adds L1 regularization to all convolutional layers with this weight.
l1_dense - Adds L1 regularization to all fully-conected Dense layers with this weight.
l2 - Adds L2 regularization to all convolutional layers with this weight.
l2_dense - Adds L2 regularization to all fully-conected Dense layers with this weight.
learning_rate - Learning rate for training.
learning_rate_decay - lLarning rate decay during training.
learning_rate_decay_steps - Number of steps after which to decay learning rate
loss - loss function; please see Keras loss documentation for all options.
manual_early_stop_epoch - Manually trigger early stopping at this epoch/batch.
manual_early_stop_batch - Manually trigger early stopping at this epoch/batch.
model - Model architecture; please see Keras application documentation for all options.
normalizer - Normalization method to use on images.
normalizer_source - Optional path to normalization image to use as the source.
optimizer - Training optimizer; please see Keras opt documentation for all options.
pooling - Pooling strategy to use before final fully-connected layers; either ‘max’, ‘avg’, or ‘none’.
tile_px - Size of extracted tiles in pixels.
tile_um - Size of extracted tiles in microns.
toplayer_epochs - Number of epochs to spend training just the final layer, with all convolutional layers “locked” (sometimes used for transfer learning).
trainable_layers - Number of layers available for training, other layers will be frozen. If 0, all layers are trained.
training_balance - Training input balancing strategy; please see A Note on Input Balancing for more details.
uq - Enable uncertainty quantification (UQ) during inference. Requires dropout to be non-zero.
validation_balance - Validation input balancing strategy; please see A Note on Input Balancing for more details.
If you are using a continuous variable as an outcome measure, be sure to use a linear loss function. Linear loss functions can be viewed in slideflow.model.ModelParams.LinearLossDict, and all available loss functions are in slideflow.model.ModelParams.AllLossDict.
Begin training¶
Once your hyperparameter settings have been chosen you may begin training using the train function. Documentation of the function is given below:
- slideflow.Project.train(self, outcomes, params, exp_label=None, filters=None, filter_blank=None, input_header=None, min_tiles=0, max_tiles=0, splits='splits.json', balance_headers=None, mixed_precision=True, **training_kwargs)
Train model(s) using a given set of parameters, outcomes, and inputs.
- Parameters
outcomes (str or list(str)) – Outcome label annotation header(s).
params (
slideflow.model.ModelParams, list, dict, or str) – Model parameters for training. May provide one ModelParams, a list, or dict mapping model names to params. If multiple params are provided, will train models for each. If JSON file is provided, will interpret as a hyperparameter sweep. See examples below for use.exp_label (str, optional) – Experiment label to add model names.
filters (dict, optional) – Filters to use when selecting tfrecords. Defaults to None.
filter_blank (list, optional) – Exclude slides blank in these cols. Defaults to None.
input_header (list, optional) – List of annotation column headers to use as additional slide-level model input. Defaults to None.
min_tiles (int) – Minimum number of tiles a slide must have to include in training. Defaults to 0.
max_tiles (int) – Only use up to this many tiles from each slide for training. Defaults to 0 (include all tiles).
splits (str, optional) – Filename of JSON file in which to log train/val splits. Looks for filename in project root directory. Defaults to “splits.json”.
balance_headers (str or list(str)) – Annotation header(s) specifying labels on which to perform mini-batch balancing. If performing category-level balancing and this is set to None, will default to balancing on outcomes. Defaults to None.
mixed_precision (bool, optional) – Enable mixed precision. Defaults to True.
- Keyword Arguments
val_strategy (str) – Validation dataset selection strategy. Options include bootstrap, k-fold, k-fold-manual, k-fold-preserved-site, fixed, and none. Defaults to ‘k-fold’.
val_k_fold (int) – Total number of K if using K-fold validation. Defaults to 3.
val_k (int) – Iteration of K-fold to train, starting at 1. Defaults to None (training all k-folds).
val_k_fold_header (str) – Annotations file header column for manually specifying k-fold or for preserved-site cross validation. Only used if validation strategy is ‘k-fold-manual’ or ‘k-fold-preserved-site’. Defaults to None for k-fold-manual and ‘site’ for k-fold-preserved-site.
val_fraction (float) – Fraction of dataset to use for validation testing, if strategy is ‘fixed’.
val_source (str) – Dataset source to use for validation. Defaults to None (same as training).
val_annotations (str) – Path to annotations file for validation dataset. Defaults to None (same as training).
val_filters (dict) – Filters to use for validation dataset. Defaults to None (same as training).
checkpoint (str, optional) – Path to cp.ckpt from which to load weights. Defaults to None.
pretrain (str, optional) – Either ‘imagenet’ or path to Tensorflow model from which to load weights. Defaults to ‘imagenet’.
multi_gpu (bool) – Train using multiple GPUs when available. Defaults to False.
resume_training (str, optional) – Path to Tensorflow model to continue training. Defaults to None.
starting_epoch (int) – Start training at the specified epoch. Defaults to 0.
steps_per_epoch_override (int) – If provided, will manually set the number of steps in an epoch. Default epoch length is the number of total tiles.
save_predicitons (bool) – Save predictions with each validation. Defaults to False.
save_model (bool, optional) – Save models when evaluating at specified epochs. Defaults to True.
validate_on_batch (int) – Perform validation every N batches. Defaults to 0 (only at epoch end).
validation_batch_size (int) – Validation dataset batch size. Defaults to 32.
use_tensorboard (bool) – Add tensorboard callback for realtime training monitoring. Defaults to False.
validation_steps (int) – Number of steps of validation to perform each time doing a mid-epoch validation check. Defaults to 200.
- Returns
Dict with model names mapped to train_acc, val_loss, and val_acc
- Examples
Method 1 (hyperparameter sweep from a configuration file):
>>> import slideflow.model >>> P.train('outcome', params='sweep.json', ...)
Method 2 (manually specified hyperparameters):
>>> from slideflow.model import ModelParams >>> hp = ModelParams(...) >>> P.train('outcome', params=hp, ...)
Method 3 (list of hyperparameters):
>>> from slideflow.model import ModelParams >>> hp = [ModelParams(...), ModelParams(...)] >>> P.train('outcome', params=hp, ...)
Method 4 (dict of hyperparameters):
>>> from slideflow.model import ModelParams >>> hp = {'HP0': ModelParams(...), 'HP1': ModelParams(...)} >>> P.train('outcome', params=hp, ...)
If you used the ModelParams class to configure a single combination of parameters, pass this object via the params argument. If you configured a hyperparameter sweep, set this argument to the name of your hyperparameter sweep file (saved by default to ‘sweep.json’).
Your outcome variable(s) are specified with the outcomes argument. You may filter slides for training using the filter argument, as previously described.
For example, to train using only slides labeled as “train” in the “dataset” column, with the outcome variable defined by the column “category”, use the following syntax:
P.train(
outcomes="category",
filters={"dataset": ["train"]},
params='sweep.json'
)
If you would like to use a different validation plan than the default, pass the relevant keyword arguments to the training function.
Once training has finished, performance metrics - including accuracy, loss, etc. - can be found in the results_log.csv file in the project directory. Additional data, including ROCs and scatter plots, are saved in the model directories.
At each designated epoch, models are saved in their own folders. Each model directory will include a copy of its hyperparameters in a params.json file, and a copy of its training/validation slide manifest in slide.log.
Multiple outcomes¶
Slideflow supports both categorical and continuous outcomes, as well as training to single or multiple outcomes at once. To use multiple outcomes simultaneously, simply pass multiple annotation headers to the outcomes argument.
Multiple input variables¶
In addition to training using image data, clinical data can also be provided as model input by passing annotation column headers to the variable ‘’input_header’’. This input is merged at the post-convolutional layer, prior to any configured hidden layers.
If desired, models can also be trained with clinical input data alone, without images, by using the hyperparameter argument drop_images=True.
Cox Proportional Hazards (CPH) models¶
Models can also be trained to a time series outcome using CPH and negative log likelihood loss. For CPH models, use ‘negative_log_likelihood’ loss and set outcomes equal to the annotation column indicating event time. Specify the event type (0 or 1) by passing the event type annotation column to the argument input_header. If you are using multiple clinical inputs, the first header passed to input_header must be event type. CPH models are not compatible with multiple outcomes.
Note
CPH models are currently unavailable with the PyTorch backend. PyTorch support for CPH outcomes is in development.
Distributed training across GPUs¶
If multiple GPUs are available, training can be distributed by passing the argument multi_gpu=True. If provided, slideflow will use all available (and visible) GPUs for training.
Monitoring performance¶
During training, progress can be monitored using Tensorflow’s bundled Tensorboard package by passing the argument use_tensorboard=True. This functionality was disabled by default due to a recent bug in Tensorflow. To use tensorboard to monitor training, execute:
$ tensorboard --logdir=/path/to/model/directory
… and open http://localhost:6006 in your web browser.