Shortcuts

slideflow.model

This module provides the ModelParams class to organize model and training parameters/hyperparameters and assist with model building, as well as the Trainer class that executes model training and evaluation. LinearTrainer and CPHTrainer are extensions of this class, supporting linear and Cox Proportional Hazards outcomes, respectively. The function trainer_from_hp() can choose and return the correct model instance based on the provided hyperparameters.

Note

In order to support both Tensorflow and PyTorch backends, the slideflow.model module will import either slideflow.model.tensorflow or slideflow.model.torch according to the currently active backend, indicated by the environmental variable SF_BACKEND.

Configuring and training models

slideflow.model.ModelParams will build models according to a set of model parameters and a given set of outcome labels. To change the core image convolutional model to another architecture, set the model parameter to the custom model class.

import CustomModel
from slideflow.model import ModelParams

mp = ModelParams(model=CustomModel, ...)

Working with layer activations

slideflow.model.Features creates an interface to efficiently generate features/layer activations and logits from either a batch of images (returning a batch of activations/logits) or a whole-slide image (returning a grid of activations/logits).

slideflow.model.DatasetFeatures calculates features and logits for an entire dataset, storing result arrays into a dictionary mapping slide names to the generated activations. This buffer of whole-dataset activations can then be used for functions requiring analysis of whole-dataset activations, including slideflow.SlideMap and slideflow.mosiac.Mosaic.

ModelParams

class slideflow.model.ModelParams(*args, **kwargs)

Build a set of hyperparameters.

__init__(*args, **kwargs)

Collection of hyperparameters used for model building and training

Parameters
  • tile_px (int, optional) – Tile width in pixels. Defaults to 299.

  • tile_um (int, optional) – Tile width in microns. Defaults to 302.

  • epochs (int, optional) – Number of epochs to train the full model. Defaults to 3.

  • toplayer_epochs (int, optional) – Number of epochs to only train the fully-connected layers. Defaults to 0.

  • model (str, optional) – Base model architecture name. Defaults to ‘xception’.

  • pooling (str, optional) – Post-convolution pooling. ‘max’, ‘avg’, or ‘none’. Defaults to ‘max’.

  • loss (str, optional) – Loss function. Defaults to ‘sparse_categorical_crossentropy’.

  • learning_rate (float, optional) – Learning rate. Defaults to 0.0001.

  • learning_rate_decay (int, optional) – Learning rate decay rate. Defaults to 0.

  • learning_rate_decay_steps (int, optional) – Learning rate decay steps. Defaults to 100000.

  • batch_size (int, optional) – Batch size. Defaults to 16.

  • hidden_layers (int, optional) – Number of fully-connected hidden layers after core model. Defaults to 0.

  • hidden_layer_width (int, optional) – Width of fully-connected hidden layers. Defaults to 500.

  • optimizer (str, optional) – Name of optimizer. Defaults to ‘Adam’.

  • early_stop (bool, optional) – Use early stopping. Defaults to False.

  • early_stop_patience (int, optional) – Patience for early stopping, in epochs. Defaults to 0.

  • early_stop_method (str, optional) – Metric to monitor for early stopping. Defaults to ‘loss’.

  • manual_early_stop_epoch (int, optional) – Manually override early stopping to occur at this epoch/batch. Defaults to None.

  • manual_early_stop_batch (int, optional) – Manually override early stopping to occur at this epoch/batch. Defaults to None.

  • training_balance ([type], optional) – Type of batch-level balancing to use during training. Options include ‘tile’, ‘category’, ‘patient’, ‘slide’, and None. Defaults to ‘category’ if a categorical loss is provided, and ‘patient’ if a linear loss is provided.

  • validation_balance ([type], optional) – Type of batch-level balancing to use during validation. Options include ‘tile’, ‘category’, ‘patient’, ‘slide’, and None. Defaults to ‘none’.

  • trainable_layers (int, optional) – Number of layers which are traininable. If 0, trains all layers. Defaults to 0.

  • l1 (int, optional) – L1 regularization weight. Defaults to 0.

  • l2 (int, optional) – L2 regularization weight. Defaults to 0.

  • l1_dense (int, optional) – L1 regularization weight for Dense layers. Defaults to the value of l1.

  • l2_dense (int, optional) – L2 regularization weight for Dense layers. Defaults to the value of l2.

  • dropout (int, optional) – Post-convolution dropout rate. Defaults to 0.

  • uq (bool, optional) – Use uncertainty quantification with dropout. Requires dropout > 0. Defaults to False.

  • augment (str) – Image augmentations to perform. String containing characters designating augmentations. ‘x’ indicates random x-flipping, ‘y’ y-flipping, ‘r’ rotating, and ‘j’ JPEG compression/decompression at random quality levels. Passing either ‘xyrj’ or True will use all augmentations.

  • normalizer (str, optional) – Normalization strategy to use on image tiles. Defaults to None.

  • normalizer_source (str, optional) – Path to normalizer source image. Defaults to None. If None but using a normalizer, will use an internal tile for normalization. Internal default tile can be found at slideflow.slide.norm_tile.jpg

  • include_top (bool, optional) – Include post-convolution fully-connected layers from the core model. Defaults to True. include_top=False is not currently compatible with the PyTorch backend.

  • drop_images (bool, optional) – Drop images, using only other slide-level features as input. Defaults to False.

build_model(labels=None, num_classes=None, **kwargs)

Auto-detects model type (categorical, linear, CPH) from parameters and builds, using pretraining (imagenet) or the base layers of a supplied model.

Parameters
  • labels (dict, optional) – Dict mapping slide names to outcomes. Used to detect number of outcome categories.

  • num_classes (int or dict, optional) – Either int (single categorical outcome, indicating number of classes) or dict (dict mapping categorical outcome names to number of unique categories in each outcome). Must supply either num_classes or label (can detect number of classes from labels)

  • num_slide_features (int, optional) – Number of slide-level features separate from image input. Defaults to 0.

  • activation (str, optional) – Type of final layer activation to use. Defaults to ‘softmax’ (categorical models) or ‘linear’ (linear or CPH models).

  • pretrain (str, optional) – Either ‘imagenet’ or path to model to use as pretraining. Defaults to ‘imagenet’.

  • checkpoint (str, optional) – Path to checkpoint from which to resume model training. Defaults to None.

get_opt()

Returns optimizer with appropriate learning rate.

model_type()

Returns either ‘linear’, ‘categorical’, or ‘cph’ depending on the loss type.

validate()

Check that hyperparameter combinations are valid.

Trainer

class slideflow.model.Trainer(hp, outdir, labels, patients, slide_input=None, name=None, manifest=None, feature_sizes=None, feature_names=None, outcome_names=None, mixed_precision=True, config=None, use_neptune=False, neptune_api=None, neptune_workspace=None)

Base trainer class containing functionality for model building, input processing, training, and evaluation.

This base class requires categorical outcome(s). Additional outcome types are supported by slideflow.model.LinearTrainer and slideflow.model.CPHTrainer.

Slide-level (e.g. clinical) features can be used as additional model input by providing slide labels in the slide annotations dictionary, under the key ‘input’.

__init__(hp, outdir, labels, patients, slide_input=None, name=None, manifest=None, feature_sizes=None, feature_names=None, outcome_names=None, mixed_precision=True, config=None, use_neptune=False, neptune_api=None, neptune_workspace=None)

Sets base configuration, preparing model inputs and outputs.

Parameters
  • hp (slideflow.model.ModelParams) – ModelParams object.

  • outdir (str) – Location where event logs and checkpoints will be written.

  • labels (dict) – Dict mapping slide names to outcome labels (int or float format).

  • patients (dict) – Dict mapping slide names to patient ID, as some patients may have multiple slides. If not provided, assumes 1:1 mapping between slide names and patients.

  • slide_input (dict) – Dict mapping slide names to additional slide-level input, concatenated after post-conv.

  • name (str, optional) – Optional name describing the model, used for model saving. Defaults to None.

  • manifest (dict, optional) – Manifest dictionary mapping TFRecords to number of tiles. Defaults to None.

  • model_type (str, optional) – Type of model outcome, ‘categorical’ or ‘linear’. Defaults to ‘categorical’.

  • feature_sizes (list, optional) – List of sizes of input features. Required if providing additional input features as input to the model.

  • feature_names (list, optional) – List of names for input features. Used when permuting feature importance.

  • outcome_names (list, optional) – Name of each outcome. Defaults to “Outcome {X}” for each outcome.

  • mixed_precision (bool, optional) – Use FP16 mixed precision (rather than FP32). Defaults to True.

  • config (dict, optional) – Training configuration dictionary, used for logging. Defaults to None.

  • use_neptune (bool, optional) – Use Neptune API logging. Defaults to False

  • neptune_api (str, optional) – Neptune API token, used for logging. Defaults to None.

  • neptune_workspace (str, optional) – Neptune workspace, used for logging. Defaults to None.

evaluate(dataset, batch_size=None, permutation_importance=False, histogram=False, save_predictions=False, norm_fit=None, uq='auto')

Evaluate model, saving metrics and predictions.

Parameters
  • dataset (slideflow.dataset.Dataset) – Dataset containing TFRecords to evaluate.

  • checkpoint (list, optional) – Path to cp.cpkt checkpoint to load. Defaults to None.

  • batch_size (int, optional) – Evaluation batch size. Defaults to the same as training (per self.hp)

  • permutation_importance (bool, optional) – Run permutation feature importance to define relative benefit of histology and each clinical slide-level feature input, if provided.

  • histogram (bool, optional) – Save histogram of tile predictions. Poorly optimized, uses seaborn, may drastically increase evaluation time. Defaults to False.

  • save_predictions (bool, optional) – Save tile, slide, and patient-level predictions to CSV. Defaults to False.

Returns

Dictionary of evaluation metrics.

predict(dataset, batch_size=None, norm_fit=None, format='csv')

Perform inference on a model, saving tile-level predictions.

Parameters
  • dataset (slideflow.dataset.Dataset) – Dataset containing TFRecords to evaluate.

  • batch_size (int, optional) – Evaluation batch size. Defaults to the same as training (per self.hp)

  • format (str, optional) – Format in which to save predictions. Either ‘csv’ or ‘feather’. Defaults to ‘csv’.

Returns

pandas.DataFrame of tile-level predictions.

train(train_dts, val_dts, log_frequency=100, validate_on_batch=0, validation_batch_size=None, validation_steps=200, starting_epoch=0, ema_observations=20, ema_smoothing=2, use_tensorboard=True, steps_per_epoch_override=None, save_predictions=False, save_model=True, resume_training=None, pretrain='imagenet', checkpoint=None, multi_gpu=False, norm_fit=None, skip_val_without_es=True)

Builds and trains a model from hyperparameters.

Parameters
  • train_dts (slideflow.dataset.Dataset) – Dataset containing TFRecords for training.

  • val_dts (slideflow.dataset.Dataset) – Dataset containing TFRecords for validation.

  • log_frequency (int, optional) – How frequent to update Tensorboard logs, in batches. Defaults to 100.

  • validate_on_batch (int, optional) – Validation will also be performed every N batches. Defaults to 0.

  • validation_batch_size (int, optional) – Validation batch size. Defaults to same as training (per self.hp).

  • validation_steps (int, optional) – Number of batches to use for each instance of validation. Defaults to 200.

  • starting_epoch (int, optional) – Starts training at the specified epoch. Defaults to 0.

  • ema_observations (int, optional) – Number of observations over which to perform exponential moving average smoothing. Defaults to 20.

  • ema_smoothing (int, optional) – Exponential average smoothing value. Defaults to 2.

  • use_tensoboard (bool, optional) – Enable tensorboard callbacks. Defaults to False.

  • steps_per_epoch_override (int, optional) – Manually set the number of steps per epoch. Defaults to None.

  • save_predictions (bool, optional) – Save tile, slide, and patient-level predictions at each evaluation. Defaults to False.

  • save_model (bool, optional) – Save models when evaluating at specified epochs. Defaults to True.

  • resume_training (str, optional) – Path to Tensorflow model to continue training. Defaults to None.

  • pretrain (str, optional) – Either ‘imagenet’ or path to Tensorflow model from which to load weights. Defaults to ‘imagenet’.

  • checkpoint (str, optional) – Path to cp.ckpt from which to load weights. Defaults to None.

Returns

Nested results dictionary containing metrics for each evaluated epoch.

LinearTrainer

class slideflow.model.LinearTrainer(*args, **kwargs)

Extends the base slideflow.model.Trainer class to add support for linear outcomes. Requires that all outcomes be linear, with appropriate linear loss function. Uses R-squared as the evaluation metric, rather than AUROC.

__init__(*args, **kwargs)

Sets base configuration, preparing model inputs and outputs.

Parameters
  • hp (slideflow.model.ModelParams) – ModelParams object.

  • outdir (str) – Location where event logs and checkpoints will be written.

  • labels (dict) – Dict mapping slide names to outcome labels (int or float format).

  • patients (dict) – Dict mapping slide names to patient ID, as some patients may have multiple slides. If not provided, assumes 1:1 mapping between slide names and patients.

  • slide_input (dict) – Dict mapping slide names to additional slide-level input, concatenated after post-conv.

  • name (str, optional) – Optional name describing the model, used for model saving. Defaults to None.

  • manifest (dict, optional) – Manifest dictionary mapping TFRecords to number of tiles. Defaults to None.

  • model_type (str, optional) – Type of model outcome, ‘categorical’ or ‘linear’. Defaults to ‘categorical’.

  • feature_sizes (list, optional) – List of sizes of input features. Required if providing additional input features as input to the model.

  • feature_names (list, optional) – List of names for input features. Used when permuting feature importance.

  • outcome_names (list, optional) – Name of each outcome. Defaults to “Outcome {X}” for each outcome.

  • mixed_precision (bool, optional) – Use FP16 mixed precision (rather than FP32). Defaults to True.

  • config (dict, optional) – Training configuration dictionary, used for logging. Defaults to None.

  • use_neptune (bool, optional) – Use Neptune API logging. Defaults to False

  • neptune_api (str, optional) – Neptune API token, used for logging. Defaults to None.

  • neptune_workspace (str, optional) – Neptune workspace, used for logging. Defaults to None.

evaluate(dataset, batch_size=None, permutation_importance=False, histogram=False, save_predictions=False, norm_fit=None, uq='auto')

Evaluate model, saving metrics and predictions.

Parameters
  • dataset (slideflow.dataset.Dataset) – Dataset containing TFRecords to evaluate.

  • checkpoint (list, optional) – Path to cp.cpkt checkpoint to load. Defaults to None.

  • batch_size (int, optional) – Evaluation batch size. Defaults to the same as training (per self.hp)

  • permutation_importance (bool, optional) – Run permutation feature importance to define relative benefit of histology and each clinical slide-level feature input, if provided.

  • histogram (bool, optional) – Save histogram of tile predictions. Poorly optimized, uses seaborn, may drastically increase evaluation time. Defaults to False.

  • save_predictions (bool, optional) – Save tile, slide, and patient-level predictions to CSV. Defaults to False.

Returns

Dictionary of evaluation metrics.

predict(dataset, batch_size=None, norm_fit=None, format='csv')

Perform inference on a model, saving tile-level predictions.

Parameters
  • dataset (slideflow.dataset.Dataset) – Dataset containing TFRecords to evaluate.

  • batch_size (int, optional) – Evaluation batch size. Defaults to the same as training (per self.hp)

  • format (str, optional) – Format in which to save predictions. Either ‘csv’ or ‘feather’. Defaults to ‘csv’.

Returns

pandas.DataFrame of tile-level predictions.

train(train_dts, val_dts, log_frequency=100, validate_on_batch=0, validation_batch_size=None, validation_steps=200, starting_epoch=0, ema_observations=20, ema_smoothing=2, use_tensorboard=True, steps_per_epoch_override=None, save_predictions=False, save_model=True, resume_training=None, pretrain='imagenet', checkpoint=None, multi_gpu=False, norm_fit=None, skip_val_without_es=True)

Builds and trains a model from hyperparameters.

Parameters
  • train_dts (slideflow.dataset.Dataset) – Dataset containing TFRecords for training.

  • val_dts (slideflow.dataset.Dataset) – Dataset containing TFRecords for validation.

  • log_frequency (int, optional) – How frequent to update Tensorboard logs, in batches. Defaults to 100.

  • validate_on_batch (int, optional) – Validation will also be performed every N batches. Defaults to 0.

  • validation_batch_size (int, optional) – Validation batch size. Defaults to same as training (per self.hp).

  • validation_steps (int, optional) – Number of batches to use for each instance of validation. Defaults to 200.

  • starting_epoch (int, optional) – Starts training at the specified epoch. Defaults to 0.

  • ema_observations (int, optional) – Number of observations over which to perform exponential moving average smoothing. Defaults to 20.

  • ema_smoothing (int, optional) – Exponential average smoothing value. Defaults to 2.

  • use_tensoboard (bool, optional) – Enable tensorboard callbacks. Defaults to False.

  • steps_per_epoch_override (int, optional) – Manually set the number of steps per epoch. Defaults to None.

  • save_predictions (bool, optional) – Save tile, slide, and patient-level predictions at each evaluation. Defaults to False.

  • save_model (bool, optional) – Save models when evaluating at specified epochs. Defaults to True.

  • resume_training (str, optional) – Path to Tensorflow model to continue training. Defaults to None.

  • pretrain (str, optional) – Either ‘imagenet’ or path to Tensorflow model from which to load weights. Defaults to ‘imagenet’.

  • checkpoint (str, optional) – Path to cp.ckpt from which to load weights. Defaults to None.

Returns

Nested results dictionary containing metrics for each evaluated epoch.

CPHTrainer

class slideflow.model.CPHTrainer(*args, **kwargs)

Cox Proportional Hazards model. Requires that the user provide event data as the first input feature, and time to outcome as the linear outcome. Uses concordance index as the evaluation metric.

__init__(*args, **kwargs)

Sets base configuration, preparing model inputs and outputs.

Parameters
  • hp (slideflow.model.ModelParams) – ModelParams object.

  • outdir (str) – Location where event logs and checkpoints will be written.

  • labels (dict) – Dict mapping slide names to outcome labels (int or float format).

  • patients (dict) – Dict mapping slide names to patient ID, as some patients may have multiple slides. If not provided, assumes 1:1 mapping between slide names and patients.

  • slide_input (dict) – Dict mapping slide names to additional slide-level input, concatenated after post-conv.

  • name (str, optional) – Optional name describing the model, used for model saving. Defaults to None.

  • manifest (dict, optional) – Manifest dictionary mapping TFRecords to number of tiles. Defaults to None.

  • model_type (str, optional) – Type of model outcome, ‘categorical’ or ‘linear’. Defaults to ‘categorical’.

  • feature_sizes (list, optional) – List of sizes of input features. Required if providing additional input features as input to the model.

  • feature_names (list, optional) – List of names for input features. Used when permuting feature importance.

  • outcome_names (list, optional) – Name of each outcome. Defaults to “Outcome {X}” for each outcome.

  • mixed_precision (bool, optional) – Use FP16 mixed precision (rather than FP32). Defaults to True.

  • config (dict, optional) – Training configuration dictionary, used for logging. Defaults to None.

  • use_neptune (bool, optional) – Use Neptune API logging. Defaults to False

  • neptune_api (str, optional) – Neptune API token, used for logging. Defaults to None.

  • neptune_workspace (str, optional) – Neptune workspace, used for logging. Defaults to None.

evaluate(dataset, batch_size=None, permutation_importance=False, histogram=False, save_predictions=False, norm_fit=None, uq='auto')

Evaluate model, saving metrics and predictions.

Parameters
  • dataset (slideflow.dataset.Dataset) – Dataset containing TFRecords to evaluate.

  • checkpoint (list, optional) – Path to cp.cpkt checkpoint to load. Defaults to None.

  • batch_size (int, optional) – Evaluation batch size. Defaults to the same as training (per self.hp)

  • permutation_importance (bool, optional) – Run permutation feature importance to define relative benefit of histology and each clinical slide-level feature input, if provided.

  • histogram (bool, optional) – Save histogram of tile predictions. Poorly optimized, uses seaborn, may drastically increase evaluation time. Defaults to False.

  • save_predictions (bool, optional) – Save tile, slide, and patient-level predictions to CSV. Defaults to False.

Returns

Dictionary of evaluation metrics.

predict(dataset, batch_size=None, norm_fit=None, format='csv')

Perform inference on a model, saving tile-level predictions.

Parameters
  • dataset (slideflow.dataset.Dataset) – Dataset containing TFRecords to evaluate.

  • batch_size (int, optional) – Evaluation batch size. Defaults to the same as training (per self.hp)

  • format (str, optional) – Format in which to save predictions. Either ‘csv’ or ‘feather’. Defaults to ‘csv’.

Returns

pandas.DataFrame of tile-level predictions.

train(train_dts, val_dts, log_frequency=100, validate_on_batch=0, validation_batch_size=None, validation_steps=200, starting_epoch=0, ema_observations=20, ema_smoothing=2, use_tensorboard=True, steps_per_epoch_override=None, save_predictions=False, save_model=True, resume_training=None, pretrain='imagenet', checkpoint=None, multi_gpu=False, norm_fit=None, skip_val_without_es=True)

Builds and trains a model from hyperparameters.

Parameters
  • train_dts (slideflow.dataset.Dataset) – Dataset containing TFRecords for training.

  • val_dts (slideflow.dataset.Dataset) – Dataset containing TFRecords for validation.

  • log_frequency (int, optional) – How frequent to update Tensorboard logs, in batches. Defaults to 100.

  • validate_on_batch (int, optional) – Validation will also be performed every N batches. Defaults to 0.

  • validation_batch_size (int, optional) – Validation batch size. Defaults to same as training (per self.hp).

  • validation_steps (int, optional) – Number of batches to use for each instance of validation. Defaults to 200.

  • starting_epoch (int, optional) – Starts training at the specified epoch. Defaults to 0.

  • ema_observations (int, optional) – Number of observations over which to perform exponential moving average smoothing. Defaults to 20.

  • ema_smoothing (int, optional) – Exponential average smoothing value. Defaults to 2.

  • use_tensoboard (bool, optional) – Enable tensorboard callbacks. Defaults to False.

  • steps_per_epoch_override (int, optional) – Manually set the number of steps per epoch. Defaults to None.

  • save_predictions (bool, optional) – Save tile, slide, and patient-level predictions at each evaluation. Defaults to False.

  • save_model (bool, optional) – Save models when evaluating at specified epochs. Defaults to True.

  • resume_training (str, optional) – Path to Tensorflow model to continue training. Defaults to None.

  • pretrain (str, optional) – Either ‘imagenet’ or path to Tensorflow model from which to load weights. Defaults to ‘imagenet’.

  • checkpoint (str, optional) – Path to cp.ckpt from which to load weights. Defaults to None.

Returns

Nested results dictionary containing metrics for each evaluated epoch.

trainer_from_hp

slideflow.model.trainer_from_hp(hp, **kwargs)

From the given slideflow.model.ModelParams object, returns the appropriate instance of slideflow.model.Model.

Parameters

hp (slideflow.model.ModelParams) – ModelParams object.

Keyword Arguments
  • outdir (str) – Target location for event logs and checkpoints.

  • annotations (dict) – Nested dict, mapping slide names to a dict with patient name (key ‘patient’), outcome labels (key ‘outcome_label’), and any additional slide-level inputs (key ‘input’).

  • name (str, optional) – Optional name describing the model, used for model saving. Defaults to None.

  • manifest (dict, optional) – Manifest dictionary mapping TFRecords to number of tiles. Defaults to None.

  • model_type (str, optional) – Type of model outcome, ‘categorical’ or ‘linear’. Defaults to ‘categorical’.

  • feature_sizes (list, optional) – List of sizes of input features. Required if providing additional input features as model input.

  • feature_names (list, optional) – List of names for input features. Used when permuting feature importance.

  • outcome_names (list, optional) – Name of each outcome. Defaults to “Outcome {X}” for each outcome.

  • mixed_precision (bool, optional) – Use FP16 mixed precision (rather than FP32). Defaults to True.

Features

class slideflow.model.Features(path, layers='postconv', include_logits=False)

Interface for obtaining logits and features from intermediate layer activations from Slideflow models.

Use by calling on either a batch of images (returning outputs for a single batch), or by calling on a slideflow.WSI object, which will generate an array of spatially-mapped activations matching the slide.

Examples

Calling on batch of images:

interface = Features('/model/path', layers='postconv')
for image_batch in train_data:
    # Return shape: (batch_size, num_features)
    batch_features = interface(image_batch)

Calling on a slide:

slide = sf.slide.WSI(...)
interface = Features('/model/path', layers='postconv')
# Return shape: (slide.grid.shape[0], slide.grid.shape[1], num_features):
activations_grid = interface(slide)

Note

When this interface is called on a batch of images, no image processing or stain normalization will be performed, as it is assumed that normalization will occur during data loader image processing. When the interface is called on a slideflow.WSI, the normalization strategy will be read from the model configuration file, and normalization will be performed on image tiles extracted from the WSI. If this interface was created from an existing model and there is no model configuration file to read, a slideflow.norm.StainNormalizer object may be passed during initialization via the argument wsi_normalizer.

__init__(path, layers='postconv', include_logits=False)

Creates a features interface from a saved slideflow model which outputs feature activations at the designated layers.

Intermediate layers are returned in the order of layers. Logits are returned last.

Parameters
  • path (str) – Path to saved Slideflow model.

  • layers (list(str), optional) – Layers from which to generate activations. The post-convolution activation layer is accessed via ‘postconv’. Defaults to ‘postconv’.

  • include_logits (bool, optional) – Include logits in output. Will be returned last. Defaults to False.

classmethod from_model(model, layers='postconv', include_logits=False, wsi_normalizer=None)

Creates a features interface from a loaded slideflow model which outputs feature activations at the designated layers.

Intermediate layers are returned in the order of layers. Logits are returned last.

Parameters
  • model (tensorflow.keras.models.Model) – Loaded model.

  • layers (list(str), optional) – Layers from which to generate activations. The post-convolution activation layer is accessed via ‘postconv’. Defaults to ‘postconv’.

  • include_logits (bool, optional) – Include logits in output. Will be returned last. Defaults to False.

  • wsi_normalizer (slideflow.norm.StainNormalizer) – Stain normalizer to use on whole-slide images. Is not used on individual tile datasets via __call__. Defaults to None.

DatasetFeatures

class slideflow.model.DatasetFeatures(model, dataset, annotations=None, cache=None, manifest=None, **kwargs)

Loads annotations, saved layer activations / features, and prepares output saving directories. Will also read/write processed features to a PKL cache file to save time in future iterations.

Note

Storing logits along with layer features is optional, to offer the user reduced memory footprint. For example, saving logits for a 10,000 slide dataset with 1000 categorical outcomes would require:

4 bytes/float32-logit * 1000 logits/slide * 3000 tiles/slide * 10000 slides ~= 112 GB

__init__(model, dataset, annotations=None, cache=None, manifest=None, **kwargs)

Calculates features / layer activations from model, storing to internal parameters self.activations, and self.logits, self.locations, dictionaries mapping slides to arrays of activations, logits, and locations for each tiles’ constituent tiles.

Parameters
  • model (str) – Path to model from which to calculate activations.

  • dataset (slideflow.Dataset) – Dataset from which to generate activations.

  • annotations (dict, optional) – Dict mapping slide names to outcome categories.

  • cache (str, optional) – File for PKL cache.

  • manifest (dict, optional) – Dict mapping tfrecords to number of tiles contained. Used for progress bars.

Keyword Arguments
  • layers (str) – Model layer(s) from which to calculate activations. Defaults to ‘postconv’.

  • batch_size (int) – Batch size for activations calculations. Defaults to 32.

  • include_logits (bool) – Calculate and store logits. Defaults to True.

activations_by_category(idx)

For each outcome category, calculates activations of a given feature across all tiles in the category. Requires annotations to have been provided.

Parameters

idx (int) – Index of activations layer to return, stratified by outcome category.

Returns

Dict mapping categories to feature activations for all

tiles in the category.

Return type

dict

box_plots(features, outdir)

Generates plots comparing node activations at slide- and tile-level.

Parameters
  • features (list(int)) – List of feature indices for which to generate box plots.

  • outdir (str) – Path to directory in which to save box plots.

export_to_csv(filename, level='tile', method='mean', slides=None)

Exports calculated activations to csv.

Parameters
  • filename (str) – Path to CSV file for export.

  • level (str) – ‘tile’ or ‘slide’. Indicates whether tile or slide-level activations are saved. Defaults to ‘tile’.

  • method (str) – Method of summarizing slide-level results. Either ‘mean’ or ‘median’. Defaults to ‘mean’.

  • slides (list(str)) – Slides to export. If None, exports all slides. Defaults to None.

export_to_torch(outdir, slides=None)

Export activations in torch format to .pt files in the given directory.

Used for training CLAM models.

Parameters

outdir (str) – Path to directory in which to save .pt files.

logits_mean()

Calculates the mean logits vector across all tiles in each slide.

Returns

This is a dictionary mapping slides to the mean logits

array for all tiles in each slide.

Return type

dict

logits_percent(prediction_filter=None)

Returns dictionary mapping slides to a vector of length num_logits with the percent of tiles in each slide predicted to be each outcome.

Parameters

prediction_filter – (optional) List of int. If provided, will restrict predictions to only these categories, with final prediction being based based on highest logit among these categories.

Returns

This is a dictionary mapping slides to an array of

percentages for each logit, of length num_logits

Return type

dict

logits_predict(prediction_filter=None)

Returns slide-level predictions, assuming the model is predicting a categorical outcome, by generating a prediction for each individual tile, and making a slide-level prediction by finding the most frequently predicted outcome among its constituent tiles.

Parameters

prediction_filter – (optional) List of int. If provided, will restrict predictions to only these categories, with final prediction based based on highest logit among these categories.

Returns

Dictionary mapping slide names to slide-level predictions.

Return type

dict

map_to_predictions(x=0, y=0)

Returns coordinates and metadata for tile-level predictions for all tiles, which can be used to create a SlideMap.

Parameters
  • x (int, optional) – Outcome category id for which predictions will be mapped to the X-axis. Defaults to 0.

  • y (int, optional) – Outcome category id for which predictions will be mapped to the Y-axis. Defaults to 0.

Returns

List of x-axis coordinates (preds for the category ‘x’) list: List of y-axis coordinates (preds for the category ‘y’) list: List of dict containing tile-level metadata (for SlideMap)

Return type

list

merge(df)

Merges with another DatasetFeatures.

Parameters

df (slideflow.model.DatasetFeatures) – TargetDatasetFeatures to merge with.

Returns

None

remove_slide(slide)

Removes slide from internally cached activations.

save_example_tiles(features, outdir, slides=None, tiles_per_feature=100)

For a set of activation features, saves image tiles named according to their corresponding activations.

Duplicate image tiles will be saved for each feature, organized into subfolders named according to feature.

Parameters
  • features (list(int)) – Features to evaluate.

  • outdir (str) – Path to folder in which to save examples tiles.

  • slides (list, optional) – List of slide names. If provided, will only include tiles from these slides. Defaults to None.

  • tiles_per_feature (int, optional) – Number of tiles to include as examples for each feature. Defaults to 100. Will evenly sample this many tiles across the activation gradient.

stats(outdir=None, method='mean', threshold=0.5)

Calculates activation averages across categories, as well as tile-level and patient-level statistics, using ANOVA, exporting to CSV if desired.

Parameters
  • outdir (str, optional) – Path to directory in which CSV file will be saved. Defaults to None.

  • method (str, optional) – Indicates method of aggregating tile-level data into slide-level data. Either ‘mean’ (default) or ‘threshold’. If mean, slide-level feature data is calculated by averaging feature activations across all tiles. If threshold, slide-level feature data is calculated by counting the number of tiles with feature activations > threshold and dividing by the total number of tiles. Defaults to ‘mean’.

  • threshold (float, optional) – Threshold if using ‘threshold’ method.

Returns

Dict mapping slides to dict of slide-level features; dict: Dict mapping features to tile-level statistics (‘p’, ‘f’); dict: Dict mapping features to slide-level statistics (‘p’, ‘f’);

Return type

dict