slideflow.model¶
This module provides the ModelParams class to organize model and training
parameters/hyperparameters and assist with model building, as well as the Trainer class that
executes model training and evaluation. LinearTrainer and CPHTrainer
are extensions of this class, supporting linear and Cox Proportional Hazards outcomes, respectively. The function
trainer_from_hp() can choose and return the correct model instance based on the provided
hyperparameters.
Note
In order to support both Tensorflow and PyTorch backends, the slideflow.model module will import either
slideflow.model.tensorflow or slideflow.model.torch according to the currently active backend,
indicated by the environmental variable SF_BACKEND.
Configuring and training models¶
slideflow.model.ModelParams will build models according to a set of model parameters and a given set of
outcome labels. To change the core image convolutional model to another architecture, set the model parameter
to the custom model class.
import CustomModel
from slideflow.model import ModelParams
mp = ModelParams(model=CustomModel, ...)
Working with layer activations¶
slideflow.model.Features creates an interface to efficiently generate features/layer activations and logits
from either a batch of images (returning a batch of activations/logits) or a whole-slide image (returning a grid of
activations/logits).
slideflow.model.DatasetFeatures calculates features and logits for an entire dataset, storing
result arrays into a dictionary mapping slide names to the generated activations. This buffer of whole-dataset
activations can then be used for functions requiring analysis of whole-dataset activations, including
slideflow.SlideMap and slideflow.mosiac.Mosaic.
ModelParams¶
- class slideflow.model.ModelParams(*args, **kwargs)¶
Build a set of hyperparameters.
- __init__(*args, **kwargs)¶
Collection of hyperparameters used for model building and training
- Parameters
tile_px (int, optional) – Tile width in pixels. Defaults to 299.
tile_um (int, optional) – Tile width in microns. Defaults to 302.
epochs (int, optional) – Number of epochs to train the full model. Defaults to 3.
toplayer_epochs (int, optional) – Number of epochs to only train the fully-connected layers. Defaults to 0.
model (str, optional) – Base model architecture name. Defaults to ‘xception’.
pooling (str, optional) – Post-convolution pooling. ‘max’, ‘avg’, or ‘none’. Defaults to ‘max’.
loss (str, optional) – Loss function. Defaults to ‘sparse_categorical_crossentropy’.
learning_rate (float, optional) – Learning rate. Defaults to 0.0001.
learning_rate_decay (int, optional) – Learning rate decay rate. Defaults to 0.
learning_rate_decay_steps (int, optional) – Learning rate decay steps. Defaults to 100000.
batch_size (int, optional) – Batch size. Defaults to 16.
hidden_layers (int, optional) – Number of fully-connected hidden layers after core model. Defaults to 0.
hidden_layer_width (int, optional) – Width of fully-connected hidden layers. Defaults to 500.
optimizer (str, optional) – Name of optimizer. Defaults to ‘Adam’.
early_stop (bool, optional) – Use early stopping. Defaults to False.
early_stop_patience (int, optional) – Patience for early stopping, in epochs. Defaults to 0.
early_stop_method (str, optional) – Metric to monitor for early stopping. Defaults to ‘loss’.
manual_early_stop_epoch (int, optional) – Manually override early stopping to occur at this epoch/batch. Defaults to None.
manual_early_stop_batch (int, optional) – Manually override early stopping to occur at this epoch/batch. Defaults to None.
training_balance ([type], optional) – Type of batch-level balancing to use during training. Options include ‘tile’, ‘category’, ‘patient’, ‘slide’, and None. Defaults to ‘category’ if a categorical loss is provided, and ‘patient’ if a linear loss is provided.
validation_balance ([type], optional) – Type of batch-level balancing to use during validation. Options include ‘tile’, ‘category’, ‘patient’, ‘slide’, and None. Defaults to ‘none’.
trainable_layers (int, optional) – Number of layers which are traininable. If 0, trains all layers. Defaults to 0.
l1 (int, optional) – L1 regularization weight. Defaults to 0.
l2 (int, optional) – L2 regularization weight. Defaults to 0.
l1_dense (int, optional) – L1 regularization weight for Dense layers. Defaults to the value of l1.
l2_dense (int, optional) – L2 regularization weight for Dense layers. Defaults to the value of l2.
dropout (int, optional) – Post-convolution dropout rate. Defaults to 0.
uq (bool, optional) – Use uncertainty quantification with dropout. Requires dropout > 0. Defaults to False.
augment (str) – Image augmentations to perform. String containing characters designating augmentations. ‘x’ indicates random x-flipping, ‘y’ y-flipping, ‘r’ rotating, and ‘j’ JPEG compression/decompression at random quality levels. Passing either ‘xyrj’ or True will use all augmentations.
normalizer (str, optional) – Normalization strategy to use on image tiles. Defaults to None.
normalizer_source (str, optional) – Path to normalizer source image. Defaults to None. If None but using a normalizer, will use an internal tile for normalization. Internal default tile can be found at slideflow.slide.norm_tile.jpg
include_top (bool, optional) – Include post-convolution fully-connected layers from the core model. Defaults to True. include_top=False is not currently compatible with the PyTorch backend.
drop_images (bool, optional) – Drop images, using only other slide-level features as input. Defaults to False.
- build_model(labels=None, num_classes=None, **kwargs)¶
Auto-detects model type (categorical, linear, CPH) from parameters and builds, using pretraining (imagenet) or the base layers of a supplied model.
- Parameters
labels (dict, optional) – Dict mapping slide names to outcomes. Used to detect number of outcome categories.
num_classes (int or dict, optional) – Either int (single categorical outcome, indicating number of classes) or dict (dict mapping categorical outcome names to number of unique categories in each outcome). Must supply either num_classes or label (can detect number of classes from labels)
num_slide_features (int, optional) – Number of slide-level features separate from image input. Defaults to 0.
activation (str, optional) – Type of final layer activation to use. Defaults to ‘softmax’ (categorical models) or ‘linear’ (linear or CPH models).
pretrain (str, optional) – Either ‘imagenet’ or path to model to use as pretraining. Defaults to ‘imagenet’.
checkpoint (str, optional) – Path to checkpoint from which to resume model training. Defaults to None.
- get_opt()¶
Returns optimizer with appropriate learning rate.
- model_type()¶
Returns either ‘linear’, ‘categorical’, or ‘cph’ depending on the loss type.
- validate()¶
Check that hyperparameter combinations are valid.
Trainer¶
- class slideflow.model.Trainer(hp, outdir, labels, patients, slide_input=None, name=None, manifest=None, feature_sizes=None, feature_names=None, outcome_names=None, mixed_precision=True, config=None, use_neptune=False, neptune_api=None, neptune_workspace=None)¶
Base trainer class containing functionality for model building, input processing, training, and evaluation.
This base class requires categorical outcome(s). Additional outcome types are supported by
slideflow.model.LinearTrainerandslideflow.model.CPHTrainer.Slide-level (e.g. clinical) features can be used as additional model input by providing slide labels in the slide annotations dictionary, under the key ‘input’.
- __init__(hp, outdir, labels, patients, slide_input=None, name=None, manifest=None, feature_sizes=None, feature_names=None, outcome_names=None, mixed_precision=True, config=None, use_neptune=False, neptune_api=None, neptune_workspace=None)¶
Sets base configuration, preparing model inputs and outputs.
- Parameters
hp (
slideflow.model.ModelParams) – ModelParams object.outdir (str) – Location where event logs and checkpoints will be written.
labels (dict) – Dict mapping slide names to outcome labels (int or float format).
patients (dict) – Dict mapping slide names to patient ID, as some patients may have multiple slides. If not provided, assumes 1:1 mapping between slide names and patients.
slide_input (dict) – Dict mapping slide names to additional slide-level input, concatenated after post-conv.
name (str, optional) – Optional name describing the model, used for model saving. Defaults to None.
manifest (dict, optional) – Manifest dictionary mapping TFRecords to number of tiles. Defaults to None.
model_type (str, optional) – Type of model outcome, ‘categorical’ or ‘linear’. Defaults to ‘categorical’.
feature_sizes (list, optional) – List of sizes of input features. Required if providing additional input features as input to the model.
feature_names (list, optional) – List of names for input features. Used when permuting feature importance.
outcome_names (list, optional) – Name of each outcome. Defaults to “Outcome {X}” for each outcome.
mixed_precision (bool, optional) – Use FP16 mixed precision (rather than FP32). Defaults to True.
config (dict, optional) – Training configuration dictionary, used for logging. Defaults to None.
use_neptune (bool, optional) – Use Neptune API logging. Defaults to False
neptune_api (str, optional) – Neptune API token, used for logging. Defaults to None.
neptune_workspace (str, optional) – Neptune workspace, used for logging. Defaults to None.
- evaluate(dataset, batch_size=None, permutation_importance=False, histogram=False, save_predictions=False, norm_fit=None, uq='auto')¶
Evaluate model, saving metrics and predictions.
- Parameters
dataset (
slideflow.dataset.Dataset) – Dataset containing TFRecords to evaluate.checkpoint (list, optional) – Path to cp.cpkt checkpoint to load. Defaults to None.
batch_size (int, optional) – Evaluation batch size. Defaults to the same as training (per self.hp)
permutation_importance (bool, optional) – Run permutation feature importance to define relative benefit of histology and each clinical slide-level feature input, if provided.
histogram (bool, optional) – Save histogram of tile predictions. Poorly optimized, uses seaborn, may drastically increase evaluation time. Defaults to False.
save_predictions (bool, optional) – Save tile, slide, and patient-level predictions to CSV. Defaults to False.
- Returns
Dictionary of evaluation metrics.
- predict(dataset, batch_size=None, norm_fit=None, format='csv')¶
Perform inference on a model, saving tile-level predictions.
- Parameters
dataset (
slideflow.dataset.Dataset) – Dataset containing TFRecords to evaluate.batch_size (int, optional) – Evaluation batch size. Defaults to the same as training (per self.hp)
format (str, optional) – Format in which to save predictions. Either ‘csv’ or ‘feather’. Defaults to ‘csv’.
- Returns
pandas.DataFrame of tile-level predictions.
- train(train_dts, val_dts, log_frequency=100, validate_on_batch=0, validation_batch_size=None, validation_steps=200, starting_epoch=0, ema_observations=20, ema_smoothing=2, use_tensorboard=True, steps_per_epoch_override=None, save_predictions=False, save_model=True, resume_training=None, pretrain='imagenet', checkpoint=None, multi_gpu=False, norm_fit=None, skip_val_without_es=True)¶
Builds and trains a model from hyperparameters.
- Parameters
train_dts (
slideflow.dataset.Dataset) – Dataset containing TFRecords for training.val_dts (
slideflow.dataset.Dataset) – Dataset containing TFRecords for validation.log_frequency (int, optional) – How frequent to update Tensorboard logs, in batches. Defaults to 100.
validate_on_batch (int, optional) – Validation will also be performed every N batches. Defaults to 0.
validation_batch_size (int, optional) – Validation batch size. Defaults to same as training (per self.hp).
validation_steps (int, optional) – Number of batches to use for each instance of validation. Defaults to 200.
starting_epoch (int, optional) – Starts training at the specified epoch. Defaults to 0.
ema_observations (int, optional) – Number of observations over which to perform exponential moving average smoothing. Defaults to 20.
ema_smoothing (int, optional) – Exponential average smoothing value. Defaults to 2.
use_tensoboard (bool, optional) – Enable tensorboard callbacks. Defaults to False.
steps_per_epoch_override (int, optional) – Manually set the number of steps per epoch. Defaults to None.
save_predictions (bool, optional) – Save tile, slide, and patient-level predictions at each evaluation. Defaults to False.
save_model (bool, optional) – Save models when evaluating at specified epochs. Defaults to True.
resume_training (str, optional) – Path to Tensorflow model to continue training. Defaults to None.
pretrain (str, optional) – Either ‘imagenet’ or path to Tensorflow model from which to load weights. Defaults to ‘imagenet’.
checkpoint (str, optional) – Path to cp.ckpt from which to load weights. Defaults to None.
- Returns
Nested results dictionary containing metrics for each evaluated epoch.
LinearTrainer¶
- class slideflow.model.LinearTrainer(*args, **kwargs)¶
Extends the base
slideflow.model.Trainerclass to add support for linear outcomes. Requires that all outcomes be linear, with appropriate linear loss function. Uses R-squared as the evaluation metric, rather than AUROC.- __init__(*args, **kwargs)¶
Sets base configuration, preparing model inputs and outputs.
- Parameters
hp (
slideflow.model.ModelParams) – ModelParams object.outdir (str) – Location where event logs and checkpoints will be written.
labels (dict) – Dict mapping slide names to outcome labels (int or float format).
patients (dict) – Dict mapping slide names to patient ID, as some patients may have multiple slides. If not provided, assumes 1:1 mapping between slide names and patients.
slide_input (dict) – Dict mapping slide names to additional slide-level input, concatenated after post-conv.
name (str, optional) – Optional name describing the model, used for model saving. Defaults to None.
manifest (dict, optional) – Manifest dictionary mapping TFRecords to number of tiles. Defaults to None.
model_type (str, optional) – Type of model outcome, ‘categorical’ or ‘linear’. Defaults to ‘categorical’.
feature_sizes (list, optional) – List of sizes of input features. Required if providing additional input features as input to the model.
feature_names (list, optional) – List of names for input features. Used when permuting feature importance.
outcome_names (list, optional) – Name of each outcome. Defaults to “Outcome {X}” for each outcome.
mixed_precision (bool, optional) – Use FP16 mixed precision (rather than FP32). Defaults to True.
config (dict, optional) – Training configuration dictionary, used for logging. Defaults to None.
use_neptune (bool, optional) – Use Neptune API logging. Defaults to False
neptune_api (str, optional) – Neptune API token, used for logging. Defaults to None.
neptune_workspace (str, optional) – Neptune workspace, used for logging. Defaults to None.
- evaluate(dataset, batch_size=None, permutation_importance=False, histogram=False, save_predictions=False, norm_fit=None, uq='auto')¶
Evaluate model, saving metrics and predictions.
- Parameters
dataset (
slideflow.dataset.Dataset) – Dataset containing TFRecords to evaluate.checkpoint (list, optional) – Path to cp.cpkt checkpoint to load. Defaults to None.
batch_size (int, optional) – Evaluation batch size. Defaults to the same as training (per self.hp)
permutation_importance (bool, optional) – Run permutation feature importance to define relative benefit of histology and each clinical slide-level feature input, if provided.
histogram (bool, optional) – Save histogram of tile predictions. Poorly optimized, uses seaborn, may drastically increase evaluation time. Defaults to False.
save_predictions (bool, optional) – Save tile, slide, and patient-level predictions to CSV. Defaults to False.
- Returns
Dictionary of evaluation metrics.
- predict(dataset, batch_size=None, norm_fit=None, format='csv')¶
Perform inference on a model, saving tile-level predictions.
- Parameters
dataset (
slideflow.dataset.Dataset) – Dataset containing TFRecords to evaluate.batch_size (int, optional) – Evaluation batch size. Defaults to the same as training (per self.hp)
format (str, optional) – Format in which to save predictions. Either ‘csv’ or ‘feather’. Defaults to ‘csv’.
- Returns
pandas.DataFrame of tile-level predictions.
- train(train_dts, val_dts, log_frequency=100, validate_on_batch=0, validation_batch_size=None, validation_steps=200, starting_epoch=0, ema_observations=20, ema_smoothing=2, use_tensorboard=True, steps_per_epoch_override=None, save_predictions=False, save_model=True, resume_training=None, pretrain='imagenet', checkpoint=None, multi_gpu=False, norm_fit=None, skip_val_without_es=True)¶
Builds and trains a model from hyperparameters.
- Parameters
train_dts (
slideflow.dataset.Dataset) – Dataset containing TFRecords for training.val_dts (
slideflow.dataset.Dataset) – Dataset containing TFRecords for validation.log_frequency (int, optional) – How frequent to update Tensorboard logs, in batches. Defaults to 100.
validate_on_batch (int, optional) – Validation will also be performed every N batches. Defaults to 0.
validation_batch_size (int, optional) – Validation batch size. Defaults to same as training (per self.hp).
validation_steps (int, optional) – Number of batches to use for each instance of validation. Defaults to 200.
starting_epoch (int, optional) – Starts training at the specified epoch. Defaults to 0.
ema_observations (int, optional) – Number of observations over which to perform exponential moving average smoothing. Defaults to 20.
ema_smoothing (int, optional) – Exponential average smoothing value. Defaults to 2.
use_tensoboard (bool, optional) – Enable tensorboard callbacks. Defaults to False.
steps_per_epoch_override (int, optional) – Manually set the number of steps per epoch. Defaults to None.
save_predictions (bool, optional) – Save tile, slide, and patient-level predictions at each evaluation. Defaults to False.
save_model (bool, optional) – Save models when evaluating at specified epochs. Defaults to True.
resume_training (str, optional) – Path to Tensorflow model to continue training. Defaults to None.
pretrain (str, optional) – Either ‘imagenet’ or path to Tensorflow model from which to load weights. Defaults to ‘imagenet’.
checkpoint (str, optional) – Path to cp.ckpt from which to load weights. Defaults to None.
- Returns
Nested results dictionary containing metrics for each evaluated epoch.
CPHTrainer¶
- class slideflow.model.CPHTrainer(*args, **kwargs)¶
Cox Proportional Hazards model. Requires that the user provide event data as the first input feature, and time to outcome as the linear outcome. Uses concordance index as the evaluation metric.
- __init__(*args, **kwargs)¶
Sets base configuration, preparing model inputs and outputs.
- Parameters
hp (
slideflow.model.ModelParams) – ModelParams object.outdir (str) – Location where event logs and checkpoints will be written.
labels (dict) – Dict mapping slide names to outcome labels (int or float format).
patients (dict) – Dict mapping slide names to patient ID, as some patients may have multiple slides. If not provided, assumes 1:1 mapping between slide names and patients.
slide_input (dict) – Dict mapping slide names to additional slide-level input, concatenated after post-conv.
name (str, optional) – Optional name describing the model, used for model saving. Defaults to None.
manifest (dict, optional) – Manifest dictionary mapping TFRecords to number of tiles. Defaults to None.
model_type (str, optional) – Type of model outcome, ‘categorical’ or ‘linear’. Defaults to ‘categorical’.
feature_sizes (list, optional) – List of sizes of input features. Required if providing additional input features as input to the model.
feature_names (list, optional) – List of names for input features. Used when permuting feature importance.
outcome_names (list, optional) – Name of each outcome. Defaults to “Outcome {X}” for each outcome.
mixed_precision (bool, optional) – Use FP16 mixed precision (rather than FP32). Defaults to True.
config (dict, optional) – Training configuration dictionary, used for logging. Defaults to None.
use_neptune (bool, optional) – Use Neptune API logging. Defaults to False
neptune_api (str, optional) – Neptune API token, used for logging. Defaults to None.
neptune_workspace (str, optional) – Neptune workspace, used for logging. Defaults to None.
- evaluate(dataset, batch_size=None, permutation_importance=False, histogram=False, save_predictions=False, norm_fit=None, uq='auto')¶
Evaluate model, saving metrics and predictions.
- Parameters
dataset (
slideflow.dataset.Dataset) – Dataset containing TFRecords to evaluate.checkpoint (list, optional) – Path to cp.cpkt checkpoint to load. Defaults to None.
batch_size (int, optional) – Evaluation batch size. Defaults to the same as training (per self.hp)
permutation_importance (bool, optional) – Run permutation feature importance to define relative benefit of histology and each clinical slide-level feature input, if provided.
histogram (bool, optional) – Save histogram of tile predictions. Poorly optimized, uses seaborn, may drastically increase evaluation time. Defaults to False.
save_predictions (bool, optional) – Save tile, slide, and patient-level predictions to CSV. Defaults to False.
- Returns
Dictionary of evaluation metrics.
- predict(dataset, batch_size=None, norm_fit=None, format='csv')¶
Perform inference on a model, saving tile-level predictions.
- Parameters
dataset (
slideflow.dataset.Dataset) – Dataset containing TFRecords to evaluate.batch_size (int, optional) – Evaluation batch size. Defaults to the same as training (per self.hp)
format (str, optional) – Format in which to save predictions. Either ‘csv’ or ‘feather’. Defaults to ‘csv’.
- Returns
pandas.DataFrame of tile-level predictions.
- train(train_dts, val_dts, log_frequency=100, validate_on_batch=0, validation_batch_size=None, validation_steps=200, starting_epoch=0, ema_observations=20, ema_smoothing=2, use_tensorboard=True, steps_per_epoch_override=None, save_predictions=False, save_model=True, resume_training=None, pretrain='imagenet', checkpoint=None, multi_gpu=False, norm_fit=None, skip_val_without_es=True)¶
Builds and trains a model from hyperparameters.
- Parameters
train_dts (
slideflow.dataset.Dataset) – Dataset containing TFRecords for training.val_dts (
slideflow.dataset.Dataset) – Dataset containing TFRecords for validation.log_frequency (int, optional) – How frequent to update Tensorboard logs, in batches. Defaults to 100.
validate_on_batch (int, optional) – Validation will also be performed every N batches. Defaults to 0.
validation_batch_size (int, optional) – Validation batch size. Defaults to same as training (per self.hp).
validation_steps (int, optional) – Number of batches to use for each instance of validation. Defaults to 200.
starting_epoch (int, optional) – Starts training at the specified epoch. Defaults to 0.
ema_observations (int, optional) – Number of observations over which to perform exponential moving average smoothing. Defaults to 20.
ema_smoothing (int, optional) – Exponential average smoothing value. Defaults to 2.
use_tensoboard (bool, optional) – Enable tensorboard callbacks. Defaults to False.
steps_per_epoch_override (int, optional) – Manually set the number of steps per epoch. Defaults to None.
save_predictions (bool, optional) – Save tile, slide, and patient-level predictions at each evaluation. Defaults to False.
save_model (bool, optional) – Save models when evaluating at specified epochs. Defaults to True.
resume_training (str, optional) – Path to Tensorflow model to continue training. Defaults to None.
pretrain (str, optional) – Either ‘imagenet’ or path to Tensorflow model from which to load weights. Defaults to ‘imagenet’.
checkpoint (str, optional) – Path to cp.ckpt from which to load weights. Defaults to None.
- Returns
Nested results dictionary containing metrics for each evaluated epoch.
trainer_from_hp¶
- slideflow.model.trainer_from_hp(hp, **kwargs)¶
From the given
slideflow.model.ModelParamsobject, returns the appropriate instance ofslideflow.model.Model.- Parameters
hp (
slideflow.model.ModelParams) – ModelParams object.- Keyword Arguments
outdir (str) – Target location for event logs and checkpoints.
annotations (dict) – Nested dict, mapping slide names to a dict with patient name (key ‘patient’), outcome labels (key ‘outcome_label’), and any additional slide-level inputs (key ‘input’).
name (str, optional) – Optional name describing the model, used for model saving. Defaults to None.
manifest (dict, optional) – Manifest dictionary mapping TFRecords to number of tiles. Defaults to None.
model_type (str, optional) – Type of model outcome, ‘categorical’ or ‘linear’. Defaults to ‘categorical’.
feature_sizes (list, optional) – List of sizes of input features. Required if providing additional input features as model input.
feature_names (list, optional) – List of names for input features. Used when permuting feature importance.
outcome_names (list, optional) – Name of each outcome. Defaults to “Outcome {X}” for each outcome.
mixed_precision (bool, optional) – Use FP16 mixed precision (rather than FP32). Defaults to True.
Features¶
- class slideflow.model.Features(path, layers='postconv', include_logits=False)¶
Interface for obtaining logits and features from intermediate layer activations from Slideflow models.
Use by calling on either a batch of images (returning outputs for a single batch), or by calling on a
slideflow.WSIobject, which will generate an array of spatially-mapped activations matching the slide.- Examples
Calling on batch of images:
interface = Features('/model/path', layers='postconv') for image_batch in train_data: # Return shape: (batch_size, num_features) batch_features = interface(image_batch)
Calling on a slide:
slide = sf.slide.WSI(...) interface = Features('/model/path', layers='postconv') # Return shape: (slide.grid.shape[0], slide.grid.shape[1], num_features): activations_grid = interface(slide)
Note
When this interface is called on a batch of images, no image processing or stain normalization will be performed, as it is assumed that normalization will occur during data loader image processing. When the interface is called on a slideflow.WSI, the normalization strategy will be read from the model configuration file, and normalization will be performed on image tiles extracted from the WSI. If this interface was created from an existing model and there is no model configuration file to read, a slideflow.norm.StainNormalizer object may be passed during initialization via the argument wsi_normalizer.
- __init__(path, layers='postconv', include_logits=False)¶
Creates a features interface from a saved slideflow model which outputs feature activations at the designated layers.
Intermediate layers are returned in the order of layers. Logits are returned last.
- Parameters
path (str) – Path to saved Slideflow model.
layers (list(str), optional) – Layers from which to generate activations. The post-convolution activation layer is accessed via ‘postconv’. Defaults to ‘postconv’.
include_logits (bool, optional) – Include logits in output. Will be returned last. Defaults to False.
- classmethod from_model(model, layers='postconv', include_logits=False, wsi_normalizer=None)¶
Creates a features interface from a loaded slideflow model which outputs feature activations at the designated layers.
Intermediate layers are returned in the order of layers. Logits are returned last.
- Parameters
model (
tensorflow.keras.models.Model) – Loaded model.layers (list(str), optional) – Layers from which to generate activations. The post-convolution activation layer is accessed via ‘postconv’. Defaults to ‘postconv’.
include_logits (bool, optional) – Include logits in output. Will be returned last. Defaults to False.
wsi_normalizer (
slideflow.norm.StainNormalizer) – Stain normalizer to use on whole-slide images. Is not used on individual tile datasets via __call__. Defaults to None.
DatasetFeatures¶
- class slideflow.model.DatasetFeatures(model, dataset, annotations=None, cache=None, manifest=None, **kwargs)¶
Loads annotations, saved layer activations / features, and prepares output saving directories. Will also read/write processed features to a PKL cache file to save time in future iterations.
Note
Storing logits along with layer features is optional, to offer the user reduced memory footprint. For example, saving logits for a 10,000 slide dataset with 1000 categorical outcomes would require:
4 bytes/float32-logit * 1000 logits/slide * 3000 tiles/slide * 10000 slides ~= 112 GB
- __init__(model, dataset, annotations=None, cache=None, manifest=None, **kwargs)¶
Calculates features / layer activations from model, storing to internal parameters self.activations, and self.logits, self.locations, dictionaries mapping slides to arrays of activations, logits, and locations for each tiles’ constituent tiles.
- Parameters
model (str) – Path to model from which to calculate activations.
dataset (
slideflow.Dataset) – Dataset from which to generate activations.annotations (dict, optional) – Dict mapping slide names to outcome categories.
cache (str, optional) – File for PKL cache.
manifest (dict, optional) – Dict mapping tfrecords to number of tiles contained. Used for progress bars.
- Keyword Arguments
- activations_by_category(idx)¶
For each outcome category, calculates activations of a given feature across all tiles in the category. Requires annotations to have been provided.
- box_plots(features, outdir)¶
Generates plots comparing node activations at slide- and tile-level.
- export_to_csv(filename, level='tile', method='mean', slides=None)¶
Exports calculated activations to csv.
- Parameters
filename (str) – Path to CSV file for export.
level (str) – ‘tile’ or ‘slide’. Indicates whether tile or slide-level activations are saved. Defaults to ‘tile’.
method (str) – Method of summarizing slide-level results. Either ‘mean’ or ‘median’. Defaults to ‘mean’.
slides (list(str)) – Slides to export. If None, exports all slides. Defaults to None.
- export_to_torch(outdir, slides=None)¶
Export activations in torch format to .pt files in the given directory.
Used for training CLAM models.
- Parameters
outdir (str) – Path to directory in which to save .pt files.
- logits_mean()¶
Calculates the mean logits vector across all tiles in each slide.
- Returns
- This is a dictionary mapping slides to the mean logits
array for all tiles in each slide.
- Return type
- logits_percent(prediction_filter=None)¶
Returns dictionary mapping slides to a vector of length num_logits with the percent of tiles in each slide predicted to be each outcome.
- Parameters
prediction_filter – (optional) List of int. If provided, will restrict predictions to only these categories, with final prediction being based based on highest logit among these categories.
- Returns
- This is a dictionary mapping slides to an array of
percentages for each logit, of length num_logits
- Return type
- logits_predict(prediction_filter=None)¶
Returns slide-level predictions, assuming the model is predicting a categorical outcome, by generating a prediction for each individual tile, and making a slide-level prediction by finding the most frequently predicted outcome among its constituent tiles.
- Parameters
prediction_filter – (optional) List of int. If provided, will restrict predictions to only these categories, with final prediction based based on highest logit among these categories.
- Returns
Dictionary mapping slide names to slide-level predictions.
- Return type
- map_to_predictions(x=0, y=0)¶
Returns coordinates and metadata for tile-level predictions for all tiles, which can be used to create a SlideMap.
- Parameters
- Returns
List of x-axis coordinates (preds for the category ‘x’) list: List of y-axis coordinates (preds for the category ‘y’) list: List of dict containing tile-level metadata (for SlideMap)
- Return type
- merge(df)¶
Merges with another DatasetFeatures.
- Parameters
df (slideflow.model.DatasetFeatures) – TargetDatasetFeatures to merge with.
- Returns
None
- remove_slide(slide)¶
Removes slide from internally cached activations.
- save_example_tiles(features, outdir, slides=None, tiles_per_feature=100)¶
For a set of activation features, saves image tiles named according to their corresponding activations.
Duplicate image tiles will be saved for each feature, organized into subfolders named according to feature.
- Parameters
outdir (str) – Path to folder in which to save examples tiles.
slides (list, optional) – List of slide names. If provided, will only include tiles from these slides. Defaults to None.
tiles_per_feature (int, optional) – Number of tiles to include as examples for each feature. Defaults to 100. Will evenly sample this many tiles across the activation gradient.
- stats(outdir=None, method='mean', threshold=0.5)¶
Calculates activation averages across categories, as well as tile-level and patient-level statistics, using ANOVA, exporting to CSV if desired.
- Parameters
outdir (str, optional) – Path to directory in which CSV file will be saved. Defaults to None.
method (str, optional) – Indicates method of aggregating tile-level data into slide-level data. Either ‘mean’ (default) or ‘threshold’. If mean, slide-level feature data is calculated by averaging feature activations across all tiles. If threshold, slide-level feature data is calculated by counting the number of tiles with feature activations > threshold and dividing by the total number of tiles. Defaults to ‘mean’.
threshold (float, optional) – Threshold if using ‘threshold’ method.
- Returns
Dict mapping slides to dict of slide-level features; dict: Dict mapping features to tile-level statistics (‘p’, ‘f’); dict: Dict mapping features to slide-level statistics (‘p’, ‘f’);
- Return type