• Docs >
  • Features / layer activations
Shortcuts

Features / layer activations

Once a model has been fully trained and evaluated, you may use the model to generate features from layer activations to gain better insight into the kinds of image features the model has learned.

Working with Layer Features

To work with features / intermediate layer activations calculated from a model, the slideflow.model.Features class will generate features on a tile or slide level, and the slideflow.model.DatasetFeatures class will generate features for an entire dataset.

DatasetFeatures

The easiest way to get started with intermediate layer activations is the slideflow.model.DatasetFeatures class, which is used to calculate and examine activations across an entire dataset. Instancing the class supervises the calculation and caching of layer activations, which can then be exported, viewed (as a mosaic map), or analyzed with various statistical methods. The project function slideflow.Project.generate_features() creates and returns an instance of this class.

features = P.generate_features('/path/to/trained_model')

Alternatively, you can create an instance of this class directly:

from slideflow.model import DatasetFeatures

dataset = P.dataset(299, 302)
labels, unique_outcomes = dataset.labels('HPV')

features = DatasetFeatures(
  model='/path/to/trained_model',
  dataset=dataset,
  annotations=labels
)

Tile-level feature activations for each slide can be accessed directly from slideflow.model.DatasetFeatures.activations, a dict mapping slide names to numpy arrays of shape (num_tiles, num_features). Logits are stored in slideflow.model.DatasetFeatures.logits, a dict mapping slide names to numpy arrays of shape (num_tiles, num_logits). Tile-level location data (coordinates from which the tiles were taken from their respective source slides) is stored in slideflow.model.DatasetFeatures.locations, a dict mapping slide names to numpy arrays of shape (num_tiles, 2) (x, y).

To return the average logits value for each slide (averaged across constituent tiles), use slideflow.model.DatasetFeatures.logits_mean(). Similarly, slideflow.model.DatasetFeatures.logits_predict() can be used to generate final slide-level logit predictions.

Features across categories can be statistically compared using slideflow.model.DatasetFeatures.stats(), which will calculate and save statistics to a specified directory.

features.stats('/outdir', method='mean')

To compare layer features across outcome categories and find features which differ significantly across categories, use the slideflow.model.DatasetFeatures.box_plots() function. For example, to generate boxplots for the first 100 features:

features.box_plots(range(100), '/outdir')
_images/boxplot_example.png

Many other functions are available, as described in the documentation, slideflow.model.DatasetFeatures.

Features

The slideflow.model.Features class can be used to generate layer activations / features for a single batch of images. For example, to calculate features for a batch of images while looping through a dataset:

from slideflow.model import Features

features = Features(layer='postconv')
for img_batch in dataset:
    postconv_features = features(img_batch)

You can choose to return features from any combination of intermediate layers by passing layer name(s) to the argument layer. The interface can also return logits, by passing include_logits=True.

To calculate layer features across an entire slide, the same interface can be called on a slideflow.WSI object, generating a grid of activations of size (slide.grid.shape[0], slide.grid.shape[1], num_features):

from slideflow import WSI
from slideflow.model import Features

slide = WSI(...)
interface = Features('/model/path', layers='postconv')
feature_grid = interface(slide)

Mosaic maps

To visualize the distribution of features across a dataset, a mosaic map can be created from a slideflow.model.DatasetFeatures instance. Mosaic maps are generated by using features (layer activations) from a dataset, performing dimensionality reduction (UMAP) on the activations (via slideflow.SlideMap), and overlaying tile images onto the UMAP (via slideflow.Mosaic). By default, the post-convolutional (‘postconv’) layer is used when calculating features, but any combination of other layers can be also be used. The Project class has a function which can supervise these steps automatically and save the final figure to the project directory.

features = P.generate_features('/path/to/trained_model')
mosaic = project.generate_mosaic(features)
mosaic.save('mosaic.png')
slideflow.Project.generate_mosaic(self, df, dataset=None, filters=None, filter_blank=None, outcomes=None, map_slide=None, show_prediction=None, restrict_pred=None, predict_on_axes=None, max_tiles=0, umap_cache=None, use_float=False, low_memory=False, **kwargs)
Generates a mosaic map by overlaying images onto mapped tiles.

Image tiles are extracted from the provided set of TFRecords, and predictions + features from layer activations are calculated using the specified model. Tiles are mapped either with UMAP of layer activations (default behavior), or by using outcome predictions for two categories, mapped to X- and Y-axis (via predict_on_axes).

Parameters
  • df (slideflow.model.DatasetFeatures) – Dataset.

  • dataset (slideflow.dataset.Dataset, optional) – Dataset from which to generate mosaic. If not supplied, will generate mosaic for all tfrecords at the tile_px/tile_um matching the supplied model, optionally using filters/filter_blank.

  • filters (dict, optional) – Filters dict to use when selecting tfrecords. Defaults to None.

  • filter_blank (list, optional) – Slides blank in these columns will be excluded. Defaults to None.

  • outcomes (list, optional) – Column name in annotations file from which to read category labels.

  • map_slide (str, optional) – None (default), ‘centroid’ or ‘average’. If provided, will map slides using slide-level calculations, either mapping centroid tiles if ‘centroid’, or calculating node averages across tiles in a slide and mapping slide-level node averages, if ‘average’.

  • show_prediction (int or str, optional) – May be either int or str, corresponding to label category. Predictions for this category will be displayed on the exported UMAP plot.

  • restrict_pred (list, optional) – List of int, if provided, restrict predictions to these categories. Final tile-level prediction is made by choosing category with highest logit.

  • predict_on_axes (list, optional) – (int, int). Each int corresponds to an label category id. If provided, predictions are generated for these two labels categories; tiles are then mapped with these predictions with the pattern (x, y) and the mosaic is generated from this map. This replaces the default UMAP.

  • max_tiles (int, optional) – Limits tiles taken from each slide. Defaults to 0.

  • umap_cache (str, optional) – Path to PKL file in which to save/cache UMAP coordinates. Defaults to None.

  • use_float (bool, optional) – Interpret labels as continuous instead of categorical. Defaults to False.

  • low_memory (bool, optional) – Limit memory during UMAP calculations. Defaults to False.

Keyword Arguments
  • resolution (str) – Mosaic map resolution. Low, medium, or high.

  • num_tiles_x (int) – Specifies the size of the mosaic map grid.

  • expanded (bool) – Controls tile assignment on grid spaces. If False, tile assignment is strict. If True, allows displaying nearby tiles if a grid is empty. Defaults to False.

  • leniency (float) – UMAP leniency. Defaults to 1.5.

Returns

Mosaic object.

Return type

slideflow.mosaic.Mosaic

_images/mosaic_example.png

To plot the underlying UMAP without overlaid images, the slideflow.SlideMap used to create the mosaic map can be accesssed via slideflow.Mosaic.slide_map. You can then use the slideflow.SlideMap.save() function to save the plot:

mosaic = project.generate_mosaic(...)
mosiac.slide_map.save('umap.png')

Tiles on the plot can be labeled using slide labels from the project annotations file, using the function slideflow.SlideMap.label_by_slide(). For example, the following will label the slide map according to the categorical outcome “HPV_status” in the project annotations file:

# Get slide labels
dataset = project.dataset(tile_px=299, tile_um=302)
labels, unique_lables = dataset.labels('HPV_status')

# Create the mosaic map and access the underlying SlideMap
mosaic = project.generate_mosaic(...)

# Label the slide map with our outcome
mosiac.slide_map.label_by_slide(labels)

# Save
mosiac.slide_map.save('umap_labeled.png')

By default, all tiles in a dataset (which may be hundreds of thousands or millions of images) will be mapped onto the mosaic map. Instead of mapping all tiles within a slide, you can alternatively choose to map only a single tile per slide with the argument map_slide='centroid'. This will calculate the tile nearest to centroid for each slide and display only this tile:

# Create the mosaic map and access the underlying SlideMap
mosaic = project.generate_mosaic(..., map_slide='centroid')

There are many additional arguments that can be provided to the slideflow.Project.generate_mosaic() function to customize the mosaic and UMAP plots, and many additional functions that can be applied to slideflow.Mosaic and slideflow.SlideMap. For example, it may be interesting to view a UMAP of tiles with an added third dimension, such as the activation value of a particular penultimate layer node. With this kind of plot, one can visualize how the activation of a particular node varies across the UMAP. To make such a plot, use the save_3d_plot function of the SlideMap:

mosaic = project.generate_mosaic(...)
mosiac.slide_map.save_3d_plot('3d_plot.png', feature=497)
_images/3d_umap.png