Shortcuts

slideflow.Project

This class provides a high-level interface that simplifies execution of pipeline functions. Nearly all pipeline tasks can be accomplished with the methods in this class, although directly interacting with the various objects in this package will enable more granular control.

class slideflow.Project(root, use_neptune=False, **project_kwargs)

Assists with project organization and execution of pipeline functions.

Standard instantiation with __init__ assumes a project already exists at a given directory, or that configuration will be supplied via kwargs. Alternatively, a project may be instantiated using from_prompt(), which interactively guides users through configuration.

Interactive instantiation:

>>> import slideflow as sf
>>> P = sf.Project.from_prompt('/project/path')
What is the project name?

Manual configuration:

>>> import slideflow as sf
>>> P = sf.Project('/project/path', name=..., ...)
__init__(root, use_neptune=False, **project_kwargs)

Initializes project at the specified project folder, creating a new project using the specified kwargs if one does not already exist. Will create a blank annotations with slide names if one does not exist.

Parameters

root (str) – Path to project directory.

Keyword Arguments
  • name (str) – Project name. Defaults to ‘MyProject’.

  • annotations (str) – Path to annotations CSV file. Defaults to ‘./annotations.csv’

  • dataset_config (str) – Path to dataset configuration JSON file. Defaults to ‘./datasets.json’.

  • sources (list(str)) – List of dataset sources to include in project. Defaults to ‘source1’.

  • models_dir (str) – Path to directory in which to save models. Defaults to ‘./models’.

  • eval_dir (str) – Path to directory in which to save evaluations. Defaults to ‘./eval’.

Raises

slideflow.errors.ProjectError – if project folder does not exist, or the folder exists but kwargs are provided.

add_source(name, slides, roi, tiles, tfrecords, path=None)

Adds a dataset source to the dataset configuration file.

Parameters
  • name (str) – Dataset source name.

  • slides (str) – Path to directory containing slides.

  • roi (str) – Path to directory containing CSV ROIs.

  • tiles (str) – Path to directory for storing extracted tiles.

  • tfrecords (str) – Path to directory for storing TFRecords of tiles.

  • path (str, optional) – Path to dataset configuration file. Defaults to None. If not provided, uses project default.

property annotations

Path to annotations file.

associate_slide_names()

Automatically associate patients with slides in the annotations.

create_blank_annotations(filename=None)

Creates an empty annotations file.

Parameters

filename (str) – Annotations file destination. If not provided, will use project default.

create_hp_sweep(filename='sweep.json', label=None, **kwargs)

Prepares a hyperparameter sweep, saving to a batch train TSV file.

Parameters
  • label (str, optional) – Label to use when naming models in sweep. Defaults to None.

  • filename (str, optional) – Filename for hyperparameter sweep. Overwrites existing files. Saves in project root directory. Defaults to “sweep.json”.

dataset(tile_px=None, tile_um=None, verification='both', **kwargs)

Returns slideflow.Dataset object using project settings.

Parameters
  • tile_px (int) – Tile size in pixels

  • tile_um (int) – Tile size in microns

Keyword Arguments
  • filters (dict, optional) – Filters for selecting tfrecords. Defaults to None.

  • filter_blank (list, optional) – Exclude slides blank in these cols. Defaults to None.

  • min_tiles (int, optional) – Min tiles a slide must have. Defaults to 0.

  • config (str, optional) – Path to dataset configuration JSON file. Defaults to project default.

  • sources (str, list(str), optional) – Dataset sources to use from configuration. Defaults to project default.

  • verification (str, optional) – ‘tfrecords’, ‘slides’, or ‘both’. If ‘slides’, verify all annotations are mapped to slides. If ‘tfrecords’, check that TFRecords exist and update manifest. Defaults to ‘both’.

property dataset_config

Path to dataset configuration JSON file.

property eval_dir

Path to evaluation directory.

evaluate(model, outcomes, dataset=None, filters=None, checkpoint=None, eval_k_fold=None, splits='splits.json', max_tiles=0, min_tiles=0, input_header=None, mixed_precision=True, **kwargs)

Evaluates a saved model on a given set of tfrecords.

Parameters
  • model (str) – Path to model to evaluate.

  • outcomes (str) – Str or list of str. Annotation column header specifying the outcome label(s).

  • dataset (slideflow.dataset.Dataset, optional) – Dataset from which to generate activations. If not supplied, will calculate activations for all project tfrecords at the tile_px/tile_um matching the supplied model, optionally using provided filters and filter_blank.

  • filters (dict, optional) – Filters dict to use when selecting tfrecords. Defaults to None. See get_dataset() documentation for more information on filtering.

  • checkpoint (str, optional) – Path to cp.ckpt file, if evaluating a saved checkpoint. Defaults to None.

  • eval_k_fold (int, optional) – K-fold iteration number to evaluate. Defaults to None. If None, will evaluate all tfrecords irrespective of K-fold.

  • splits (str, optional) – Filename of JSON file in which to log train/val splits. Looks for filename in project root directory. Defaults to “splits.json”.

  • max_tiles (int, optional) – Maximum number of tiles from each slide to evaluate. Defaults to 0. If zero, will include all tiles.

  • min_tiles (int, optional) – Minimum number of tiles a slide must have to be included in evaluation. Defaults to 0.

  • input_header (str, optional) – Annotation column header to use as additional input. Defaults to None.

  • mixed_precision (bool, optional) – Enable mixed precision. Defaults to True.

Keyword Arguments
  • save_predictions (bool or str, optional) – Either True, False, or any combination of ‘tile’, ‘patient’, or ‘slide’, as string or list of strings. Save tile-level, patient-level, and/or slide-level predictions. If True, will save all.

  • histogram (bool, optional) – Create tile-level histograms for each class. Defaults to False.

  • permutation_importance (bool, optional) – Calculate the permutation feature importance. Determine relative importance when using multiple model inputs. Only available for Tensorflow backend. Defaults to False.

Returns

Dictionary of keras training results, nested by epoch.

Return type

Dict

evaluate_clam(exp_name, pt_files, outcomes, tile_px, tile_um, k=0, eval_tag=None, filters=None, filter_blank=None, attention_heatmaps=True)

Evaluate CLAM model on saved activations & export attention heatmaps.

Parameters
  • exp_name (str) – Name of experiment to evaluate (subfolder in clam/)

  • pt_files (str) – Path to pt_files containing tile-level features.

  • outcomes (str or list) – Annotation column that specifies labels.

  • tile_px (int) – Tile width in pixels.

  • tile_um (int) – Tile width in microns.

  • k (int, optional) – K-fold / split iteration to evaluate. Evaluates the model saved as s_{k}_checkpoint.pt. Defaults to 0.

  • eval_tag (str, optional) – Unique identifier for this evaluation. Defaults to None

  • filters (dict, optional) – Filters to use when selecting tfrecords. Defaults to None.

  • filter_blank (list, optional) – Exclude slides blank in these cols. Defaults to None.

  • attention_heatmaps (bool, optional) – Save attention heatmaps of validation dataset. Defaults to True.

Returns

None

extract_tiles(tile_px, tile_um, filters=None, filter_blank=None, **kwargs)

Extracts tiles from slides. Preferred use is calling slideflow.dataset.Dataset.extract_tiles() on a slideflow.dataset.Dataset directly.

Parameters
  • save_tiles (bool, optional) – Save tile images in loose format. Defaults to False.

  • save_tfrecords (bool, optional) – Save tile images as TFRecords. Defaults to True.

  • source (str, optional) – Process slides only from this source. Defaults to None (all slides in project).

  • stride_div (int, optional) – Stride divisor. Defaults to 1. A stride of 1 will extract non-overlapping tiles. A stride_div of 2 will extract overlapping tiles with a stride equal to 50% of the tile width.

  • enable_downsample (bool, optional) – Enable downsampling when reading slides. Defaults to True. This may result in corrupted image tiles if downsampled slide layers are corrupted or incomplete. Recommend manual confirmation of tile integrity.

  • roi_method (str, optional) – Either ‘inside’, ‘outside’ or ‘ignore’. Indicates whether tiles are extracted inside or outside ROIs, or if ROIs are ignored entirely. Defaults to ‘inside’.

  • skip_missing_roi (bool, optional) – Skip slides that missing ROIs. Defaults to False.

  • skip_extracted (bool, optional) – Skip already extracted slides. Defaults to True.

  • tma (bool, optional) – Reads slides as Tumor Micro-Arrays (TMAs), detecting and extracting tumor cores. Defaults to False.

  • randomize_origin (bool, optional) – Randomize pixel starting position during extraction. Defaults to False.

  • buffer (str, optional) – Copy slides here before extraction. Improves processing speed if using an SSD/ramdisk buffer. Defaults to None.

  • num_workers (int, optional) – Extract tiles from this many slides simultaneously. Defaults to 1.

  • q_size (int, optional) – Queue size for buffer. Defaults to 4.

  • qc (str, optional) – ‘otsu’, ‘blur’, ‘both’, or None. Perform blur detection quality control - discarding tiles with detected out-of-focus regions or artifact - and/or otsu’s method. Defaults to None.

  • report (bool, optional) – Save a PDF report of tile extraction. Defaults to True.

Keyword Arguments
  • normalizer (str, optional) – Normalization strategy. Defaults to None.

  • normalizer_source (str, optional) – Path to normalizer source image. Defaults to None (use internal image at slide.norm_tile.jpg).

  • whitespace_fraction (float, optional) – Range 0-1. Defaults to 1. Discard tiles with this fraction of whitespace. If 1, will not perform whitespace filtering.

  • whitespace_threshold (int, optional) – Range 0-255. Threshold above which a pixel (RGB average) is considered whitespace. Defaults to 230.

  • grayspace_fraction (float, optional) – Range 0-1. Discard tiles with this fraction of grayspace. If 1, will not perform grayspace filtering. Defaults to 0.6.

  • grayspace_threshold (float, optional) – Range 0-1. Pixels in HSV format with saturation below this are considered grayspace. Defaults to 0.05.

  • img_format (str, optional) – ‘png’ or ‘jpg’. Defaults to ‘jpg’. Image format to use in tfrecords. PNG (lossless) for fidelity, JPG (lossy) for efficiency.

  • full_core (bool, optional) – Only used if extracting from TMA. Save entire TMA core as image. Otherwise, will extract sub-images from each core at the tile micron size. Defaults to False.

  • shuffle (bool, optional) – Shuffle tiles before tfrecords storage. Defaults to True.

  • num_threads (int, optional) – Threads for each tile extractor. Defaults to 4.

  • qc_blur_radius (int, optional) – Blur radius for out-of-focus area detection. Used if qc=True. Defaults to 3.

  • qc_blur_threshold (float, optional) – Blur threshold for detecting out-of-focus areas. Used if qc=True. Defaults to 0.1.

  • qc_filter_threshold (float, optional) – Float between 0-1. Tiles with more than this proportion of blur will be discarded. Used if qc=True. Defaults to 0.6.

  • qc_mpp (float, optional) – Microns-per-pixel indicating image magnification level at which quality control is performed. Defaults to mpp=4 (effective magnification 2.5 X)

  • dry_run (bool, optional) – Determine tiles that would be extracted, but do not export any images. Defaults to None.

classmethod from_prompt(root, **kwargs)

Initializes project by creating project folder, prompting user for project settings, and saving to “settings.json” in project directory.

Parameters

root (str) – Path to project directory.

generate_features(model, dataset=None, filters=None, filter_blank=None, min_tiles=0, max_tiles=0, outcomes=None, torch_export=None, **kwargs)

Calculate layer features / activations.

Parameters
  • model (str) – Path to model

  • dataset (slideflow.dataset.Dataset, optional) – Dataset from which to generate activations. If not supplied, calculate activations for all tfrecords compatible with the model, optionally using provided filters and filter_blank.

  • filters (dict, optional) – Filters to use when selecting tfrecords. Defaults to None.

  • filter_blank (list, optional) – Slides blank in these columns will be excluded. Defaults to None.

  • min_tiles (int, optional) – Only include slides with this minimum number of tiles. Defaults to 0.

  • max_tiles (int, optional) – Only include maximum of this many tiles per slide. Defaults to 0 (all tiles).

  • outcomes (list, optional) – Column header(s) in annotations file. Used for category-level comparisons. Defaults to None.

  • torch_export (str, optional) – Path. Export activations to torch-compatible file at this location. Defaults to None.

Keyword Arguments
  • layers (list(str)) – Layers from which to generate activations. Defaults to ‘postconv’.

  • export (str) – Path to CSV file. Save activations in CSV format. Defaults to None.

  • cache (str) – Path to PKL file. Cache activations at this location. Defaults to None.

  • include_logits (bool) – Generate and store logit predictions along with layer activations. Defaults to True.

  • batch_size (int) – Batch size to use when calculating activations. Defaults to 32.

Return type

slideflow.model.DatasetFeatures

generate_features_for_clam(model, outdir='auto', layers='postconv', max_tiles=0, min_tiles=16, filters=None, filter_blank=None, force_regenerate=False)

Generate tile-level features for slides for use with CLAM.

Parameters
  • model (str) – Path to model from which to generate activations. May provide either this or “pt_files”

  • outdir (str, optional) – Save exported activations in .pt format. Defaults to ‘auto’ (project directory).

  • layers (list, optional) – Which model layer(s) generate activations. Defaults to ‘postconv’.

  • max_tiles (int, optional) – Maximum tiles to take per slide. Defaults to 0.

  • min_tiles (int, optional) – Minimum tiles per slide. Skip slides not meeting this threshold. Defaults to 8.

  • filters (dict, optional) – Filters to use when selecting tfrecords. Defaults to None.

  • filter_blank (list, optional) – Slides blank in these columns will be excluded. Defaults to None.

  • force_regenerate (bool, optional) – Forcibly regenerate activations for all slides even if .pt file exists. Defaults to False.

Returns

Path to directory containing exported .pt files

generate_heatmaps(model, filters=None, filter_blank=None, outdir=None, resolution='low', batch_size=32, roi_method='inside', buffer=None, num_threads=None, skip_completed=False, **kwargs)

Creates predictive heatmap overlays on a set of slides.

Parameters
  • model (str) – Path to Tensorflow model.

  • filters (dict, optional) – Filters to use when selecting tfrecords. Defaults to None.

  • filter_blank (list, optional) – Exclude slides blank in these cols. Defaults to None.

  • outdir (path, optional) – Directory in which to save heatmap images.

  • resolution (str, optional) – Heatmap resolution. Defaults to ‘low’. “low” uses a stride equal to tile width. “medium” uses a stride equal 1/2 tile width. “high” uses a stride equal to 1/4 tile width.

  • batch_size (int, optional) – Batch size during heatmap calculation. Defaults to 64.

  • roi_method (str, optional) – ‘inside’, ‘outside’, or ‘none’. Determines where heatmap should be made with respect to ROI. Defaults to ‘inside’.

  • buffer (str, optional) – Path to which slides are copied prior to heatmap generation. Defaults to None.

  • num_threads (int, optional) – Number of threads for tile extraction. Defaults to CPU core count.

  • skip_completed (bool, optional) – Skip heatmaps for slides that already have heatmaps in target directory.

Keyword Arguments
  • show_roi (bool) – Show ROI on heatmaps.

  • interpolation (str) – Interpolation strategy for predictions. Defaults to None. Includes all matplotlib imshow interpolation options.

  • logit_cmap – Function or a dict used to create heatmap colormap. If None (default), separate heatmaps are generated for each category, with color representing category prediction. Each image tile will generate a list of preds of length O, If logit_cmap is a function, then the logit predictions will be passed, where O is the number of label categories. and the function is expected to return [R, G, B] values. If the logit_cmap is a dictionary, it should map ‘r’, ‘g’, and ‘b’ to label indices; the prediction for these label categories will be mapped to corresponding colors. Thus, the corresponding color will only reflect predictions of up to three labels. Example (this would map predictions for label 0 to red, 3 to green, etc): {‘r’: 0, ‘g’: 3, ‘b’: 1 }

  • vmin (float) – Minimimum value to display on heatmap. Defaults to 0.

  • vcenter (float) – Center value for color display on heatmap. Defaults to 0.5.

  • vmax (float) – Maximum value to display on heatmap. Defaults to 1.

generate_mosaic(df, dataset=None, filters=None, filter_blank=None, outcomes=None, map_slide=None, show_prediction=None, restrict_pred=None, predict_on_axes=None, max_tiles=0, umap_cache=None, use_float=False, low_memory=False, **kwargs)
Generates a mosaic map by overlaying images onto mapped tiles.

Image tiles are extracted from the provided set of TFRecords, and predictions + features from layer activations are calculated using the specified model. Tiles are mapped either with UMAP of layer activations (default behavior), or by using outcome predictions for two categories, mapped to X- and Y-axis (via predict_on_axes).

Parameters
  • df (slideflow.model.DatasetFeatures) – Dataset.

  • dataset (slideflow.dataset.Dataset, optional) – Dataset from which to generate mosaic. If not supplied, will generate mosaic for all tfrecords at the tile_px/tile_um matching the supplied model, optionally using filters/filter_blank.

  • filters (dict, optional) – Filters dict to use when selecting tfrecords. Defaults to None.

  • filter_blank (list, optional) – Slides blank in these columns will be excluded. Defaults to None.

  • outcomes (list, optional) – Column name in annotations file from which to read category labels.

  • map_slide (str, optional) – None (default), ‘centroid’ or ‘average’. If provided, will map slides using slide-level calculations, either mapping centroid tiles if ‘centroid’, or calculating node averages across tiles in a slide and mapping slide-level node averages, if ‘average’.

  • show_prediction (int or str, optional) – May be either int or str, corresponding to label category. Predictions for this category will be displayed on the exported UMAP plot.

  • restrict_pred (list, optional) – List of int, if provided, restrict predictions to these categories. Final tile-level prediction is made by choosing category with highest logit.

  • predict_on_axes (list, optional) – (int, int). Each int corresponds to an label category id. If provided, predictions are generated for these two labels categories; tiles are then mapped with these predictions with the pattern (x, y) and the mosaic is generated from this map. This replaces the default UMAP.

  • max_tiles (int, optional) – Limits tiles taken from each slide. Defaults to 0.

  • umap_cache (str, optional) – Path to PKL file in which to save/cache UMAP coordinates. Defaults to None.

  • use_float (bool, optional) – Interpret labels as continuous instead of categorical. Defaults to False.

  • low_memory (bool, optional) – Limit memory during UMAP calculations. Defaults to False.

Keyword Arguments
  • resolution (str) – Mosaic map resolution. Low, medium, or high.

  • num_tiles_x (int) – Specifies the size of the mosaic map grid.

  • expanded (bool) – Controls tile assignment on grid spaces. If False, tile assignment is strict. If True, allows displaying nearby tiles if a grid is empty. Defaults to False.

  • leniency (float) – UMAP leniency. Defaults to 1.5.

Returns

Mosaic object.

Return type

slideflow.mosaic.Mosaic

generate_mosaic_from_annotations(header_x, header_y, dataset, model=None, outcomes=None, max_tiles=100, use_optimal_tile=False, cache=None, batch_size=32, **kwargs)
Generates mosaic map by overlaying images onto a set of mapped tiles.

Slides are mapped with slide-level annotations, x-axis determined from header_x, y-axis from header_y. If use_optimal_tile is False and no model is provided, tje first image tile in each TFRecord will be displayed. If optimal_tile is True, layer activations for all tiles in each slide are calculated using the provided model, and the tile nearest to centroid is used.

Parameters
  • header_x (str) – Annotations file header with X-axis coords.

  • header_y (str) – Annotations file header with Y-axis coords.

  • dataset (slideflow.dataset.Dataset) – Dataset object.

  • model (str, optional) – Path to Tensorflow model to use when generating layer activations.

  • None. (Defaults to) – If not provided, mosaic will not be calculated or saved. If provided, saved in project mosaic directory.

  • outcomes (list(str)) – Column name(s) in annotations file from which to read category labels.

  • filters (dict, optional) – Filters to use when selecting tfrecords. Defaults to None.

  • max_tiles (int, optional) – Limits the number of tiles taken from each slide. Defaults to 0.

  • use_optimal_tile (bool, optional) – Use model to calculate layer activations for all tiles in each slide, and choosing tile nearest centroid for each slide for display.

  • cache (str, optional) – Path to PKL file to cache node activations. Defaults to None.

  • batch_size (int, optional) – Batch size for model. Defaults to 64.

Keyword Arguments
  • resolution (str) – Resolution of the mosaic. Low, medium, or high.

  • num_tiles_x (int) – Specifies the size of the mosaic map grid.

  • expanded (bool) – Controls tile assignment on grid spaces. If False, tile assignment is strict. If True, allows displaying nearby tiles if a grid is empty. Defaults to False.

  • leniency (float) – UMAP leniency. Defaults to 1.5.

generate_tfrecord_heatmap(tfrecord, tile_px, tile_um, tile_dict, outdir=None)

Creates a tfrecord-based WSI heatmap using a dictionary of tile values for heatmap display, saving to project root directory.

Parameters
  • tfrecord (str) – Path to tfrecord

  • tile_dict (dict) – Dictionary mapping tfrecord indices to a tile-level value for display in heatmap format

  • tile_px (int) – Tile width in pixels

  • tile_um (int) – Tile width in microns

Returns

Dictionary mapping slide names to dict of statistics

(mean, median, above_0, and above_1)

generate_thumbnails(size=512, dataset=None, filters=None, filter_blank=None, roi=False, enable_downsample=True)

Generates square slide thumbnails with black borders of fixed size, and saves to project folder.

Parameters
  • size (int, optional) – Width/height of thumbnail in pixels. Defaults to 512.

  • dataset (slideflow.dataset.Dataset, optional) – Dataset from which to generate activations. If not supplied, will calculate activations for all tfrecords at the tile_px/tile_um matching the supplied model, optionally using provided filters and filter_blank.

  • filters (dict, optional) – Filters to use when selecting tfrecords. Defaults to None.

  • filter_blank (list, optional) – Exclude slides blank in these cols. Defaults to None.

  • roi (bool, optional) – Include ROI in the thumbnail images. Defaults to False.

  • enable_downsample (bool, optional) – If True and a thumbnail is not embedded in the slide file, downsampling is permitted to accelerate thumbnail calculation.

load_project(path)

Loads a saved and pre-configured project from the specified path.

property models_dir

Path to models directory.

property name

Descriptive project name.

property neptune_api

Neptune API token.

property neptune_workspace

Neptune workspace name.

predict(model, dataset=None, filters=None, checkpoint=None, eval_k_fold=None, splits='splits.json', max_tiles=0, min_tiles=0, batch_size=32, input_header=None, format='csv', mixed_precision=True, **kwargs)

Evaluates a saved model on a given set of tfrecords.

Parameters
  • model (str) – Path to model to evaluate.

  • outcomes (str) – Str or list of str. Annotation header specifying outcome label(s).

  • dataset (slideflow.dataset.Dataset, optional) – Dataset from which to generate activations. If not supplied, will calculate activations for all project tfrecords at the tile_px/tile_um matching the model, optionally using provided filters and filter_blank.

  • filters (dict, optional) – Filters to use when selecting tfrecords. Defaults to None.

  • checkpoint (str, optional) – Path to cp.ckpt file, if evaluating a saved checkpoint. Defaults to None.

  • eval_k_fold (int, optional) – K-fold iteration number to evaluate. If None, will evaluate all tfrecords irrespective of K-fold. Defaults to None.

  • splits (str, optional) – Filename of JSON file in which to log training/validation splits. Looks for filename in project root directory. Defaults to “splits.json”.

  • max_tiles (int, optional) – Maximum number of tiles from each slide to evaluate. If zero, will include all tiles. Defaults to 0.

  • min_tiles (int, optional) – Min tiles a slide must have to be included in evaluation. Defaults to 0.

  • input_header (str, optional) – Annotation column header to use as additional input. Defaults to None.

  • format (str, optional) – Format in which to save predictions. Either ‘csv’ or ‘feather’. Defaults to ‘csv’.

  • mixed_precision (bool, optional) – Enable mixed precision. Defaults to True.

Returns

pandas.DataFrame of tile-level predictions.

predict_wsi(model, outdir, dataset=None, filters=None, filter_blank=None, stride_div=1, enable_downsample=True, roi_method='inside', skip_missing_roi=False, source=None, randomize_origin=False, **kwargs)
Using a given model, generates a map of tile-level predictions for a

whole-slide image (WSI), dumping prediction arrays into pkl files for later use.

Parameters
  • model (str) – Path to model from which to generate predictions.

  • outdir (str) – Directory for saving WSI predictions in .pkl format.

  • dataset (slideflow.dataset.Dataset, optional) – Dataset from which to generate activations. If not supplied, will calculate activations for all tfrecords at the tile_px/tile_um matching the supplied model.

  • filters (dict, optional) – Filters to use when selecting tfrecords. Defaults to None.

  • filter_blank (list, optional) – Exclude slides blank in these cols. Defaults to None.

  • stride_div (int, optional) – Stride divisor for extracting tiles. A stride of 1 will extract non-overlapping tiles. A stride_div of 2 will extract overlapping tiles, with a stride equal to 50% of the tile width. Defaults to 1.

  • enable_downsample (bool, optional) – Enable downsampling for slides. This may result in corrupted image tiles if downsampled slide layers are corrupted or incomplete. Defaults to True.

  • roi_method (str, optional) – Either ‘inside’, ‘outside’ or ‘ignore’. Indicates whether tiles are extracted inside or outside ROIs or if ROIs are ignored entirely. Defaults to ‘inside’.

  • skip_missing_roi (bool, optional) – Skip slides missing ROIs. Defaults to True.

  • source (list, optional) – Name(s) of dataset sources from which to get slides. If None, will use all.

  • randomize_origin (bool, optional) – Randomize pixel starting position during extraction. Defaults to False.

Keyword Arguments
  • whitespace_fraction (float, optional) – Range 0-1. Defaults to 1. Discard tiles with this fraction of whitespace. If 1, will not perform whitespace filtering.

  • whitespace_threshold (int, optional) – Range 0-255. Defaults to 230. Threshold above which a pixel (RGB average) is whitespace.

  • grayspace_fraction (float, optional) – Range 0-1. Defaults to 0.6. Discard tiles with this fraction of grayspace. If 1, will not perform grayspace filtering.

  • grayspace_threshold (float, optional) – Range 0-1. Defaults to 0.05. Pixels in HSV format with saturation below this are grayspace.

save()

Saves current project configuration as “settings.json”.

property sources

Returns list of dataset sources active in this project.

train(outcomes, params, exp_label=None, filters=None, filter_blank=None, input_header=None, min_tiles=0, max_tiles=0, splits='splits.json', balance_headers=None, mixed_precision=True, **training_kwargs)

Train model(s) using a given set of parameters, outcomes, and inputs.

Parameters
  • outcomes (str or list(str)) – Outcome label annotation header(s).

  • params (slideflow.model.ModelParams, list, dict, or str) – Model parameters for training. May provide one ModelParams, a list, or dict mapping model names to params. If multiple params are provided, will train models for each. If JSON file is provided, will interpret as a hyperparameter sweep. See examples below for use.

  • exp_label (str, optional) – Experiment label to add model names.

  • filters (dict, optional) – Filters to use when selecting tfrecords. Defaults to None.

  • filter_blank (list, optional) – Exclude slides blank in these cols. Defaults to None.

  • input_header (list, optional) – List of annotation column headers to use as additional slide-level model input. Defaults to None.

  • min_tiles (int) – Minimum number of tiles a slide must have to include in training. Defaults to 0.

  • max_tiles (int) – Only use up to this many tiles from each slide for training. Defaults to 0 (include all tiles).

  • splits (str, optional) – Filename of JSON file in which to log train/val splits. Looks for filename in project root directory. Defaults to “splits.json”.

  • balance_headers (str or list(str)) – Annotation header(s) specifying labels on which to perform mini-batch balancing. If performing category-level balancing and this is set to None, will default to balancing on outcomes. Defaults to None.

  • mixed_precision (bool, optional) – Enable mixed precision. Defaults to True.

Keyword Arguments
  • val_strategy (str) – Validation dataset selection strategy. Options include bootstrap, k-fold, k-fold-manual, k-fold-preserved-site, fixed, and none. Defaults to ‘k-fold’.

  • val_k_fold (int) – Total number of K if using K-fold validation. Defaults to 3.

  • val_k (int) – Iteration of K-fold to train, starting at 1. Defaults to None (training all k-folds).

  • val_k_fold_header (str) – Annotations file header column for manually specifying k-fold or for preserved-site cross validation. Only used if validation strategy is ‘k-fold-manual’ or ‘k-fold-preserved-site’. Defaults to None for k-fold-manual and ‘site’ for k-fold-preserved-site.

  • val_fraction (float) – Fraction of dataset to use for validation testing, if strategy is ‘fixed’.

  • val_source (str) – Dataset source to use for validation. Defaults to None (same as training).

  • val_annotations (str) – Path to annotations file for validation dataset. Defaults to None (same as training).

  • val_filters (dict) – Filters to use for validation dataset. Defaults to None (same as training).

  • checkpoint (str, optional) – Path to cp.ckpt from which to load weights. Defaults to None.

  • pretrain (str, optional) – Either ‘imagenet’ or path to Tensorflow model from which to load weights. Defaults to ‘imagenet’.

  • multi_gpu (bool) – Train using multiple GPUs when available. Defaults to False.

  • resume_training (str, optional) – Path to Tensorflow model to continue training. Defaults to None.

  • starting_epoch (int) – Start training at the specified epoch. Defaults to 0.

  • steps_per_epoch_override (int) – If provided, will manually set the number of steps in an epoch. Default epoch length is the number of total tiles.

  • save_predicitons (bool) – Save predictions with each validation. Defaults to False.

  • save_model (bool, optional) – Save models when evaluating at specified epochs. Defaults to True.

  • validate_on_batch (int) – Perform validation every N batches. Defaults to 0 (only at epoch end).

  • validation_batch_size (int) – Validation dataset batch size. Defaults to 32.

  • use_tensorboard (bool) – Add tensorboard callback for realtime training monitoring. Defaults to False.

  • validation_steps (int) – Number of steps of validation to perform each time doing a mid-epoch validation check. Defaults to 200.

Returns

Dict with model names mapped to train_acc, val_loss, and val_acc

Examples

Method 1 (hyperparameter sweep from a configuration file):

>>> import slideflow.model
>>> P.train('outcome', params='sweep.json', ...)

Method 2 (manually specified hyperparameters):

>>> from slideflow.model import ModelParams
>>> hp = ModelParams(...)
>>> P.train('outcome', params=hp, ...)

Method 3 (list of hyperparameters):

>>> from slideflow.model import ModelParams
>>> hp = [ModelParams(...), ModelParams(...)]
>>> P.train('outcome', params=hp, ...)

Method 4 (dict of hyperparameters):

>>> from slideflow.model import ModelParams
>>> hp = {'HP0': ModelParams(...), 'HP1': ModelParams(...)}
>>> P.train('outcome', params=hp, ...)
train_clam(exp_name, pt_files, outcomes, dataset, train_slides='auto', val_slides='auto', splits='splits.json', clam_args=None, attention_heatmaps=True)

Train a CLAM model from layer activations exported with slideflow.project.generate_features_for_clam().

Parameters
  • exp_name (str) – Name of experiment. Makes clam/{exp_name} folder.

  • pt_files (str) – Path to pt_files containing tile-level features.

  • outcomes (str) – Annotation column which specifies the outcome.

  • dataset (slideflow.dataset.Dataset) – Dataset object from which to generate activations.

  • train_slides (str, optional) – List of slide names for training. If ‘auto’ (default), will auto-generate training/val split.

  • validation_slides (str, optional) – List of slides for validation. If ‘auto’ (default), will auto-generate training/val split.

  • splits (str, optional) – Filename of JSON file in which to log training/val splits. Looks for filename in project root directory. Defaults to “splits.json”.

  • clam_args (optional) – Namespace with clam arguments, as provided by slideflow.clam.get_args().

  • attention_heatmaps (bool, optional) – Save attention heatmaps of validation dataset.

Returns

None

Examples

Train with basic settings:

>>> dataset = P.dataset(tile_px=299, tile_um=302)
>>> P.generate_features_for_clam('/model', outdir='/pt_files')
>>> P.train_clam('NAME', '/pt_files', 'category1', dataset)

Specify a specific layer from which to generate activations:

>>> P.generate_features_for_clam(..., layers=['postconv'])

Manually configure CLAM, with 5-fold validation and SVM bag loss:

>>> import slideflow.clam as clam
>>> clam_args = clam.get_args(k=5, bag_loss='svm')
>>> P.generate_features_for_clam(...)
>>> P.train_clam(..., clam_args=clam_args)