Shortcuts

slideflow.slide

This module contains classes to load slides and extract tiles. For optimal performance, tile extraction should generally not be performed by instancing these classes directly, but by calling either slideflow.Project.extract_tiles() or slideflow.Dataset.extract_tiles(), which include performance optimizations and additional functionality.

WSI

class slideflow.slide.WSI(path, tile_px, tile_um, stride_div=1, enable_downsample=True, roi_dir=None, rois=None, roi_method='inside', skip_missing_roi=False, randomize_origin=False, pb=None, pb_counter=None, counter_lock=None, silent=False)

Loads a slide and its annotated region of interest (ROI).

__init__(path, tile_px, tile_um, stride_div=1, enable_downsample=True, roi_dir=None, rois=None, roi_method='inside', skip_missing_roi=False, randomize_origin=False, pb=None, pb_counter=None, counter_lock=None, silent=False)

Loads slide and ROI(s).

Parameters
  • path (str) – Path to slide.

  • tile_px (int) – Size of tiles to extract, in pixels.

  • tile_um (int) – Size of tiles to extract, in microns.

  • stride_div (int, optional) – Stride divisor for tile extraction (1 = no tile overlap; 2 = 50% overlap, etc). Defaults to 1.

  • enable_downsample (bool, optional) – Allow use of downsampled intermediate layers in the slide image pyramid, which greatly improves tile extraction speed. May result in artifacts for slides with incompletely generated intermediates pyramids. Defaults to True.

  • roi_dir (str, optional) – Directory in which to search for ROI CSV files. Defaults to None.

  • rois (list(str)) – Alternatively, a list of ROI paths can be explicitly provided. Defaults to None.

  • roi_method (str) – Either ‘inside’, ‘outside’, or ‘ignore’. Determines how ROIs are used to extract tiles. Defaults to ‘inside’.

  • skip_missing_roi (bool, optional) – Skip tiles that are missing a ROI file. Defaults to False.

  • randomize_origin (bool, optional) – Offset the starting grid by a random amount. Defaults to False.

  • pb (slideflow.util.ProgressBar, optional) – Multiprocessing capable ProgressBar instance; will update progress bar during tile extraction if provided.

  • pb_counter (obj) – Multiprocessing counter (a multiprocessing Value, from Progress Bar) used to follow tile extraction progress. Defaults to None.

  • counter_lock (obj) – Lock object for updating pb_counter, if provided. Defaults to None.

  • silent (bool, optional) – Suppresses warnings about slide skipping if ROIs are missing. Defaults to False.

build_generator(shuffle=True, whitespace_fraction=None, whitespace_threshold=None, grayspace_fraction=None, grayspace_threshold=None, normalizer=None, normalizer_source=None, include_loc=True, num_threads=None, show_progress=False, img_format='numpy', full_core=None, yolo=False, draw_roi=False, pool=None, dry_run=False)

Builds tile generator to extract tiles from this slide.

Parameters
  • shuffle (bool) – Shuffle images during extraction.

  • whitespace_fraction (float, optional) – Range 0-1. Defaults to 1. Discard tiles with this fraction of whitespace. If 1, will not perform whitespace filtering.

  • whitespace_threshold (int, optional) – Range 0-255. Defaults to 230. Threshold above which a pixel (RGB average) is whitespace.

  • grayspace_fraction (float, optional) – Range 0-1. Defaults to 0.6. Discard tiles with this fraction of grayspace. If 1, will not perform grayspace filtering.

  • grayspace_threshold (float, optional) – Range 0-1. Defaults to 0.05. Pixels in HSV format with saturation below this threshold are considered grayspace.

  • normalizer (str, optional) – Normalization strategy to use on image

  • None. (tiles. Defaults to) –

  • normalizer_source (str, optional) – Path to normalizer source image. If None, will use slideflow.slide.norm_tile.jpg. Defaults to None.

  • include_loc (bool, optional) – Return (x,y) origin coordinates for each tile along with tile images.

  • show_progress (bool, optional) – Show a progress bar.

  • img_format (str, optional) – Image format. Either ‘numpy’, ‘jpg’, or ‘png’. Defaults to ‘numpy’.

  • yolo (bool, optional) – Include yolo-formatted tile-level ROI annotations in the return dictionary, under the key ‘yolo’. Defaults to False.

  • draw_roi (bool, optional) – Draws ROIs onto extracted tiles. Defaults to False.

  • dry_run (bool, optional) – Determine tiles that would be extracted, but do not export any images. Defaults to None.

Returns

dict, with keys ‘image’ (image data), ‘yolo’ (optional

yolo-formatted annotations, (x_center, y_center, width, height)) and ‘grid’ ((x, y) slide or grid coordinates)

extract_tiles(tfrecord_dir=None, tiles_dir=None, img_format='jpg', report=True, **kwargs)

Extracts tiles from slide using the build_generator() method, saving tiles into a TFRecord file or as loose JPG tiles in a directory.

Parameters
  • tfrecord_dir (str) – If provided, saves tiles into a TFRecord file

  • here. ((named according to slide name)) –

  • tiles_dir (str) – If provided, saves loose images into a subdirectory (per slide name) here.

  • img_format (str) – ‘png’ or ‘jpg’. Format of images for internal storage in tfrecords. PNG (lossless) format recommended for fidelity, JPG (lossy) for efficiency. Defaults to ‘jpg’.

Keyword Arguments
  • whitespace_fraction (float, optional) – Range 0-1. Defaults to 1. Discard tiles with this fraction of whitespace. If 1, will not perform whitespace filtering.

  • whitespace_threshold (int, optional) – Range 0-255. Defaults to 230. Threshold above which a pixel (RGB average) is whitespace.

  • grayspace_fraction (float, optional) – Range 0-1. Defaults to 0.6. Discard tiles with this fraction of grayspace. If 1, will not perform grayspace filtering.

  • grayspace_threshold (float, optional) – Range 0-1. Defaults to 0.05. Pixels in HSV format with saturation below this threshold are considered grayspace.

  • normalizer (str, optional) – Normalization for image tiles. Defaults to None.

  • normalizer_source (str, optional) – Path to normalizer source image. If None, will use slideflow.slide.norm_tile.jpg. Defaults to None.

  • full_core (bool, optional) – Extract an entire detected core, rather than subdividing into image tiles. Defaults to False.

  • shuffle (bool) – Shuffle images during extraction.

  • num_threads (int) – Number of threads to allocate to workers.

  • yolo (bool, optional) – Export yolo-formatted tile-level ROI annotations (.txt) in the tile directory. Requires that tiles_dir is set. Defaults to False.

  • draw_roi (bool, optional) – Draws ROIs onto extracted tiles. Defaults to False.

  • dry_run (bool, optional) – Determine tiles that would be extracted, but do not export any images. Defaults to None.

load_csv_roi(path)

Loads CSV ROI from a given path.

load_json_roi(path, scale=10)

Loads ROI from a JSON file.

loaded_correctly()

Checks if slide loaded correctly.

Returns

bool

preview(rois=True, **kwargs)

Performs a dry run of tile extraction without saving any images, returning a PIL image of the slide thumbnail annotated with a grid of tiles that were marked for extraction.

Parameters

rois (bool, optional) – Draw ROI annotation(s) onto the image. Defaults to True.

Keyword Arguments
  • whitespace_fraction (float, optional) – Range 0-1. Defaults to 1. Discard tiles with this fraction of whitespace. If 1, will not perform whitespace filtering.

  • whitespace_threshold (int, optional) – Range 0-255. Defaults to 230. Threshold above which a pixel (RGB average) is considered whitespace.

  • grayspace_fraction (float, optional) – Range 0-1. Defaults to 0.6. Discard tiles with this fraction of grayspace. If 1, will not perform grayspace filtering.

  • grayspace_threshold (float, optional) – Range 0-1. Defaults to 0.05. Pixels in HSV format with saturation below this threshold are considered grayspace.

  • full_core (bool, optional) – Extract an entire detected core, rather than subdividing into image tiles. Defaults to False.

  • num_threads (int) – Number of threads to allocate to workers.

  • yolo (bool, optional) – Export yolo-formatted tile-level ROI annotations (.txt) in the tile directory. Requires that tiles_dir is set. Defaults to False.

qc(method, blur_radius=3, blur_threshold=0.02, filter_threshold=0.6, blur_mpp=4)

Applies quality control to a slide, performing filtering based on a whole-slide image thumbnail.

‘blur’ method filters out blurry or out-of-focus slide sections. ‘otsu’ method filters out background based on automatic saturation thresholding in the HSV colorspace. ‘both’ applies both methods of filtering.

Parameters
  • method (str) – Quality control method, ‘blur’, ‘otsu’, or ‘both’.

  • blur_radius (int, optional) – Blur radius.

  • blur_threshold (float, optional) – Blur threshold.

  • filter_threshold (float, optional) – Percent of a tile detected as background that will trigger a tile to be discarded. Defaults to 0.6.

  • blur_mpp (float, optional) – Size of WSI thumbnail on which to perform blur QC, in microns-per-pixel. Defaults to 4 (equivalent magnification = 2.5 X).

square_thumb(width=512)

Returns a square thumbnail of the slide, with black bar borders.

Parameters

width (int) – Width/height of thumbnail in pixels.

Returns

PIL image

thumb(mpp=None, width=None, coords=None, rois=False, linewidth=2, color='black')

Returns PIL Image of thumbnail with ROI overlay.

Parameters
  • mpp (float, optional) – Microns-per-pixel, used to determine thumbnail size.

  • width (int, optional) – Goal thumbnail width (alternative to mpp).

  • coords (list(int), optional) – List of tile extraction coordinates to show as rectangles on the thumbnail, in [(x_center, y_center), …] format. Defaults to None.

  • rois (bool, optional) – Draw ROIs onto thumbnail. Defaults to False.

  • linewidth (int, optional) – Width of ROI line. Defaults to 2.

  • color (str, optional) – Color of ROI. Defaults to black.

Returns

PIL image

TMA

class slideflow.slide.TMA(path, tile_px, tile_um, stride_div=1, annotations_dir=None, enable_downsample=True, report_dir=None, pb=None, pb_id=0)

Loads a TMA-formatted slide and detects tissue cores.

__init__(path, tile_px, tile_um, stride_div=1, annotations_dir=None, enable_downsample=True, report_dir=None, pb=None, pb_id=0)

Initializer.

Parameters
  • path (str) – Path to slide.

  • tile_px (int) – Size of tiles to extract, in pixels.

  • tile_um (int) – Size of tiles to extract, in microns.

  • stride_div (int, optional) – Stride divisor for tile extraction (1 = no tile overlap; 2 = 50% overlap, etc). Defaults to 1.

  • enable_downsample (bool, optional) – Allow use of downsampled layers in the slide image pyramid, which greatly improves tile extraction speed. Defaults to True.

  • pb (sf.util.ProgressBar, optional) – ProgressBar; will update progress bar during tile extraction if provided. Defaults to None.

  • pb_id (int, optional) – ID of bar in ProgressBar. Defaults to 0.

build_generator(shuffle=True, whitespace_fraction=None, whitespace_threshold=None, grayspace_fraction=None, grayspace_threshold=None, normalizer=None, normalizer_source=None, include_loc=True, num_threads=None, pool=None, img_format='numpy', full_core=False, show_progress=False)

Builds tile generator to extract of tiles across the slide.

Parameters
  • shuffle (bool) – Shuffle images during extraction.

  • whitespace_fraction (float, optional) – Range 0-1. Defaults to 1. Discard tiles with this fraction of whitespace. If 1, will not perform whitespace filtering.

  • whitespace_threshold (int, optional) – Range 0-255. Defaults to 230. Threshold above which a pixel (RGB average) is whitespace.

  • grayspace_fraction (float, optional) – Range 0-1. Defaults to 0.6. Discard tiles with this fraction of grayspace. If 1, will not perform grayspace filtering.

  • grayspace_threshold (float, optional) – Range 0-1. Defaults to 0.05. Pixels in HSV format with saturation below this threshold are considered grayspace.

  • normalizer (str, optional) – Normalization to use on image tiles. Defaults to None.

  • normalizer_source (str, optional) – Path to normalizer source image. If None, will use slideflow.slide.norm_tile.jpg Defaults to None.

  • include_loc (bool, optional) – Include location information in returned dictionary. Defaults to True.

  • num_threads (int, optional) – Number of threads for pool. Unused if pool is specified.

  • pool (multiprocessing.Pool, optional) – Multiprocessing pool to use. By using a shared pool, a slide no longer needs to spin up its own new pool for tile extraction, decreasing tile extraction time for large datasets. Defaults to None (create a new pool, using num_threads).

  • img_format (str, optional) – ‘png’, ‘jpg’, or ‘numpy’. Format images should be returned in.

  • full_core (bool, optional) – Extract an entire detected core, rather than subdividing into image tiles. Defaults to False.

  • show_progress (bool, optional) – Show a progress bar for extraction.

extract_tiles(tfrecord_dir=None, tiles_dir=None, img_format='jpg', report=True, **kwargs)

Extracts tiles from slide using the build_generator() method, saving tiles into a TFRecord file or as loose JPG tiles in a directory.

Parameters
  • tfrecord_dir (str) – If provided, saves tiles into a TFRecord file (named according to slide name) here.

  • tiles_dir (str) – If provided, saves loose images into a subdirectory (per slide name) here.

  • img_format (str) – ‘png’ or ‘jpg’. Format of images for internal storage in tfrecords. PNG (lossless) format recommended for fidelity, JPG (lossy) for efficiency. Defaults to ‘jpg’.

Keyword Arguments
  • whitespace_fraction (float, optional) – Range 0-1. Defaults to 1. Discard tiles with this fraction of whitespace. If 1, will not perform whitespace filtering.

  • whitespace_threshold (int, optional) – Range 0-255. Defaults to 230. Threshold above which a pixel (RGB average) is whitespace.

  • grayspace_fraction (float, optional) – Range 0-1. Defaults to 0.6. Discard tiles with this fraction of grayspace. If 1, will not perform grayspace filtering.

  • grayspace_threshold (float, optional) – Range 0-1. Defaults to 0.05. Pixels in HSV format with saturation below this threshold are considered grayspace.

  • normalizer (str, optional) – Normalization for image tiles. Defaults to None.

  • normalizer_source (str, optional) – Path to normalizer source image. If None, will use slideflow.slide.norm_tile.jpg. Defaults to None.

  • full_core (bool, optional) – Extract an entire detected core, rather than subdividing into image tiles. Defaults to False.

  • shuffle (bool) – Shuffle images during extraction.

  • num_threads (int) – Number of threads to allocate to workers.

  • yolo (bool, optional) – Export yolo-formatted tile-level ROI annotations (.txt) in the tile directory. Requires that tiles_dir is set. Defaults to False.

  • draw_roi (bool, optional) – Draws ROIs onto extracted tiles. Defaults to False.

  • dry_run (bool, optional) – Determine tiles that would be extracted, but do not export any images. Defaults to None.

loaded_correctly()

Checks if slide loaded correctly.

Returns

bool

preview(rois=True, **kwargs)

Performs a dry run of tile extraction without saving any images, returning a PIL image of the slide thumbnail annotated with a grid of tiles that were marked for extraction.

Parameters

rois (bool, optional) – Draw ROI annotation(s) onto the image. Defaults to True.

Keyword Arguments
  • whitespace_fraction (float, optional) – Range 0-1. Defaults to 1. Discard tiles with this fraction of whitespace. If 1, will not perform whitespace filtering.

  • whitespace_threshold (int, optional) – Range 0-255. Defaults to 230. Threshold above which a pixel (RGB average) is considered whitespace.

  • grayspace_fraction (float, optional) – Range 0-1. Defaults to 0.6. Discard tiles with this fraction of grayspace. If 1, will not perform grayspace filtering.

  • grayspace_threshold (float, optional) – Range 0-1. Defaults to 0.05. Pixels in HSV format with saturation below this threshold are considered grayspace.

  • full_core (bool, optional) – Extract an entire detected core, rather than subdividing into image tiles. Defaults to False.

  • num_threads (int) – Number of threads to allocate to workers.

  • yolo (bool, optional) – Export yolo-formatted tile-level ROI annotations (.txt) in the tile directory. Requires that tiles_dir is set. Defaults to False.

qc(method, blur_radius=3, blur_threshold=0.02, filter_threshold=0.6, blur_mpp=4)

Applies quality control to a slide, performing filtering based on a whole-slide image thumbnail.

‘blur’ method filters out blurry or out-of-focus slide sections. ‘otsu’ method filters out background based on automatic saturation thresholding in the HSV colorspace. ‘both’ applies both methods of filtering.

Parameters
  • method (str) – Quality control method, ‘blur’, ‘otsu’, or ‘both’.

  • blur_radius (int, optional) – Blur radius.

  • blur_threshold (float, optional) – Blur threshold.

  • filter_threshold (float, optional) – Percent of a tile detected as background that will trigger a tile to be discarded. Defaults to 0.6.

  • blur_mpp (float, optional) – Size of WSI thumbnail on which to perform blur QC, in microns-per-pixel. Defaults to 4 (equivalent magnification = 2.5 X).

square_thumb(width=512)

Returns a square thumbnail of the slide, with black bar borders.

Parameters

width (int) – Width/height of thumbnail in pixels.

Returns

PIL image

thumb(mpp=None, width=None, coords=None, rois=None)

Returns PIL thumbnail of the slide.

Parameters
  • mpp (float, optional) – Microns-per-pixel, used to determine thumbnail size.

  • width (int, optional) – Alternatively, goal thumbnail width may be supplied.

  • coords (list(int), optional) – List of tile extraction coordinates to show as rectangles on the thumbnail, in [(x_center, y_center), …] format. Defaults to None.

Returns

PIL image