• Docs >
  • slideflow.io.tensorflow
Shortcuts

slideflow.io.tensorflow

This module contains functions for processing TFRecords, including detecting contents and image format of saved TFRecords, extracting tiles from TFRecords, splitting and merging TFRecrds, and a variety of other manipulations.

The more important compontent of this module, however, is the slideflow.io.tensorflow.interleave() function, which interleaves a set of tfrecords together into a tf.data.Datasets object that can be used for training. This interleaving can include patient or category-level balancing for returned batches (see A Note on Input Balancing).

Note

The TFRecord reading and interleaving implemented in this module is only compatible with Tensorflow models. The slideflow.io.torch module includes an optimized, PyTorch-specific TFRecord reader based on a modified version of the tfrecord reader/writer at: https://github.com/vahidk/tfrecord.

slideflow.io.tensorflow.checkpoint_to_tf_model(models_dir, model_name)

Converts a checkpoint file into a saved model.

slideflow.io.tensorflow.detect_tfrecord_format(path)

Loads a tfrecord at the specified path, and detects the feature description and image type.

Returns

Feature description dictionary. str: Stored image type, either ‘png’ or ‘jpg’.

Return type

dict

slideflow.io.tensorflow.get_tfrecord_parser(tfrecord_path, features_to_return=None, to_numpy=False, decode_images=True, img_size=None, error_if_invalid=True)

Returns a tfrecord parsing function based on the specified parameters.

Parameters
  • tfrecord_path (str) – Path to tfrecord to parse.

  • features_to_return (list or dict, optional) – Designates format for how features should be returned from parser. If a list of feature names is provided, the parsing function will return tfrecord features as a list in the order provided. If a dictionary of labels (keys) mapping to feature names (values) is provided, features will be returned from the parser as a dictionary matching the same format. If None, will return all features as a list.

  • to_numpy (bool, optional) – Convert records from tensors->numpy arrays. Defaults to False.

  • decode_images (bool, optional) – Decode image strings into arrays. Defaults to True.

  • standardize (bool, optional) – Standardize images into the range (0,1). Defaults to False.

  • img_size (int) – Width of images in pixels. Will call tf.set_shape(…) if provided. Defaults to False.

  • normalizer (slideflow.norm.StainNormalizer) – Stain normalizer to use on images. Defaults to None.

  • augment (str) – Image augmentations to perform. String containing characters designating augmentations. ‘x’ indicates random x-flipping, ‘y’ y-flipping, ‘r’ rotating, ‘j’ JPEG compression/decompression at random quality levels, and ‘b’ random gaussian blur. Passing either ‘xyrjb’ or True will use all augmentations.

  • error_if_invalid (bool, optional) – Raise an error if a tfrecord cannot be read. Defaults to True.

slideflow.io.tensorflow.interleave(tfrecords, img_size, batch_size, prob_weights=None, clip=None, labels=None, incl_slidenames=False, incl_loc=False, infinite=True, augment=False, standardize=True, normalizer=None, num_shards=None, shard_idx=None, num_parallel_reads=4, deterministic=False, drop_last=False)

Generates an interleaved dataset from a collection of tfrecord files, sampling from tfrecord files randomly according to balancing if provided. Requires manifest for balancing. Assumes TFRecord files are named by slide.

Parameters
  • tfrecords (list(str)) – List of paths to TFRecord files.

  • img_size (int) – Image width in pixels.

  • batch_size (int) – Batch size.

  • prob_weights (dict, optional) – Dict mapping tfrecords to probability of including in batch. Defaults to None.

  • clip (dict, optional) – Dict mapping tfrecords to number of tiles to take per tfrecord. Defaults to None.

  • labels (dict or str, optional) – Dict or function. If dict, must map slide names to outcome labels. If function, function must accept an image (tensor) and slide name (str), and return a dict {‘image_raw’: image (tensor)} and label (int or float). If not provided, all labels will be None.

  • incl_slidenames (bool, optional) – Include slidenames as third returned variable. Defaults to False.

  • incl_loc (bool, optional) – Include loc_x and loc_y as additional returned variables. Defaults to False.

  • infinite (bool, optional) – Create an finite dataset. WARNING: If infinite is False && balancing is used, some tiles will be skipped. Defaults to True.

  • augment (str) – Image augmentations to perform. String containing characters designating augmentations. ‘x’ indicates random x-flipping, ‘y’ y-flipping, ‘r’ rotating, ‘j’ JPEG compression/decompression at random quality levels, and ‘b’ random gaussian blur. Passing either ‘xyrjb’ or True will use all augmentations.

  • standardize (bool, optional) – Standardize images to (0,1). Defaults to True.

  • normalizer (slideflow.norm.StainNormalizer, optional) – Normalizer to use on images. Defaults to None.

  • num_shards (int, optional) – Shard the tfrecord datasets, used for multiprocessing datasets. Defaults to None.

  • shard_idx (int, optional) – Index of the tfrecord shard to use. Defaults to None.

  • num_parallel_reads (int, optional) – Number of parallel reads for each TFRecordDataset. Defaults to 4.

  • deterministic (bool, optional) – When num_parallel_calls is specified, if this boolean is specified, it controls the order in which the transformation produces elements. If set to False, the transformation is allowed to yield elements out of order to trade determinism for performance. Defaults to False.

  • drop_last (bool, optional) – Drop the last non-full batch. Defaults to False.

slideflow.io.tensorflow.join_tfrecord(input_folder, output_file, assign_slide=None)

Randomly samples from tfrecords in the input folder with shuffling, and combines into a single tfrecord file.

slideflow.io.tensorflow.merge_split_tfrecords(source, destination)

Merges TFRecords with the same name in subfolders within the given source folder, as may be the case when using split TFRecords for tile-level validation.

slideflow.io.tensorflow.multi_image_example(slide, image_dict)

Returns a Tensorflow Data example for storage with multiple images.

slideflow.io.tensorflow.parser_from_labels(labels)

Returns a label parsing function used for parsing slides into single or multi-outcome labels.

slideflow.io.tensorflow.print_tfrecord(target)

Prints the slide names (and locations, if present) for records in the given tfrecord file.

slideflow.io.tensorflow.process_image(record, *args, standardize=False, augment=False, size=None)

Applies augmentations and/or standardization to an image Tensor.

slideflow.io.tensorflow.serialized_record(slide, image_raw, loc_x=0, loc_y=0)

Returns a serialized example for TFRecord storage, ready to be written by a TFRecordWriter.

slideflow.io.tensorflow.shuffle_tfrecord(target)

Shuffles records in a TFRecord, saving the original to a .old file.

slideflow.io.tensorflow.shuffle_tfrecords_by_dir(directory)

For each TFRecord in a directory, shuffles records in the TFRecord, saving the original to a .old file.

slideflow.io.tensorflow.split_tfrecord(tfrecord_file, output_folder)

Splits records from a single tfrecord file into individual tfrecord files by slide.

slideflow.io.tensorflow.tfrecord_example(slide, image_raw, loc_x=0, loc_y=0)

Returns a Tensorflow Data example for TFRecord storage.

slideflow.io.tensorflow.transform_tfrecord(origin, target, assign_slide=None, hue_shift=None, resize=None, silent=False)

Transforms images in a single tfrecord. Can perform hue shifting, resizing, or re-assigning slide label.

slideflow.io.tensorflow.update_tfrecord(tfrecord_file, assign_slide=None)

Updates a single tfrecord from an old format to a new format.

slideflow.io.tensorflow.update_tfrecord_dir(directory, old_feature_description={'image_raw': FixedLenFeature(shape=[], dtype=tf.string, default_value=None), 'loc_x': FixedLenFeature(shape=[], dtype=tf.int64, default_value=None), 'loc_y': FixedLenFeature(shape=[], dtype=tf.int64, default_value=None), 'slide': FixedLenFeature(shape=[], dtype=tf.string, default_value=None)}, slide='slide', assign_slide=None, image_raw='image_raw')

Updates tfrecords in a directory from an old format to a new format.