Features / layer activations
============================

Once a model has been fully trained and evaluated, you may use the model to generate features from layer activations to gain better insight into the kinds of image features the model has learned.

Working with Layer Features
***************************

To work with features / intermediate layer activations calculated from a model, the :class:`slideflow.model.Features` class will generate features on a tile or slide level, and the :class:`slideflow.model.DatasetFeatures` class will generate features for an entire dataset.

DatasetFeatures
---------------

The easiest way to get started with intermediate layer activations is the :class:`slideflow.model.DatasetFeatures` class, which is used to calculate and examine activations across an entire dataset. Instancing the class supervises the calculation and caching of layer activations, which can then be exported, viewed (as a mosaic map), or analyzed with various statistical methods. The project function :func:`slideflow.Project.generate_features` creates and returns an instance of this class.

.. code-block:: python

    features = P.generate_features('/path/to/trained_model')

Alternatively, you can create an instance of this class directly:

.. code-block:: python

    from slideflow.model import DatasetFeatures

    dataset = P.dataset(299, 302)
    labels, unique_outcomes = dataset.labels('HPV')

    features = DatasetFeatures(
      model='/path/to/trained_model',
      dataset=dataset,
      annotations=labels
    )

Tile-level feature activations for each slide can be accessed directly from ``slideflow.model.DatasetFeatures.activations``, a dict mapping slide names to numpy arrays of shape ``(num_tiles, num_features)``. Logits are stored in ``slideflow.model.DatasetFeatures.logits``, a dict mapping slide names to numpy arrays of shape ``(num_tiles, num_logits)``. Tile-level location data (coordinates from which the tiles were taken from their respective source slides) is stored in ``slideflow.model.DatasetFeatures.locations``, a dict mapping slide names to numpy arrays of shape ``(num_tiles, 2)`` (``x``, ``y``).

To return the average logits value for each slide (averaged across constituent tiles), use :func:`slideflow.model.DatasetFeatures.logits_mean`. Similarly, :func:`slideflow.model.DatasetFeatures.logits_predict` can be used to generate final slide-level logit predictions.

Features across categories can be statistically compared using :func:`slideflow.model.DatasetFeatures.stats`, which will calculate and save statistics to a specified directory.

.. code-block:: python

    features.stats('/outdir', method='mean')

To compare layer features across outcome categories and find features which differ significantly across categories, use the :func:`slideflow.model.DatasetFeatures.box_plots` function. For example, to generate boxplots for the first 100 features:

.. code-block:: python

    features.box_plots(range(100), '/outdir')

.. image:: boxplot_example.png

Many other functions are available, as described in the documentation, :class:`slideflow.model.DatasetFeatures`.

Features
--------

The :class:`slideflow.model.Features` class can be used to generate layer activations / features for a single batch of images. For example, to calculate features for a batch of images while looping through a dataset:

.. code-block:: python

    from slideflow.model import Features

    features = Features(layer='postconv')
    for img_batch in dataset:
        postconv_features = features(img_batch)

You can choose to return features from any combination of intermediate layers by passing layer name(s) to the argument ``layer``. The interface can also return logits, by passing ``include_logits=True``.

To calculate layer features across an entire slide, the same interface can be called on a :class:`slideflow.WSI` object, generating a grid of activations of size ``(slide.grid.shape[0], slide.grid.shape[1], num_features)``:

.. code-block:: python

    from slideflow import WSI
    from slideflow.model import Features

    slide = WSI(...)
    interface = Features('/model/path', layers='postconv')
    feature_grid = interface(slide)


Mosaic maps
***********

To visualize the distribution of features across a dataset, a mosaic map can be created from a :class:`slideflow.model.DatasetFeatures` instance. Mosaic maps are generated by using features (layer activations) from a dataset, performing dimensionality reduction (UMAP) on the activations (via :class:`slideflow.SlideMap`), and overlaying tile images onto the UMAP (via :class:`slideflow.Mosaic`). By default, the post-convolutional ('postconv') layer is used when calculating features, but any combination of other layers can be also be used. The ``Project`` class has a function which can supervise these steps automatically and save the final figure to the project directory.

.. code-block:: python

    features = P.generate_features('/path/to/trained_model')
    mosaic = project.generate_mosaic(features)
    mosaic.save('mosaic.png')

.. autofunction:: slideflow.Project.generate_mosaic
   :noindex:

.. image:: mosaic_example.png

To plot the underlying UMAP without overlaid images, the :class:`slideflow.SlideMap` used to create the mosaic map can be accesssed via ``slideflow.Mosaic.slide_map``. You can then use the :func:`slideflow.SlideMap.save` function to save the plot:

.. code-block:: python

    mosaic = project.generate_mosaic(...)
    mosiac.slide_map.save('umap.png')

Tiles on the plot can be labeled using slide labels from the project annotations file, using the function :func:`slideflow.SlideMap.label_by_slide`. For example, the following will label the slide map according to the categorical outcome "HPV_status" in the project annotations file:

.. code-block:: python

    # Get slide labels
    dataset = project.dataset(tile_px=299, tile_um=302)
    labels, unique_lables = dataset.labels('HPV_status')

    # Create the mosaic map and access the underlying SlideMap
    mosaic = project.generate_mosaic(...)

    # Label the slide map with our outcome
    mosiac.slide_map.label_by_slide(labels)

    # Save
    mosiac.slide_map.save('umap_labeled.png')

By default, all tiles in a dataset (which may be hundreds of thousands or millions of images) will be mapped onto the mosaic map. Instead of mapping all tiles within a slide, you can alternatively choose to map only a single tile per slide with the argument ``map_slide='centroid'``. This will calculate the tile nearest to centroid for each slide and display only this tile:

.. code-block:: python

    # Create the mosaic map and access the underlying SlideMap
    mosaic = project.generate_mosaic(..., map_slide='centroid')

There are many additional arguments that can be provided to the :meth:`slideflow.Project.generate_mosaic()` function to customize the mosaic and UMAP plots, and many additional functions that can be applied to :class:`slideflow.Mosaic` and :class:`slideflow.SlideMap`. For example, it may be interesting to view a UMAP of tiles with an added third dimension, such as the activation value of a particular penultimate layer node. With this kind of plot, one can visualize how the activation of a particular node varies across the UMAP. To make such a plot, use the ``save_3d_plot`` function of the ``SlideMap``:

.. code-block:: python

    mosaic = project.generate_mosaic(...)
    mosiac.slide_map.save_3d_plot('3d_plot.png', feature=497)

.. image:: 3d_umap.png
