Embed Image Annotations

Annotator to embedding a set of frame using a neural network.

Given a convolutional neural network trained on a supervised learning task, embedding into the penultimate layer (or some other internal layer) gives a useful embedding that can be used similar to word vectors. This module returns an embedding over a (possible subset) of the frames in an input. The module can also be used when the embedding corresponds to a concrete supervised task.

class dvt.annotate.embed.EmbedAnnotator(**kwargs)[source]

Bases: dvt.abstract.FrameAnnotator

Annotator for embedding frames into an ambient space.

The annotator will return a numpy array, with one row per processed frame. Control how frequently the annotator runs by setting the frequency attribute to a number higher than 1. Note that frequency should be able to divide the batch size.

embedding

Object to perform the embedding.

Type:EmbedFrameKeras
freq

How often to perform the embedding. For example, setting the frequency to 2 will embed every other frame in the batch.

Type:int
frames

An optional list of frames to process. This should be a list of integers or a 1D numpy array of integers. If set to something other than None, the freq input is ignored.

Type:array of ints
name

A description of the aggregator. Used as a key in the output data.

Type:str
annotate(batch)[source]

Annotate the batch of frames with the embedding annotator.

Parameters:batch (FrameBatch) – A batch of images to annotate.
Returns:A list of dictionaries containing the video name, frame, and a numpy array of the embedding.
name = 'embed'
class dvt.annotate.embed.EmbedFrameKeras(model, preprocess_input=None, outlayer=None)[source]

Bases: object

A generic class for applying an embedding to frames.

Applies a keras model to a batch of frames. The input of the model is assumed to be an image with three channels. The class automatically handles resizing the images to the required input shape.

model

A keras model to apply to the frames.

preprocess_input

An optional function to preprocess the images. Set to None (the default) to not apply any preprocessing.

outlayer

Name of the output layer. Set to None (the default) to use the final layer predictions as the embedding.

embed(img)[source]

Embed a batch of images.

Parameters:img – A four dimensional numpy array to embed using the keras model.
Returns:A numpy array, with a first dimension matching the first dimension of the input image.
class dvt.annotate.embed.EmbedFrameKerasResNet50[source]

Bases: dvt.annotate.embed.EmbedFrameKeras

Example embedding using ResNet50.

Provides an example of how to use an embedding annotator and provides easy access to one of the most popular models for computing image similarity metrics in an embedding space. See the (very minimal) source code for how to extend this function to other pre-built keras models.

model

The ResNet-50 model, tuned to produce the penultimate layer as an output.

preprocess_input

Default processing function for an image provided as an array in RGB format.