Embed Image Annotations¶
Annotator to embedding a set of frame using a neural network.
Given a convolutional neural network trained on a supervised learning task, embedding into the penultimate layer (or some other internal layer) gives a useful embedding that can be used similar to word vectors. This module returns an embedding over a (possible subset) of the frames in an input. The module can also be used when the embedding corresponds to a concrete supervised task.
-
class
dvt.annotate.embed.EmbedAnnotator(**kwargs)[source]¶ Bases:
dvt.abstract.FrameAnnotatorAnnotator for embedding frames into an ambient space.
The annotator will return a numpy array, with one row per processed frame. Control how frequently the annotator runs by setting the frequency attribute to a number higher than 1. Note that frequency should be able to divide the batch size.
-
embedding¶ Object to perform the embedding.
Type: EmbedFrameKeras
-
freq¶ How often to perform the embedding. For example, setting the frequency to 2 will embed every other frame in the batch.
Type: int
-
frames¶ An optional list of frames to process. This should be a list of integers or a 1D numpy array of integers. If set to something other than None, the freq input is ignored.
Type: array of ints
-
name¶ A description of the aggregator. Used as a key in the output data.
Type: str
-
annotate(batch)[source]¶ Annotate the batch of frames with the embedding annotator.
Parameters: batch (FrameBatch) – A batch of images to annotate. Returns: A list of dictionaries containing the video name, frame, and a numpy array of the embedding.
-
name= 'embed'
-
-
class
dvt.annotate.embed.EmbedFrameKeras(model, preprocess_input=None, outlayer=None)[source]¶ Bases:
objectA generic class for applying an embedding to frames.
Applies a keras model to a batch of frames. The input of the model is assumed to be an image with three channels. The class automatically handles resizing the images to the required input shape.
-
model¶ A keras model to apply to the frames.
-
preprocess_input¶ An optional function to preprocess the images. Set to None (the default) to not apply any preprocessing.
-
outlayer¶ Name of the output layer. Set to None (the default) to use the final layer predictions as the embedding.
-
-
class
dvt.annotate.embed.EmbedFrameKerasResNet50[source]¶ Bases:
dvt.annotate.embed.EmbedFrameKerasExample embedding using ResNet50.
Provides an example of how to use an embedding annotator and provides easy access to one of the most popular models for computing image similarity metrics in an embedding space. See the (very minimal) source code for how to extend this function to other pre-built keras models.
-
model¶ The ResNet-50 model, tuned to produce the penultimate layer as an output.
-
preprocess_input¶ Default processing function for an image provided as an array in RGB format.
-