Face Annotations

Annotators to detect and identify faces.

Identifying individuals in an image generally requires two distinct steps. The first is detecting bounding boxes for faces in the image and the second is identifying the faces themselves. Currently the most common method for doing the second step is to project a detected face into a high-dimensional space designed such that different images of the same person will be close together and images of different people will be farther apart. This module is built around this paradigm, allowing for the specification of custom detectors and embeddings into the model.

class dvt.annotate.face.FaceAnnotator(**kwargs)[source]

Bases: dvt.abstract.FrameAnnotator

Annotator for detecting faces and embedding them as a face vector.

The annotator will return a list with one DictList item for every frame with a detected face. If an embedding is supplied, the DictList items will contain a numpy array with the face embeddings.

detector

An object with a method called detect that takes an image and returns a set of detect faces. Can be set to None (default) as a pass-through option for testing.

embedding

An object with a method embed that takes an image along with a set of bounding boxed and returns embeddings of the faces as a numpy array. Set to None (default) to only run the face detector.

freq

How often to perform the embedding. For example, setting the frequency to 2 will embed every other frame in the batch.

Type:int
frames

An optional list of frames to process. This should be a list of integers or a 1D numpy array of integers. If set to something other than None, the freq input is ignored.

Type:array of ints
name

A description of the aggregator. Used as a key in the output data.

Type:str
annotate(batch)[source]

Annotate the batch of frames with the face annotator.

Parameters:batch (FrameBatch) – A batch of images to annotate.
Returns:A list of dictionaries containing the video name, frame, bounding box coordinates (top, bottom, left, and right). If an embedding is included, the result will also contain a numpy array of the embedding for each face.
name = 'face'
class dvt.annotate.face.FaceDetectMtcnn(cutoff=0)[source]

Bases: object

Detect faces using the Multi-task Cascaded CNN model.

cutoff

A cutoff value for which faces to include in the final output. Set to zero (default) to include all faces.

Type:float
detect(img)[source]

Detect faces in an image.

Parameters:img (numpy array) – A single image stored as a three-dimensional numpy array.
Returns:A list of dictionaries where each dictionary represents a detected face. Keys include the bounding box (top, left, bottom, right) as well as a confidence score.
class dvt.annotate.face.FaceEmbedVgg2[source]

Bases: object

Embed faces using the VGGFace2 model.

A face embedding with state-of-the-art results, particularly suitable when there are small or non-forward-facing examples in the dataset.

embed(img, face)[source]

Embed detected faces in an image.

Parameters:
  • img (numpy array) – A single image stored as a three-dimensional numpy array.
  • faces (dict) – Location of detected faces in the image.
Returns:

A numpy array with one row for each input face and 2048 columns.