Conference paper Open Access
Werner Bailer; Stefanie Wechtitsch
In order to enable efficient instance search in video, compact descriptors for video segments have been proposed. They exploit the temporal redundancy within a video segment to obtain smaller descriptors, and the segment-based representation can be exploited to enable more efficient matching. In this paper we analyze the performance of different visual features when applying both lossless and lossy compression to the set of descriptors of one video segment. We consider both hand-crafted and deep features, i.e., visual features obtained from training a deep convolutional neural network. We also propose optimizations to the extraction and matching procedure. Both the compression methods and the optimizations are experimentally evaluated on a large video data set.