Published September 14, 2023 | Version v1
Preprint Open

Temporal Normalization in Attentive Key-frame Extraction for Deep Neural Video Summarization

  • 1. Aristotle University of Thessaloniki

Description

Attention-based neural architectures have consistently demonstrated superior performance over Long Short-Term Memory (LSTM) Deep Neural Networks (DNNs) in tasks such as key-frame extraction for video summarization. However, existing approaches mostly rely on rather shallow Transformer DNNs. This paper revisits the issue of model depth and proposes DATS: a deep attentive architecture for supervised video summarization that meaningfully exploits skip connections. Additionally, a novel per-layer temporal normalization algorithm is proposed that yields improved test accuracy. Finally, the model’s noisy output is rectified in an innovative post-processing step. Experiments conducted on two common, publicly available benchmark datasets showcase performance superior to competing state-of-the-art video summarization methods, both supervised and unsupervised.

Files

TEMPORAL NORMALIZATION IN ATTENTIVE KEY-FRAME EXTRACTION FOR DEEPNEURAL VIDEO SUMMARIZATION.pdf

Additional details

Funding

AI4Media – A European Excellence Centre for Media, Society and Democracy 951911
European Commission