Temporal Normalization in Attentive Key-frame Extraction for Deep Neural Video Summarization
Description
Attention-based neural architectures have consistently demonstrated superior performance over Long Short-Term Memory (LSTM) Deep Neural Networks (DNNs) in tasks such as key-frame extraction for video summarization. However, existing approaches mostly rely on rather shallow Transformer DNNs. This paper revisits the issue of model depth and proposes DATS: a deep attentive architecture for supervised video summarization that meaningfully exploits skip connections. Additionally, a novel per-layer temporal normalization algorithm is proposed that yields improved test accuracy. Finally, the model’s noisy output is rectified in an innovative post-processing step. Experiments conducted on two common, publicly available benchmark datasets showcase performance superior to competing state-of-the-art video summarization methods, both supervised and unsupervised.
Files
TEMPORAL NORMALIZATION IN ATTENTIVE KEY-FRAME EXTRACTION FOR DEEPNEURAL VIDEO SUMMARIZATION.pdf
Files
(186.5 kB)
Name | Size | Download all |
---|---|---|
md5:648f2b0a4bc477b10a5a9065f2d882a8
|
186.5 kB | Preview Download |