Combining Global and Local Attention with Positional Encoding for Video Summarization

Apostolidis Evlampios; Balaouras Georgios; Mezaris Vasileios; Patras Ioannis

doi:10.5281/zenodo.6683785

Published January 10, 2022 | Version v1

Preprint Open

Combining Global and Local Attention with Positional Encoding for Video Summarization

1. Certh-ITI & Queen Mary University of London, Thessaloniki, Greece
2. Certh-ITI, Thessaloniki, Greece
3. Queen Mary University of London, London, UK

This paper presents a new method for supervised video summarization. To overcome drawbacks of existing RNN-based summarization architectures, that relate to the modeling of long-range frames’ dependencies and the ability to parallelize the training process, the developed model re-lies on the use of self-attention mechanisms to estimate the importance of video frames. Contrary to previous attention-based summarization approaches that model the frames’ dependencies by observing the entire frame sequence, our method combines global and local multi-head attention mechanisms to discover different modelings of the frames’ dependencies at different levels of granularity. Moreover, the utilized attention mechanisms integrate a component that encodes the temporal position of video frames - this is of major importance when producing a video summary. Experiments on two datasets (SumMe and TVSum) demonstrate the effectiveness of the proposed model compared to existing attention-based methods, and its competitiveness against other state-of-the-art supervised summarization approaches. An ablation study that focuses on our main proposed components, namely the use of global and local multi-head attention mechanisms in collaboration with an absolute positional encoding component, shows their relative contributions to the overall summarization performance.

Files

combining global.pdf

Files (512.7 kB)

Name	Size	Download all
combining global.pdf md5:5a2ff2d091c164d203ad3affa0331dbf	512.7 kB	Preview Download

Additional details

Is cited by: Conference paper: 21570354 (PMID)

European Commission
MIRROR - Migration-Related Risks caused by misconceptions of Opportunities and Requirement 832921

	All versions	This version
Views	186	186
Downloads	301	301
Data volume	158.4 MB	158.4 MB

combining global.pdf

Files (512.7 kB)

Related works

Funding

Combining Global and Local Attention with Positional Encoding for Video Summarization

Authors/Creators

Description

Files

combining global.pdf

Files (512.7 kB)

Additional details

Related works

Funding