CA-SUM pretrained models

Apostolidis, Evlampios; Balaouras, Georgios; Mezaris, Vasileios; Patras, Ioannis

doi:10.5281/zenodo.6562992

Published May 19, 2022 | Version v1

Dataset Open

CA-SUM pretrained models

1. CERTH-ITI
2. QMUL

This dataset contains pretrained models of the CA-SUM network architecture for video summarization, that is presented in our work titled “Summarizing Videos using Concentrated Attention and Considering the Uniqueness and Diversity of the Video Frames”, in Proc. ACM ICMR 2022.

Method overview:

In our ICMR 2022 paper we describe a new method for unsupervised video summarization. To overcome limitations of existing unsupervised video summarization approaches, that relate to the unstable training of Generator-Discriminator architectures, the use of RNNs for modeling long-range frames' dependencies and the ability to parallelize the training process of RNN-based network architectures, the developed method relies solely on the use of a self-attention mechanism to estimate the importance of video frames. Instead of simply modeling the frames' dependencies based on global attention, our method integrates a concentrated attention mechanism that is able to focus on non-overlapping blocks in the main diagonal of the attention matrix, and to enrich the existing information by extracting and exploiting knowledge about the uniqueness and diversity of the associated frames of the video. In this way, our method makes better estimates about the significance of different parts of the video, and drastically reduces the number of learnable parameters. Experimental evaluations using two benchmarking datasets (SumMe and TVSum) show the competitiveness of the proposed method against other state-of-the-art unsupervised summarization approaches, and demonstrate its ability to produce video summaries that are very close to the human preferences. An ablation study that focuses on the introduced components, namely the use of concentrated attention in combination with attention-based estimates about the frames' uniqueness and diversity, shows their relative contributions to the overall summarization performance.

File format:

The “pretrained_models.zip“ file that is provided in the present zenodo page contains a set of pretrained models of the CA-SUM network architecture. After downloading and unpacking this file, in the created “pretrained_models” folder, you will find two sub-directories one per each of the utilized benchmarking datasets (SumMe and TVSum) in our experimental evaluations. Within each of these sub-directories we provide the pretrained model (.pt file) for each data-split (split0-split4), where the naming of the provided .pt file indicates the training epoch and the value of the length regularization factor of the selected pretrained model.

The models have been trained in a full-batch mode (i.e., batch size is equal to the number of training samples) and were automatically selected after the end of the training process, based on a methodology that relies on transductive inference (described in Section 4.2 of [1]). Finally, the data-splits we used for performing inference on the provided pretrained models, and the source code that can be used for training your own models of the proposed CA-SUM network architecture, can be found at: https://github.com/e-apostolidis/CA-SUM.

License and Citation:

These resources are provided for academic, non-commercial use only. If you find these resources useful in your work, please cite the following publication where they are introduced:

E. Apostolidis, G. Balaouras, V. Mezaris, and I. Patras. 2022, “Summarizing Videos using Concentrated Attention and Considering the Uniqueness and Diversity of the Video Frames”, Proc. of the 2022 Int. Conf. on Multimedia Retrieval (ICMR ’22), June 2022, Newark, NJ, USA. https://doi.org/10.1145/3512527.3531404 Software available at: https://github.com/e-apostolidis/CA-SUM

Files

icmr22_zenodo_pretrained_models.zip

Files (193.9 MB)

Name	Size	Download all
icmr22_zenodo_pretrained_models.zip md5:38c1a26a639f39843d4447246c666991	193.9 MB	Preview Download

Additional details

European Commission
MIRROR - Migration-Related Risks caused by misconceptions of Opportunities and Requirement 832921
European Commission
AI4Media - A European Excellence Centre for Media, Society and Democracy 951911

	All versions	This version
Views	933	929
Downloads	372	372
Data volume	84.3 GB	84.3 GB

CA-SUM pretrained models

Authors/Creators

Description

Files

icmr22_zenodo_pretrained_models.zip

Files (193.9 MB)

Additional details

Funding