Masked Feature Modelling for the unsupervised pre-training of a Graph Attention Network block for bottom-up video event recognition

Daskalakis, Dimitrios; Gkalelis, Nikolaos; Mezaris, Vasileios

doi:10.1109/ISM59092.2023.00047

Published December 31, 2023 | Version v1

Conference paper Open

Masked Feature Modelling for the unsupervised pre-training of a Graph Attention Network block for bottom-up video event recognition

1. CERTH-ITI

In this paper, we introduce Masked Feature Modelling (MFM), a novel approach for the unsupervised pretraining of a Graph Attention Network (GAT) block. MFM utilizes a pretrained Visual Tokenizer to reconstruct masked features of objects within a video, leveraging the MiniKinetics dataset. We then incorporate the pre-trained GAT block into a state-of-the-art bottom-up supervised video-event recognition architecture, ViGAT, to improve the model’s starting point and overall accuracy. Experimental evaluations on the YLI-MED dataset demonstrate the effectiveness of MFM in improving event recognition performance.

Files

203_ism2023_preprint.pdf

Files (678.7 kB)

Name	Size	Download all
203_ism2023_preprint.pdf md5:75713dd0c320c517a59531f3e2ef3f33	678.7 kB	Preview Download

Additional details

European Commission
CRiTERIA - Comprehensive data-driven Risk and Threat Assessment Methods for the Early and Reliable Identification, Validation and Analysis of migration-related risks 101021866

Views

Downloads

Show more details

	All versions	This version
Views	70	70
Downloads	49	49
Data volume	33.3 MB	33.3 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

Zenodo

Conference

Proc. 25th IEEE Int. Symp. on Multimedia (ISM 2023)

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: August 1, 2024
Modified: August 1, 2024

Masked Feature Modelling for the unsupervised pre-training of a Graph Attention Network block for bottom-up video event recognition

Authors/Creators

Description

Files

203_ism2023_preprint.pdf

Files (678.7 kB)

Additional details

Funding