Published December 31, 2023 | Version v1
Conference paper Open

Masked Feature Modelling for the unsupervised pre-training of a Graph Attention Network block for bottom-up video event recognition

Description

In this paper, we introduce Masked Feature Modelling (MFM), a novel approach for the unsupervised pretraining of a Graph Attention Network (GAT) block. MFM utilizes a pretrained Visual Tokenizer to reconstruct masked features of objects within a video, leveraging the MiniKinetics dataset. We then incorporate the pre-trained GAT block into a state-of-the-art bottom-up supervised video-event recognition architecture, ViGAT, to improve the model’s starting point and overall accuracy. Experimental evaluations on the YLI-MED dataset demonstrate the effectiveness of MFM in improving event recognition performance.

Files

203_ism2023_preprint.pdf

Files (678.7 kB)

Name Size Download all
md5:75713dd0c320c517a59531f3e2ef3f33
678.7 kB Preview Download

Additional details

Funding

European Commission
CRiTERIA – Comprehensive data-driven Risk and Threat Assessment Methods for the Early and Reliable Identification, Validation and Analysis of migration-related risks 101021866