There is a newer version of the record available.

Published July 12, 2022 | Version v1
Project deliverable Open

MARVEL - D3.1: Multimodal and privacy-aware audio-visual intelligence – initial version

  • 1. AU


This document describes the initial version of the methodologies pro- posed by MARVEL partners towards the realisation of the Audio, Visual and Multimodal AI Subsystem of the MARVEL architecture. These include methods for Sound Event De- tection, Sound Event Localisation and Detection, Automated Audio Captioning, Visual Anomaly Detection, Visual Crowd Counting, Audio-Visual Crowd Counting, as well as methodologies for improving the training and efficiency of AI models under supervised, unsupervised, and cross-modal contrastive learning settings. The effectiveness of these methods is compared against recent baselines, towards achieving the AI methodology- related objectives of the MARVEL project.



Files (9.1 MB)

Name Size Download all
9.1 MB Preview Download

Additional details


MARVEL – Multimodal Extreme Scale Data Analytics for Smart Cities Environments 957337
European Commission