Published July 14, 2023 | Version v1
Project deliverable Open

MARVEL D3.5 - Multimodal and privacy-aware audio-visual intelligence – final version

  • 1. AU


This document describes methodologies proposed by MARVEL partners during the second reporting period of the project towards the realisation of the Au- dio, Visual and Multimodal AI Subsystem of the MARVEL architecture. These meth- odologies complement the methodologies proposed by MARVEL partners during the first reporting period, and include methods for Automated Audio Captioning, Visual Crowd Counting, Visual Anomaly Detection, Audio-Visual Anomaly Detection, Audio- Visual Event Detection, privacy-preserving Audio-Visual Emotion Recognition, as well as methodologies for improving the training of dense regression models for efficient inference on standard and Gigapixel images, and on heavily compressed images. The effectiveness of these methods is compared against recent baselines, towards achieving the AI methodology-related objectives of the MARVEL project.



Files (25.9 MB)

Name Size Download all
25.9 MB Preview Download

Additional details


MARVEL – Multimodal Extreme Scale Data Analytics for Smart Cities Environments 957337
European Commission