MARVEL D3.5 - Multimodal and privacy-aware audio-visual intelligence – final version
Description
This document describes methodologies proposed by MARVEL partners during the second reporting period of the project towards the realisation of the Au- dio, Visual and Multimodal AI Subsystem of the MARVEL architecture. These meth- odologies complement the methodologies proposed by MARVEL partners during the first reporting period, and include methods for Automated Audio Captioning, Visual Crowd Counting, Visual Anomaly Detection, Audio-Visual Anomaly Detection, Audio- Visual Event Detection, privacy-preserving Audio-Visual Emotion Recognition, as well as methodologies for improving the training of dense regression models for efficient inference on standard and Gigapixel images, and on heavily compressed images. The effectiveness of these methods is compared against recent baselines, towards achieving the AI methodology-related objectives of the MARVEL project.
Files
MARVEL-d3.5.pdf
Files
(25.9 MB)
Name | Size | Download all |
---|---|---|
md5:61ca50b0b549fbd11b3446352e9ffdc3
|
25.9 MB | Preview Download |