MARVEL - D3.1: Multimodal and privacy-aware audio-visual intelligence – initial version
Description
This document describes the initial version of the methodologies pro- posed by MARVEL partners towards the realisation of the Audio, Visual and Multimodal AI Subsystem of the MARVEL architecture. These include methods for Sound Event De- tection, Sound Event Localisation and Detection, Automated Audio Captioning, Visual Anomaly Detection, Visual Crowd Counting, Audio-Visual Crowd Counting, as well as methodologies for improving the training and efficiency of AI models under supervised, unsupervised, and cross-modal contrastive learning settings. The effectiveness of these methods is compared against recent baselines, towards achieving the AI methodology- related objectives of the MARVEL project.
Files
MARVEL-d3.1.pdf
Files
(9.1 MB)
Name | Size | Download all |
---|---|---|
md5:d8880d78a7afc4f30cee300324f8bfbb
|
9.1 MB | Preview Download |