Published July 14, 2023 | Version v1
Project deliverable Open

MARVEL D3.5 - Multimodal and privacy-aware audio-visual intelligence – final version

  • 1. AU

Description

This document describes methodologies proposed by MARVEL partners during the second reporting period of the project towards the realisation of the Au- dio, Visual and Multimodal AI Subsystem of the MARVEL architecture. These meth- odologies complement the methodologies proposed by MARVEL partners during the first reporting period, and include methods for Automated Audio Captioning, Visual Crowd Counting, Visual Anomaly Detection, Audio-Visual Anomaly Detection, Audio- Visual Event Detection, privacy-preserving Audio-Visual Emotion Recognition, as well as methodologies for improving the training of dense regression models for efficient inference on standard and Gigapixel images, and on heavily compressed images. The effectiveness of these methods is compared against recent baselines, towards achieving the AI methodology-related objectives of the MARVEL project.

Files

MARVEL-d3.5.pdf

Files (25.9 MB)

Name Size Download all
md5:61ca50b0b549fbd11b3446352e9ffdc3
25.9 MB Preview Download

Additional details

Funding

MARVEL – Multimodal Extreme Scale Data Analytics for Smart Cities Environments 957337
European Commission