MARVEL D3.5 - Multimodal and privacy-aware audio-visual intelligence – final version

Alexandros Iosifidis

doi:10.5281/zenodo.8147164

Published July 14, 2023 | Version v1

Project deliverable Open

MARVEL D3.5 - Multimodal and privacy-aware audio-visual intelligence – final version

Alexandros Iosifidis¹

1. AU

This document describes methodologies proposed by MARVEL partners during the second reporting period of the project towards the realisation of the Au- dio, Visual and Multimodal AI Subsystem of the MARVEL architecture. These meth- odologies complement the methodologies proposed by MARVEL partners during the first reporting period, and include methods for Automated Audio Captioning, Visual Crowd Counting, Visual Anomaly Detection, Audio-Visual Anomaly Detection, Audio- Visual Event Detection, privacy-preserving Audio-Visual Emotion Recognition, as well as methodologies for improving the training of dense regression models for efficient inference on standard and Gigapixel images, and on heavily compressed images. The effectiveness of these methods is compared against recent baselines, towards achieving the AI methodology-related objectives of the MARVEL project.

Files

MARVEL-d3.5.pdf

Files (25.9 MB)

Name	Size	Download all
MARVEL-d3.5.pdf md5:61ca50b0b549fbd11b3446352e9ffdc3	25.9 MB	Preview Download

Additional details

European Commission
MARVEL - Multimodal Extreme Scale Data Analytics for Smart Cities Environments 957337

	All versions	This version
Views	339	336
Downloads	388	386
Data volume	10.7 GB	10.6 GB

MARVEL D3.5 - Multimodal and privacy-aware audio-visual intelligence – final version

Authors/Creators

Description

Files

MARVEL-d3.5.pdf

Files (25.9 MB)

Additional details

Funding