Published October 21, 2019 | Version v1
Conference paper Open

Multimodal Fusion of Appearance Features, Optical Flow and Accelerometer Data for Speech Detection

Description

In this paper we examine the task of automatic detection of speech without microphones, using an overhead camera and wearable accelerometers. For this purpose, we propose the extraction of hand-crafted appearance and optical flow features from the video modality, and time-domain features from the accelerometer data. We evaluate the performance of the separate modalities in a large dataset of over 25 hours of standing conversation between multiple individuals. Finally, we show that applying a multimodal late fusion technique can lead to a performance boost in most cases.

Files

giannakeris2019noaudio.pdf

Files (459.8 kB)

Name Size Download all
md5:8b918e5baaa6f73dcdb1a386ece9eb21
459.8 kB Preview Download

Additional details

Funding

SUITCEYES – Smart, User-friendly, Interactive, Tactual, Cognition-Enhancer that Yields Extended Sensosphere - Appropriating sensor technologies, machine learning, gamification and smart haptic interfaces 780814
European Commission