Published October 21, 2019 | Version v1
Conference paper Open

Multimodal Fusion of Appearance Features, Optical Flow and Accelerometer Data for Speech Detection


In this paper we examine the task of automatic detection of speech without microphones, using an overhead camera and wearable accelerometers. For this purpose, we propose the extraction of hand-crafted appearance and optical flow features from the video modality, and time-domain features from the accelerometer data. We evaluate the performance of the separate modalities in a large dataset of over 25 hours of standing conversation between multiple individuals. Finally, we show that applying a multimodal late fusion technique can lead to a performance boost in most cases.



Files (459.8 kB)

Name Size Download all
459.8 kB Preview Download

Additional details


SUITCEYES – Smart, User-friendly, Interactive, Tactual, Cognition-Enhancer that Yields Extended Sensosphere - Appropriating sensor technologies, machine learning, gamification and smart haptic interfaces 780814
European Commission