Published October 21, 2025 | Version v4
Dataset Open

CAVEMOVE: An acoustic database for the study of voice-enabled technologies inside moving vehicles

Description

We provide a collection of multichannel audio recordings obtained inside four different cars. The recording process involves (i) recordings of acoustic impulse responses, which are acquired at static conditions and provide the means for modeling the speech and car-audio components (ii) recordings of acoustic noise at a wide range of both static and in-motion conditions. Data is recorded with two different microphone configurations and particularly (i) a compact microphone array or (ii) a distributed microphone setup. A python API and a Matlab API, that can be freely downloaded from CAVEMOVE github, is provided as the means to easily exploit the open access audio recordings to synthesize mixtures of speech and acoustic noise at different driving conditions. This way, the user can easily synthesize the microphone signals required for research on voice enabled technologies inside moving vehicles.

All audio recordings and impulse responses are provided at 16 kHz sampling rate and are in the form of 8-channel .wav files.

Some basic principles followed in CAVEMOVE APIs are the following. 
- We provide functions for retrieving speech and noise components as separate entities (e.g. numpy arrays). Users must then add the speech and noise components to derive a mixture.
- Noise recordings are derived as a function of driving conditions, specifically the speed (in km/hour) and the window aperture (3 or 4 different windows conditions are considered in each vehicle)
-Apart from the basic noise components, we also provide means for adding ventilation/air-condition noise and also, interference from the built-in car audio system (e.g. radio, cd player etc)
-To produce the speech components, users must provide their own dry speech recordings 
-To produce the car-audio components, users must provide their own audio signals.

The Documentation .pdf  file that we provide along with the audio recordings lists all the conditions that were recorded or measured inside the three cars  (the same documantation can also be found in CAVEMOVE github). This information is important for correct use of the python API, since, asking for a condition that was not recorded can potentially produce an error. Note that a list of recorded driving conditions given the car name and the microphone configuration can also be retrieved from auxiliary  functions included in the APIs.

For any questions with respect to CAVEMOVE dataset or API, feel free to send an email to 
Andreas Symiakakis at  andrysmi@ics.forth.gr 
or
Nikos Stefanakis at nstefana@ics.forth.gr

CAVEMOVE project is funded by the Institute of Computer Science of the Foundation for Research and Technology-Hellas (FORTH).

Files

cavemove_dataset.zip

Files (7.6 GB)

Name Size Download all
md5:dec31a69f02a304a2b55b6b4db665c5e
7.6 GB Preview Download
md5:a76e68b81c8ea7c042b89c111f157040
14.9 MB Preview Download

Additional details

Software

Repository URL
https://github.com/SPL-FORTH-ICS/CAVEMOVE
Programming language
Python