DEAR Dataset
Contributors
Contact person:
Description
The DEAR benchmark is generated by adding speech signals to background sound scenes to ensure full control over the acoustic properties of the final mixture. The background recordings were selected from the HOA-SSR dataset sound scene library (Force Technology, Denmark),1 which is a curated collection of 150 audiovisual scenes captured using specialized equipment, designed for comprehensive evaluations in audio product development. In particular, we use the 4th order ambisonics audio, which was recorded using an Eigenmikeem32 and encoded in 25-channel AmbiX format at 48 kHz with a bit depth of 24. The category selection has the purpose of capturing typical everyday situations. The speech signals are proprietary anechoic monologues recorded with Lavalier microphones. They span different vocal effort levels, which are elicited by playing pink noise through headphones at different levels. The anechoic speech signals are then convolved with a set of impulse responses to produce sound mixtures with different combinations of speakers, positions, reverberation, and SNRs. Throughout the process, attention was paid to avoid violations of the overall consistency of the generated sound scenes.
Files
Additional details
Software
- Repository URL
- https://dear-dataset.github.io
- Development Status
- Active