DEAR Dataset
Contributors
Contact person:
Description
The DEAR benchmark is generated by adding speech signals to background sound scenes to ensure full control over the acoustic properties of the final mixture. The background recordings were selected from the HOA-SSR dataset sound scene library (Force Technology, Denmark),1 which is a curated collection of 150 audiovisual scenes captured using specialized equipment, designed for comprehensive evaluations in audio product development. In particular, we use the 4th order ambisonics audio, which was recorded using an Eigenmikeem32 and encoded in 25-channel AmbiX format at 48 kHz with a bit depth of 24. The category selection has the purpose of capturing typical everyday situations. The speech signals are proprietary anechoic monologues recorded with Lavalier microphones. They span different vocal effort levels, which are elicited by playing pink noise through headphones at different levels. The anechoic speech signals are then convolved with a set of impulse responses to produce sound mixtures with different combinations of speakers, positions, reverberation, and SNRs. Throughout the process, attention was paid to avoid violations of the overall consistency of the generated sound scenes.
Acknowledgments
We convey the acknowledgment to FORCE Technology, Bang & Olufsen, Demant, GN Store Nord, Sonova, WSA, and industrial partners who created the 360 audio-visual datasets under the HOA-SSR joint project, and to XRHub Team for the great help in dealing with technicalities and field recording.
Notes
Files
Additional details
Funding
- Innosuisse – Swiss Innovation Agency
- DARLING 100.327 IP-ICT
Software
- Repository URL
- https://dear-dataset.github.io
- Development Status
- Active