There is a newer version of the record available.

Published January 15, 2025 | Version v1
Dataset Restricted

DEAR Dataset

  • 1. ROR icon Sonova (Switzerland)
  • 2. ROR icon Lucerne University of Applied Sciences and Arts
  • 3. ROR icon University of Basel

Description

The DEAR benchmark is generated by adding speech signals to background sound scenes to ensure full control over the acoustic properties of the final mixture. The background recordings were selected from the HOA-SSR dataset sound scene library (Force Technology, Denmark),1 which is a curated collection of 150 audiovisual scenes captured using specialized equipment, designed for comprehensive evaluations in audio product development. In particular, we use the 4th order ambisonics audio, which was recorded using an Eigenmikeem32 and encoded in 25-channel AmbiX format at 48 kHz with a bit depth of 24. The category selection has the purpose of capturing typical everyday situations. The speech signals are proprietary anechoic monologues recorded with Lavalier microphones. They span different vocal effort levels, which are elicited by playing pink noise through headphones at different levels. The anechoic speech signals are then convolved with a set of impulse responses to produce sound mixtures with different combinations of speakers, positions, reverberation, and SNRs. Throughout the process, attention was paid to avoid violations of the overall consistency of the generated sound scenes. 

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Software

Repository URL
https://dear-dataset.github.io
Development Status
Active