Published May 15, 2024 | Version 1.0.0
Dataset Open

Hearing Anything Anywhere - the DIFFRIR dataset

  • 1. ROR icon Stanford University
  • 2. ROR icon Sony (Japan)
  • 3. ROR icon University of Maryland, College Park

Description

Overview

This is the DIFFRIR dataset, released as part of "Hearing Anything Anywhere" (CVPR, 2024).

It contains a dataset of monoaural and binaural Room Impulse Responses (RIRs) and music recordings in four different rooms:

  1. A Classroom
  2. An acoustically Dampened Room
  3. A Hallway
  4. A Complex Room (shaped like a pentagonal prism with several irregular surfaces of varying materials and pillars)

In each room, we measure RIRs using both monoaural and binaural microphones at several hundred precisely-measured locations. We record different music recordings from these locations as well.

We also record several configurations in the Dampened Room, the Hallway, and the Complex Room, where we rotate or translate the location of the speaker used to measure the RIR, or insert one or more whiteboard panels at different locations in the room. Each of these configurations also contains measurements of monoaural and binaural RIRs and music.

Organization

There are 14 .zip files in this dataset, containing data from the 14 room configurations across the four rooms. Zip files with the suffix "base" contain data from the base configuration, and hallwayPanel1.zip, hallwayPanel2.zip, and hallwayPanel3.zip contain data from each of the three panel configurations in the hallway.

Files

All data is stored as .npy files. All audio data is stored as numpy float64 arrays with a sampling rate of 48,000 Hz.

All audio files are aligned such that the time the recording starts is equal to the time that the speaker begins playing the music or hypothetical impulse.

 

In the descriptions below, we specify the contents and shape of each file. N_mono is the number of monoaural data points, and N_binaural is the number of binaural data points recorded in the room configuration.

In each .zip file, we have files for each monoaural data point:

  1. RIRs.npy - (N_mono_data, 671,884) - monoaural room impulse responses measured for the room's configuration.
  2. music.npy - (N_mono_data, N_songs, 623,884) - monoaural music recordings, measured at the same locations as RIRs.npy. N_songs is either 1 or 5.
  3. xyzs.npy - (N_mono_data, 3) - the xyz microphone locations in meters at which RIRs.npy and music.npy were recorded.
  4. music_dls.npy - (N_mono_data, N_songs, 624000) - music source files for each of the monoaural music data points. The source is measured by recording a loopback signal, and is aligned such that convolving a source from music_dls.npy with the corresponding RIR in RIRs.npy estimates the corresponding music recording in music.npy.
  5. mic_numbers.npy - (N_mono,) - an array of integers identifying the microphone of each recording in RIRs.npy and music.npy.

Each .zip file also contains files for each binaural data point:

  1. bin_RIRs.npy - (N_binaural_data, 2, 671,884) - binaural room impulse responses measured for the room's configuration.
  2. bin_music.npy - (N_binaural_data, N_songs, 2, 623,884) - binaural music recordings, measured at the same locations as bin_xyzs.npy. N_songs is 5.
  3. bin_xyzs.npy - (N_binaural_data, 3) - the xyz locations in meters at which each row in bin_RIRs.npy and bin_music.npy was recorded. The location is the center of the binaural microphone. To get the location of the left microphone, add 5.5 cm to the x location, and to get the location of the right microphone, subtract 5.5 cm from the x location (the binaural microphone was facing in the -y direction in all rooms/configurations).
  4. bin_music_dls.npy - (N_binaural_data, N_songs, 624000) - music sources for each binaural data point, measured via direct line loopback.

Room Geometry, Speaker Location

The geometric measurements of the surfaces in the room are provided in the rooms/ folder on the github, including speaker locations and the locations of all surfaces.

Microphone Calibrations

All monoaural microphone recordings were perfomed using EMM6 measurement microphones. They have already been adjusted to account for differences in microphone sensitivity according to the microphone's sensitivity at 1000 Hz.

If you would like to perform more fine-grained microphone frequency calibration, the EMM6 calibration files for each dataset are included in mic_calibrations.zip. This folder contains four subfolders corresponding to each room. In each subfolder, we included the microphone calibration files .txt files, whose titles are {mic_id}_{mic_serial_number}.txt. {mic_id} corresponds to the microphone ID number provided in mic_numbers.npy for each monoaural data point. For more information on microphone calibration files, refer to this website.

Files

classroomBase.zip

Files (50.2 GB)

Name Size Download all
md5:8d21c72585257c33764c884a2f7def10
5.4 GB Preview Download
md5:d1358ed4d39ad498dd8a82381796de03
7.5 GB Preview Download
md5:66a5a4d11472a8773419ba1e6475656d
2.4 GB Preview Download
md5:f1d98d2b33afe39f9473900f9e3151a3
2.4 GB Preview Download
md5:9ed2a5231c0d33ea1db4e2a328d74103
5.0 GB Preview Download
md5:7ecccdaee8da4686ad1291abc483ff04
4.5 GB Preview Download
md5:e3d2f51ecdc2968cab353bc482c4462c
2.1 GB Preview Download
md5:fea5a177ed218e081bbb4940259c4cf1
2.0 GB Preview Download
md5:99ee819dad68708be87180b2f2ef8eb3
11.6 GB Preview Download
md5:fd507542085f2006cc36ba81dbc456d9
1.4 GB Preview Download
md5:a9c462f65e4b5b63ae1fb260890dd8ae
1.4 GB Preview Download
md5:98d910aeb9a66ba139946861ed0d7db2
1.4 GB Preview Download
md5:696e27ac14cc80a6606de9e173c392fa
1.5 GB Preview Download
md5:0d6424cd7389515a7fcde9ed6017eb5f
1.5 GB Preview Download
md5:b7b55d48571a5c1e83cd6dc455695488
64.5 kB Preview Download

Additional details

Related works

Cites
Data paper: 10.48550/arXiv.2311.03517 (DOI)
Data paper: 10.48550/arXiv.1612.01840 (DOI)

Funding

CCRI: ENS: Activity-Centric Interactive Environments for Embodied AI 2120095
National Science Foundation
Collaborative Research: RI: Medium: Learning Compositional Implicit Representations for 3D Scene Understanding 2211258
National Science Foundation

Software

Repository URL
https://github.com/maswang32/hearinganythinganywhere
Programming language
Python
Development Status
Active