Hearing Anything Anywhere - the DIFFRIR dataset
Creators
Description
Overview
This is the DIFFRIR dataset, released as part of "Hearing Anything Anywhere" (CVPR, 2024).
It contains a dataset of monoaural and binaural Room Impulse Responses (RIRs) and music recordings in four different rooms:
- A Classroom
- An acoustically Dampened Room
- A Hallway
- A Complex Room (shaped like a pentagonal prism with several irregular surfaces of varying materials and pillars)
In each room, we measure RIRs using both monoaural and binaural microphones at several hundred precisely-measured locations. We record different music recordings from these locations as well.
We also record several configurations in the Dampened Room, the Hallway, and the Complex Room, where we rotate or translate the location of the speaker used to measure the RIR, or insert one or more whiteboard panels at different locations in the room. Each of these configurations also contains measurements of monoaural and binaural RIRs and music.
Organization
There are 14 .zip files in this dataset, containing data from the 14 room configurations across the four rooms. Zip files with the suffix "base" contain data from the base configuration, and hallwayPanel1.zip
, hallwayPanel2.zip
, and hallwayPanel3.zip
contain data from each of the three panel configurations in the hallway.
Files
All data is stored as .npy
files. All audio data is stored as numpy float64 arrays with a sampling rate of 48,000 Hz.
All audio files are aligned such that the time the recording starts is equal to the time that the speaker begins playing the music or hypothetical impulse.
In the descriptions below, we specify the contents and shape of each file. N_mono
is the number of monoaural data points, and N_binaural
is the number of binaural data points recorded in the room configuration.
In each .zip file, we have 5 files for each monoaural data point:
RIRs.npy
- (N_mono_data, 671,884) - monoaural room impulse responses measured for the room's configuration.music.npy
- (N_mono_data, N_songs, 623,884) - monoaural music recordings, measured at the same locations asRIRs.npy
. N_songs is either 1 or 5.xyzs.npy
- (N_mono_data, 3) - the xyz microphone locations in meters at whichRIRs.npy
andmusic.npy
were recorded.music_dls.npy
- (N_mono_data, N_songs, 624000) - music source files for each of the monoaural music data points. The source is measured by recording a loopback signal, and is aligned such that convolving a source frommusic_dls.npy
with the corresponding RIR inRIRs.npy
estimates the corresponding music recording inmusic.npy
.mic_numbers.npy
- (N_mono,) - an array of integers identifying the microphone of each recording inRIRs.npy
andmusic.npy
.
Each .zip file also contains 4 files for each binaural data point:
bin_RIRs.npy
- (N_binaural_data, 2, 671,884) - binaural room impulse responses measured for the room's configuration.bin_music.npy
- (N_binaural_data, N_songs, 2, 623,884) - binaural music recordings, measured at the same locations asbin_xyzs.npy
. N_songs is 5.bin_xyzs.npy
- (N_binaural_data, 3) - the xyz locations in meters at which each row inbin_RIRs.npy
andbin_music.npy
was recorded. The location is the center of the binaural microphone. To get the location of the left microphone, add 5.5 cm to the x location, and to get the location of the right microphone, subtract 5.5 cm from the x location (the binaural microphone was facing in the -y direction in all rooms/configurations).bin_music_dls.npy
- (N_binaural_data, N_songs, 624000) - music sources for each binaural data point, measured via direct line loopback.
Room Geometry, Speaker Location
The geometric measurements of the surfaces in the room are provided in the rooms/
folder on the github, including speaker locations and the locations of all surfaces.
Microphone Calibrations
All monoaural microphone recordings were perfomed using EMM6 measurement microphones. They have already been adjusted to account for differences in microphone sensitivity according to the microphone's sensitivity at 1000 Hz.
If you would like to perform more fine-grained microphone frequency calibration, the EMM6 calibration files for each dataset are included in mic_calibrations.zip
. This folder contains four subfolders corresponding to each room. In each subfolder, we included the microphone calibration files .txt files, whose titles are {mic_id}_{mic_serial_number}.txt. {mic_id} corresponds to the microphone ID number provided in mic_numbers.npy
for each monoaural data point. For more information on microphone calibration files, refer to this website.
Files
classroomBase.zip
Files
(50.2 GB)
Name | Size | Download all |
---|---|---|
md5:8d21c72585257c33764c884a2f7def10
|
5.4 GB | Preview Download |
md5:d1358ed4d39ad498dd8a82381796de03
|
7.5 GB | Preview Download |
md5:66a5a4d11472a8773419ba1e6475656d
|
2.4 GB | Preview Download |
md5:f1d98d2b33afe39f9473900f9e3151a3
|
2.4 GB | Preview Download |
md5:9ed2a5231c0d33ea1db4e2a328d74103
|
5.0 GB | Preview Download |
md5:7ecccdaee8da4686ad1291abc483ff04
|
4.5 GB | Preview Download |
md5:e3d2f51ecdc2968cab353bc482c4462c
|
2.1 GB | Preview Download |
md5:fea5a177ed218e081bbb4940259c4cf1
|
2.0 GB | Preview Download |
md5:99ee819dad68708be87180b2f2ef8eb3
|
11.6 GB | Preview Download |
md5:fd507542085f2006cc36ba81dbc456d9
|
1.4 GB | Preview Download |
md5:a9c462f65e4b5b63ae1fb260890dd8ae
|
1.4 GB | Preview Download |
md5:98d910aeb9a66ba139946861ed0d7db2
|
1.4 GB | Preview Download |
md5:696e27ac14cc80a6606de9e173c392fa
|
1.5 GB | Preview Download |
md5:0d6424cd7389515a7fcde9ed6017eb5f
|
1.5 GB | Preview Download |
md5:b7b55d48571a5c1e83cd6dc455695488
|
64.5 kB | Preview Download |
Additional details
Related works
- Cites
- Data paper: 10.48550/arXiv.2311.03517 (DOI)
- Data paper: 10.48550/arXiv.1612.01840 (DOI)
Funding
- U.S. National Science Foundation
- CCRI: ENS: Activity-Centric Interactive Environments for Embodied AI 2120095
- U.S. National Science Foundation
- Collaborative Research: RI: Medium: Learning Compositional Implicit Representations for 3D Scene Understanding 2211258
Software
- Repository URL
- https://github.com/maswang32/hearinganythinganywhere
- Programming language
- Python
- Development Status
- Active