Hearing Anything Anywhere - the DIFFRIR dataset

Wang, Mason Long; Sawata, Ryosuke; Clarke, Samuel; Gao, Ruohan; Wu, Shangzhe; Wu, Jiajun

doi:10.5281/zenodo.11195833

Published May 15, 2024 | Version 1.0.0

Dataset Open

Hearing Anything Anywhere - the DIFFRIR dataset

1. Stanford University
2. Sony (Japan)
3. University of Maryland, College Park

Overview

This is the DIFFRIR dataset, released as part of "Hearing Anything Anywhere" (CVPR, 2024).

It contains a dataset of monoaural and binaural Room Impulse Responses (RIRs) and music recordings in four different rooms:

A Classroom
An acoustically Dampened Room
A Hallway
A Complex Room (shaped like a pentagonal prism with several irregular surfaces of varying materials and pillars)

In each room, we measure RIRs using both monoaural and binaural microphones at several hundred precisely-measured locations. We record different music recordings from these locations as well.

We also record several configurations in the Dampened Room, the Hallway, and the Complex Room, where we rotate or translate the location of the speaker used to measure the RIR, or insert one or more whiteboard panels at different locations in the room. Each of these configurations also contains measurements of monoaural and binaural RIRs and music.

Organization

There are 14 .zip files in this dataset, containing data from the 14 room configurations across the four rooms. Zip files with the suffix "base" contain data from the base configuration, and hallwayPanel1.zip, hallwayPanel2.zip, and hallwayPanel3.zip contain data from each of the three panel configurations in the hallway.

Files

All data is stored as .npy files. All audio data is stored as numpy float64 arrays with a sampling rate of 48,000 Hz.

All audio files are aligned such that the time the recording starts is equal to the time that the speaker begins playing the music or hypothetical impulse.

In the descriptions below, we specify the contents and shape of each file. N_mono is the number of monoaural data points, and N_binaural is the number of binaural data points recorded in the room configuration.

In each .zip file, we have 5 files for each monoaural data point:

RIRs.npy - (N_mono_data, 671,884) - monoaural room impulse responses measured for the room's configuration.
music.npy - (N_mono_data, N_songs, 623,884) - monoaural music recordings, measured at the same locations as RIRs.npy. N_songs is either 1 or 5.
xyzs.npy - (N_mono_data, 3) - the xyz microphone locations in meters at which RIRs.npy and music.npy were recorded.
music_dls.npy - (N_mono_data, N_songs, 624000) - music source files for each of the monoaural music data points. The source is measured by recording a loopback signal, and is aligned such that convolving a source from music_dls.npy with the corresponding RIR in RIRs.npy estimates the corresponding music recording in music.npy.
mic_numbers.npy - (N_mono,) - an array of integers identifying the microphone of each recording in RIRs.npy and music.npy.

Each .zip file also contains 4 files for each binaural data point:

bin_RIRs.npy - (N_binaural_data, 2, 671,884) - binaural room impulse responses measured for the room's configuration.
bin_music.npy - (N_binaural_data, N_songs, 2, 623,884) - binaural music recordings, measured at the same locations as bin_xyzs.npy. N_songs is 5.
bin_xyzs.npy - (N_binaural_data, 3) - the xyz locations in meters at which each row in bin_RIRs.npy and bin_music.npy was recorded. The location is the center of the binaural microphone. To get the location of the left microphone, add 5.5 cm to the x location, and to get the location of the right microphone, subtract 5.5 cm from the x location (the binaural microphone was facing in the -y direction in all rooms/configurations).
bin_music_dls.npy - (N_binaural_data, N_songs, 624000) - music sources for each binaural data point, measured via direct line loopback.

Room Geometry, Speaker Location

The geometric measurements of the surfaces in the room are provided in the rooms/ folder on the github, including speaker locations and the locations of all surfaces.

Microphone Calibrations

All monoaural microphone recordings were perfomed using EMM6 measurement microphones. They have already been adjusted to account for differences in microphone sensitivity according to the microphone's sensitivity at 1000 Hz.

If you would like to perform more fine-grained microphone frequency calibration, the EMM6 calibration files for each dataset are included in mic_calibrations.zip. This folder contains four subfolders corresponding to each room. In each subfolder, we included the microphone calibration files .txt files, whose titles are {mic_id}_{mic_serial_number}.txt. {mic_id} corresponds to the microphone ID number provided in mic_numbers.npy for each monoaural data point. For more information on microphone calibration files, refer to this website.

Files

classroomBase.zip

Files (50.2 GB)

Name	Size	Download all
classroomBase.zip md5:8d21c72585257c33764c884a2f7def10	5.4 GB	Preview Download
complexBase.zip md5:d1358ed4d39ad498dd8a82381796de03	7.5 GB	Preview Download
complexRotation.zip md5:66a5a4d11472a8773419ba1e6475656d	2.4 GB	Preview Download
complexTranslation.zip md5:f1d98d2b33afe39f9473900f9e3151a3	2.4 GB	Preview Download
dampenedBase.zip md5:9ed2a5231c0d33ea1db4e2a328d74103	5.0 GB	Preview Download
dampenedPanel.zip md5:7ecccdaee8da4686ad1291abc483ff04	4.5 GB	Preview Download
dampenedRotation.zip md5:e3d2f51ecdc2968cab353bc482c4462c	2.1 GB	Preview Download
dampenedTranslation.zip md5:fea5a177ed218e081bbb4940259c4cf1	2.0 GB	Preview Download
hallwayBase.zip md5:99ee819dad68708be87180b2f2ef8eb3	11.6 GB	Preview Download
hallwayPanel1.zip md5:fd507542085f2006cc36ba81dbc456d9	1.4 GB	Preview Download
hallwayPanel2.zip md5:a9c462f65e4b5b63ae1fb260890dd8ae	1.4 GB	Preview Download
hallwayPanel3.zip md5:98d910aeb9a66ba139946861ed0d7db2	1.4 GB	Preview Download
hallwayRotation.zip md5:696e27ac14cc80a6606de9e173c392fa	1.5 GB	Preview Download
hallwayTranslation.zip md5:0d6424cd7389515a7fcde9ed6017eb5f	1.5 GB	Preview Download
mic_calibrations.zip md5:b7b55d48571a5c1e83cd6dc455695488	64.5 kB	Preview Download

Additional details

Cites: Data paper: 10.48550/arXiv.2311.03517 (DOI); Data paper: 10.48550/arXiv.1612.01840 (DOI)

U.S. National Science Foundation
CCRI: ENS: Activity-Centric Interactive Environments for Embodied AI 2120095
U.S. National Science Foundation
Collaborative Research: RI: Medium: Learning Compositional Implicit Representations for 3D Scene Understanding 2211258

Repository URL: https://github.com/maswang32/hearinganythinganywhere
Programming language: Python
Development Status: Active

	All versions	This version
Views	758	758
Downloads	1,408	1,408
Data volume	39.3 TB	39.3 TB

Hearing Anything Anywhere - the DIFFRIR dataset

Overview

Organization

Files

Room Geometry, Speaker Location

Microphone Calibrations

Files

classroomBase.zip

Files (50.2 GB)

Additional details

Related works

Funding

Software

Hearing Anything Anywhere - the DIFFRIR dataset

Creators

Description

Overview

Organization

Files

Room Geometry, Speaker Location

Microphone Calibrations

Files

classroomBase.zip

Files (50.2 GB)

Additional details

Related works

Funding

Software