Published May 15, 2024 | Version 1.0.0
Dataset Open

Hearing Anything Anywhere - the DIFFRIR dataset

  • 1. ROR icon Stanford University
  • 2. ROR icon Sony (Japan)
  • 3. ROR icon University of Maryland, College Park



This is the DIFFRIR dataset, released as part of "Hearing Anything Anywhere" (CVPR, 2024).

It contains a dataset of monoaural and binaural Room Impulse Responses (RIRs) and music recordings in four different rooms:

  1. A Classroom
  2. An acoustically Dampened Room
  3. A Hallway
  4. A Complex Room (shaped like a pentagonal prism with several irregular surfaces of varying materials and pillars)

In each room, we measure RIRs using both monoaural and binaural microphones at several hundred precisely-measured locations. We record different music recordings from these locations as well.

We also record several configurations in the Dampened Room, the Hallway, and the Complex Room, where we rotate or translate the location of the speaker used to measure the RIR, or insert one or more whiteboard panels at different locations in the room. Each of these configurations also contains measurements of monoaural and binaural RIRs and music.


There are 14 .zip files in this dataset, containing data from the 14 room configurations across the four rooms. Zip files with the suffix "base" contain data from the base configuration, and,, and contain data from each of the three panel configurations in the hallway.


All data is stored as .npy files. All audio data is stored as numpy float64 arrays with a sampling rate of 48,000 Hz.

All audio files are aligned such that the time the recording starts is equal to the time that the speaker begins playing the music or hypothetical impulse.


In the descriptions below, we specify the contents and shape of each file. N_mono is the number of monoaural data points, and N_binaural is the number of binaural data points recorded in the room configuration.

In each .zip file, we have files for each monoaural data point:

  1. RIRs.npy - (N_mono_data, 671,884) - monoaural room impulse responses measured for the room's configuration.
  2. music.npy - (N_mono_data, N_songs, 623,884) - monoaural music recordings, measured at the same locations as RIRs.npy. N_songs is either 1 or 5.
  3. xyzs.npy - (N_mono_data, 3) - the xyz microphone locations in meters at which RIRs.npy and music.npy were recorded.
  4. music_dls.npy - (N_mono_data, N_songs, 624000) - music source files for each of the monoaural music data points. The source is measured by recording a loopback signal, and is aligned such that convolving a source from music_dls.npy with the corresponding RIR in RIRs.npy estimates the corresponding music recording in music.npy.
  5. mic_numbers.npy - (N_mono,) - an array of integers identifying the microphone of each recording in RIRs.npy and music.npy.

Each .zip file also contains files for each binaural data point:

  1. bin_RIRs.npy - (N_binaural_data, 2, 671,884) - binaural room impulse responses measured for the room's configuration.
  2. bin_music.npy - (N_binaural_data, N_songs, 2, 623,884) - binaural music recordings, measured at the same locations as bin_xyzs.npy. N_songs is 5.
  3. bin_xyzs.npy - (N_binaural_data, 3) - the xyz locations in meters at which each row in bin_RIRs.npy and bin_music.npy was recorded. The location is the center of the binaural microphone. To get the location of the left microphone, add 5.5 cm to the x location, and to get the location of the right microphone, subtract 5.5 cm from the x location (the binaural microphone was facing in the -y direction in all rooms/configurations).
  4. bin_music_dls.npy - (N_binaural_data, N_songs, 624000) - music sources for each binaural data point, measured via direct line loopback.

Room Geometry, Speaker Location

The geometric measurements of the surfaces in the room are provided in the rooms/ folder on the github, including speaker locations and the locations of all surfaces.

Microphone Calibrations

All monoaural microphone recordings were perfomed using EMM6 measurement microphones. They have already been adjusted to account for differences in microphone sensitivity according to the microphone's sensitivity at 1000 Hz.

If you would like to perform more fine-grained microphone frequency calibration, the EMM6 calibration files for each dataset are included in This folder contains four subfolders corresponding to each room. In each subfolder, we included the microphone calibration files .txt files, whose titles are {mic_id}_{mic_serial_number}.txt. {mic_id} corresponds to the microphone ID number provided in mic_numbers.npy for each monoaural data point. For more information on microphone calibration files, refer to this website.


Files (50.2 GB)

Name Size Download all
5.4 GB Preview Download
7.5 GB Preview Download
2.4 GB Preview Download
2.4 GB Preview Download
5.0 GB Preview Download
4.5 GB Preview Download
2.1 GB Preview Download
2.0 GB Preview Download
11.6 GB Preview Download
1.4 GB Preview Download
1.4 GB Preview Download
1.4 GB Preview Download
1.5 GB Preview Download
1.5 GB Preview Download
64.5 kB Preview Download

Additional details

Related works

Data paper: 10.48550/arXiv.2311.03517 (DOI)
Data paper: 10.48550/arXiv.1612.01840 (DOI)


CCRI: ENS: Activity-Centric Interactive Environments for Embodied AI 2120095
National Science Foundation
Collaborative Research: RI: Medium: Learning Compositional Implicit Representations for 3D Scene Understanding 2211258
National Science Foundation


Repository URL
Programming language
Development Status