Dataset Open Access
Tampere University of Technology (TUT) Tietotalo Ambisonic Impulse Response
This dataset consists of impulse responses (IR) from a real environment using the Eigenmike spherical microphone array. The recordings were done in a fairly large spaced corridor inside the university (Tietotalo building) with classrooms around it. The IR acquisition was done using a maximum length sequence (MLS). The measurement was done by slowly moving a Genelec G Two loudspeaker continuously playing the MLS around the Eigenmike in a circular trajectory. The playback volume was set to be 30 dB greater than the ambient sound level. The IRs were collected at elevations −40 to 40 with 10-degree increments at 1 m from the Eigenmike and at elevations −20 to 20 with 10-degree increments at 2 m.
The moving-source IRs were obtained by a freely available tool from CHiME challenge which estimates the time-varying responses in STFT domain by forming a least-squares regression between the known measurement signal and the far-field recording independently at each frequency. The IR for any azimuth within one trajectory can be analyzed by assuming block-wise stationarity of acoustic channel. The CHiME IR estimation tool was applied independently on all 32 channels of the Eigenmike. For the dataset creation, we analyzed the DOA of each time frame using MUSIC and extracted IRs for azimuthal angles at 10° resolution (36 IRs for each elevation).
The IR file is in .mat format and can be read both in Matlab and Python. The details of the IR file are as following,
Size: (2, 9, 1025, 36, 4, 32) = (distance_wrt_mic, elevation_wrt_mic, FFT, azimuth_wrt_mic, blocks, channels).
distance_wrt_mic = two distances (1m and 2m)
elevation_wrt_mic = 9 elevation angles (-40:10:40) at distance 1m, and 5 elevations angles (-20:10:20) at distance 2m.
azimuth_wrt_mic = 36 azimuth angles (-180:10:180) for all distance-elevation combination
The IRs were extracted assuming block-wise stationarity (four blocks) for each frequency bin (1025 bins).
During synthesis, after convolving the IR with a sound event, the 32 channel audio will have to be transformed to Ambisonic format using the transformation matrix of Eigenmike.
This dataset was collected as part of the 'Sound event localization and detection of overlapping sources using convolutional recurrent neural network' work, more details about this IR dataset can be found in this work.
Data collector (s): Fagerlund, Eemi; Koskimies, Aino