Published October 27, 2017 | Version 1
Dataset Open

Dataset for: Application of Machine Learning for the Spatial Analysis of Binaural Room Impulse Responses

  • 1. University of York

Description

This repository contains supplementary material for the paper titled `Application of Machine Learning for the Spatial
Analysis of Binaural Room Impulse Responses' Available at: dx.doi.org/10.3390/app8010105 . These programs and audio files are distributed in the hopes that they will prove useful under the Creative Commons Attribution 4.0, with no warranty; or the implied warranty of merchantability or fitness for a particular problem. Please give appropriate credit for use of the material provided in this repository back to the author. 

In order to use the MatLab code the Auditory Toolbox by Malcolm Slaney [1] and the Cochleagram function distributed by Bin Gao [2] are required.

The python scrips require the following Python libraries to be installed: Numpy[3], SciPy[4] and Tensorflow [5].

The MatLab code was tested using MatLab R2017a on a Computer running windows 7.

The python code was tested using Python 3.2.5, using an anaconda Python environment - in windows command line.

--

The repository contains:

Folders:


1.) neg90 - This folder contains the gaussian normalisation parameters stored as text files and the weights and biases for the trained neural network - these are all for the -90° rotation neural network.

2.) pos90 - This folder contains the gaussian normalisation parameters stored as text files and the weights and biases for the trained neural network - these are all for the +90° rotation neural network.

3.) testData - this folder contains pre-generated test data for the different binaural dummy head microphones, speaker, and signal type combinations.

Python Scripts:


1.) AnalyseDoA.py - A python script that can be run to test the neural network using the pre-generated test data - running the script will allow the user to input the binaural dummy head, speaker, and signal type. The important variables generated by this script are DoA - the direction of arrival for each signal in the feature vector, and yDiff - the difference between the predicted DoA and the expected direction of arrival

2.) DirectionAnalysis.py - This python file contains a set of function that are used to define the neural network, and run it. The function called DoAPrediction takes the feature vector generated by the MatLab code as its input argument, these features will then be passed to the neural network, and the output of this function is the direction of arrival predicted by the neural network for each signal. The functions: DoAAnalysis_neg90 and DoAAnalysis_pos90 are called by the DoAPrediction function, these functions create the neural network using the NN function, import the weights and biases, and passes the feature matrix (provided as input) through the neural network - the output of these functions are the predicted direction of arrival.

MatLab files:


1.) runAnalysis.m - This MatLab script analyses the dataset provided as part of this repository. Users can change the variables head ('KEMAR' or 'KU100'), signalType ('directSound' or 'reflection'), and speaker ('EquatorD5' or 'Genelec8030'). This script will produce the gaussian normalised feature vector and expected direction of arrival for all signals with the defined head, signal type, and speaker combination. These variables are then saved in .mat files so they can be imported by the python scripts.

2.) BinauralModelCochlea.m - This MatLab function analyses a given binaural signal and outputs the interaural cross-correlation, interaural level difference, interaural time difference, the cochlea output for the left and right channel and the centre frequencies of the gammatone filter band. The input variables are: IR - the signal to be analysed, N - the number of gammatone filters, freqLow - the lowest centre frequency of the gammatone filter bank (centre frequency of the first gammatone filter), and freqHigh - the highest centre frequency of the gammatone filter bank (the centre frequency of the Nth gammatone filter). This function requires Malcolm Slaney's Auditory Toolbox [1] and Bin Gao's Cochleagram function [2] in order to work.

3.) generateFeatureVector.m - This MatLab function generates a feature vector from an input binaural signal x, and a version of the signal captured after the binaural dummy head has been rotated by either +90° or -90° degree (variables xPos90 and xNeg90 respectively). If the sampling frequency (Fs) isn't 44100, the signals are resampled to be at 44100. This file also contains a function 'gaussianNormalisationTestData' which gaussian normalises the data using the mean and standard deviation calculated from the data used to train the neural networks - the mean and standard deviation values are stored in the folder GMParams in the pos90 and neg90 folders.

4.) generateTestData.m - This MatLab function analyses the included binaural dataset, it takes the input variables: head - the binaural dummy head used for the measurements either 'KEMAR' or 'KU100', speaker - the speaker used for the measurements either 'EquatorD5' or 'Genelec8030', and signalType - the type of signal being analysed either 'directSound' or 'reflection'.

Text files:


1.) noLayers.txt - a text file containing the number of layers used when training the neural network - with the current version of the code the neural network contains only 1 layer.

2.) README.txt - Read me file containing information about the repository.

Audio files:


This repository contains 1152 binaural signals half of which are direct sounds segmented from a binaural room impulse responses and the other half are reflections segmented from binaural room impulse responses (detailed in the paper this material supports) the direct sounds are recorded at angles from 0° to 357.5° in steps of 2.5° and the reflections are recorded at angles of 1° to 358.5° in steps of 2.5°. In the paper only recordings relating to signals recorded with the Equator D5 are analysed.

The combination of audio files include:

1.) 144 direct sound recordings captured with the KEMAR 45BC binaural dummy head microphone and the Equator D5 speaker
2.) 144 reflection recordings captured with the KEMAR 45BC binaural dummy head microphone and the Equator D5 speaker
3.) 144 direct sound recordings captured with the KU100 binaural dummy head microphone and the Equator D5 speaker
4.) 144 reflection recordings captured with the KU100 binaural dummy head microphone and the Equator D5 speaker
5.) 144 direct sound recordings captured with the KEMAR 45BC binaural dummy head microphone and the Genelec 8030 speaker
6.) 144 reflection recordings captured with the KEMAR 45BC binaural dummy head microphone and the Genelec 8030 speaker
7.) 144 direct sound recordings captured with the KU100 binaural dummy head microphone and the Genelec 8030 speaker
8.) 144 reflection recordings captured with the KU100 binaural dummy head microphone and the Genelec 8030 speaker

The files are stored using the following file naming convention:
head_Test3_speaker_signalType_000_0_Degrees.wav - where _000_0 defines the azimuth direction of arrival so for example for a direct sound measured with the KEMAR unit and the Genelec8030 at 5 degrees would be 'KEMAR_Test3_Genelec8030_directSound_005_0Degrees.wav' and for a reflection measured with the KU100 and the Equator D5 at 298.5 degrees would be 'KU100_Test3_EquatorD5_reflection_298_5Degrees.wav'

--

Bibliography:
[1] Slaney, M. (1998). Auditory Toolbox. Palo Alto, CA. [Online]. Available: https://engineering.purdue.edu/~malcolm/interval/1998-010/ [Accessed: Oct. 27, 2017]

[2] Gao, B. (2014). Cochleagram and IS-NMF2D for Blind Source Separation. [Online] Available: http://uk.mathworks.com/matlabcentral/fileexchange/48622-cochleagram-and-is-nmf2d-for-blind-source-separation?focused=3855900&tab=function [Accessed: Oct. 27, 2017]

[3] NumFocus. (n.d.). NumPy. [Online]. Available: http://www.numpy.org/ [Accessed: Oct. 27, 2017]

[4] SciPy. (n.d.). SciPy. [Online]. Available: https://www.scipy.org/ [Accessed: Oct. 27, 2017]

[5] Google. (n.d.). TensorFlow. [Online] Available: https://www.tensorflow.org/ [Accessed: Oct. 27, 2017]

--

All code and audio produced by: Michael Lovedee-Turner, PhD candidate in Music Technology at the Audio Lab, Department of Electronic Engineering, University of York

Contact: mjlt500@york.ac.uk

Notes

Funding was provided by a UK Engineering and Physical Sciences Research Council (EPSRC) Doctoral Training Award, the Department of Electronic Engineering at the University of York.

Files

Application of Machine Learning for the Spatial Analysis of Binaural Room Impulse Responses.zip

Additional details

Related works

Is supplement to
10.3390/app8010105 (DOI)