Application of Machine Learning for the Spatial Analysis of Binaural Room Impulse Responses

Dataset

Description

This repository contains supplementary material for the paper titled `Application of Machine Learning for the Spatial Analysis of Binaural Room Impulse Responses' Available at: dx.doi.org/10.3390/app8010105. The dataset contains:

Folders:
1.) neg90 - This folder contains gaussian normalisation parameters stored as text files and the weights and biases for the trained neural network - these are all for the -90° rotation neural network.

2.) pos90 - This folder contains gaussian normalisation parameters stored as text files and the weights and biases for the trained neural network - these are all for the +90° rotation neural network.

3.) testData - this folder contains pre-generated test data for the different binaural dummy head microphones, speaker, and signal type combinations.

Python Scripts:

1.) AnalyseDoA.py - A python script that can be run to test the neural network using the pre-generated test data - running the script will allow the user to input the binaural dummy head, speaker, and signal type. The important variables generated by this script are DoA - the direction of arrival for each signal in the feature vector, and yDiff - the difference between the predicted DoA and the expected direction of arrival

2.) DirectionAnalysis.py - This python file contains a set of functions that are used to define the neural network, and run it. The function called DoAPrediction takes the feature vector generated by the MATLAB code as its input argument, these features will then be passed to the neural network, and the output of this function is the direction of arrival predicted by the neural network for each signal. The functions: DoAAnalysis_neg90 and DoAAnalysis_pos90 are called by the DoAPrediction function, these functions create the neural network using the NN function, import the weights and biases, and passes the feature matrix (provided as input) through the neural network - the output of these functions are the predicted direction of arrival.

MATLAB files:

1.) runAnalysis.m - This MATLAB script analyses the dataset provided as part of this repository. Users can change the variables head ('KEMAR' or 'KU100'), signalType ('directSound' or 'reflection'), and speaker ('EquatorD5' or 'Genelec8030'). This script will produce the gaussian normalised feature vector and expected direction of arrival for all signals with the defined head, signal type, and speaker combination. These variables are then saved in .mat files so they can be imported by the python scripts.

2.) BinauralModelCochlea.m - This MATLAB function analyses a given binaural signal and outputs the interaural cross-correlation, interaural level difference, interaural time difference, the cochlea output for the left and right channel and the centre frequencies of the gammatone filter band. The input variables are: IR - the signal to be analysed, N - the number of gammatone filters, freqLow - the lowest centre frequency of the gammatone filter bank (centre frequency of the first gammatone filter), and freqHigh - the highest centre frequency of the gammatone filter bank (the centre frequency of the Nth gammatone filter). This function requires Malcolm Slaney's Auditory Toolbox and Bin Gao's Cochleagram function in order to work (see paper for references).

3.) generateFeatureVector.m - This MATLAB function generates a feature vector from an input binaural signal x, and a version of the signal captured after the binaural dummy head has been rotated by either +90° or -90° degree (variables xPos90 and xNeg90 respectively). If the sampling frequency (Fs) isn't 44100, the signals are resampled to be at 44100. This file also contains a function 'gaussianNormalisationTestData' which gaussian normalises the data using the mean and standard deviation calculated from the data used to train the neural networks - the mean and standard deviation values are stored in the folder GMParams in the pos90 and neg90 folders.

4.) generateTestData.m - This MATLAB function analyses the included binaural dataset, it takes the input variables: head - the binaural dummy head used for the measurements either 'KEMAR' or 'KU100', speaker - the speaker used for the measurements either 'EquatorD5' or 'Genelec8030', and signalType - the type of signal being analysed either 'directSound' or 'reflection'.

Text files:

1.) noLayers.txt - a text file containing the number of layers used when training the neural network - with the current version of the code the neural network contains only 1 layer.

2.) README.txt - Read me file containing information about the repository.

Audio files:

This repository contains 1152 binaural signals half of which are direct sounds segmented from a binaural room impulse responses and the other half are reflections segmented from binaural room impulse responses (detailed in the paper this material supports) the direct sounds are recorded at angles from 0° to 357.5° in steps of 2.5° and the reflections are recorded at angles of 1° to 358.5° in steps of 2.5°. In the paper only recordings relating to signals recorded with the Equator D5 are analysed.

The combination of audio files include:

1.) 144 direct sound recordings captured with the KEMAR 45BC binaural dummy head microphone and the Equator D5 speaker
2.) 144 reflection recordings captured with the KEMAR 45BC binaural dummy head microphone and the Equator D5 speaker
3.) 144 direct sound recordings captured with the KU100 binaural dummy head microphone and the Equator D5 speaker
4.) 144 reflection recordings captured with the KU100 binaural dummy head microphone and the Equator D5 speaker
5.) 144 direct sound recordings captured with the KEMAR 45BC binaural dummy head microphone and the Genelec 8030 speaker
6.) 144 reflection recordings captured with the KEMAR 45BC binaural dummy head microphone and the Genelec 8030 speaker
7.) 144 direct sound recordings captured with the KU100 binaural dummy head microphone and the Genelec 8030 speaker
8.) 144 reflection recordings captured with the KU100 binaural dummy head microphone and the Genelec 8030 speaker

The files are stored using the following file naming convention:
head_Test3_speaker_signalType_000_0_Degrees.wav - where _000_0 defines the azimuth direction of arrival so for example for a direct sound measured with the KEMAR unit and the Genelec8030 at 5 degrees would be 'KEMAR_Test3_Genelec8030_directSound_005_0Degrees.wav' and for a reflection measured with the KU100 and the Equator D5 at 298.5 degrees would be 'KU100_Test3_EquatorD5_reflection_298_5Degrees.wav'
Date made available12 Jan 2018
PublisherZenodo
Temporal coverage1 Jan 2017 - 12 Jan 2018
Date of data production1 Jan 2017 - 12 Jan 2018
Geographical coverageYork, UK

Cite this