Abstract
We describe optimal cue mapping (OCM), a potentially real-time binaural signal processing method for segregating a sound source in the presence of multiple interfering 3D sound sources. Spatial cues are extracted from a multisource binaural mixture and used to train artificial neural networks (ANNs) to estimate the spectral energy fraction of a wanted speech source in the mixture. Once trained, the ANN outputs form a spectral ratio mask which is applied frame-by-frame to the mixture to approximate the magnitude spectrum of the wanted speech. The speech intelligibility performance of the OCM algorithm for anechoic sound sources is evaluated on previously unseen speech mixtures using the STOI automated measures, and compared with an established reference method. The optimized integration of multiple cues offers clear performance benefits and the ability to quantify the relative importance of each cue will facilitate computationally efficient implementations.
Original language | English |
---|---|
Title of host publication | 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing |
Subtitle of host publication | Proceedings |
Place of Publication | Brisbane |
Publisher | IEEE |
Pages | 2095-2099 |
Number of pages | 5 |
ISBN (Print) | 978-1-4673-6997-8 |
DOIs | |
Publication status | Published - 19 Apr 2015 |
Keywords
- Speech segregation
- Artificial Neural Networks
- ratio mask
Datasets
-
Sydney-York Morphological and Recording of Ears database (SYMARE)
Jin, C. (Creator), Guillon, P. (Creator), Zolfaghari, R. (Creator), Epain, N. (Creator), van Schaik, A. (Creator), Tew, A. I. (Creator), Hetherington, C. T. (Creator) & Thorpe, J. (Creator), University of Sydney, 1 Jul 2012
https://www.morphoacoustics.org/
Dataset