The segregation of spatialised speech in interference by optimal mapping of diverse cues

Jingbo Gao, Anthony I Tew

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe optimal cue mapping (OCM), a potentially real-time binaural signal processing method for segregating a sound source in the presence of multiple interfering 3D sound sources. Spatial cues are extracted from a multisource binaural mixture and used to train artificial neural networks (ANNs) to estimate the spectral energy fraction of a wanted speech source in the mixture. Once trained, the ANN outputs form a spectral ratio mask which is applied frame-by-frame to the mixture to approximate the magnitude spectrum of the wanted speech. The speech intelligibility performance of the OCM algorithm for anechoic sound sources is evaluated on previously unseen speech mixtures using the STOI automated measures, and compared with an established reference method. The optimized integration of multiple cues offers clear performance benefits and the ability to quantify the relative importance of each cue will facilitate computationally efficient implementations.
Original languageEnglish
Title of host publication2015 IEEE International Conference on Acoustics, Speech, and Signal Processing
Subtitle of host publicationProceedings
Place of PublicationBrisbane
PublisherIEEE
Pages2095-2099
Number of pages5
ISBN (Print) 978-1-4673-6997-8
DOIs
Publication statusPublished - 19 Apr 2015

Keywords

  • Speech segregation
  • Artificial Neural Networks
  • ratio mask

Cite this