Masked Conditional Neural Networks for Environmental Sound Classification

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The ConditionaL Neural Network (CLNN) exploits the nature of the temporal sequencing of the sound signal represented in a spectrogram, and its variant the Masked ConditionaL Neural Network (MCLNN) induces the network to learn in frequency bands by embedding a filterbank-like sparseness over the network’s links using a binary mask. Additionally, the masking automates the exploration of different feature combinations concurrently analogous to handcrafting the optimum combination of features for a recognition task. We have evaluated the MCLNN performance using the Urbansound8k dataset of environmental sounds. Additionally, we present a collection of manually recorded sounds for rail and road traffic, YorNoise, to investigate the confusion rates among machine generated sounds possessing low-frequency components. MCLNN has achieved competitive results without augmentation and using 12% of the trainable parameters utilized by an equivalent model based on state-of-the-art Convolutional Neural Networks on the Urbansound8k. We extended the Urbansound8k dataset with YorNoise, where experiments have shown that common tonal properties affect the classification performance.
Original languageEnglish
Title of host publicationArtificial Intelligence XXXIV
Subtitle of host publication37th SGAI International Conference on Artificial Intelligence, AI 2017, Cambridge, UK, December 12-14, 2017, Proceedings
PublisherSpringer
Pages21-33
ISBN (Electronic)9783319710785
ISBN (Print)9783319710778
DOIs
Publication statusPublished - Feb 2018
EventArtificial Intelligence XXXIV - , United Kingdom
Duration: 12 Dec 2017 → …

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume10630

Conference

ConferenceArtificial Intelligence XXXIV
Abbreviated titleSGAI
Country/TerritoryUnited Kingdom
Period12/12/17 → …

Cite this