About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
IEEE Transactions on Audio, Speech and Language Processing
Paper
Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations
Abstract
We describe a content-based audio classification algorithm based on novel multiscale spectra-temporal modulation features inspired by a model of auditory cortical processing. The task explored is to discriminate speech from nonspeech consisting of animal vocalizations, music, and environmental sounds. Although this is a relatively easy task for humans, it is still difficult to automate well, especially in noisy and reverberant environments. The auditory model captures basic processes occurring from the early cochlear stages to the central cortical areas. The model generates a multidimensional spectro-temporal representation of the sound, which is then analyzed by a multilinear dimensionality reduction technique and classified by a support vector machine (SVM). Generalization of the system to signals in high level of additive noise and reverberation is evaluated and compared to two existing approaches (Scheirer and Slaney, 2002 and Kingsbury et al., 2002). The results demonstrate the advantages of the auditory model over the other two systems, especially at low signal-to-noise ratios (SNRs) and high reverberation. © 2006 IEEE.