Evaluating Deep Scattering Spectra with deep neural networks on large scale spontaneous speech task

Petr Fousek; Pierre Dognin; Vaibhava Goel

doi:10.1109/ICASSP.2015.7178832

ICASSP 2015

Conference paper

04 Aug 2015

Evaluating Deep Scattering Spectra with deep neural networks on large scale spontaneous speech task

View publication

Abstract

Deep Scattering Network features introduced for image processing have recently proved useful in speech recognition as an alternative to log-mel features for Deep Neural Network (DNN) acoustic models. Scattering features use wavelet decomposition directly producing log-frequency spectrograms which are robust to local time warping and provide additional information within higher order coefficients. This paper extends previous works by showing how scattering features perform on a state-of-the-art spontaneous speech recognition utilizing DNN acoustic model. We revisit feature normalization and compression topics in an extensive study, putting emphasis on comparing models of the same size. We observe that scattering features outperform baseline log-mel in all conditions, with additional gains from multi-resolution processing.

Conference paper