Robust speech processing using ARMA spectrogram models
Abstract
Speech applications in noisy and degraded channel conditions continue to be a challenging problem especially when there is a mismatch between the training and test conditions. In this paper, a robust speech feature extraction scheme is developed based on autoregressive moving average (ARMA) modeling that emphasizes high energy regions of the signal with a data driven modulation filter. The peak preserving ability of two dimensional autoregressive (AR) models is used to emphasize the high energy regions in the spectrotemporal domain. The modulation filtering property is achieved by moving average (MA) modeling. The ARMA spectrograms are used to derive features for speech recognition in the Aurora-4 database. In these experiments, the ARMA model features provide significant improvements (relative improvements of 15%) compared to other robust features. Furthermore, the robustness of these features is also verified for language identification (LID) of highly degraded radio channel speech. Here, the ARMA approach achieves relative improvements of up to 20% over the baseline features.