Model-based noise reduction leveraging frequency-wise confidence metric for in-car speech recognition

Osamu Ichikawa; Steven J. Rennie; Takashi Fukuda; Masafumi Nishimura

doi:10.1109/ICASSP.2012.6289023

ICASSP 2012

Conference paper

23 Oct 2012

Model-based noise reduction leveraging frequency-wise confidence metric for in-car speech recognition

View publication

Abstract

Model-based approaches for noise reduction effectively improve the performance of automatic speech recognition in noisy environments. Most of them use the Minimum Mean Square Estimate (MMSE) criterion for de-noised speech estimates. In general, an observation has speech-dominant bands and noise-dominant bands in the Mel spectral domain. This paper introduces a method to add weight to speech-dominated bands when evaluating the posterior probability of each speech state, as these bands are generally more reliable. To leverage high-resolution information in the Mel domain, we use Local Peak Weight (LPW) as the confidence metric for the degree of speech dominance. This information is also used to regulate the amount of compensation that is applied to each frequency band during feature reconstruction under an integrated probabilistic model. The method produced relative word error rate improvements of up to 33.8% over the baseline MMSE method on an isolated word task with car noise. © 2012 IEEE.

Paper