Sparse representation features for speech recognition

Tara N. Sainath; Bhuvana Ramabhadran; David Nahamoo; Dimitri Kanevsky; Abhinav Sethy

INTERSPEECH 2010

Conference paper

26 Sep 2010

Sparse representation features for speech recognition

Abstract

In this paper, we explore the use of exemplar-based sparse representations (SRs) to map test features into the linear span of training examples. We show that the frame classification accuracy with these new features is 1.3% higher than a Gaussian Mixture Model (GMM), showing that not only do SRs move test features closer to training, but also move the features closer to the correct class. Given these new SR features, we train up a Hidden Markov Model (HMM) on these features and perform recognition. On the TIMIT corpus, we show that applying the SR features on top of our best discriminatively trained system allows for a 0.7% absolute reduction in phonetic error rate (PER), from 19.9% to 19.2%. In fact, after applying model adaptation we reduce the PER to 19.0%, the best results on TIMIT to date. Furthermore, on a large vocabulary 50 hour broadcast news task, we achieve a reduction in word error rate (WER) of 0.3% absolute, demonstrating the benefit of this method for large vocabulary speech recognition. © 2010 ISCA.

Paper