Continuous digits recognition leveraging invariant structure

Masayuki Suzuki; Gakuto Kurata; Masafumi Nishimura; Nobuaki Minematsu

INTERSPEECH 2011

Conference paper

01 Dec 2011

Continuous digits recognition leveraging invariant structure

Abstract

Recently, an invariant structure of speech was proposed, where the inevitable acoustic variations caused by non-linguistic factors are effectively removed from speech. The invariant structure was applied to isolated word recognition and the experimental results showed good performance. However, the previous method can't apply to continuous speech recognition directly because there was no efficient decoding algorithm. In this paper, we propose a method to leverage the invariant structure in continuous digits recognition. We use a traditional HMM-based Automatic Speech Recognition (ASR) system to get N-best lists with phone alignments. Then we construct invariant structures using these phone alignments and re-rank the Nbest lists by investigating which hypothesis is structurally more valid. Experimental results show a relative WER improvement of 17.4% over the baseline HMM-based ASR system. Copyright © 2011 ISCA.

Conference paper