Publication
ChinaSIP 2014
Conference paper

Leveraging phonetic context dependent invariant structure for continuous speech recognition

View publication

Abstract

Speech acoustics intrinsically vary due to linguistic and non-linguistic factors. The invariant structure extracted from a given utterance is one of the long-span acoustic representations, where acoustic variation caused by non-linguistic factors can be removed reasonably. It expresses spectral contrasts between acoustic events in an utterance. In previous studies, the invariant structure was leveraged in continuous speech recognition for reranking the N-best candidates hypothesized by a traditional automatic speech recognition (ASR) system. Use of the invariant structure features for reranking showed good effects, however, the features were defined or labeled in a phonetic-context-independent way. In this paper, use of phonetic context to define invariant structure features is examined. The proposed method is tested in two tasks of continuous digits speech recognition and large vocabulary continuous speech recognition (LVCSR). The performances are improved relatively by 4.7% and 1.2%, respectively.

Date

03 Sep 2014

Publication

ChinaSIP 2014