Main vowel domain tone modeling with lexical and prosodic analysis for Mandarin ASR

Shilei Zhang; Qin Shi; Stephen M. Chu; Yong Qin

doi:10.1109/ICASSP.2009.4960645

ICASSP 2009

Conference paper

23 Sep 2009

Main vowel domain tone modeling with lexical and prosodic analysis for Mandarin ASR

View publication

Abstract

The tone is a distinctive discriminative feature in Mandarin Chinese. Often functional, yet seldom thorough are most large-scale Mandarin speech recognition systems in treating tone modeling. In particular, many lack the necessary sophistication to deal with the myriad variations arising from the combination of acoustic and lexical contexts. This paper reports an attempt to account for these variabilities and to bring richer tone modeling into the IBM Mandarin broadcast transcription system. In particular, we describe a system that combines the embedded approach and a novel explicit tone modeling technique characterized by a. robust tone tracking in the main-vowel domain, and b. context-dependent models with lexical and prosodic contexts. The proposed method is validated on a connected-digits set and subsequently evaluated on a large-vocabulary broadcast transcription task. It is shown that 14.8% and 5.4% relative reductions in character error rate are achieved respectively. ©2009 IEEE.

Conference paper