Publication
ASRU 2003
Conference paper

Improvements in English ASR for the MALACH project using syllable-centric models

View publication

Abstract

LVCSR systems have traditionally used phones as the basic acoustic unit for recognition. Syllable and other longer length units provide an efficient means for modeling long-term temporal dependencies in speech that are difficult to capture in a phone based recognition framework. However, it is well known that longer duration units suffer from training data sparsity problems since a large number of units in the lexicon will have little or no acoustic training data. Previous research has shown that syllable-based modeling provides improvements over word internal systems, but performance has lagged behind crossword context-dependent systems. In this paper, we describe a syllable-centric approach to English LVCSR for the MALACH (Multilingual Access to Large spoken ArCHives) project. The combined modeling of syllables and context-dependent phones provides a 0.5% absolute improvement in recognition accuracy over the state-of-the-art cross word system for the heavily accented and spontaneous speech seen in oral history archives. More importantly, we report on the importance of the improved recognition of names and concepts that is crucial for subsequent search and retrieval.

Date

Publication

ASRU 2003

Share