About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
INTERSPEECH 2014
Conference paper
Exploiting vocal-source features to improve ASR accuracy for low-resource languages
Abstract
A traditional framework in speech production describes the output speech as an interaction between a source excitation and a vocal-tract configured by the speaker to impart segmental characteristics. In general, this simplification has led to approaches where systems that focus on phonetic segment tasks (e.g. speech recognition) make use of a front-end that extracts features that aim to distinguish between different vocal-tract configurations. The excitation signal, on the other hand, has received more attention for speaker-characterization tasks. In this work we look at augmenting the front-end in a recognition system with vocal-source features, motivated by our work with languages that are low in resources and whose phonology and phonetics suggest the need for complementary approaches to classical ASR features. We demonstrate that the additional use of such features provides improvements over a state-of-the-art system for low-resource languages from the BABEL Program.