About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
INTERSPEECH 2011
Conference paper
"What is.. Dengue fever?" Modeling and predicting pronunciation errors in a text-to-speech system
Abstract
We propose a system to predict baseform-generation errors in a text-to-speech (TTS) front-end, and aid in the process of customizing the synthesis engine to a novel application with a large, open-ended vocabulary. We motivate the use of the system by using data collected during the deployment of the IBM TTS engine in the Watson Deep Question-Answering system customized to play a game of Jeopardy!. We propose a set of features derived from a lexeme's orthography and candidate baseform, and use a variety of learning schemes and data sampling algorithms to address the issue of skewed class priors in the training data. We show that 1) these different approaches provide complementary information that can then be exploited by fusion schemes to improve on the baseline performances, and 2) it is possible to use these techniques to retrieve a list of likely incorrect lexemes so as to reduce the number of tokens that must be vetted before finding and fixing an error. Copyright © 2011 ISCA.