About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICSLP 1998
Conference paper
Telephone Band LVCSR for Hearing-Impaired Users
Abstract
Large vocabulary automatic speech recognition might assist hearing impaired telephone users by displaying a transcription of the incoming side of the conversation, but the system would have to achieve sufficient accuracy on conversational-style, telephone-bandwidth speech. We describe our development work toward such a system. This work comprised three phases: Experiments with clean data filtered to 200-3500Hz, experiments with real telephone data, and language model development. In the first phase, the speaker independent error rate was reduced from 25% to 12% by using MLLT, increasing the number of cepstral components from 9 to 13, and increasing the number of Gaussians from 30,000 to 120,000. The resulting system, however, performed less well on actual telephony, producing an error rate of 28.4%. By additional adaptation and the use of an LDA and CDCN combination, the error rate was reduced to 19.1%. Speaker adaptation reduces the error rate to 10.96%. These results were obtained with read speech. To explore the language-model requirements in a more realistic situation, we collected some conversational speech with an arrangement in which one participant could not hear the conversation but only saw recognizer output on a screen. We found that a mixture of language models, one derived from the Switchboard corpus and the other from prepared texts, resulted in approximately 10% fewer errors than either model alone.