About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
INTERSPEECH - Eurospeech 2005
Conference paper
Using random forest language models in the IBM RT-04 CTS system
Abstract
One of the challenges in large vocabulary speech recognition is the availability of large amounts of data for training language models. In most state-of-the-art speech recognition systems, n-gram models with Kneser-Ney smoothing still prevail due to their simplicity and effectiveness. In this paper, we study the performance of a new language model, the random forest language model, in the IBM conversational telephony speech recognition system. We show that although the random forest language models are designed to deal with the data sparseness problem, they also achieve statistically significant improvements over n-gram models when the training data has over 500 million words.