About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
IEEE Transactions on Acoustics, Speech, and Signal Processing
Paper
A Tree-Based Statistical Language Model for Natural Language Speech Recognition
Abstract
This paper is concerned with the problem of “predicting” the next word a speaker will say, given the words already spoken; specifically, the problem is to estimate the probability that a given word will be the next word uttered. Algorithms are presented for automatically constructing a binary decision tree designed to estimate these probabilities. At each node of the tree there is a yes/no question relating to the words already spoken, and at each leaf there is a probability distribution over the allowable vocabulary. Ideally, these nodal questions can take the form of arbitrarily complex Boolean expressions, but computationally cheaper alternatives are also discussed. The paper includes some results obtained on a 5000-word vocabulary with a tree designed to predict the next word spoken from the preceding 20 words. The tree is compared to an equivalent trigram model and shown to be superior. © 1989 IEEE