About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
INTERSPEECH - Eurospeech 2003
Conference paper
Hierarchical class n-gram language models: Towards better estimation of unseen events in speech recognition
Abstract
In this paper, we show how a multi-level class hierarchy can be used to better estimate the likelihood of an unseen event. In classical backoff n-gram models, the (n-1)-gram model is used to estimate the probability of an unseen n-gram. In the approach we propose, we use a class hierarchy to define an appropriate context which is more general than the unseen n-gram but more specific than the (n-1)-gram. Each node in the hierarchy is a class containing all the words of the descendant nodes (classes). Hence, the closer a node is to the root, the more general the corresponding class is. We also investigate in this paper the impact of the hierarchy depth and the Turing's discount coefficient on the performance of the model. We evaluate the backoff hierarchical n-gram models on WSJ database with two large vocabularies, 5, 000 and 20, 000 words. Experiments show up to 26% improvement on the unseen events perplexity and up to 12% improvement in the WER when a backoff hierarchical class trigram language model is used on an ASR test set with a relatively large number of unseen events.