About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
INTERSPEECH 2010
Conference paper
Impact of word classing on shrinkage-based language models
Abstract
This paper investigates the impact of word classing on a recently proposed shrinkage-based language model, Model M[5]. Model M, a class-based n-gram model, has been shown to significantly outperform word-based n-gram models on a variety of domains. In past work, word classes for Model M were induced automatically from unlabeled text using the algorithm of [2]. We take a closer look at the classing and attempt to find out whether improved classing would also translate to improved performance. In particular, we explore the use of manually-assigned classes, part-of-speech (POS) tags, and dialog state information, considering both hard classing and soft classing. In experiments with a conversational dialog system (human-machine dialog) and a speech-to-speech translation system (human-human dialog), we find that better classing can improve Model M performance by up to 3% absolute in word-error rate. © 2010 ISCA.