About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICASSP 2007
Conference paper
Data driven approach for language model adaptation using stepwise relative entropy minimization
Abstract
The ability to build domain and task specific language models from large generic text corpora is of considerable interest to the language modeling community. One of the key challenges is to identify the relevant text material in the collection. The text selection problem can be cast in a semi-supervised learning framework. Motivated by recent advancements in semi-supervised learning which emphasize the need of balanced label assignments, we present a stepwise relative entropy minimization scheme which focuses on selection of a set of sentences instead of selecting sentences solely on their individual merit. Our results on the IBM European Parliament Plenary Speech (EPPS) transcription system, show significant performance improvement (0.5% on an 8.9% baseline), with just a seventh of the out-of-domain data. The IBM EPPS LVCSR system which has a 6OK vocabulary is a particularly hard baseline for out-of-domain adaptation because of low WER with in-domain training data. © 2007 IEEE.