Static interpolation of exponential n-gram models using features of features
Abstract
The best language model performance for a task is often achieved by interpolating language models built separately on corpora from multiple sources. While common practice is to use a single set of fixed interpolation weights to combine models, past work has found that gains can be had by allowing weights to vary by n-gram, when linearly interpolating word n-gram models. In this work, we investigate whether similar ideas can be used to improve log-linear interpolation for Model M, an exponential class-based n-gram model with state-of-the-art performance. We focus on log-linear interpolation as Model M's combined via (regular) linear interpolation cannot be statically compiled into a single model, as is required for many applications due to resource constraints. We present a general parameter interpolation framework in which a weight prediction model is used to compute the interpolation weights for each n-gram. The weight prediction model takes a rich representation of n-gram features as input, and is trained to optimize the perplexity of a held-out set. In experiments on Broadcast News, we show that a mixture of experts weight prediction model yields significant perplexity and word-error rate improvements as compared to static linear interpolation. © 2014 IEEE.