Word class information has long been proven useful in language modeling (LM). However, the improved performance of class-based LMs over word n-gram models generally comes at the cost of increased decoding complexity and model size. In this paper, we propose a modified version of the Maximum Entropy token-based language model of  that matches the performance of the best existing class-based models, but which is as fast for decoding as a word n-gram model. In addition, while it is easy to statically combine word n-gram models built on different corpora into a single word n-gram model for fast decoding, it is unknown how to statically combine class-based LMs effectively. Another contribution of this paper is to propose a novel combination method that retains the gain of class-based LMs over word n-gram models. Experimental results on several spoken language translation tasks show that our model performs significantly better than word n-gram models with comparable decoding speed and only a modest increase in model size. © 2011 IEEE.