Enhanced word classing for model M

Stanley F. Chen; Stephen M. Chu

INTERSPEECH 2010

Conference paper

26 Sep 2010

Enhanced word classing for model M

Abstract

Model M is a superior class-based n-gram model that has shown improvements on a variety of tasks and domains. In previous work with Model M, bigram mutual information clustering has been used to derive word classes. In this paper, we introduce a new word classing method designed to closely match with Model M. The proposed classing technique achieves gains in speech recognition word-error rate of up to 1.1% absolute over the baseline clustering, and a total gain of up to 3.0% absolute over a Katz-smoothed trigram model, the largest such gain ever reported for a class-based language model. © 2010 ISCA.

Paper