Language model adaptation using word clustering

Shinsuke Mori; Masafumi Nishimura; Nobuyasu Itoh

INTERSPEECH - Eurospeech 2003

Conference paper

01 Sep 2003

Language model adaptation using word clustering

Abstract

Building a stochastic language model (LM) for speech recognition requires a large corpus of target tasks. For some tasks no enough large corpus is available and this is an obstacle to achieving high recognition accuracy. In this paper, we propose a method for building an LM with a higher prediction power using large corpora from different tasks rather than an LM estimated from a small corpus for a specific target task. In our experiment, we used transcriptions of air university lectures and articles from Nikkei newspaper and compared an existing interpolation-based method and our new method. The results show that our new method reduces perplexity by 9.71%.

Conference paper