Converting Written Language to Spoken Language with Neural Machine Translation for Language Modeling
Abstract
When building a language model (LM) for spontaneous speech, the ideal situation is to have a large amount of spoken, in-domain training data. Having such abundant data, however, is not realistic. We address this problem by generating texts in spoken language from those in written language by using a neural machine translation (NMT) model. We collected faithful transcripts of fully spontaneous speech and corresponding written versions and used them as a parallel corpus to train the NMT model. We used top-k random sampling, which generates a large variety of texts of higher quality as compared to other generation methods for NMT. We indicate that the NMT model is capable of converting written texts in a certain domain to spoken texts, and that the converted texts are effective for training LMs. Our experimental results show significant improvement of speech recognition accuracy with the LMs.