Empirical exploration of novel architectures and objectives for language models
While recurrent neural network language models based on Long Short Term Memory (LSTM) have shown good gains in many automatic speech recognition tasks, Convolutional Neural Net- work (CNN) language models are relatively new and have not been studied in-depth. In this paper we present an empirical comparison of LSTM and CNN language models on English broadcast news and various conversational telephone speech transcription tasks. We also present a new type of CNN Language model that leverages dilated causal convolution to effi- ciently exploit long range history. We propose a novel criterion for training language models that combines word and class pre- diction in a multi-task learning framework. We apply this crite- rion to train word and character based LSTM language models and CNN language models and show that it improves perfor- mance. Our results also show that CNN and LSTM language models are complementary and can be combined to obtain fur- ther gains.