While recurrent neural network language models based on Long Short Term Memory (LSTM) have shown good gains in many automatic speech recognition tasks, Convolutional Neural Net- work (CNN) language models are relatively new and have not been studied in-depth. In this paper we present an empirical comparison of LSTM and CNN language models on English broadcast news and various conversational telephone speech transcription tasks. We also present a new type of CNN Language model that leverages dilated causal convolution to effi- ciently exploit long range history. We propose a novel criterion for training language models that combines word and class pre- diction in a multi-task learning framework. We apply this crite- rion to train word and character based LSTM language models and CNN language models and show that it improves perfor- mance. Our results also show that CNN and LSTM language models are complementary and can be combined to obtain fur- ther gains.