About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICASSP 2018
Conference paper
Whole Sentence Neural Language Models
Abstract
Recurrent neural networks have become increasingly popular for the task of language modeling achieving impressive gains in state-of-the-art speech recognition and natural language processing (NLP) tasks. Recurrent models exploit word dependencies over a much longer context window (as retained by the history states) than what is feasible with n-gram language models. However the training criterion of choice for recurrent language models continues to be the local conditional likelihood of generating the current word given the (pos-sibly long) word context, thus making local decisions at each word. This locally-conditional design fundamentally limits the ability of the model in exploiting whole sentence structures. In this paper, we present our initial results at whole sentence neural language models which assign a probability to the entire word sequence. We extend the previous work on whole sentence maximum entropy models to recurrent language models while using Noise Contrastive Estimation (NCE) for training, as these sentence models are fundamentally un-normalizable. We present results on a range of tasks: from sequence identification tasks such as, palindrome detection to large vocabulary automatic speech recognition (LVCSR) and demonstrate the modeling power of this approach.