About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICPR 2016
Conference paper
Wake-up-word spotting using end-to-end deep neural network system
Abstract
Deep neural networks (DNNs) have tremendously improved the performance of automatic speech recognition (ASR). On the other hand, end-to-end speech recognition system can achieve state-of-the-art performance using Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) and Connectionist Temporal Classification (CTC) method for unsegmented sequence data. In this paper, we therefor propose a lightweight wake-up-word (WUW) spotting system based on end-to-end DNN architecture, which is intended to provide a great balance between decoding speed, accuracy and model size. The objective is to introduce CTC framework on spotting process, and to enhance the system by WUW-oriented model training and refinement steps. We test the performance of the proposed architecture on a conversational telephone dataset which illustrate that the computation time can be significantly reduced without a significant decrease in the spotting accuracy.