Deep neural networks (DNNs) have tremendously improved the performance of automatic speech recognition (ASR). On the other hand, end-to-end speech recognition system can achieve state-of-the-art performance using Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) and Connectionist Temporal Classification (CTC) method for unsegmented sequence data. In this paper, we therefor propose a lightweight wake-up-word (WUW) spotting system based on end-to-end DNN architecture, which is intended to provide a great balance between decoding speed, accuracy and model size. The objective is to introduce CTC framework on spotting process, and to enhance the system by WUW-oriented model training and refinement steps. We test the performance of the proposed architecture on a conversational telephone dataset which illustrate that the computation time can be significantly reduced without a significant decrease in the spotting accuracy.