About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
SLT 2018
Conference paper
Improved Knowledge Distillation from Bi-Directional to Uni-Directional LSTM CTC for End-to-End Speech Recognition
Abstract
End-to-end automatic speech recognition (ASR) promises to simplify model training and deployment. Most end-to-end ASR systems utilize a bi-directional Long Short-Term Memory (BiLSTM) acoustic model due to its ability to capture acoustic context from the entire utterance. However, BiLSTM models have high latency and cannot be used in streaming applications. Leveraging knowledge distillation to train a low-latency end-to-end uni-directional LSTM (UniLSTM) model from a BiLSTM model can be an option. However, it makes the strict assumption of shared frame-wise time alignments between the two models. We propose an improved knowledge distillation algorithm that relaxes this assumption and improves the accuracy of the UniLSTM model. We confirmed the advantage of the proposed method on a standard English conversational telephone speech recognition task.