Improved Knowledge Distillation from Bi-Directional to Uni-Directional LSTM CTC for End-to-End Speech Recognition

Gakuto Kurata; Kartik Audhkhasi

doi:10.1109/SLT.2018.8639629

SLT 2018

Conference paper

11 Feb 2019

Improved Knowledge Distillation from Bi-Directional to Uni-Directional LSTM CTC for End-to-End Speech Recognition

View publication

Abstract

End-to-end automatic speech recognition (ASR) promises to simplify model training and deployment. Most end-to-end ASR systems utilize a bi-directional Long Short-Term Memory (BiLSTM) acoustic model due to its ability to capture acoustic context from the entire utterance. However, BiLSTM models have high latency and cannot be used in streaming applications. Leveraging knowledge distillation to train a low-latency end-to-end uni-directional LSTM (UniLSTM) model from a BiLSTM model can be an option. However, it makes the strict assumption of shared frame-wise time alignments between the two models. We propose an improved knowledge distillation algorithm that relaxes this assumption and improves the accuracy of the UniLSTM model. We confirmed the advantage of the proposed method on a standard English conversational telephone speech recognition task.

Conference paper