Simplified lstms for speech recognition

George Saon; Zoltan Tuske; Kartik Audhkhasi; Brian Kingsbury; Michael Picheny; Samuel Thomas

doi:10.1109/ASRU46091.2019.9003898

ASRU 2019

Conference paper

01 Dec 2019

Simplified lstms for speech recognition

View publication

Abstract

In this paper we explore new variants of Long Short-Term Memory (LSTM) networks for sequential modeling of acoustic features. In particular, we show that: (i) removing the output gate, (ii) replacing the hyperbolic tangent nonlinearity at the cell output with hard tanh, and (iii) collapsing the cell and hidden state vectors leads to a model that is conceptually simpler than and comparable in effectiveness to a regular LSTM for speech recognition. The proposed model has 25% fewer parameters than an LSTM with the same number of cells, trains faster because it has larger gradients leading to larger steps in weight space, and reaches a better optimum because there are fewer nonlinearities to traverse across layers. We report experimental results for both hybrid and CTC acoustic models on three publicly available English datasets: Switchboard 300 hours telephone conversations, 400 hours broadcast news transcription, and the MALACH 176 hours corpus of Holocaust survivor testimonies. In all cases the proposed models achieve similar or better accuracy than regular LSTMs while being conceptually simpler.

Conference paper