A comparison of two optimization techniques for sequence discriminative training of deep neural networks

George Saon; Hagen Soltau

doi:10.1109/ICASSP.2014.6854668

ICASSP 2014

Conference paper

04 May 2014

A comparison of two optimization techniques for sequence discriminative training of deep neural networks

View publication

Abstract

We compare two optimization methods for lattice-based sequence discriminative training of neural network acoustic models: distributed Hessian-free (DHF) and stochastic gradient descent (SGD). Our findings on two different LVCSR tasks suggest that SGD running on a single GPU machine achieves the best accuracy 2.5 times faster than DHF running on multiple non-GPU machines; however, DHF training achieves a higher accuracy at the end of the optimization. In addition, we present an improved modified forward-backward algorithm for computing lattice-based expected loss functions and gradients that results in a 34% speedup for SGD. © 2014 IEEE.

Conference paper