About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICASSP 2014
Conference paper
A comparison of two optimization techniques for sequence discriminative training of deep neural networks
Abstract
We compare two optimization methods for lattice-based sequence discriminative training of neural network acoustic models: distributed Hessian-free (DHF) and stochastic gradient descent (SGD). Our findings on two different LVCSR tasks suggest that SGD running on a single GPU machine achieves the best accuracy 2.5 times faster than DHF running on multiple non-GPU machines; however, DHF training achieves a higher accuracy at the end of the optimization. In addition, we present an improved modified forward-backward algorithm for computing lattice-based expected loss functions and gradients that results in a 34% speedup for SGD. © 2014 IEEE.