About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
SLT 2014
Conference paper
A distributed architecture for fast SGD sequence discriminative training of DNN acoustic models
Abstract
We describe a hybrid GPU/CPU architecture for stochastic gradient descent training of neural network acoustic models under a lattice-based minimum Bayes risk (MBR) criterion. The crux of the method is to run SGD on a GPU card which consumes framerandomized mini-batches produced by multiple workers running on a cluster of multi-core CPU nodes which compute HMM state MBR occupancies. To minimize communication cost, a separate thread running on the GPU host receives minibatches from and sends updated models to the workers, and communicates with the SGD thread via a producer-consumer queue of minibatches. Using this architecture, it is possible to match the speed of GPU-based SGD cross-entropy (CE) training (1 hour of processing per 100 hours of audio on Switchboard). Additionally, we compare different ways of doing frame randomization and discuss experimental results on three LVCSR tasks (Switchboard 300 hours, English broadcast news 50 hours, and noisy Levantine telephone conversations 300 hours).