Reduction of acoustic model training time and required data passes via stochastic approaches to maximum likelihood and discriminative training
Abstract
The recent boom in use of speech recognition technology has made the access to potentially large amounts of training data easier. This, however, also constitutes a challenge in processing such large, continuously growing amount of information. Here we present a stochastic modification of traditional iterative training approach which leads to the same or even better accuracy of acoustic models and reduces the cost of processing large data sets. The algorithm relies on model updates from statistics collected on randomly selected subsets of training data. The approach is demonstrated on maximum likelihood (ML) training and on discriminative training (DT) with minimum phone error (MPE) objective function both in the feature and the model space. Based on our experiments on 30 thousand hours of mobile data, the number of data passes can be reduced to 1/5 of the original for ML training and to 1/10 for model space DT training. © 2014 IEEE.