IEEE Transactions on Acoustics, Speech, and Signal Processing

On a Model-Robust Training Method for Speech Recognition

View publication


We are interested in comparing training methods for designing better decoders. We treat the training problem as a statistical parameter estimation problem. In particular, we consider the conditional maximum likelihood estimate (CMLE)—the value of unknown parameters which maximizes the conditional probability of words given acoustics during training. We compare it to the maximum likelihood estimate (MLE)—the estimate obtained by maximizing the joint probability of the words and acoustics. For minimizing the decoding error rate of the (“optimal”) maximum a posteriori probability (MAP) decoder, we show that the CMLE (or maximum mutual information estimate, MMIE) may be preferable when the model is incorrect and, in this sense, the CMLE/MMIE appears more robust than the MLE. © 1988 IEEE