Discriminative training for full covariance models

Peder A. Olsen; Vaibhava Goel; Steven J. Rennie

doi:10.1109/ICASSP.2011.5947557

ICASSP 2011

Conference paper

18 Aug 2011

Discriminative training for full covariance models

View publication

Abstract

In this paper we revisit discriminative training of full covariance acoustic models for automatic speech recognition. One of the difficult aspects of discriminative training is how to set the constant D that appears in the parameter updates. For diagonal covariance models, this constant D is set based on knowing the smallest value of D, D*, for which the resulting covariances remain positive definite. In this paper we show how to compute D* analytically, and show empirically that knowing this smallest value is important. Our baseline speech recognition models are state of the art broadcast news systems, built using the boosted Maximum Mutual Information criterion and feature space Maximum Mutual Information for feature selection. We show that discriminatively built full covariance models outperform our best diagonal covariance models. Moreover, full covariance models at optimal performance can be obtained by only a few discriminative iterations starting with a diagonal covariance model. The experiments also show that systems utilizing full covariance models are less sensitive to the choice of the number of gaussians. © 2011 IEEE.

Conference paper