Discriminatively trained full-covariance Gaussian mixture models have been shown to outperform its corresponding diagonal-covariance models on large vocabulary speech recognition tasks. However, the size of full-covariance model is much larger than that of diagonal-covariance model and is therefore not practical for use in a real system. In this paper, we present a method to build a large discriminatively trained full-covariance model with large (over 9000 hours) training corpora and still improve performance over the diagonal-covariance model. We then reduce the size of the full-covariance model to the size of its baseline diagonal-covariance model by using subspace constrained Gaussian mixture model (SCGMM). The resulting discriminatively trained SCGMM still retains the performance of its corresponding full-covariance model, and improves 5% relative over the same size diagonal-covariance model on a large vocabulary speech recognition task. © 2013 IEEE.