Publication
ICASSP 1999
Paper

HMM training based on quality measurement

Abstract

Two discriminant measures for HMM states to improve effectiveness on HMM training are presented in this paper. In HMM based speech recognition, the context-dependent states are usually modeled by Gaussian mixture distributions. In general, the number of Gaussian mixtures for each state is fixed or proportional to the amount of training data. From our study, some of the states are `non-aggressive' compared to others, and a higher acoustic resolution is required for them. Two methods are presented in this paper to determine those non-aggressive states. The first approach uses the recognition accuracy of the states and the second method is based on a rank distribution of states. Baseline systems, trained by a fixed number of Gaussian mixtures for each state, having 33 K and 120 K Gaussians, yield 14.57% and 13.04% word error rates, respectively. Using our approach, a 38 K Gaussian system was constructed that reduces the error rate to 13.95%. The average ranks of non-aggressive states in rank lists of testing data were also seen to dramatic improve compared to the baseline systems.

Date

Publication

ICASSP 1999

Authors

Share