A discriminant measure for model complexity adaptation
Abstract
We present a discriminant measure that can be used to determine the model complexity in a speech recognition system. In the speech recognition process, given a test feature vector the conditional probability of the feature vector has to be obtained for several allophone (sub-phonetic units) classes using a Gaussian-mixture density model for each class. The Gaussian-mixture models are constructed from the training data belonging to the allophone classes, and the number of mixture components that are required to adequately model the PDF of each class is determined by using some simple rule of thumb-for instance the number of components has to be sufficient to model the data reasonably well but not so many as to overmodel the data. A typical example of the choice of the number is to make it proportional to the number of data samples. However, such methods may result in models that are sub-optimal as far as classification accuracy is concerned. We present a new discriminant measure that can be used to determine in an objective fashion, the number of Gaussians required to best model the PDF of an allophone class. We also present the results of experiments showing the improvement in recognition performance when the number of mixture components is chosen based on the discriminant measure as opposed to the rule of thumb. These results are presented both for the speaker-independent and speaker-adapted case. © 1998 IEEE.