About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICSLP 1998
Conference paper
CLUSTER ADAPTIVE TRAINING FOR SPEECH RECOGNITION
Abstract
When performing speaker adaptation there are two conflicting requirements. First the transform must be powerful enough to represent the speaker. Second the transform must be quickly and easily estimated for any particular speaker. Recently the most popular adaptation schemes have used many parameters to adapt the models. This limits how rapidly the models may be adapted. This paper examines an adaptation scheme requiring very few parameters to adapt the models, cluster adaptive training (CAT). CAT may be viewed as a simple extension to speaker clustering. Rather than selecting one cluster, a linear interpolation of all the cluster means is used as the mean of the particular speaker. This scheme naturally falls into an adaptive training framework. Maximum likelihood estimates of the interpolation weights are given. Furthermore, simple re-estimation formulae for cluster means, represented both explicitly and by sets of transforms of some canonical mean, are given. On a speaker-independent task CAT reduced the word error rate using very little adaptation data. In addition when combined with other adaptation schemes it gave a 5% reduction in word error rate over adapting a speaker-independent model set.