Improved speaker model migration via stochastic synthesis
Abstract
Model migration in speaker recognition is a task of converting parametrically-obsolete models to new structures and configurations without the requirement to store the original speech waveforms or feature vector sequences along with the models. The need for model migration arises in large-scale deployments of speaker recognition technology in which the potential for legacy problems increases as the evolving technology may require configuration changes thus invalidating already existing user voice accounts. A migration may represent the only alternative to otherwise costly user re-enrollment or waveform storage and, as a new research problem, presents the challenge of developing algorithms to minimize the loss in accuracy in the migrated accounts. This paper reports on further enhancements of a statistical migration technique based on Gaussian Mixture Models, introduced previously. The present approach is based on a stochastic synthesis of feature sequences from obsolete models that are subsequently used to create the new models. Here, in addition to Gaussian means and priors, as utilized in the previous contribution, also the covariances are included resulting in significant performance gains in the migrated models, compared to the mean-only method. Overall, measured on the NIST 2003 cellular task, the described algorithm achieves a model migration incurring a loss in performance of 8-20% relative to a full re-enrollment from waveforms, dependent on the type of mismatch between the obsolete and the new configuration. The inclusion of the covariance information is shown to reduce the loss of performance by a factor of 3-4 as compared to the baseline mean-only migration technique. © 2005 IEEE.