E. Eide, B. Maison, et al.
ICSLP 2000
We describe our formulation of transformation enhanced data modeling used to develop a multi-grained data analysis approach to text independent speaker recognition. The broad goal is to address difficulties caused by sparse training and test data. First, our development of maximum likelihood transformation based recognition with diagonally constrained Gaussian mixture models is detailed. We give results to show its robustness to decreasing training data. Then using the these models as building blocks, a multigrained model structure is developed. For this, the training data must be labeled, e.g. with an HMM based phone labeler. A graduated phone class structure is then used to train the speaker model at various levels of detail. This structure is a tree with the root node containing all the phones. Subsequent levels partition the phones into increasingly finer grained linguistic classes. We demonstrate the effectiveness of the modeling with identification and verification experiments.
E. Eide, B. Maison, et al.
ICSLP 2000
Sabine Deligne
ICSLP 2000
Deepak S. Turaga, Olivier Verscheure, et al.
ICDM 2006
Jiří Navrátil, David Klusáček
ICASSP 2007