Maximum margin linear kernel optimization for speaker verification
Abstract
This paper describes a novel approach for discriminative modeling and its application to automatic text-independent speaker verification. This approach maximizes the margin between the model scores for pairs of utterances belonging to the same speaker and for pairs of utterances belonging to different speakers. A low-dimensional linear kernel is estimated which maximizes this margin. This approach emphasizes features which have a better ability to discriminate between scores belonging to pairs of utterances of the same target speakers and those of different speakers. In this paper, we apply this approach to the NIST 2005 speaker verification task. Compared to the Gaussian mixture model (GMM) baseline system, a 17.7% relative improvement in the minimum detection cost function (DCF) and a 11.7% relative improvement in equal error rate (EER) are obtained. We achieve also a 5.7% relative improvement in EER and 2.3% relative improvement in DCF by using our approach on top of a nuisance attribute projection (NAP) compensated GMMbased kernel baseline system. ©2009 IEEE.