Score stabilization for speaker recognition trained on a small development set

Hagai Aronowitz

INTERSPEECH 2015

Conference paper

06 Sep 2015

Score stabilization for speaker recognition trained on a small development set

Abstract

Nowadays state-of-the-art speaker recognition systems obtain quite accurate results for both text-independent and textdependent tasks as long as they are trained on a fair amount of development data from the target domain (assuming clean speech). In this work, we address the challenge of building a speaker recognition system with a small development dataset from the target domain without using out-of-domain data whatsoever. When development data is limited, the Nuisance Attribute Projector (NAP) algorithm is (in general) superior to the i-vector approach. We have investigated the relative degradation observed from the different components of the NAP system trained on a small dataset and conclude that score normalization is a major source of degradation. We introduce a novel method for stabilizing the normalized scores. We explicitly estimate a low dimensional subspace in supervector space which accounts for high variability in score normalization parameters. We then compensate the estimated subspace. We report experiments on both text-dependent and text-independent tasks which validate our method and show large error reductions.

Conference paper