Speaker age estimation on conversational telephone speech using senone posterior based i-vectors
Automatic age estimation from speech has a variety of applications including natural human-computer interaction, targeted advertising, customer-agent pairing in call centers, and forensics, to mention a few. Recently, the use of i-vectors has shown promise for automatic age estimation. In this paper, we adopt a phonetically-aware i-vector extractor for the age estimation problem. Such senone i-vector based schemes have demonstrated success in the speaker recognition field. Fixed-length and low-dimensional i-vectors are first conditioned through a linear discriminant analysis (LDA) transform, and then used to train a support vector regression (SVR) model. Additionally, in contrast to previous work, we employ the use of the logarithm of the age as the target in training the SVR to further penalize estimation errors for younger speakers compared with older speakers. The proposed system is evaluated using telephony speech material extracted from the NIST SRE 2008 and 2010 evaluation corpora. Experimental results indicate solid age estimation performance with a mean absolute error (MAE) of 4.7 years for both male and female speakers on the NIST SRE 2010 telephony test set.