About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICASSP 2016
Conference paper
Speaker age estimation on conversational telephone speech using senone posterior based i-vectors
Abstract
Automatic age estimation from speech has a variety of applications including natural human-computer interaction, targeted advertising, customer-agent pairing in call centers, and forensics, to mention a few. Recently, the use of i-vectors has shown promise for automatic age estimation. In this paper, we adopt a phonetically-aware i-vector extractor for the age estimation problem. Such senone i-vector based schemes have demonstrated success in the speaker recognition field. Fixed-length and low-dimensional i-vectors are first conditioned through a linear discriminant analysis (LDA) transform, and then used to train a support vector regression (SVR) model. Additionally, in contrast to previous work, we employ the use of the logarithm of the age as the target in training the SVR to further penalize estimation errors for younger speakers compared with older speakers. The proposed system is evaluated using telephony speech material extracted from the NIST SRE 2008 and 2010 evaluation corpora. Experimental results indicate solid age estimation performance with a mean absolute error (MAE) of 4.7 years for both male and female speakers on the NIST SRE 2010 telephony test set.