Intersession variability compensation for language detection

Xi Zhou; Jiří Navrátil; Jason W. Pelecanos; Ganesh N. Ramaswamy; Thomas S. Huang

doi:10.1109/ICASSP.2008.4518570

ICASSP 2008

Conference paper

16 Sep 2008

Intersession variability compensation for language detection

View publication

Abstract

Gaussian mixture models (GMM) have become one of the standard acoustic approaches for Language Detection. These models are typically incorporated to produce a log-likelihood ratio (LLR) verification statistic. In this framework, the intersession variability within each language becomes an adverse factor degrading the accuracy. To address this problem, we formulate the LLR as a function of the GMM parameters concatenated into normalized mean supervectors, and estimate the distribution of each language in this (high dimensional) supervector space. The goal is to de-emphasize the directions with the largest intersession variability. We compare this method with two other popular intersession variability compensation methods known as Nuisance Attribute Projection (NAP) and Within-Class Covariance Normalization (WCCN). Experiments on the NIST LRE 2003 and NIST LRE 2005 speech corpora show that the presented technique reduces the error by 50% relative to the baseline, and performs competitively with the NAP and WCCN approaches. Fusion results with a phonotactic component are also presented. ©2008 IEEE.

Conference paper