Speaker recognition using kernel-PCA and intersession variability modeling

Hagai Aronowitz

INTERSPEECH 2007

Conference paper

27 Aug 2007

Speaker recognition using kernel-PCA and intersession variability modeling

Abstract

This paper presents a new method for text independent speaker recognition. We embed both training and test sessions into a session space. The session space is a direct sum of a common-speaker subspace and a speaker-unique subspace. The common-speaker subspace is Euclidean and is spanned by a set of reference sessions. Kernel-PCA is used to explicitly embed sessions into the common-speaker subspace. The common-speaker subspace typically captures attributes that are common to many speakers. The speaker-unique subspace is the orthogonal complement of the commonspeaker subspace and typically captures attributes that are speaker unique. We model intersession variability in the common-speaker subspace, and combine it with the information that exists in the speaker-unique subspace. Our suggested framework leads to a 43.5% reduction in error rate compared to a Gaussian Mixture Model (GMM) baseline.

Conference paper