Publication
ICASSP 2003
Conference paper

Audio-visual speaker recognition using time-varying stream reliability prediction

Abstract

We examine a time-varying, context dependent information fusion methodology for multi-stream authentication based on audio and video data collected simultaneously during a user's interaction with a system. Scores obtained from the two data streams are combined based on the relative local richness, as compared to the training data or derived model, and stability of each stream. The results show that the proposed technique outperforms the use of video or audio data alone as well as the use of fused data streams (via concatenation). Of particular note, is that the performance improvements are achieved for clean, high quality speech, whereas previous efforts focused on degraded speech conditions.

Date

Publication

ICASSP 2003