Robust audio-visual speech synchrony detection by generalized bimodal linear prediction

Kshitiz Kumar; Jiri Navratil; Etienne Marcheret; Vit Libal; Gerasimos Potamianos

INTERSPEECH 2009

Conference paper

26 Nov 2009

Robust audio-visual speech synchrony detection by generalized bimodal linear prediction

Abstract

We study the problem of detecting audio-visual synchrony in video segments containing a speaker in frontal head pose. The problem holds a number of important applications, for example speech source localization, speech activity detection, speaker diarization, speech source separation, and biometric spoofing detection. In particular, we build on earlier work, extending our previously proposed time-evolution model of audio-visual features to include non-causal (future) feature information. This significantly improves robustness of the method to small time-alignment errors between the audio and visual streams, as demonstrated by our experiments. In addition, we compare the proposed model to two known literature approaches for audio-visual synchrony detection, namely mutual information and hypothesis testing, and we show that our method is superior to both. Copyright © 2009 ISCA.

Workshop paper