Weighting schemes for audio-visual fusion in speech recognition

Hervé Glotin; Dimitru Vergyri; Chulupathy Neti; Gerusimos Potamianos; Juergen Luettin

doi:10.1109/ICASSP.2001.940795

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Paper

01 Jan 2001

Weighting schemes for audio-visual fusion in speech recognition

View publication

Abstract

In this work we demonstrate an improvement in the state-of-the-art large vocabulary continuous speech recognition (LVCSR) performance, under clean and noisy conditions, by the use of visual information, in addition to the traditional audio one. We take a decision fusion approach for the audio-visual information, where the single-modality (audio- and visual-only) HMM classifiers are combined to recognize audio-visual speech. More specifically, we tackle the problem of estimating the appropriate combination weights for each of the modalities. Two different techniques are described: The first uses an automatically extracted estimate of the audio stream reliability in order to modify the weights for each modality (both clean and noisy audio results are reported), while the second is a discriminative model combination approach where weights on pre-defined model classes are optimized to minimize WER (clean audio only results).

Conference paper

Large-vocabulary audio-visual speech recognition: A summary of the Johns Hopkins Summer 2000 Workshop

Chalapathy Neti, Gerasimos Potamianos, et al.

MMSP 2001

View all publications

Abstract

Related

Large-vocabulary audio-visual speech recognition: A summary of the Johns Hopkins Summer 2000 Workshop