About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICME 2001
Conference paper
A comparison of model and transform-based visual features for audio-visual LVCSR
Abstract
Four different visual speech parameterisation methods are compared on a large vocabulary, continuous, audio-visual speech recognition task using the IBM ViaVoiceTM audio-visual speech database. Three are direct mouth image region based transforms; discrete cosine and wavelet transforms, and principal component analysis. The fourth uses a statistical model of shape and appearance called an active appearance model, to track and obtain model parameters describing the entire face. All parameterisations are compared experimentally using hidden Markov models (HMM's) in a speaker independent test. Visualonly HMM's are used to rescore lattices obtained from audio models trained in noisy conditions.