About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
INTERSPEECH - Eurospeech 2005
Conference paper
Improving lip-reading with feature space transforms for multi-stream audio-visual speech recognition
Abstract
In this paper we investigate feature space transforms to improve lip-reading performance for multi-stream HMM based audio-visual speech recognition (AVSR). The feature space transforms include non-linear Gaussianization transform and feature space maximum likelihood linear regression (fMLLR). We apply Gaussianization at the various stages of visual front-end. The results show that Gaussianizing the final visual features achieves the best performance: 8% gain on lip-reading and 14% gain on AVSR. We also compare performance of speaker-based Gaussianization and global Gaussianization. Without fMLLR adaptation, speaker-based Gaussianization improves more on lip-reading and multi-stream AVSR performance. However, with fMLLR adaptation, global Gaussianization shows better results, and achieves 18% over baseline fMLLR adaptation for AVSR.