About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICIP 2001
Conference paper
Improved ROI and within frame discriminant features for lipreading
Abstract
We study three aspects of designing appearance based visual features for automatic lipreading: (a) The choice of the video region of interest (ROI), on which image transform features are obtained; (b) The extraction of speech discriminant features at each frame; and (c) The use of temporal information to improve visual speech modeling. In particular, with respect to (a), we propose a ROI that includes the speaker's jaw and cheeks, in addition to the traditionally used mouth/lip region; with respect to (b) and (c), we propose the use of a two-stage linear discriminant analysis, both within frame, as well as across a large number of frames. On a large-vocabulary, continuous speech audio-visual database, the proposed visual features result in a 13% absolute reduction in visual-only word error rate over a baseline visual front end, and in an additional 28% relative improvement in audio-visual over audio-only phonetic classification accuracy.