About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
MMSP 2006
Conference paper
Lipreading using profile versus frontal views
Abstract
Visual information from a speaker's mouth region is known to improve automatic speech recognition robustness. However, the vast majority of audio-visual automatic speech recognition (AVASR) studies assume frontal images of the speaker's face. In contrast, this paper investigates extracting visual speech information from the speaker's profile view, and, to our knowledge, constitutes the first real attempt to attack this problem. As with any AVASR system, the overall recognition performance depends heavily on the visual front end. This is especially the case with profile-view data, as the facial features are heavily compacted compared to the frontal scenario. In this paper, we particularly describe our visual front end approach, and report experiments on a multi-subject, small-vocabulary, bimodal, multi-sensory database that contains synchronously captured audio with frontal and profile face video. Our experiments show that AVASR is possible from profile views with moderate performance degradation compared to frontal video data. © 2006 IEEE.