About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICASSP 2007
Conference paper
Dynamic stream weight modeling for audio-visual speech recognition
Abstract
To generate optimal multi-stream audio-visual speech recognition performance, appropriate dynamic weighting of each modality is desired. In this paper, we propose to estimate such weights based on a combination of acoustic signal space observations and singlemodality audio and visual speech model likelihoods. Two modeling approaches are investigated for such weight estimation: one based on a sigmoid fitting function, the other employing Gaussian mixture models. Reported experiments demonstrate that the later approach outperforms sigmoid based modeling, and is dramatically superior to the static weighting scheme. © 2007 IEEE.