About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Abstract
We describe a multimodal attentive environment system that performs joint audio-visual information processing to enable it to interact intelligently with people. It integrates real-time video and audio processing techniques to detect and track multiple persons in the scene. Speech recognition and eye contact are used to develop a natural human-like communication interface with participants. We have implemented the system as a visually interactive toy robot (VTOYS) and demonstrated it successfully to many people belonging to different age classes. This allows us to explore novel ways of human-machine interactions and novel interfaces-specifically, the new possibilities of the human-machine interaction for the case of the machine having a limited environment perception ability.