About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Abstract
A description with regards to the experiments done in emotive spoken language user interfaces is given. It has been found out that when the use of multimodal, synthesizing, and recognizing information has been optimized in both the audio and video modalities, there has been an improvement when it comes to recognition accuracy and synthesis quality. Specific topics being covered include: the speech and emotion recognition by humans; the automatic audiovisual speech and emotion recognition; the audiovisual speech synthesis; the emotive prosody; and finally the emotionally nuanced audiovisual speech.