About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ASRU 2003
Conference paper
Expressive speech synthesis using American English ToBI: Questions and contrastive emphasis
Abstract
We describe American English concatenative text-to-speech synthesis experiments in which "expressions," here, questioning and contrastive emphasis, are each associated with a ToBI prosodic template. ToBI labels, along with text features, are in turn incorporated into decision-tree models of F0 and segment duration to be used during synthesis, sparing the need for expression-specific large corpora and decision trees. Synthesizing using this approach enables listeners to perform the difficult task of distinguishing yes-no questions from identically-worded declarative sentences 78% of the time, compared to the baseline system's 50%. For contrastive emphasis, a sentence is synthesized with emphasis on a word which is chosen appropriately or inappropriately based on a preceding sentence. Listeners' mean opinion scores for appropriate emphases exceed inappropriate by 0.40 on a 1-to-5 scale for the experimental system, compared to a difference of 0.11 for the baseline, a significant system difference (p < 0.01).