About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
INTERSPEECH 2010
Conference paper
HMM based TTS for mixed language text
Abstract
When synthesizing Chinese text mixed with English text, it is usually preferred to synthesize the mixed languages content with a single voice. However the synthesized English of HMM based TTS may sound unnatural if the models are directly built with a Chinese speakers' unprofessional English data. In this paper, we propose to use MLLR speaker adaptation method to leverage a native English speaker's model to generate more natural English for the Chinese speaker. Adapted F0 model and spectrum model are used together with original English speaker's duration models for a better prosody. In synthesis stage, mixed language contents share a unified prosody tree to improve the continuity between Chinese and English contents. Evaluation results show that the proposed method significantly improve the speaker consistency and naturalness of synthesized speech for mixed language text compared to using directly built models. © 2010 ISCA.