HMM based TTS for mixed language text

Zhiwei Shuang; Shiyin Kang; Yong Qin; Lirong Dai; Lianhong Cai

INTERSPEECH 2010

Conference paper

26 Sep 2010

HMM based TTS for mixed language text

Abstract

When synthesizing Chinese text mixed with English text, it is usually preferred to synthesize the mixed languages content with a single voice. However the synthesized English of HMM based TTS may sound unnatural if the models are directly built with a Chinese speakers' unprofessional English data. In this paper, we propose to use MLLR speaker adaptation method to leverage a native English speaker's model to generate more natural English for the Chinese speaker. Adapted F0 model and spectrum model are used together with original English speaker's duration models for a better prosody. In synthesis stage, mixed language contents share a unified prosody tree to improve the continuity between Chinese and English contents. Evaluation results show that the proposed method significantly improve the speaker consistency and naturalness of synthesized speech for mixed language text compared to using directly built models. © 2010 ISCA.

Paper