Trainable Cantonese/English dual language speech synthesis system
Abstract
The Cantonese/English dual language Text To Speech (TTS) system introduced in this paper was developed on IBM's trainable TTS technology, which uses trainable statistical models to automate speech data processing and selection. The Cantonese and English phonological, syntactic and prosodic rules were built into a dual-language Delta module, which processes the mixed-language input accordingly and generates mixed Cantonese and English speech with coherent prosody. To approximate the speaker's characteristics, a speaker prosody profile was extracted from the dataset and incorporated into Delta speech rule processing for the enhancement of duration, lexical tone and intonation prediction. In selection of the concatenative unit set, different Cantonese syllable decomposition schemes were experimented. Though this system is currently only implemented for Cantonese, it can be easily adapted to other tonal languages.