Haiping Li, Fangxin Chen, et al.
ICASSP 2003
A statistical selection method is proposed for generating an optimized recording script for Concatenative Speech Synthesizer. This method starts with traveling a large text corpus to collect the statistical information of the Context Variation Unit Vectors (CVUV), which represent the multi-dimension phonetic contexts and properties of the synthesis unit. Each CVUV descriptor is organized as a node in a sorted tree of the CVUV forest to record the dimension values and the index to its position in the corpus. Then it selects sentences according to the pre-defined criteria relating to the CVUV distribution in the corpus. This selection algorithm has been implemented to generate syllable-based Chinese script and yielded satisfactory results. The context dimension definition concept is described in this paper, and the coverage analysis and computing time estimation are reported also.
Haiping Li, Fangxin Chen, et al.
ICASSP 2003
Jennifer C. Lai, Kwan Min Lee
ICSLP 2002
Dan Chazan, Ron Hoory, et al.
ICSLP 2002
Pratibha Jain, Hynek Hermansky, et al.
ICSLP 2002