Applying scalable phonetic context similarity in unit selection of concatenative Text-to-Speech

Wei Zhang; Xiaodong Cui

INTERSPEECH 2010

Conference paper

26 Sep 2010

Applying scalable phonetic context similarity in unit selection of concatenative Text-to-Speech

Abstract

This paper presents an approach using phonetic context similarity as a cost function in unit selection of concatenative Text-to-Speech. The approach measures the degree of similarity between the desired context and the candidate segment under different phonetic contexts. It considers the impact from relatively far contexts when plenty of candidates are available and can take advantage of the data from other symbolically different contexts when the candidates are sparse. Moreover, the cost function also provides an efficient way to prune the search space. Different parameters for modeling, normalization and integerization are discussed. MOS evaluation shows that it can improve the synthesis quality significantly. © 2010 ISCA.

Workshop paper