Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling

Dan Chazan; Ron Hoory; Zvi Rons; Ariel Sagi; Slava Shechtman; Alexander Sorin

INTERSPEECH - Eurospeech 2005

Conference paper

01 Dec 2005

Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling

Abstract

In this paper we present a method for speech modeling and its utilization in IBM's small footprint concatenate text-to-speech system. The method is based on frequency-domain, complex spectral envelope modeling, where the phase component plays a crucial role in attaining high quality speech synthesis. The modeling scheme presented enables low bit rate compression of the amplitude and phase information and low-complexity reconstruction of high quality speech with wide range pitch modification. Listening tests conducted for the overall text-to-speech system show a major improvement in MOS, compared to a previous, MFCC-based, system.

Conference paper