Synthesizing breathiness in natural speech with sinusoidal modelling
Abstract
This paper discusses recent work in synthesizing a breathy quality in pre-recorded speech, which has applications in voice morphing and concatenative TTS. Previous work has shown that the breathy quality in speech is characterized in part by the presence of random noise in the upper region of the spectrum [1]. The sinusoidal modelling representation of speech facilitates making high-quality modifications to speech signals as well as modifying regions of the spectrum independently. We use sinusoidal modelling, along with techniques borrowed from analog communication systems to simulate aspiration noise in wideband speech signals above some lower cutoff frequency. Specifically, we use techniques based on amplitude modulation (AM) and phase modulation (PM), with the harmonics from the sinusoidal model of speech as carriers and lowpass random noise as the message signal. Formal listening tests were conducted and listeners rated the synthesized effect as "breathy" more often than in natural non-breathy speech, but significantly less often than in naturally breathy speech.