Speech reconstruction from mel frequency cepstral coefficients and pitch frequency

Dan Chazan; Ron Hoory; Gilad Cohen; Meir Zibulski

doi:10.1109/ICASSP.2000.861816

ICASSP 2000

Conference paper

05 Jun 2000

Speech reconstruction from mel frequency cepstral coefficients and pitch frequency

View publication

Abstract

This paper presents a novel low complexity, frequency domain algorithm for reconstruction of speech from the mel-frequency cepstral coefficients (MFCC), commonly used by speech recognition systems, and the pitch frequency values. The reconstruction technique is based on the sinusoidal speech representation. A set of sine-wave frequencies is derived using the pitch frequency and voicing decisions, and synthetic phases are then assigned to each respective sine wave. The sine-wave amplitudes are generated by sampling a linear combination of frequency domain basis functions. The basis function gains are determined such that the mel-frequency binned spectrum of the reconstructed speech is similar to the mel-frequency binned spectrum, obtained from the original MFCC vector by IDCT and antilog operations. Natural sounding, good quality intelligible speech is obtained by this procedure.

Conference paper