Voice-transformation-based data augmentation for prosodic classification

Raul Fernandez; Andrew Rosenberg; Alexander Sorin; Bhuvana Ramabhadran; Ron Hoory

doi:10.1109/ICASSP.2017.7953214

ICASSP 2017

Conference paper

16 Jun 2017

Voice-transformation-based data augmentation for prosodic classification

View publication

Abstract

In this work we explore data-augmentation techniques for the task of improving the performance of a supervised recurrent-neural-network classifier tasked with predicting prosodic-boundary and pitch-accent labels. The technique is based on applying voice transformations to the training data that modify the pitch baseline and range, as well as the vocal-tract and vocal-source characteristics of the speakers to generate further training examples. We demonstrate the validity of the approach by improving performance when the amount of base labeled examples is small (showing reductions in the range of 7%-12% for reduced-data conditions) as well as in terms of its generalization to speakers unseen in the training set (showing a relative reduction in the error rate of 8.74% and 4.75%, on the average, for boundaries and accent tasks respectively, in leave-one-speaker-out validation).

Conference paper