About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICASSP 2017
Conference paper
Voice-transformation-based data augmentation for prosodic classification
Abstract
In this work we explore data-augmentation techniques for the task of improving the performance of a supervised recurrent-neural-network classifier tasked with predicting prosodic-boundary and pitch-accent labels. The technique is based on applying voice transformations to the training data that modify the pitch baseline and range, as well as the vocal-tract and vocal-source characteristics of the speakers to generate further training examples. We demonstrate the validity of the approach by improving performance when the amount of base labeled examples is small (showing reductions in the range of 7%-12% for reduced-data conditions) as well as in terms of its generalization to speakers unseen in the training set (showing a relative reduction in the error rate of 8.74% and 4.75%, on the average, for boundaries and accent tasks respectively, in leave-one-speaker-out validation).