About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ASRU 2019
Conference paper
Data Augmentation Based on Vowel Stretch for Improving Children's Speech Recognition
Abstract
Prolongation is a speech disfluency that lengthens some portions of speech utterances. It is frequently observed in children's spontaneous speech, while it is rare in read speech. To make acoustic models more robust to children's spontaneous speech, collecting a large amount of children's speech data containing prolongation is usually required, which is very impractical in many cases. To tackle this problem, we propose a novel data augmentation method that virtually generates additional data by simulating prolongation. The method inserts pseudo frames into specific positions of speech utterances to simulate prolongation. The acoustic features of the inserted frames are calculated from the original frames on both sides. This is based on our analysis that many of vowels are actually stretched in children's spontaneous speech. Our proposed procedure can generate partially stretched utterances with low computational costs, unlike a conventional speed or tempo perturbation method that extends and shrinks entire utterances at a uniform rate. The effectiveness of the proposed method were confirmed with the experiments of acoustic model adaptations, in which our proposed method focusing on vowel stretch showed consistent improvement compared with conventional speed and tempo perturbation approach.