Exploiting active-learning strategies for annotating prosodic events with limited labeled data
Abstract
Many applications of spoken-language systems can benefit from having access to annotations of prosodic events. Unfortunately, obtaining human annotations of these events, even sensible amounts to train a supervised system, can become a laborious and costly effort. Given these constraints, this task serves as a good case study for approaches that judiciously guide the selection of data in order to maximize the gain from the human-labeling process or which minimize the size of the training set. To address this, we explore active learning techniques with the objective of reducing the amount of human-annotated data needed to attain a given level of performance. We review strategies that can be used to guide the selection of sequences by combining the output of a classifier and information about the structure of the data into a criterion that can be used during the learning process to query the label of data points that are both informative and representative of the task, and show that for most of the cases considered, active selection strategies when labeling pitch accents and prosodic boundaries are as good as or exceed the performance of random data selection. © 2011 IEEE.