Conference paper

A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets

Download paper


We present the Nearest Neighbors Scores Improvement (\textbf{NNSI}) algorithm for text classifiers such as intent classifiers. NNSI is useful for augmenting the training data by selecting and labeling high-ambiguity training examples from a large corpus of unlabeled data. The method can be useful for conversation systems where many unlabeled samples can be extracted from user interactions with the system. We demonstrate the method on two large-scale, real-life voice conversation systems. We found that our method can automatically select and label useful samples with high accuracy. We demonstrate that by adding these samples to the baseline training data, which significantly increases the classifier's accuracy and reduces its error by up to 10\%.