Towards semi-automatic generation of proposition banks for low-resource languages

Alan Akbik; Vishwajeet Kumar; Yunyao Li

doi:10.18653/v1/d16-1102

EMNLP 2016

Conference paper

01 Nov 2016

Towards semi-automatic generation of proposition banks for low-resource languages

View publication

Abstract

Annotation projection based on parallel corpora has shown great promise in inexpensively creating Proposition Banks for languages for which high-quality parallel corpora and syntactic parsers are available. In this paper, we present an experimental study where we apply this approach to three languages that lack such resources: Tamil, Bengali and Malayalam. We find an average quality difference of 6 to 20 absolute F-measure points vis-a-vis high-resource languages, which indicates that annotation projection alone is insufficient in low-resource scenarios. Based on these results, we explore the possibility of using annotation projection as a starting point for inexpensive data curation involving both experts and non-experts. We give an outline of what such a process may look like and present an initial study to discuss its potential and challenges.

Conference paper