Distant supervision for relation extraction with an incomplete knowledge base

Bonan Min; Ralph Grishman; Li Wan; Chang Wang; David Gondek

NAACL-HLT 2013

Conference paper

09 Jun 2013

Distant supervision for relation extraction with an incomplete knowledge base

Abstract

Distant supervision, heuristically labeling a corpus using a knowledge base, has emerged as a popular choice for training relation extractors. In this paper, we show that a significant number of "negative" examples generated by the labeling process are false negatives because the knowledge base is incomplete. Therefore the heuristic for generating negative examples has a serious flaw. Building on a state-of-The-Art distantly-supervised extraction algorithm, we proposed an algorithm that learns from only positive and unlabeled labels at the pair-of-entity level. Experimental results demonstrate its advantage over existing algorithms.

Conference paper