Measuring crowd truth for medical relation extraction

Lora Aroyo; Chris Welty

AAAI-FS 2013

Conference paper

15 Nov 2013

Measuring crowd truth for medical relation extraction

Abstract

One of the critical steps in analytics for big data is creating a human annotated ground truth. Crowdsourcing has proven to be a scalable and cost-effective approach to gathering ground truth data, but most annotation tasks are based on the assumption that for each annotated instance there is a single right answer. From this assumption it has always followed that ground truth quality can be measured in inter-annotator agreement, and unfortunately crowdsourcing typically results in high disagreement. We have been working on a different assumption, that disagreement is not noise but signal, and that in fact crowdsourcing can not only be cheaper and sealable, it can be higher quality. In this paper we present a framework for continuously gathering, analyzing and understanding large amounts of gold standard annotation disagreement data. We discuss the experimental results demonstrating that there is useful information in human disagreement on annotation tasks. Our results show .98 accuracy in detecting low quality crowdsource workers, and .87 F-measure at recognizing useful sentences for training relation extraction systems. Copyright © 2013, Association tor the Advancement ot Artificial Intelligence. All rights reserved.

Conference paper