Measuring crowd truth: Disagreement metrics combined with worker behavior filters

Guillermo Soberón; Lora Aroyo; Chris Welty; Oana Inel; Hui Lin; Manfred Overmeen

ISWC 2013

Conference paper

19 Oct 2013

Measuring crowd truth: Disagreement metrics combined with worker behavior filters

Abstract

When crowdsourcing gold standards for NLP tasks, the workers may not reach a consensus on a single correct solution for each task. The goal of Crowd Truth is to embrace such disagreement between individual annotators and harness it as useful information to signal vague or ambiguous examples. Even though the technique relies on disagreement, we also assume that the differing opinions will cluster around the more plausible alternatives. Therefore it is possible to identify workers who systematically disagree - both with the majority opinion and with the rest of their co-workers-as low quality or spam workers. We present in this paper a more detailed formalization of metrics for Crowd Truth in the context of medical relation extraction, and a set of additional filtering techniques that require the workers to briefly justify their answers. These explanation-based techniques are shown to be particularly useful in conjunction with disagreement-based metrics, and achieve 95% accuracy for identifying low quality and spam submissions in crowdsourcing settings where spam is quite high.

Conference paper