Designing ground truth and the social life of labels

Michael Muller; Christine T. Wolf; Josh Andrés; Zahra Ashktorab; Narendra Nath Joshi; Michael Desmond; Aabhas Sharma; Kristina Brimijoin; Qian Pan; Evelyn Duesterwald; Casey Dugan

doi:10.1145/11445.11456

CHI 2021

Conference paper

06 May 2021

Designing ground truth and the social life of labels

Abstract

Ground-truth labeling is an important activity in machine learning. Many studies have examined how crowdworkers apply labels to records in machine learning datasets. However, there have been few studies that have examined the work of domain experts when their knowledge and expertise are needed to apply labels. We provide a grounded account of the work of labeling teams with domain experts, including the experiences of labeling, collaborative confgurations and work-practices, and quality issues. We show three major patterns in the social design of ground truth data: Principled design, Iterative design, and Improvisational design. We interpret our results through theories of from Human Centered Data Science, and particularly work on human interventions in data science work through the design and creation of data.

Workshop paper