Data Silences: How to Unsilence the Uncertainties in Data Science
Abstract
When we wrangle the data in data science, we design the data to make it fit-for-analysis. Wrangling involves the removal or reduction of uncertainties, such as outliers, missing values, mal-distributions, and the details of feature engineering. Many of the steps of data wrangling go unrecorded or poorly recorded, in terms of both what was done and also the rationale for why it was done. In this way, we impose multiple types of data silences on the data, and often on the sources (people) who are “behind” the data. In this paper, we articulate how we may perform multiple types of silencing. We challenge comfortable conceptions of the nature of data, and we call on the data-science community to devise and adopt methodologies to unsilence data.