About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
SDM 2024
Conference paper
Data Silences: How to Unsilence the Uncertainties in Data Science
Abstract
When we wrangle the data in data science, we design the data to make it fit-for-analysis. Wrangling involves the removal or reduction of uncertainties, such as outliers, missing values, mal-distributions, and the details of feature engineering. Many of the steps of data wrangling go unrecorded or poorly recorded, in terms of both what was done and also the rationale for why it was done. In this way, we impose multiple types of data silences on the data, and often on the sources (people) who are “behind” the data. In this paper, we articulate how we may perform multiple types of silencing. We challenge comfortable conceptions of the nature of data, and we call on the data-science community to devise and adopt methodologies to unsilence data.