On the challenges of balancing privacy and utility of open health data

Christian Guttmann; Xingzhi Sun; Chaitanya Rao; Carlos Queiroz; Benjamin I. P. Rubinstein

doi:10.1145/2516911.2516916

IJCAI 2013

Conference paper

31 Dec 2013

On the challenges of balancing privacy and utility of open health data

View publication

Abstract

While health data has been collected at large scale for many years, this data is often difficult to obtain for the purpose of research. This is in part due to the cost and complexities involved in preparing this data for third parties. Health data must be adequately de-identified - a complex process resulting in full or partial "synthetic" data. This paper discusses technological challenges in this process when balancing the preservation of an individual's privacy against the preservation of the data's utility. An example is open health data, where the process of de-identification is often so rigorous that the data is useless for meaningful observational studies. Our discussion is made concrete by considering an open health data set by the American Centres of Medicare and Medicaid Services (CMS). © 2013 ACM.

Conference paper