IJCAI 2013
Conference paper

On the challenges of balancing privacy and utility of open health data

View publication


While health data has been collected at large scale for many years, this data is often difficult to obtain for the purpose of research. This is in part due to the cost and complexities involved in preparing this data for third parties. Health data must be adequately de-identified - a complex process resulting in full or partial "synthetic" data. This paper discusses technological challenges in this process when balancing the preservation of an individual's privacy against the preservation of the data's utility. An example is open health data, where the process of de-identification is often so rigorous that the data is useless for meaningful observational studies. Our discussion is made concrete by considering an open health data set by the American Centres of Medicare and Medicaid Services (CMS). © 2013 ACM.