Publication
IBM J. Res. Dev
Paper

Data quality challenges for person-generated health and wellness data

View publication

Abstract

Person-generated health data (PGHD) generated by wearable devices and smartphone applications are growing rapidly. There is increasing effort to employ advanced analytical methods to generate insights from these data in order to help people change their lifestyle and improve their health. PGHD - such as step counts, exercise logs, nutritional diaries, and sleep records - are often incomplete, inaccurate, and collected over too short a duration. Insufficient user engagement with wearable and mobile technologies, as well as lack of sensor validation, standardization of data collection, transparency of data processing assumptions, and accessibility to relevant data from consumer-grade sensors, also negatively affects data quality. The literature on data quality for PGHD is sparse and fragmented, providing little guidance to data analysts on how to assess and prioritize data quality concerns. In this paper, we summarize our experiences as data analysts working with PGHD, outline some of the challenges in using PGHD for insight generation, and discuss some established methods for addressing these challenges. We review the literature on PGHD data quality, identify the major stakeholders in the PGHD ecosystem, and apply an established data quality framework to present the most relevant data quality challenges for each stakeholder.

Date

Publication

IBM J. Res. Dev

Authors

Share