About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
IBM J. Res. Dev
Paper
Curating and integrating user-generated health data from multiple sources to support healthcare analytics
Abstract
As the volume and variety of healthcare-related data continue to grow, the analysis and use of this data will increasingly depend on the ability to appropriately collect, curate, and integrate disparate data from many different sources including user-generated health data. We describe our approach to, and highlight our experiences with, the development of a robust data curation process that supports healthcare analytics. The process consists of the following steps: collection, understanding, validation, cleaning, integration, enrichment, and storage. It has been successfully applied to the processing of a variety of data types including clinical data from electronic health records and observational studies, genomic data, microbiome data, self-reported data from surveys, and self-tracked data from wearables from more than 600 subjects. The curated data have been used to support a number of healthcare analytic applications, including descriptive analytics, data visualization, patient stratification, and predictive modeling.