About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
AMIA Annual Symposium 2022
Conference paper
Facilitating Federated Genomic Data Analysis by Identifying Record Correlations while Ensuring Privacy
Abstract
With the reduction of sequencing costs and the pervasiveness of computing devices, genomic data collection is continually growing. Identifying related records is a fundamental step in creating high-quality datasets for genomic research. However, genomic data may reveal sensitive information about individuals. In this paper, we present a privacy-preserving solution for identifying samples with high kinship relationships in the federated datasets. In the client-server setting, the researchers lightly synchronize to decide the metadata to share with the server. To improve privacy, we propose a framework based on random shuffling, synthetic records generation technique, and a variant of local differential privacy. Furthermore, we provide detailed privacy analysis and extensive evaluations on real genomic data from OpenSNP. The experiment results show that our proposed schema is secure for honest-but-curious servers and allows efficiently identifying related samples with high accuracy.