Your browser doesn't support javascript.
loading
Privacy preserving identification of population stratification for collaborative genomic research.
Dervishi, Leonard; Li, Wenbiao; Halimi, Anisa; Jiang, Xiaoqian; Vaidya, Jaideep; Ayday, Erman.
Afiliación
  • Dervishi L; Computer and Data Sciences, Case Western Reserve University, OH 44106, United States.
  • Li W; Computer and Data Sciences, Case Western Reserve University, OH 44106, United States.
  • Halimi A; Research, IBM, D15HN66, Ireland.
  • Jiang X; School of Biomedical Informatics, University of Texas Health Science Center at Houston, TX 77030, United States.
  • Vaidya J; Management Science and Information Systems Department, Rutgers University, NJ 07102, USA.
  • Ayday E; Computer and Data Sciences, Case Western Reserve University, OH 44106, United States.
Bioinformatics ; 39(39 Suppl 1): i168-i176, 2023 06 30.
Article en En | MEDLINE | ID: mdl-37387172
ABSTRACT
The rapid improvements in genomic sequencing technology have led to the proliferation of locally collected genomic datasets. Given the sensitivity of genomic data, it is crucial to conduct collaborative studies while preserving the privacy of the individuals. However, before starting any collaborative research effort, the quality of the data needs to be assessed. One of the essential steps of the quality control process is population stratification identifying the presence of genetic difference in individuals due to subpopulations. One of the common methods used to group genomes of individuals based on ancestry is principal component analysis (PCA). In this article, we propose a privacy-preserving framework which utilizes PCA to assign individuals to populations across multiple collaborators as part of the population stratification step. In our proposed client-server-based scheme, we initially let the server train a global PCA model on a publicly available genomic dataset which contains individuals from multiple populations. The global PCA model is later used to reduce the dimensionality of the local data by each collaborator (client). After adding noise to achieve local differential privacy (LDP), the collaborators send metadata (in the form of their local PCA outputs) about their research datasets to the server, which then aligns the local PCA results to identify the genetic differences among collaborators' datasets. Our results on real genomic data show that the proposed framework can perform population stratification analysis with high accuracy while preserving the privacy of the research participants.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Privacidad / Genómica Tipo de estudio: Diagnostic_studies / Prognostic_studies Límite: Humans Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Privacidad / Genómica Tipo de estudio: Diagnostic_studies / Prognostic_studies Límite: Humans Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos