Your browser doesn't support javascript.
loading
Unsupervised representation learning on high-dimensional clinical data improves genomic discovery and prediction.
Yun, Taedong; Cosentino, Justin; Behsaz, Babak; McCaw, Zachary R; Hill, Davin; Luben, Robert; Lai, Dongbing; Bates, John; Yang, Howard; Schwantes-An, Tae-Hwi; Zhou, Yuchen; Khawaja, Anthony P; Carroll, Andrew; Hobbs, Brian D; Cho, Michael H; McLean, Cory Y; Hormozdiari, Farhad.
Afiliação
  • Yun T; Google Research, Cambridge, MA, USA. tedyun@google.com.
  • Cosentino J; Google Research, Mountain View, CA, USA.
  • Behsaz B; Google Research, Cambridge, MA, USA.
  • McCaw ZR; Google Research, Mountain View, CA, USA.
  • Hill D; Insitro, South San Francisco, CA, USA.
  • Luben R; Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA.
  • Lai D; Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA.
  • Bates J; NIHR Biomedical Research Centre at Moorfields Eye Hospital and University College London (UCL) Institute of Ophthalmology, London, UK.
  • Yang H; MRC Epidemiology Unit, University of Cambridge, Cambridge, UK.
  • Schwantes-An TH; Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA.
  • Zhou Y; Verily Life Sciences, South San Francisco, CA, USA.
  • Khawaja AP; Google Research, Mountain View, CA, USA.
  • Carroll A; Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA.
  • Hobbs BD; Division of Cardiology, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, USA.
  • Cho MH; Google Research, Cambridge, MA, USA.
  • McLean CY; NIHR Biomedical Research Centre at Moorfields Eye Hospital and University College London (UCL) Institute of Ophthalmology, London, UK.
  • Hormozdiari F; MRC Epidemiology Unit, University of Cambridge, Cambridge, UK.
Nat Genet ; 56(8): 1604-1613, 2024 Aug.
Article em En | MEDLINE | ID: mdl-38977853
ABSTRACT
Although high-dimensional clinical data (HDCD) are increasingly available in biobank-scale datasets, their use for genetic discovery remains challenging. Here we introduce an unsupervised deep learning model, Representation Learning for Genetic Discovery on Low-Dimensional Embeddings (REGLE), for discovering associations between genetic variants and HDCD. REGLE leverages variational autoencoders to compute nonlinear disentangled embeddings of HDCD, which become the inputs to genome-wide association studies (GWAS). REGLE can uncover features not captured by existing expert-defined features and enables the creation of accurate disease-specific polygenic risk scores (PRSs) in datasets with very few labeled data. We apply REGLE to perform GWAS on respiratory and circulatory HDCD-spirograms measuring lung function and photoplethysmograms measuring blood volume changes. REGLE replicates known loci while identifying others not previously detected. REGLE are predictive of overall survival, and PRSs constructed from REGLE loci improve disease prediction across multiple biobanks. Overall, REGLE contain clinically relevant information beyond that captured by existing expert-defined features, leading to improved genetic discovery and disease prediction.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Estudo de Associação Genômica Ampla Limite: Humans Idioma: En Revista: Nat Genet Assunto da revista: GENETICA MEDICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Estudo de Associação Genômica Ampla Limite: Humans Idioma: En Revista: Nat Genet Assunto da revista: GENETICA MEDICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos