Pesquisa | BVS IEC

1.

Unsupervised representation learning on high-dimensional clinical data improves genomic discovery and prediction.

Yun, Taedong; Cosentino, Justin; Behsaz, Babak; McCaw, Zachary R; Hill, Davin; Luben, Robert; Lai, Dongbing; Bates, John; Yang, Howard; Schwantes-An, Tae-Hwi; Zhou, Yuchen; Khawaja, Anthony P; Carroll, Andrew; Hobbs, Brian D; Cho, Michael H; McLean, Cory Y; Hormozdiari, Farhad.

Nat Genet ; 56(8): 1604-1613, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-38977853

RESUMO

Although high-dimensional clinical data (HDCD) are increasingly available in biobank-scale datasets, their use for genetic discovery remains challenging. Here we introduce an unsupervised deep learning model, Representation Learning for Genetic Discovery on Low-Dimensional Embeddings (REGLE), for discovering associations between genetic variants and HDCD. REGLE leverages variational autoencoders to compute nonlinear disentangled embeddings of HDCD, which become the inputs to genome-wide association studies (GWAS). REGLE can uncover features not captured by existing expert-defined features and enables the creation of accurate disease-specific polygenic risk scores (PRSs) in datasets with very few labeled data. We apply REGLE to perform GWAS on respiratory and circulatory HDCD-spirograms measuring lung function and photoplethysmograms measuring blood volume changes. REGLE replicates known loci while identifying others not previously detected. REGLE are predictive of overall survival, and PRSs constructed from REGLE loci improve disease prediction across multiple biobanks. Overall, REGLE contain clinically relevant information beyond that captured by existing expert-defined features, leading to improved genetic discovery and disease prediction.

Assuntos

Estudo de Associação Genômica Ampla , Humanos , Estudo de Associação Genômica Ampla/métodos , Herança Multifatorial/genética , Predisposição Genética para Doença , Aprendizado de Máquina não Supervisionado , Genômica/métodos , Aprendizado Profundo , Polimorfismo de Nucleotídeo Único

2.

Polygenic and transcriptional risk scores identify chronic obstructive pulmonary disease subtypes.

Moll, Matthew; Hecker, Julian; Platig, John; Zhang, Jingzhou; Ghosh, Auyon J; Pratte, Katherine A; Wang, Rui-Sheng; Hill, Davin; Konigsberg, Iain R; Chiles, Joe W; Hersh, Craig P; Castaldi, Peter J; Glass, Kimberly; Dy, Jennifer G; Sin, Don D; Tal-Singer, Ruth; Mouded, Majd; Rennard, Stephen I; Anderson, Gary P; Kinney, Gregory L; Bowler, Russell P; Curtis, Jeffrey L; McDonald, Merry-Lynn; Silverman, Edwin K; Hobbs, Brian D; Cho, Michael H.

medRxiv ; 2024 May 20.

Artigo em Inglês | MEDLINE | ID: mdl-38826461

RESUMO

Rationale: Genetic variants and gene expression predict risk of chronic obstructive pulmonary disease (COPD), but their effect on COPD heterogeneity is unclear. Objectives: Define high-risk COPD subtypes using both genetics (polygenic risk score, PRS) and blood gene expression (transcriptional risk score, TRS) and assess differences in clinical and molecular characteristics. Methods: We defined high-risk groups based on PRS and TRS quantiles by maximizing differences in protein biomarkers in a COPDGene training set and identified these groups in COPDGene and ECLIPSE test sets. We tested multivariable associations of subgroups with clinical outcomes and compared protein-protein interaction networks and drug repurposing analyses between high-risk groups. Measurements and Main Results: We examined two high-risk omics-defined groups in non-overlapping test sets (n=1,133 NHW COPDGene, n=299 African American (AA) COPDGene, n=468 ECLIPSE). We defined "High activity" (low PRS/high TRS) and "severe risk" (high PRS/high TRS) subgroups. Participants in both subgroups had lower body-mass index (BMI), lower lung function, and alterations in metabolic, growth, and immune signaling processes compared to a low-risk (low PRS, low TRS) reference subgroup. "High activity" but not "severe risk" participants had greater prospective FEV 1 decline (COPDGene: -51 mL/year; ECLIPSE: - 40 mL/year) and their proteomic profiles were enriched in gene sets perturbed by treatment with 5-lipoxygenase inhibitors and angiotensin-converting enzyme (ACE) inhibitors. Conclusions: Concomitant use of polygenic and transcriptional risk scores identified clinical and molecular heterogeneity amongst high-risk individuals. Proteomic and drug repurposing analysis identified subtype-specific enrichment for therapies and suggest prior drug repurposing failures may be explained by patient selection.

3.

Inference of chronic obstructive pulmonary disease with deep learning on raw spirograms identifies new genetic loci and improves risk models.

Cosentino, Justin; Behsaz, Babak; Alipanahi, Babak; McCaw, Zachary R; Hill, Davin; Schwantes-An, Tae-Hwi; Lai, Dongbing; Carroll, Andrew; Hobbs, Brian D; Cho, Michael H; McLean, Cory Y; Hormozdiari, Farhad.

Nat Genet ; 55(5): 787-795, 2023 05.

Artigo em Inglês | MEDLINE | ID: mdl-37069358

RESUMO

Chronic obstructive pulmonary disease (COPD), the third leading cause of death worldwide, is highly heritable. While COPD is clinically defined by applying thresholds to summary measures of lung function, a quantitative liability score has more power to identify genetic signals. Here we train a deep convolutional neural network on noisy self-reported and International Classification of Diseases labels to predict COPD case-control status from high-dimensional raw spirograms and use the model's predictions as a liability score. The machine-learning-based (ML-based) liability score accurately discriminates COPD cases and controls, and predicts COPD-related hospitalization without any domain-specific knowledge. Moreover, the ML-based liability score is associated with overall survival and exacerbation events. A genome-wide association study on the ML-based liability score replicates existing COPD and lung function loci and also identifies 67 new loci. Lastly, our method provides a general framework to use ML methods and medical-record-based labels that does not require domain knowledge or expert curation to improve disease prediction and genomic discovery for drug design.

Assuntos

Aprendizado Profundo , Doença Pulmonar Obstrutiva Crônica , Humanos , Estudo de Associação Genômica Ampla/métodos , Doença Pulmonar Obstrutiva Crônica/genética , Loci Gênicos , Polimorfismo de Nucleotídeo Único/genética

4.

Deep Learning Utilizing Suboptimal Spirometry Data to Improve Lung Function and Mortality Prediction in the UK Biobank.

Hill, Davin; Torop, Max; Masoomi, Aria; Castaldi, Peter J; Silverman, Edwin K; Bodduluri, Sandeep; Bhatt, Surya P; Yun, Taedong; McLean, Cory Y; Hormozdiari, Farhad; Dy, Jennifer; Cho, Michael H; Hobbs, Brian D.

medRxiv ; 2023 Apr 29.

Artigo em Inglês | MEDLINE | ID: mdl-37162978

RESUMO

Background: Spirometry measures lung function by selecting the best of multiple efforts meeting pre-specified quality control (QC), and reporting two key metrics: forced expiratory volume in 1 second (FEV1) and forced vital capacity (FVC). We hypothesize that discarded submaximal and QC-failing data meaningfully contribute to the prediction of airflow obstruction and all-cause mortality. Methods: We evaluated volume-time spirometry data from the UK Biobank. We identified "best" spirometry efforts as those passing QC with the maximum FVC. "Discarded" efforts were either submaximal or failed QC. To create a combined representation of lung function we implemented a contrastive learning approach, Spirogram-based Contrastive Learning Framework (Spiro-CLF), which utilized all recorded volume-time curves per participant and applied different transformations (e.g. flow-volume, flow-time). In a held-out 20% testing subset we applied the Spiro-CLF representation of a participant's overall lung function to 1) binary predictions of FEV1/FVC < 0.7 and FEV1 Percent Predicted (FEV1PP) < 80%, indicative of airflow obstruction, and 2) Cox regression for all-cause mortality. Findings: We included 940,705 volume-time curves from 352,684 UK Biobank participants with 2-3 spirometry efforts per individual (66.7% with 3 efforts) and at least one QC-passing spirometry effort. Of all spirometry efforts, 24.1% failed QC and 37.5% were submaximal. Spiro-CLF prediction of FEV1/FVC < 0.7 utilizing discarded spirometry efforts had an Area under the Receiver Operating Characteristics (AUROC) of 0.981 (0.863 for FEV1PP prediction). Incorporating discarded spirometry efforts in all-cause mortality prediction was associated with a concordance index (c-index) of 0.654, which exceeded the c-indices from FEV1 (0.590), FVC (0.559), or FEV1/FVC (0.599) from each participant's single best effort. Interpretation: A contrastive learning model using raw spirometry curves can accurately predict lung function using submaximal and QC-failing efforts. This model also has superior prediction of all-cause mortality compared to standard lung function measurements. Funding: MHC is supported by NIH R01HL137927, R01HL135142, HL147148, and HL089856.BDH is supported by NIH K08HL136928, U01 HL089856, and an Alpha-1 Foundation Research Grant.DH is supported by NIH 2T32HL007427-41EKS is supported by NIH R01 HL152728, R01 HL147148, U01 HL089856, R01 HL133135, P01 HL132825, and P01 HL114501.PJC is supported by NIH R01HL124233 and R01HL147326.SPB is supported by NIH R01HL151421 and UH3HL155806.TY, FH, and CYM are employees of Google LLC.

5.

Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases.

Yun, Taedong; Cosentino, Justin; Behsaz, Babak; McCaw, Zachary R; Hill, Davin; Luben, Robert; Lai, Dongbing; Bates, John; Yang, Howard; Schwantes-An, Tae-Hwi; Zhou, Yuchen; Khawaja, Anthony P; Carroll, Andrew; Hobbs, Brian D; Cho, Michael H; McLean, Cory Y; Hormozdiari, Farhad.

medRxiv ; 2023 Aug 29.

Artigo em Inglês | MEDLINE | ID: mdl-37163049

RESUMO

High-dimensional clinical data are becoming more accessible in biobank-scale datasets. However, effectively utilizing high-dimensional clinical data for genetic discovery remains challenging. Here we introduce a general deep learning-based framework, REpresentation learning for Genetic discovery on Low-dimensional Embeddings (REGLE), for discovering associations between genetic variants and high-dimensional clinical data. REGLE uses convolutional variational autoencoders to compute a non-linear, low-dimensional, disentangled embedding of the data with highly heritable individual components. REGLE can incorporate expert-defined or clinical features and provides a framework to create accurate disease-specific polygenic risk scores (PRS) in datasets which have minimal expert phenotyping. We apply REGLE to both respiratory and circulatory systems: spirograms which measure lung function and photoplethysmograms (PPG) which measure blood volume changes. Genome-wide association studies on REGLE embeddings identify more genome-wide significant loci than existing methods and replicate known loci for both spirograms and PPG, demonstrating the generality of the framework. Furthermore, these embeddings are associated with overall survival. Finally, we construct a set of PRSs that improve predictive performance of asthma, chronic obstructive pulmonary disease, hypertension, and systolic blood pressure in multiple biobanks. Thus, REGLE embeddings can quantify clinically relevant features that are not currently captured in a standardized or automated way.

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA