Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 129
Filter
1.
Nat Commun ; 15(1): 8891, 2024 Oct 15.
Article in English | MEDLINE | ID: mdl-39406732

ABSTRACT

Identifying genetic drivers of chronic diseases is necessary for drug discovery. Here, we develop a machine learning-assisted genetic priority score, which we call ML-GPS, that incorporates genetic associations with predicted disease phenotypes to enhance target discovery. First, we construct gradient boosting models to predict 112 chronic disease phecodes in the UK Biobank and analyze associations of predicted and observed phenotypes with common, rare, and ultra-rare variants to model the allelic series. We integrate these associations with existing evidence using gradient boosting with continuous feature encoding to construct ML-GPS, training it to predict drug indications in Open Targets and externally testing it in SIDER. We then generate ML-GPS predictions for 2,362,636 gene-phecode pairs. We find that the use of predicted phenotypes, which identify substantially more genetic associations than observed phenotypes across the allele frequency spectrum, significantly improves the performance of ML-GPS. ML-GPS increases coverage of drug targets, with the top 1% of all scores providing support for 15,077 gene-phecode pairs that previously had no support. ML-GPS can also identify well-known target-disease relationships, promising targets without indicated drugs, and targets for several drugs in clinical trials, including LRRK2 inhibitors for Parkinson's disease and olpasiran for cardiovascular disease.


Subject(s)
Drug Discovery , Machine Learning , Phenotype , Humans , Chronic Disease/drug therapy , Drug Discovery/methods , Gene Frequency , Genetic Predisposition to Disease
2.
Obes Rev ; : e13823, 2024 Sep 04.
Article in English | MEDLINE | ID: mdl-39233338

ABSTRACT

We systematically reviewed observational and Mendelian randomization (MR) articles that evaluated the association between obesity and 17 gastrointestinal (GI) diseases to integrate causal and observational evidence. A total of 594 observational studies from 26 systematic reviews and meta-analyses and nine MR articles were included. For every 5 kg/m2 increase in body mass index (BMI), there was an increased risk of GI diseases ranging from 2% for rectal cancer (relative risk [RR]: 1.02, 95% confidence interval [CI]: 1.01 to 1.03) to 63% for gallbladder disease (RR: 1.63, 95% CI: 1.50 to 1.77). MR articles indicated that risks of developing GI diseases elevated with each 1 standard deviation increase in genetically predicted BMI, ranging from 11% for Crohn's disease to 189% for nonalcoholic fatty liver disease. Moreover, upper GI conditions were less susceptible, whereas hepatobiliary organs were more vulnerable to increased adiposity. Among the associations between obesity and the 17 GI conditions, causal relationships were inferred from only approximately half (10/17, 59%). This study reveals a substantial gap between observational and causal evidence, indicating that a combined approach is necessary to effectively inform public health policies and guide epidemiological research on obesity and GI diseases.

3.
Hepatol Commun ; 8(9)2024 09 01.
Article in English | MEDLINE | ID: mdl-39185915

ABSTRACT

BACKGROUND: Liver fibrosis is a critical public health concern, necessitating early detection to prevent progression. This study evaluates the recently developed LiverRisk score and steatosis-associated Fibrosis Estimator (SAFE) score against established indices for prognostication and/or fibrosis prediction in 4diverse cohorts, including participants with metabolic dysfunction-associated steatotic liver disease (MASLD). METHODS: We used data from the Mount Sinai Data Warehouse (32,828 participants without liver disease diagnoses), the Mount Sinai MASLD/MASH Longitudinal Registry (422 participants with MASLD), and National Health and Nutrition Examination Survey 2017-2020 (4133 participants representing the general population) to compare LiverRisk score, FIB-4 index, APRI, and SAFE score. Analyses included Cox proportional hazards regressions, Kaplan-Meier estimates, and classification metrics to evaluate performance in prognostication and fibrosis prediction. RESULTS: In Mount Sinai Data Warehouse, LiverRisk score was significantly associated with future liver-related outcomes but did not significantly outperform FIB-4 or APRI for predicting any of the outcomes. In the general population, LiverRisk score and SAFE score outperformed FIB-4 and APRI in identifying fibrosis, but LiverRisk score underperformed among participants who were non-White or had type 2 diabetes. Among participants with MASLD, SAFE score outperformed FIB-4 and APRI in 1 of 2 cohorts, but there were generally few significant performance differences between all 4 scores. CONCLUSIONS: LiverRisk score does not consistently outperform existing predictors in diverse populations, and further validation is needed before adoption in settings with significant differences from the original derivation cohorts. It remains necessary to replicate the ability of these scores to predict liver-specific mortality, as well as to develop diagnostic tools for liver fibrosis that are accessible and substantially better than current scores, especially among patients with MASLD and other chronic liver conditions.


Subject(s)
Liver Cirrhosis , Nutrition Surveys , Registries , Humans , Liver Cirrhosis/blood , Liver Cirrhosis/pathology , Male , Female , Middle Aged , Adult , Prognosis , Severity of Illness Index , Aged , United States/epidemiology , Biomarkers/blood , Fatty Liver/pathology
4.
Ophthalmology ; 2024 Aug 14.
Article in English | MEDLINE | ID: mdl-39128550

ABSTRACT

PURPOSE: We used a polygenic risk score (PRS) to identify high-risk groups for primary open-angle glaucoma (POAG) within population-based cohorts. DESIGN: Secondary analysis of 4 prospective population-based studies. PARTICIPANTS: We included four European-ancestry cohorts: the United States-based Nurses' Health Study, Nurses' Health Study 2, and the Health Professionals Follow-up Study and the Rotterdam Study (RS) in The Netherlands. The United States cohorts included female nurses and male health professionals ≤ 55 years of age. The RS included residents ≤ 45 years of age living in Rotterdam, The Netherlands. METHODS: Polygenic risk score weights were estimated by applying the lassosum method on imputed genotype and phenotype data from the UK Biobank. This resulted in 144 020 variants, single nucleotide polymorphism and insertions or deletions, with nonzero ßs that we used to calculate a PRS in the target populations. Using multivariable Cox proportional hazard models, we estimated the relationship between the standardized PRS and relative risk for POAG. Additionally, POAG prediction was tested by calculating these models' concordance (Harrell's C statistic). Finally, we assessed the association between PRS tertiles and glaucoma-related traits. MAIN OUTCOME MEASURES: The relative risk for POAG and Harrell's C statistic. RESULTS: Among 1046 patients and 38 809‬ control participants, the relative risk (95% confidence interval) for POAG for participants in the highest PRS quintile was 3.99 (3.08-5.18) times higher in the United States cohorts and 4.89 (2.93-8.17) times higher in the RS, compared with participants with median genetic risk (third quintile). Combining age, sex, intraocular pressure of more than 25 mmHg, and family history resulted in a meta-analyzed concordance of 0.75 (95% CI, 0.73-0.75). Adding the PRS to this model improved the concordance to 0.82 (95% CI, 0.80-0.84). In a meta-analysis of all cohorts, patients in the highest tertile showed a larger cup-to-disc ratio at diagnosis, by 0.10 (95% CI, 0.06 0.14), and a 2.07-fold increased risk of requiring glaucoma surgery (95% CI, 1.19-3.60). CONCLUSIONS: Incorporating a PRS into a POAG predictive model improves identification concordance from 0.75 up to 0.82, supporting its potential for guiding more cost-effective screening strategies. FINANCIAL DISCLOSURE(S): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

5.
NPJ Digit Med ; 7(1): 226, 2024 Aug 24.
Article in English | MEDLINE | ID: mdl-39181999

ABSTRACT

Congenital long QT syndrome (LQTS) diagnosis is complicated by limited genetic testing at scale, low prevalence, and normal QT corrected interval in patients with high-risk genotypes. We developed a deep learning approach combining electrocardiogram (ECG) waveform and electronic health record data to assess whether patients had pathogenic variants causing LQTS. We defined patients with high-risk genotypes as having ≥1 pathogenic variant in one of the LQTS-susceptibility genes. We trained the model using data from United Kingdom Biobank (UKBB) and then fine-tuned in a racially/ethnically diverse cohort using Mount Sinai BioMe Biobank. Following group-stratified 5-fold splitting, the fine-tuned model achieved area under the precision-recall curve of 0.29 (95% confidence interval [CI] 0.28-0.29) and area under the receiver operating curve of 0.83 (0.82-0.83) on independent testing data from BioMe. Multimodal fusion learning has promise to identify individuals with pathogenic genetic mutations to enable patient prioritization for further work up.

6.
medRxiv ; 2024 Jul 01.
Article in English | MEDLINE | ID: mdl-39006413

ABSTRACT

Background: Circulating biomarkers play a pivotal role in personalized medicine, offering potential for disease screening, prevention, and treatment. Despite established associations between numerous biomarkers and diseases, elucidating their causal relationships is challenging. Mendelian Randomization (MR) can address this issue by employing genetic instruments to discern causal links. Additionally, using multiple MR methods with overlapping results enhances the reliability of discovered relationships. Methods: Here we report an MR study using multiple methods, including inverse variance weighted, simple mode, weighted mode, weighted median, and MR Egger. We use the MR-base resource (v0.5.6)1 to evaluate causal relationships between 212 circulating biomarkers (curated from UK Biobank analyses by Neale lab and from Shin et al. 2014, Roederer et al. 2015, and Kettunen et al. 2016)2-4 and 99 complex diseases (curated from several consortia by MRC IEU and Biobank Japan). Results: We report novel causal relationships found by 4 or more MR methods between glucose and bipolar disorder (Mean Effect Size estimate across methods: 0.39) and between cystatin C and bipolar disorder (Mean Effect Size: -0.31). Based on agreement in 4 or more methods, we also identify previously known links between urate with gout and creatine with chronic kidney disease, as well as biomarkers that may be causal of cardiovascular conditions: apolipoprotein B, cholesterol, LDL, lipoprotein A, and triglycerides in coronary heart disease, as well as lipoprotein A, LDL, cholesterol, and apolipoprotein B in myocardial infarction. Conclusions: This Mendelian Randomization study not only corroborates known causal relationships between circulating biomarkers and diseases but also uncovers two novel biomarkers associated with bipolar disorder that warrant further investigation. Our findings provide insight into understanding how biological processes reflecting circulating biomarkers and their associated effects may contribute to disease etiology, which can eventually help improve precision diagnostics and intervention.

7.
Cell Metab ; 36(7): 1494-1503.e3, 2024 Jul 02.
Article in English | MEDLINE | ID: mdl-38959863

ABSTRACT

The extent to which modifiable lifestyle factors offset the determined genetic risk of obesity and obesity-related morbidities remains unknown. We explored how the interaction between genetic and lifestyle factors influences the risk of obesity and obesity-related morbidities. The polygenic score for body mass index was calculated to quantify inherited susceptibility to obesity in 338,645 UK Biobank European participants, and a composite lifestyle score was derived from five obesogenic factors (physical activity, diet, sedentary behavior, alcohol consumption, and sleep duration). We observed significant interaction between high genetic risk and poor lifestyles (pinteraction < 0.001). Absolute differences in obesity risk between those who adhere to healthy lifestyles and those who do not had gradually expanded with an increase in polygenic score. Despite a high genetic risk for obesity, individuals can prevent obesity-related morbidities by adhering to a healthy lifestyle and maintaining a normal body weight. Healthy lifestyles should be promoted irrespective of genetic background.


Subject(s)
Body Mass Index , Genetic Predisposition to Disease , Life Style , Obesity , Humans , Obesity/genetics , Male , Female , Middle Aged , Risk Factors , Adult , Aged , Exercise , Sedentary Behavior , United Kingdom/epidemiology
8.
Int J Mol Sci ; 25(13)2024 Jul 05.
Article in English | MEDLINE | ID: mdl-39000484

ABSTRACT

Circulating biomarkers play a pivotal role in personalized medicine, offering potential for disease screening, prevention, and treatment. Despite established associations between numerous biomarkers and diseases, elucidating their causal relationships is challenging. Mendelian Randomization (MR) can address this issue by employing genetic instruments to discern causal links. Additionally, using multiple MR methods with overlapping results enhances the reliability of discovered relationships. Here, we report an MR study using multiple methods, including inverse variance weighted, simple mode, weighted mode, weighted median, and MR-Egger. We use the MR-base resource (v0.5.6) from Hemani et al. 2018 to evaluate causal relationships between 212 circulating biomarkers (curated from UK Biobank analyses by Neale lab and from Shin et al. 2014, Roederer et al. 2015, and Kettunen et al. 2016 and 99 complex diseases (curated from several consortia by MRC IEU and Biobank Japan). We report novel causal relationships found by four or more MR methods between glucose and bipolar disorder (Mean Effect Size estimate across methods: 0.39) and between cystatin C and bipolar disorder (Mean Effect Size: -0.31). Based on agreement in four or more methods, we also identify previously known links between urate with gout and creatine with chronic kidney disease, as well as biomarkers that may be causal of cardiovascular conditions: apolipoprotein B, cholesterol, LDL, lipoprotein A, and triglycerides in coronary heart disease, as well as lipoprotein A, LDL, cholesterol, and apolipoprotein B in myocardial infarction. This Mendelian Randomization study not only corroborates known causal relationships between circulating biomarkers and diseases but also uncovers two novel biomarkers associated with bipolar disorder that warrant further investigation. Our findings provide insight into understanding how biological processes reflecting circulating biomarkers and their associated effects may contribute to disease etiology, which can eventually help improve precision diagnostics and intervention.


Subject(s)
Biomarkers , Mendelian Randomization Analysis , Humans , Biomarkers/blood , Bipolar Disorder/genetics , Bipolar Disorder/blood , Cardiovascular Diseases/genetics , Cardiovascular Diseases/blood , Risk Factors , Cystatin C/blood , Cystatin C/genetics , Gout/genetics , Gout/blood
9.
Am J Ophthalmol ; 267: 204-212, 2024 Nov.
Article in English | MEDLINE | ID: mdl-38906208

ABSTRACT

PURPOSE: Polygenic risk scores (PRSs) likely predict risk and prognosis of glaucoma. We compared the PRS performance for primary open-angle glaucoma (POAG), defined using International Classification of Diseases (ICD) codes vs manual medical record review. DESIGN: Retrospective cohort study. METHODS: We identified POAG cases in the Mount Sinai BioMe and Mass General Brigham (MGB) biobanks using ICD codes. We confirmed POAG based on optical coherence tomograms and visual fields. In a separate 5% sample, the absence of POAG was confirmed with intraocular pressure and cup-disc ratio criteria. We used genotype data and either self-reported glaucoma diagnoses or ICD-10 codes for glaucoma diagnoses from the UK Biobank and the lassosum method to compute a genome-wide POAG PRS. We compared the area under the curve (AUC) for POAG prediction based on ICD codes vs medical records. RESULTS: We reviewed 804 of 996 BioMe and 367 of 1006 MGB ICD-identified cases. In BioMe and MGB, respectively, positive predictive value was 53% and 55%; negative predictive value was 96% and 97%; sensitivity was 97% and 97%; and specificity was 44% and 53%. Adjusted PRS AUCs for POAG using ICD codes vs manual record review in BioMe were not statistically different (P ≥.21) by ancestry: 0.77 vs 0.75 for African, 0.80 vs 0.80 for Hispanic, and 0.81 vs 0.81 for European. Results were similar in MGB (P ≥.18): 0.72 vs 0.80 for African, 0.83 vs 0.86 for Hispanic, and 0.74 vs 0.73 for European. CONCLUSIONS: A POAG PRS performed similarly using either manual review or ICD codes in 2 electronic health record-linked biobanks; manual assessment of glaucoma status might not be necessary for some PRS studies. However, caution should be exercised when using ICD codes for glaucoma diagnosis given their low specificity (44%-53%) for manually confirmed cases of glaucoma.


Subject(s)
Electronic Health Records , Glaucoma, Open-Angle , Intraocular Pressure , Humans , Glaucoma, Open-Angle/genetics , Glaucoma, Open-Angle/diagnosis , Retrospective Studies , Male , Female , Intraocular Pressure/physiology , Aged , Middle Aged , Biological Specimen Banks , Risk Factors , International Classification of Diseases , Visual Fields/physiology , Multifactorial Inheritance , Area Under Curve , Tomography, Optical Coherence , Genome-Wide Association Study , Risk Assessment/methods , ROC Curve , Predictive Value of Tests , Genetic Risk Score
10.
Nat Genet ; 56(7): 1412-1419, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38862854

ABSTRACT

Coronary artery disease (CAD) exists on a spectrum of disease represented by a combination of risk factors and pathogenic processes. An in silico score for CAD built using machine learning and clinical data in electronic health records captures disease progression, severity and underdiagnosis on this spectrum and could enhance genetic discovery efforts for CAD. Here we tested associations of rare and ultrarare coding variants with the in silico score for CAD in the UK Biobank, All of Us Research Program and BioMe Biobank. We identified associations in 17 genes; of these, 14 show at least moderate levels of prior genetic, biological and/or clinical support for CAD. We also observed an excess of ultrarare coding variants in 321 aggregated CAD genes, suggesting more ultrarare variant associations await discovery. These results expand our understanding of the genetic etiology of CAD and illustrate how digital markers can enhance genetic association investigations for complex diseases.


Subject(s)
Coronary Artery Disease , Genetic Predisposition to Disease , Machine Learning , Coronary Artery Disease/genetics , Humans , Exome/genetics , Exome Sequencing/methods , Genetic Variation , Genome-Wide Association Study/methods , Female , Polymorphism, Single Nucleotide
11.
JACC Adv ; 3(4)2024 Apr.
Article in English | MEDLINE | ID: mdl-38737007

ABSTRACT

BACKGROUND: Diet is a key modifiable risk factor of coronary artery disease (CAD). However, the causal effects of specific dietary traits on CAD risk remain unclear. With the expansion of dietary data in population biobanks, Mendelian randomization (MR) could help enable the efficient estimation of causality in diet-disease associations. OBJECTIVES: The primary goal was to test causality for 13 common dietary traits on CAD risk using a systematic 2-sample MR framework. A secondary goal was to identify plasma metabolites mediating diet-CAD associations suspected to be causal. METHODS: Cross-sectional genetic and dietary data on up to 420,531 UK Biobank and 184,305 CARDIoGRAMplusC4D individuals of European ancestry were used in 2-sample MR. The primary analysis used fixed effect inverse-variance weighted regression, while sensitivity analyses used weighted median estimation, MR-Egger regression, and MR-Pleiotropy Residual Sum and Outlier. RESULTS: Genetic variants serving as proxies for muesli intake were negatively associated with CAD risk (OR: 0.74; 95% CI: 0.65-0.84; P = 5.385 × 10-4). Sensitivity analyses using weighted median estimation supported this with a significant association in the same direction. Additionally, we identified higher plasma acetate levels as a potential mediator (OR: 0.03; 95% CI: 0.01-0.12; P = 1.15 × 10-4). CONCLUSIONS: Muesli, a mixture of oats, seeds, nuts, dried fruit, and milk, may causally reduce CAD risk. Circulating levels of acetate, a gut microbiota-derived short-chain fatty acid, could be mediating its cardioprotective effects. These findings highlight the role of gut flora in cardiovascular health and help prioritize randomized trials on dietary interventions for CAD.

12.
Cell Rep Med ; 5(5): 101518, 2024 May 21.
Article in English | MEDLINE | ID: mdl-38642551

ABSTRACT

Population-based genomic screening may help diagnose individuals with disease-risk variants. Here, we perform a genome-first evaluation for nine disorders in 29,039 participants with linked exome sequences and electronic health records (EHRs). We identify 614 individuals with 303 pathogenic/likely pathogenic or predicted loss-of-function (P/LP/LoF) variants, yielding 644 observations; 487 observations (76%) lack a corresponding clinical diagnosis in the EHR. Upon further investigation, 75 clinically undiagnosed observations (15%) have evidence of symptomatic untreated disease, including familial hypercholesterolemia (3 of 6 [50%] undiagnosed observations with disease evidence) and breast cancer (23 of 106 [22%]). These genetic findings enable targeted phenotyping that reveals new diagnoses in previously undiagnosed individuals. Disease yield is greater with variants in penetrant genes for which disease is observed in carriers in an independent cohort. The prevalence of P/LP/LoF variants exceeds that of clinical diagnoses, and some clinically undiagnosed carriers are discovered to have disease. These results highlight the potential of population-based genomic screening.


Subject(s)
Exome Sequencing , Exome , Humans , Female , Male , Exome/genetics , Exome Sequencing/methods , Middle Aged , Adult , Genetic Diseases, Inborn/genetics , Genetic Diseases, Inborn/diagnosis , Genetic Diseases, Inborn/epidemiology , Genetic Predisposition to Disease , Electronic Health Records , Genetic Testing/methods , Genome, Human , Aged , Delivery of Health Care , Adolescent , Genomics/methods , Young Adult
13.
Diabetes Care ; 47(6): 1042-1047, 2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38652672

ABSTRACT

OBJECTIVE: To identify genetic risk factors for incident cardiovascular disease (CVD) among people with type 2 diabetes (T2D). RESEARCH DESIGN AND METHODS: We conducted a multiancestry time-to-event genome-wide association study for incident CVD among people with T2D. We also tested 204 known coronary artery disease (CAD) variants for association with incident CVD. RESULTS: Among 49,230 participants with T2D, 8,956 had incident CVD events (event rate 18.2%). We identified three novel genetic loci for incident CVD: rs147138607 (near CACNA1E/ZNF648, hazard ratio [HR] 1.23, P = 3.6 × 10-9), rs77142250 (near HS3ST1, HR 1.89, P = 9.9 × 10-9), and rs335407 (near TFB1M/NOX3, HR 1.25, P = 1.5 × 10-8). Among 204 known CAD loci, 5 were associated with incident CVD in T2D (multiple comparison-adjusted P < 0.00024, 0.05/204). A standardized polygenic score of these 204 variants was associated with incident CVD with HR 1.14 (P = 1.0 × 10-16). CONCLUSIONS: The data point to novel and known genomic regions associated with incident CVD among individuals with T2D.


Subject(s)
Cardiovascular Diseases , Diabetes Mellitus, Type 2 , Genome-Wide Association Study , Humans , Diabetes Mellitus, Type 2/genetics , Diabetes Mellitus, Type 2/epidemiology , Diabetes Mellitus, Type 2/complications , Cardiovascular Diseases/genetics , Cardiovascular Diseases/epidemiology , Female , Male , Middle Aged , Aged , Polymorphism, Single Nucleotide
14.
Nat Commun ; 15(1): 3441, 2024 Apr 24.
Article in English | MEDLINE | ID: mdl-38658550

ABSTRACT

Hyperuricemia is an essential causal risk factor for gout and is associated with cardiometabolic diseases. Given the limited contribution of East Asian ancestry to genome-wide association studies of serum urate, the genetic architecture of serum urate requires exploration. A large-scale cross-ancestry genome-wide association meta-analysis of 1,029,323 individuals and ancestry-specific meta-analysis identifies a total of 351 loci, including 17 previously unreported loci. The genetic architecture of serum urate control is similar between European and East Asian populations. A transcriptome-wide association study, enrichment analysis, and colocalization analysis in relevant tissues identify candidate serum urate-associated genes, including CTBP1, SKIV2L, and WWP2. A phenome-wide association study using polygenic risk scores identifies serum urate-correlated diseases including heart failure and hypertension. Mendelian randomization and mediation analyses show that serum urate-associated genes might have a causal relationship with serum urate-correlated diseases via mediation effects. This study elucidates our understanding of the genetic architecture of serum urate control.


Subject(s)
Genome-Wide Association Study , Hyperuricemia , Uric Acid , Humans , DNA-Binding Proteins/genetics , Genetic Predisposition to Disease , Gout/genetics , Gout/blood , Heart Failure/genetics , Heart Failure/blood , Hypertension/genetics , Hypertension/blood , Hyperuricemia/genetics , Hyperuricemia/blood , Mendelian Randomization Analysis , Multifactorial Inheritance , Polymorphism, Single Nucleotide , Transcriptome , Uric Acid/blood
15.
Nat Genet ; 56(1): 51-59, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38172303

ABSTRACT

Studies have shown that drug targets with human genetic support are more likely to succeed in clinical trials. Hence, a tool integrating genetic evidence to prioritize drug target genes is beneficial for drug discovery. We built a genetic priority score (GPS) by integrating eight genetic features with drug indications from the Open Targets and SIDER databases. The top 0.83%, 0.28% and 0.19% of the GPS conferred a 5.3-, 9.9- and 11.0-fold increased effect of having an indication, respectively. In addition, we observed that targets in the top 0.28% of the score were 1.7-, 3.7- and 8.8-fold more likely to advance from phase I to phases II, III and IV, respectively. Complementary to the GPS, we incorporated the direction of genetic effect and drug mechanism into a directional version of the score called the GPS with direction of effect. We applied our method to 19,365 protein-coding genes and 399 drug indications and made all results available through a web portal.


Subject(s)
Human Genetics , Pharmacogenetics , Humans , Drug Discovery
17.
Arterioscler Thromb Vasc Biol ; 44(2): 491-504, 2024 02.
Article in English | MEDLINE | ID: mdl-38095106

ABSTRACT

BACKGROUND: Venous thromboembolism (VTE) is a major cause of morbidity and mortality worldwide. Current risk assessment tools, such as the Caprini and Padua scores and Wells criteria, have limitations in their applicability and accuracy. This study aimed to develop machine learning models using structured electronic health record data to predict diagnosis and 1-year risk of VTE. METHODS: We trained and validated models on data from 159 001 participants in the Mount Sinai Data Warehouse. We then externally tested them on 401 723 participants in the UK Biobank and 123 039 participants in All of Us. All data sets contain populations of diverse ancestries and clinical histories. We used these data sets to develop small, medium, and large models with increasing features on a range of optimizing portability to maximizing performance. We make trained models publicly available in click-and-run format at https://doi.org/10.17632/tkwzysr4y6.6. RESULTS: In the holdout and external test sets, respectively, models achieved areas under the receiver operating characteristic curve of 0.80 to 0.83 and 0.72 to 0.82 for VTE diagnosis prediction and 0.76 to 0.78 and 0.64 to 0.69 for 1-year risk prediction, significantly outperforming the Padua score. Models also demonstrated robust performance across different VTE types and patient subsets, including ethnicity, age, and surgical and hospitalization status. Models identified both established and novel clinical features contributing to VTE risk, offering valuable insights into its underlying pathophysiology. CONCLUSIONS: Machine learning models using structured electronic health record data can significantly improve VTE diagnosis and 1-year risk prediction in diverse populations. Model probability scores exist on a continuum, affecting mortality risk in both healthy individuals and VTE cases. Integrating these models into electronic health record systems to generate real-time predictions may enhance VTE risk assessment, early detection, and preventative measures, ultimately reducing the morbidity and mortality associated with VTE.


Subject(s)
Population Health , Venous Thromboembolism , Humans , Electronic Health Records , Risk Factors , Venous Thromboembolism/diagnosis , Venous Thromboembolism/epidemiology , Venous Thromboembolism/etiology , Risk Assessment , Machine Learning , Retrospective Studies
18.
J Am Heart Assoc ; 13(1): e031671, 2024 Jan 02.
Article in English | MEDLINE | ID: mdl-38156471

ABSTRACT

BACKGROUND: Right ventricular ejection fraction (RVEF) and end-diastolic volume (RVEDV) are not readily assessed through traditional modalities. Deep learning-enabled ECG analysis for estimation of right ventricular (RV) size or function is unexplored. METHODS AND RESULTS: We trained a deep learning-ECG model to predict RV dilation (RVEDV >120 mL/m2), RV dysfunction (RVEF ≤40%), and numerical RVEDV and RVEF from a 12-lead ECG paired with reference-standard cardiac magnetic resonance imaging volumetric measurements in UK Biobank (UKBB; n=42 938). We fine-tuned in a multicenter health system (MSHoriginal [Mount Sinai Hospital]; n=3019) with prospective validation over 4 months (MSHvalidation; n=115). We evaluated performance with area under the receiver operating characteristic curve for categorical and mean absolute error for continuous measures overall and in key subgroups. We assessed the association of RVEF prediction with transplant-free survival with Cox proportional hazards models. The prevalence of RV dysfunction for UKBB/MSHoriginal/MSHvalidation cohorts was 1.0%/18.0%/15.7%, respectively. RV dysfunction model area under the receiver operating characteristic curve for UKBB/MSHoriginal/MSHvalidation cohorts was 0.86/0.81/0.77, respectively. The prevalence of RV dilation for UKBB/MSHoriginal/MSHvalidation cohorts was 1.6%/10.6%/4.3%. RV dilation model area under the receiver operating characteristic curve for UKBB/MSHoriginal/MSHvalidation cohorts was 0.91/0.81/0.92, respectively. MSHoriginal mean absolute error was RVEF=7.8% and RVEDV=17.6 mL/m2. The performance of the RVEF model was similar in key subgroups including with and without left ventricular dysfunction. Over a median follow-up of 2.3 years, predicted RVEF was associated with adjusted transplant-free survival (hazard ratio, 1.40 for each 10% decrease; P=0.031). CONCLUSIONS: Deep learning-ECG analysis can identify significant cardiac magnetic resonance imaging RV dysfunction and dilation with good performance. Predicted RVEF is associated with clinical outcome.


Subject(s)
Ventricular Dysfunction, Right , Ventricular Function, Right , Humans , Stroke Volume , Magnetic Resonance Imaging/methods , Heart , Electrocardiography
19.
medRxiv ; 2023 Oct 25.
Article in English | MEDLINE | ID: mdl-37961657

ABSTRACT

Metabolic dysfunction-associated steatotic liver disease (MASLD) affects 30% of the global population but is often underdiagnosed. To fill this diagnostic gap, we developed a digital score reflecting presence and severity of MASLD. We fitted a machine learning model to electronic health records from 37,212 UK Biobank participants with proton density fat fraction measurements and/or a MASLD diagnosis to generate a "MASLD score". In holdout testing, our model achieved areas under the receiver-operating curve of 0.83-0.84 for MASLD diagnosis and 0.90-0.91 for identifying MASLD-associated advanced fibrosis. MASLD score was significantly associated with MASLD risk factors, progression to cirrhosis, and mortality. External testing in 252,725 diverse American participants demonstrated consistent results, and hepatologist chart review showed MASLD score identified probable MASLD underdiagnosis. The MASLD score could improve early diagnosis and intervention of chronic liver disease by providing a non-invasive, low-cost method for population-wide screening of MASLD.

20.
Pharmacol Ther ; 251: 108544, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37848164

ABSTRACT

Severe hypertriglyceridemia (sHTG), defined as a triglyceride (TG) concentration ≥ 500 mg/dL (≥ 5.7 mmol/L) is an important risk factor for acute pancreatitis. Although lifestyle, some medications, and certain conditions such as diabetes may lead to HTG, sHTG results from a combination of major and minor genetic defects in proteins that regulate TG lipolysis. Familial chylomicronemia syndrome (FCS) is a rare disorder caused by complete loss of function in lipoprotein lipase (LPL) or LPL activating proteins due to two homozygous recessive traits or compound heterozygous traits. Multifactorial chylomicronemia syndrome (MCS) and sHTG are due to the accumulation of rare heterozygous variants and polygenic defects that predispose individuals to sHTG phenotypes. Until recently, treatment of sHTG focused on lifestyle interventions, control of secondary factors, and nonselective pharmacotherapies that had modest TG-lowering efficacy and no corresponding reductions in atherosclerotic cardiovascular disease events. Genetic discoveries have allowed for the development of novel pathway-specific therapeutics targeting LPL modulating proteins. New targets directed towards inhibition of apolipoprotein C-III (apoC-III), angiopoietin-like protein 3 (ANGPTL3), angiopoietin-like protein 4 (ANGPTL4), and fibroblast growth factor-21 (FGF21) offer far more efficacy in treating the various phenotypes of sHTG and opportunities to reduce the risk of acute pancreatitis and atherosclerotic cardiovascular disease events.


Subject(s)
Cardiovascular Diseases , Hyperlipoproteinemia Type I , Hypertriglyceridemia , Pancreatitis , Humans , Acute Disease , Pancreatitis/genetics , Pancreatitis/therapy , Pancreatitis/complications , Hyperlipoproteinemia Type I/drug therapy , Hyperlipoproteinemia Type I/genetics , Hypertriglyceridemia/drug therapy , Hypertriglyceridemia/genetics , Angiopoietin-Like Protein 3
SELECTION OF CITATIONS
SEARCH DETAIL