Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 161(3): 647-660, 2015 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-25910212

RESUMEN

How disease-associated mutations impair protein activities in the context of biological networks remains mostly undetermined. Although a few renowned alleles are well characterized, functional information is missing for over 100,000 disease-associated variants. Here we functionally profile several thousand missense mutations across a spectrum of Mendelian disorders using various interaction assays. The majority of disease-associated alleles exhibit wild-type chaperone binding profiles, suggesting they preserve protein folding or stability. While common variants from healthy individuals rarely affect interactions, two-thirds of disease-associated alleles perturb protein-protein interactions, with half corresponding to "edgetic" alleles affecting only a subset of interactions while leaving most other interactions unperturbed. With transcription factors, many alleles that leave protein-protein interactions intact affect DNA binding. Different mutations in the same gene leading to different interaction profiles often result in distinct disease phenotypes. Thus disease-associated alleles that perturb distinct protein activities rather than grossly affecting folding and stability are relatively widespread.


Asunto(s)
Enfermedad/genética , Mutación Missense , Mapas de Interacción de Proteínas , Proteínas/genética , Proteínas/metabolismo , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Estudio de Asociación del Genoma Completo , Humanos , Sistemas de Lectura Abierta , Pliegue de Proteína , Estabilidad Proteica
2.
Am J Hum Genet ; 109(1): 33-49, 2022 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-34951958

RESUMEN

The identification of genes that evolve under recessive natural selection is a long-standing goal of population genetics research that has important applications to the discovery of genes associated with disease. We found that commonly used methods to evaluate selective constraint at the gene level are highly sensitive to genes under heterozygous selection but ubiquitously fail to detect recessively evolving genes. Additionally, more sophisticated likelihood-based methods designed to detect recessivity similarly lack power for a human gene of realistic length from current population sample sizes. However, extensive simulations suggested that recessive genes may be detectable in aggregate. Here, we offer a method informed by population genetics simulations designed to detect recessive purifying selection in gene sets. Applying this to empirical gene sets produced significant enrichments for strong recessive selection in genes previously inferred to be under recessive selection in a consanguineous cohort and in genes involved in autosomal recessive monogenic disorders.


Asunto(s)
Frecuencia de los Genes , Genes Recesivos , Genética de Población , Selección Genética , Algoritmos , Alelos , Genes Dominantes , Predisposición Genética a la Enfermedad , Variación Genética , Genética de Población/métodos , Genómica/métodos , Genotipo , Humanos , Patrón de Herencia , Funciones de Verosimilitud , Modelos Genéticos , Mutación , Reino Unido
3.
PLoS Genet ; 18(11): e1010367, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36327219

RESUMEN

Host genetics is a key determinant of COVID-19 outcomes. Previously, the COVID-19 Host Genetics Initiative genome-wide association study used common variants to identify multiple loci associated with COVID-19 outcomes. However, variants with the largest impact on COVID-19 outcomes are expected to be rare in the population. Hence, studying rare variants may provide additional insights into disease susceptibility and pathogenesis, thereby informing therapeutics development. Here, we combined whole-exome and whole-genome sequencing from 21 cohorts across 12 countries and performed rare variant exome-wide burden analyses for COVID-19 outcomes. In an analysis of 5,085 severe disease cases and 571,737 controls, we observed that carrying a rare deleterious variant in the SARS-CoV-2 sensor toll-like receptor TLR7 (on chromosome X) was associated with a 5.3-fold increase in severe disease (95% CI: 2.75-10.05, p = 5.41x10-7). This association was consistent across sexes. These results further support TLR7 as a genetic determinant of severe disease and suggest that larger studies on rare variants influencing COVID-19 outcomes could provide additional insights.


Asunto(s)
COVID-19 , Exoma , Humanos , Exoma/genética , Estudio de Asociación del Genoma Completo , COVID-19/genética , Predisposición Genética a la Enfermedad , Receptor Toll-Like 7/genética , SARS-CoV-2/genética
4.
Lancet ; 401(10372): 215-225, 2023 Jan 21.
Artículo en Inglés | MEDLINE | ID: mdl-36563696

RESUMEN

BACKGROUND: Binary diagnosis of coronary artery disease does not preserve the complexity of disease or quantify its severity or its associated risk with death; hence, a quantitative marker of coronary artery disease is warranted. We evaluated a quantitative marker of coronary artery disease derived from probabilities of a machine learning model. METHODS: In this cohort study, we developed and validated a coronary artery disease-predictive machine learning model using 95 935 electronic health records and assessed its probabilities as in-silico scores for coronary artery disease (ISCAD; range 0 [lowest probability] to 1 [highest probability]) in participants in two longitudinal biobank cohorts. We measured the association of ISCAD with clinical outcomes-namely, coronary artery stenosis, obstructive coronary artery disease, multivessel coronary artery disease, all-cause death, and coronary artery disease sequelae. FINDINGS: Among 95 935 participants, 35 749 were from the BioMe Biobank (median age 61 years [IQR 18]; 14 599 [41%] were male and 21 150 [59%] were female; 5130 [14%] were with diagnosed coronary artery disease) and 60 186 were from the UK Biobank (median age 62 [15] years; 25 031 [42%] male and 35 155 [58%] female; 8128 [14%] with diagnosed coronary artery disease). The model predicted coronary artery disease with an area under the receiver operating characteristic curve of 0·95 (95% CI 0·94-0·95; sensitivity of 0·94 [0·94-0·95] and specificity of 0·82 [0·81-0·83]) and 0·93 (0·92-0·93; sensitivity of 0·90 [0·89-0·90] and specificity of 0·88 [0·87-0·88]) in the BioMe validation and holdout sets, respectively, and 0·91 (0·91-0·91; sensitivity of 0·84 [0·83-0·84] and specificity of 0·83 [0·82-0·83]) in the UK Biobank external test set. ISCAD captured coronary artery disease risk from known risk factors, pooled cohort equations, and polygenic risk scores. Coronary artery stenosis increased quantitatively with ascending ISCAD quartiles (increase per quartile of 12 percentage points), including risk of obstructive coronary artery disease, multivessel coronary artery disease, and stenosis of major coronary arteries. Hazard ratios (HRs) and prevalence of all-cause death increased stepwise over ISCAD deciles (decile 1: HR 1·0 [95% CI 1·0-1·0], 0·2% prevalence; decile 6: 11 [3·9-31], 3·1% prevalence; and decile 10: 56 [20-158], 11% prevalence). A similar trend was observed for recurrent myocardial infarction. 12 (46%) undiagnosed individuals with high ISCAD (≥0·9) had clinical evidence of coronary artery disease according to the 2014 American College of Cardiology/American Heart Association Task Force guidelines. INTERPRETATION: Electronic health record-based machine learning was used to generate an in-silico marker for coronary artery disease that can non-invasively quantify atherosclerosis and risk of death on a continuous spectrum, and identify underdiagnosed individuals. FUNDING: National Institutes of Health.


Asunto(s)
Enfermedad de la Arteria Coronaria , Estenosis Coronaria , Humanos , Masculino , Femenino , Persona de Mediana Edad , Enfermedad de la Arteria Coronaria/diagnóstico , Enfermedad de la Arteria Coronaria/epidemiología , Estudios de Cohortes , Valor Predictivo de las Pruebas , Estenosis Coronaria/diagnóstico , Factores de Riesgo , Aprendizaje Automático , Angiografía Coronaria
5.
PLoS Genet ; 17(1): e1009337, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33493176

RESUMEN

Understanding the relationship between natural selection and phenotypic variation has been a long-standing challenge in human population genetics. With the emergence of biobank-scale datasets, along with new statistical metrics to approximate strength of purifying selection at the variant level, it is now possible to correlate a proxy of individual relative fitness with a range of medical phenotypes. We calculated a per-individual deleterious load score by summing the total number of derived alleles per individual after incorporating a weight that approximates strength of purifying selection. We assessed four methods for the weight, including GERP, phyloP, CADD, and fitcons. By quantitatively tracking each of these scores with the site frequency spectrum, we identified phyloP as the most appropriate weight. The phyloP-weighted load score was then calculated across 15,129,142 variants in 335,161 individuals from the UK Biobank and tested for association on 1,380 medical phenotypes. After accounting for multiple test correction, we observed a strong association of the load score amongst coding sites only on 27 traits including body mass, adiposity and metabolic rate. We further observed that the association signals were driven by common variants (derived allele frequency > 5%) with high phyloP score (phyloP > 2). Finally, through permutation analyses, we showed that the load score amongst coding sites had an excess of nominally significant associations on many medical phenotypes. These results suggest a broad impact of deleterious load on medical phenotypes and highlight the deleterious load score as a tool to disentangle the complex relationship between natural selection and medical phenotypes.


Asunto(s)
Evolución Molecular , Aptitud Genética/genética , Genética de Población , Selección Genética/genética , Alelos , Bancos de Muestras Biológicas , Índice de Masa Corporal , Femenino , Frecuencia de los Genes , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Variación Genética/genética , Humanos , Masculino , Reino Unido
6.
JAMA ; 327(4): 350-359, 2022 01 25.
Artículo en Inglés | MEDLINE | ID: mdl-35076666

RESUMEN

Importance: Population-based assessment of disease risk associated with gene variants informs clinical decisions and risk stratification approaches. Objective: To evaluate the population-based disease risk of clinical variants in known disease predisposition genes. Design, Setting, and Participants: This cohort study included 72 434 individuals with 37 780 clinical variants who were enrolled in the BioMe Biobank from 2007 onwards with follow-up until December 2020 and the UK Biobank from 2006 to 2010 with follow-up until June 2020. Participants had linked exome and electronic health record data, were older than 20 years, and were of diverse ancestral backgrounds. Exposures: Variants previously reported as pathogenic or predicted to cause a loss of protein function by bioinformatic algorithms (pathogenic/loss-of-function variants). Main Outcomes and Measures: The primary outcome was the disease risk associated with clinical variants. The risk difference (RD) between the prevalence of disease in individuals with a variant allele (penetrance) vs in individuals with a normal allele was measured. Results: Among 72 434 study participants, 43 395 were from the UK Biobank (mean [SD] age, 57 [8.0] years; 24 065 [55%] women; 2948 [7%] non-European) and 29 039 were from the BioMe Biobank (mean [SD] age, 56 [16] years; 17 355 [60%] women; 19 663 [68%] non-European). Of 5360 pathogenic/loss-of-function variants, 4795 (89%) were associated with an RD less than or equal to 0.05. Mean penetrance was 6.9% (95% CI, 6.0%-7.8%) for pathogenic variants and 0.85% (95% CI, 0.76%-0.95%) for benign variants reported in ClinVar (difference, 6.0 [95% CI, 5.6-6.4] percentage points), with a median of 0% for both groups due to large numbers of nonpenetrant variants. Penetrance of pathogenic/loss-of-function variants for late-onset diseases was modified by age: mean penetrance was 10.3% (95% CI, 9.0%-11.6%) in individuals 70 years or older and 8.5% (95% CI, 7.9%-9.1%) in individuals 20 years or older (difference, 1.8 [95% CI, 0.40-3.3] percentage points). Penetrance of pathogenic/loss-of-function variants was heterogeneous even in known disease predisposition genes, including BRCA1 (mean [range], 38% [0%-100%]), BRCA2 (mean [range], 38% [0%-100%]), and PALB2 (mean [range], 26% [0%-100%]). Conclusions and Relevance: In 2 large biobank cohorts, the estimated penetrance of pathogenic/loss-of-function variants was variable but generally low. Further research of population-based penetrance is needed to refine variant interpretation and clinical evaluation of individuals with these variant alleles.


Asunto(s)
Predisposición Genética a la Enfermedad , Variación Genética , Mutación con Pérdida de Función , Penetrancia , Anciano , Bancos de Muestras Biológicas , Estudios de Cohortes , Femenino , Humanos , Masculino , Mutación , Reino Unido
7.
Annu Rev Genomics Hum Genet ; 19: 289-301, 2018 08 31.
Artículo en Inglés | MEDLINE | ID: mdl-29641912

RESUMEN

While sequence-based genetic tests have long been available for specific loci, especially for Mendelian disease, the rapidly falling costs of genome-wide genotyping arrays, whole-exome sequencing, and whole-genome sequencing are moving us toward a future where full genomic information might inform the prognosis and treatment of a variety of diseases, including complex disease. Similarly, the availability of large populations with full genomic information has enabled new insights about the etiology and genetic architecture of complex disease. Insights from the latest generation of genomic studies suggest that our categorization of diseases as complex may conceal a wide spectrum of genetic architectures and causal mechanisms that ranges from Mendelian forms of complex disease to complex regulatory structures underlying Mendelian disease. Here, we review these insights, along with advances in the prediction of disease risk and outcomes from full genomic information.


Asunto(s)
Enfermedades Genéticas Congénitas/genética , Enfermedades Genéticas Congénitas/complicaciones , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Secuenciación del Exoma
8.
Nature ; 524(7564): 225-9, 2015 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-26123021

RESUMEN

Patterns of amino acid conservation have served as a tool for understanding protein evolution. The same principles have also found broad application in human genomics, driven by the need to interpret the pathogenic potential of variants in patients. Here we performed a systematic comparative genomics analysis of human disease-causing missense variants. We found that an appreciable fraction of disease-causing alleles are fixed in the genomes of other species, suggesting a role for genomic context. We developed a model of genetic interactions that predicts most of these to be simple pairwise compensations. Functional testing of this model on two known human disease genes revealed discrete cis amino acid residues that, although benign on their own, could rescue the human mutations in vivo. This approach was also applied to ab initio gene discovery to support the identification of a de novo disease driver in BTG2 that is subject to protective cis-modification in more than 50 species. Finally, on the basis of our data and models, we developed a computational tool to predict candidate residues subject to compensation. Taken together, our data highlight the importance of cis-genomic context as a contributor to protein evolution; they provide an insight into the complexity of allele effect on phenotype; and they are likely to assist methods for predicting allele pathogenicity.


Asunto(s)
Enfermedad/genética , Genómica , Mutación Missense/genética , Supresión Genética/genética , Proteínas Adaptadoras Transductoras de Señales/genética , Alelos , Animales , Evolución Molecular , Genoma Humano/genética , Humanos , Proteínas Inmediatas-Precoces/genética , Microcefalia/genética , Proteínas Asociadas a Microtúbulos , Fenotipo , Proteínas/genética , Alineación de Secuencia , Proteínas Supresoras de Tumor/genética
9.
PLoS Med ; 16(1): e1002725, 2019 01.
Artículo en Inglés | MEDLINE | ID: mdl-30645594

RESUMEN

BACKGROUND: Studies have shown strong positive associations between serum urate (SU) levels and chronic kidney disease (CKD) risk; however, whether the relation is causal remains uncertain. We evaluate whether genetic data are consistent with a causal impact of SU level on the risk of CKD and estimated glomerular filtration rate (eGFR). METHODS AND FINDINGS: We used Mendelian randomization (MR) methods to evaluate the presence of a causal effect. We used aggregated genome-wide association data (N = 110,347 for SU, N = 69,374 for gout, N = 133,413 for eGFR, N = 117,165 for CKD), electronic-medical-record-linked UK Biobank data (N = 335,212), and population-based cohorts (N = 13,425), all in individuals of European ancestry, for SU levels and CKD. Our MR analysis showed that SU has a causal effect on neither eGFR level nor CKD risk across all MR analyses (all P > 0.05). These null associations contrasted with our epidemiological association findings from the 4 population-based cohorts (change in eGFR level per 1-mg/dl [59.48 µmol/l] increase in SU: -1.99 ml/min/1.73 m2; 95% CI -2.86 to -1.11; P = 8.08 × 10(-6); odds ratio [OR] for CKD: 1.48; 95% CI 1.32 to 1.65; P = 1.52 × 10(-11)). In contrast, the same MR approaches showed that SU has a causal effect on the risk of gout (OR estimates ranging from 3.41 to 6.04 per 1-mg/dl increase in SU, all P < 10-3), which served as a positive control of our approach. Overall, our MR analysis had >99% power to detect a causal effect of SU level on the risk of CKD of the same magnitude as the observed epidemiological association between SU and CKD. Limitations of this study include the lifelong effect of a genetic perturbation not being the same as an acute perturbation, the inability to study non-European populations, and some sample overlap between the datasets used in the study. CONCLUSIONS: Evidence from our series of causal inference approaches using genetics does not support a causal effect of SU level on eGFR level or CKD risk. Reducing SU levels is unlikely to reduce the risk of CKD development.


Asunto(s)
Insuficiencia Renal Crónica/etiología , Ácido Úrico/sangre , Adulto , Factores de Edad , Femenino , Estudio de Asociación del Genoma Completo , Tasa de Filtración Glomerular/genética , Humanos , Masculino , Análisis de la Aleatorización Mendeliana , Insuficiencia Renal Crónica/sangre , Insuficiencia Renal Crónica/genética , Factores Sexuales , Adulto Joven
10.
Genet Med ; 20(9): 936-941, 2018 09.
Artículo en Inglés | MEDLINE | ID: mdl-29388949

RESUMEN

PURPOSE: Over 150,000 variants have been reported to cause Mendelian disease in the medical literature. It is still difficult to leverage this knowledge base in clinical practice, as many reports lack strong statistical evidence or may include false associations. Clinical laboratories assess whether these variants (along with newly observed variants that are adjacent to these published ones) underlie clinical disorders. METHODS: We investigated whether citation data-including journal impact factor and the number of cited variants (NCV) in each gene with published disease associations-can be used to improve variant assessment. RESULTS: Surprisingly, we found that impact factor is not predictive of pathogenicity, but the NCV score for each gene can provide statistical support for prediction of pathogenicity. When this gene-level citation metric is combined with variant-level evolutionary conservation and structural features, classification accuracy reaches 89.5%. Further, variants identified in clinical exome sequencing cases have higher NCVs than do simulated rare variants from the Exome Aggregation Consortium database within the same set of genes and functional consequences (P < 2.22 × 10-16). CONCLUSION: Aggregate citation data can complement existing variant-based predictive algorithms, and can boost their performance without the need to access and review large numbers of papers. The NCV is a slow-growing metric of scientific knowledge about each gene's association with disease.


Asunto(s)
Biología Computacional/métodos , Estudio de Asociación del Genoma Completo/métodos , Algoritmos , Bases de Datos Genéticas , Predicción , Variación Genética , Humanos , Factor de Impacto de la Revista
11.
PLoS Genet ; 11(10): e1005622, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26509271

RESUMEN

Large genome-wide association studies (GWAS) have identified many genetic loci associated with risk for myocardial infarction (MI) and coronary artery disease (CAD). Concurrently, efforts such as the National Institutes of Health (NIH) Roadmap Epigenomics Project and the Encyclopedia of DNA Elements (ENCODE) Consortium have provided unprecedented data on functional elements of the human genome. In the present study, we systematically investigate the biological link between genetic variants associated with this complex disease and their impacts on gene function. First, we examined the heritability of MI/CAD according to genomic compartments. We observed that single nucleotide polymorphisms (SNPs) residing within nearby regulatory regions show significant polygenicity and contribute between 59-71% of the heritability for MI/CAD. Second, we showed that the polygenicity and heritability explained by these SNPs are enriched in histone modification marks in specific cell types. Third, we found that a statistically higher number of 45 MI/CAD-associated SNPs that have been identified from large-scale GWAS studies reside within certain functional elements of the genome, particularly in active enhancer and promoter regions. Finally, we observed significant heterogeneity of this signal across cell types, with strong signals observed within adipose nuclei, as well as brain and spleen cell types. These results suggest that the genetic etiology of MI/CAD is largely explained by tissue-specific regulatory perturbation within the human genome.


Asunto(s)
Enfermedad de la Arteria Coronaria/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple/genética , Enfermedad de la Arteria Coronaria/patología , Genoma Humano , Genotipo , Humanos , Secuencias Reguladoras de Ácidos Nucleicos , Factores de Riesgo
12.
Mol Biol Evol ; 33(10): 2555-64, 2016 10.
Artículo en Inglés | MEDLINE | ID: mdl-27436009

RESUMEN

Deleterious mutations are expected to evolve under negative selection and are usually purged from the population. However, deleterious alleles segregate in the human population and some disease-associated variants are maintained at considerable frequencies. Here, we test the hypothesis that balancing selection may counteract purifying selection in neighboring regions and thus maintain deleterious variants at higher frequency than expected from their detrimental fitness effect. We first show in realistic simulations that balancing selection reduces the density of polymorphic sites surrounding a locus under balancing selection, but at the same time markedly increases the population frequency of the remaining variants, including even substantially deleterious alleles. To test the predictions of our simulations empirically, we then use whole-exome sequencing data from 6,500 human individuals and focus on the most established example for balancing selection in the human genome, the major histocompatibility complex (MHC). Our analysis shows an elevated frequency of putatively deleterious coding variants in nonhuman leukocyte antigen (non-HLA) genes localized in the MHC region. The mean frequency of these variants declined with physical distance from the classical HLA genes, indicating dependency on genetic linkage. These results reveal an indirect cost of the genetic diversity maintained by balancing selection, which has hitherto been perceived as mostly advantageous, and have implications both for the evolution of recombination and also for the epidemiology of various MHC-associated diseases.


Asunto(s)
Antígenos HLA/genética , Complejo Mayor de Histocompatibilidad/genética , Selección Genética , Eliminación de Secuencia , Alelos , Evolución Biológica , Simulación por Computador , Bases de Datos Genéticas , Evolución Molecular , Frecuencia de los Genes/genética , Variación Genética , Genoma Humano , Haplotipos/genética , Humanos , Polimorfismo Genético/genética
13.
Hum Mutat ; 36(10): 998-1003, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26378430

RESUMEN

Clinical sequencing is expanding, but causal variants are still not identified in the majority of cases. These unsolved cases can aid in gene discovery when individuals with similar phenotypes are identified in systems such as the Matchmaker Exchange. We describe risks for gene discovery in this growing set of unsolved cases. In a set of rare disease cases with the same phenotype, it is not difficult to find two individuals with the same phenotype that carry variants in the same gene. We quantify the risk of false-positive association in a cohort of individuals with the same phenotype, using the prior probability of observing a variant in each gene from over 60,000 individuals (Exome Aggregation Consortium). Based on the number of individuals with a genic variant, cohort size, specific gene, and mode of inheritance, we calculate a P value that the match represents a true association. A match in two of 10 patients in MECP2 is statistically significant (P = 0.0014), whereas a match in TTN would not reach significance, as expected (P > 0.999). Finally, we analyze the probability of matching in clinical exome cases to estimate the number of cases needed to identify genes related to different disorders. We offer Rare Disease Match, an online tool to mitigate the uncertainty of false-positive associations.


Asunto(s)
Biología Computacional/métodos , Estudios de Asociación Genética/métodos , Enfermedades Raras/genética , Algoritmos , Bases de Datos Genéticas , Exoma , Reacciones Falso Positivas , Variación Genética , Humanos , Fenotipo , Navegador Web
14.
Am J Hum Genet ; 88(2): 183-92, 2011 Feb 11.
Artículo en Inglés | MEDLINE | ID: mdl-21310275

RESUMEN

Assessing the significance of novel genetic variants revealed by DNA sequencing is a major challenge to the integration of genomic techniques with medical practice. Many variants remain difficult to classify by traditional genetic methods. Computational methods have been developed that could contribute to classifying these variants, but they have not been properly validated and are generally not considered mature enough to be used effectively in a clinical setting. We developed a computational method for predicting the effects of missense variants detected in patients with hypertrophic cardiomyopathy (HCM). We used a curated clinical data set of 74 missense variants in six genes associated with HCM to train and validate an automated predictor. The predictor is based on support vector regression and uses phylogenetic and structural features specific to genes involved in HCM. Ten-fold cross validation estimated our predictor's sensitivity at 94% (95% confidence interval: 83%-98%) and specificity at 89% (95% confidence interval: 72%-100%). This corresponds to an odds ratio of 10 for a prediction of pathogenic (95% confidence interval: 4.0-infinity), or an odds ratio of 9.9 for a prediction of benign (95% confidence interval: 4.6-21). Coverage (proportion of variants for which a prediction was made) was 57% (95% confidence interval: 49%-64%). This performance exceeds that of existing methods that are not specifically designed for HCM. The accuracy of this predictor provides support for the clinical use of automated predictions alongside family segregation and population frequency data in the interpretation of new missense variants and suggests future development of similar tools for other diseases.


Asunto(s)
Cardiomiopatía Hipertrófica/genética , Biología Computacional , Variación Genética/genética , Mutación Missense/genética , Proteínas Nucleares/genética , Predisposición Genética a la Enfermedad , Humanos
15.
Nat Genet ; 56(7): 1412-1419, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38862854

RESUMEN

Coronary artery disease (CAD) exists on a spectrum of disease represented by a combination of risk factors and pathogenic processes. An in silico score for CAD built using machine learning and clinical data in electronic health records captures disease progression, severity and underdiagnosis on this spectrum and could enhance genetic discovery efforts for CAD. Here we tested associations of rare and ultrarare coding variants with the in silico score for CAD in the UK Biobank, All of Us Research Program and BioMe Biobank. We identified associations in 17 genes; of these, 14 show at least moderate levels of prior genetic, biological and/or clinical support for CAD. We also observed an excess of ultrarare coding variants in 321 aggregated CAD genes, suggesting more ultrarare variant associations await discovery. These results expand our understanding of the genetic etiology of CAD and illustrate how digital markers can enhance genetic association investigations for complex diseases.


Asunto(s)
Enfermedad de la Arteria Coronaria , Predisposición Genética a la Enfermedad , Aprendizaje Automático , Enfermedad de la Arteria Coronaria/genética , Humanos , Exoma/genética , Secuenciación del Exoma/métodos , Variación Genética , Estudio de Asociación del Genoma Completo/métodos , Femenino , Polimorfismo de Nucleótido Simple
16.
JACC Adv ; 3(4)2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38737007

RESUMEN

BACKGROUND: Diet is a key modifiable risk factor of coronary artery disease (CAD). However, the causal effects of specific dietary traits on CAD risk remain unclear. With the expansion of dietary data in population biobanks, Mendelian randomization (MR) could help enable the efficient estimation of causality in diet-disease associations. OBJECTIVES: The primary goal was to test causality for 13 common dietary traits on CAD risk using a systematic 2-sample MR framework. A secondary goal was to identify plasma metabolites mediating diet-CAD associations suspected to be causal. METHODS: Cross-sectional genetic and dietary data on up to 420,531 UK Biobank and 184,305 CARDIoGRAMplusC4D individuals of European ancestry were used in 2-sample MR. The primary analysis used fixed effect inverse-variance weighted regression, while sensitivity analyses used weighted median estimation, MR-Egger regression, and MR-Pleiotropy Residual Sum and Outlier. RESULTS: Genetic variants serving as proxies for muesli intake were negatively associated with CAD risk (OR: 0.74; 95% CI: 0.65-0.84; P = 5.385 × 10-4). Sensitivity analyses using weighted median estimation supported this with a significant association in the same direction. Additionally, we identified higher plasma acetate levels as a potential mediator (OR: 0.03; 95% CI: 0.01-0.12; P = 1.15 × 10-4). CONCLUSIONS: Muesli, a mixture of oats, seeds, nuts, dried fruit, and milk, may causally reduce CAD risk. Circulating levels of acetate, a gut microbiota-derived short-chain fatty acid, could be mediating its cardioprotective effects. These findings highlight the role of gut flora in cardiovascular health and help prioritize randomized trials on dietary interventions for CAD.

17.
Nat Commun ; 15(1): 8891, 2024 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-39406732

RESUMEN

Identifying genetic drivers of chronic diseases is necessary for drug discovery. Here, we develop a machine learning-assisted genetic priority score, which we call ML-GPS, that incorporates genetic associations with predicted disease phenotypes to enhance target discovery. First, we construct gradient boosting models to predict 112 chronic disease phecodes in the UK Biobank and analyze associations of predicted and observed phenotypes with common, rare, and ultra-rare variants to model the allelic series. We integrate these associations with existing evidence using gradient boosting with continuous feature encoding to construct ML-GPS, training it to predict drug indications in Open Targets and externally testing it in SIDER. We then generate ML-GPS predictions for 2,362,636 gene-phecode pairs. We find that the use of predicted phenotypes, which identify substantially more genetic associations than observed phenotypes across the allele frequency spectrum, significantly improves the performance of ML-GPS. ML-GPS increases coverage of drug targets, with the top 1% of all scores providing support for 15,077 gene-phecode pairs that previously had no support. ML-GPS can also identify well-known target-disease relationships, promising targets without indicated drugs, and targets for several drugs in clinical trials, including LRRK2 inhibitors for Parkinson's disease and olpasiran for cardiovascular disease.


Asunto(s)
Descubrimiento de Drogas , Aprendizaje Automático , Fenotipo , Humanos , Enfermedad Crónica/tratamiento farmacológico , Descubrimiento de Drogas/métodos , Frecuencia de los Genes , Predisposición Genética a la Enfermedad
18.
Nat Genet ; 56(1): 51-59, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38172303

RESUMEN

Studies have shown that drug targets with human genetic support are more likely to succeed in clinical trials. Hence, a tool integrating genetic evidence to prioritize drug target genes is beneficial for drug discovery. We built a genetic priority score (GPS) by integrating eight genetic features with drug indications from the Open Targets and SIDER databases. The top 0.83%, 0.28% and 0.19% of the GPS conferred a 5.3-, 9.9- and 11.0-fold increased effect of having an indication, respectively. In addition, we observed that targets in the top 0.28% of the score were 1.7-, 3.7- and 8.8-fold more likely to advance from phase I to phases II, III and IV, respectively. Complementary to the GPS, we incorporated the direction of genetic effect and drug mechanism into a directional version of the score called the GPS with direction of effect. We applied our method to 19,365 protein-coding genes and 399 drug indications and made all results available through a web portal.


Asunto(s)
Genética Humana , Farmacogenética , Humanos , Descubrimiento de Drogas
19.
Hum Mutat ; 34(9): 1216-20, 2013 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-23818451

RESUMEN

It is now affordable to order clinically interpreted whole-genome sequence reports from clinical laboratories. One major component of these reports is derived from the knowledge base of previously identified pathogenic variants, including research articles, locus-specific, and other databases. While over 150,000 such pathogenic variants have been identified, many of these were originally discovered in small cohort studies of affected individuals, so their applicability to asymptomatic populations is unclear. We analyzed the prevalence of a large set of pathogenic variants from the medical and scientific literature in a large set of asymptomatic individuals (N = 1,092) and found 8.5% of these pathogenic variants in at least one individual. In the average individual in the 1000 Genomes Project, previously identified pathogenic variants occur on average 294 times (σ = 25.5) in homozygous form and 942 times (σ = 68.2) in heterozygous form. We also find that many of these pathogenic variants are frequently occurring: there are 3,744 variants with minor allele frequency (MAF) ≥ 0.01 (4.6%) and 2,837 variants with MAF ≥ 0.05 (3.5%). This indicates that many of these variants may be erroneous findings or have lower penetrance than previously expected.


Asunto(s)
Frecuencia de los Genes , Variación Genética , Análisis de Secuencia de ADN , Bases de Datos Genéticas , Genoma Humano , Genotipo , Heterocigoto , Homocigoto , Humanos , Hallazgos Incidentales , Penetrancia
20.
medRxiv ; 2023 Dec 24.
Artículo en Inglés | MEDLINE | ID: mdl-38196638

RESUMEN

It is estimated that as many as 1 in 16 people worldwide suffer from rare diseases. Rare disease patients face difficulty finding diagnosis and treatment for their conditions, including long diagnostic odysseys, multiple incorrect diagnoses, and unavailable or prohibitively expensive treatments. As a result, it is likely that large electronic health record (EHR) systems include high numbers of participants suffering from undiagnosed rare disease. While this has been shown in detail for specific diseases, these studies are expensive and time consuming and have only been feasible to perform for a handful of the thousands of known rare diseases. The bulk of these undiagnosed cases are effectively hidden, with no straightforward way to differentiate them from healthy controls. The ability to access them at scale would enormously expand our capacity to study and develop drugs for rare diseases, adding to tools aimed at increasing availability of study cohorts for rare disease. In this study, we train a deep learning transformer algorithm, RarePT (Rare-Phenotype Prediction Transformer), to impute undiagnosed rare disease from EHR diagnosis codes in 436,407 participants in the UK Biobank and validated on an independent cohort from 3,333,560 individuals from the Mount Sinai Health System. We applied our model to 155 rare diagnosis codes with fewer than 250 cases each in the UK Biobank and predicted participants with elevated risk for each diagnosis, with the number of participants predicted to be at risk ranging from 85 to 22,000 for different diagnoses. These risk predictions are significantly associated with increased mortality for 65% of diagnoses, with disease burden expressed as disability-adjusted life years (DALY) for 73% of diagnoses, and with 72% of available disease-specific diagnostic tests. They are also highly enriched for known rare diagnoses in patients not included in the training set, with an odds ratio (OR) of 48.0 in cross-validation cohorts of the UK Biobank and an OR of 30.6 in the independent Mount Sinai Health System cohort. Most importantly, RarePT successfully screens for undiagnosed patients in 32 rare diseases with available diagnostic tests in the UK Biobank. Using the trained model to estimate the prevalence of undiagnosed disease in the UK Biobank for these 32 rare phenotypes, we find that at least 50% of patients remain undiagnosed for 20 of 32 diseases. These estimates provide empirical evidence of a high prevalence of undiagnosed rare disease, as well as demonstrating the enormous potential benefit of using RarePT to screen for undiagnosed rare disease patients in large electronic health systems.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA