Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 161(3): 647-660, 2015 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-25910212

RESUMO

How disease-associated mutations impair protein activities in the context of biological networks remains mostly undetermined. Although a few renowned alleles are well characterized, functional information is missing for over 100,000 disease-associated variants. Here we functionally profile several thousand missense mutations across a spectrum of Mendelian disorders using various interaction assays. The majority of disease-associated alleles exhibit wild-type chaperone binding profiles, suggesting they preserve protein folding or stability. While common variants from healthy individuals rarely affect interactions, two-thirds of disease-associated alleles perturb protein-protein interactions, with half corresponding to "edgetic" alleles affecting only a subset of interactions while leaving most other interactions unperturbed. With transcription factors, many alleles that leave protein-protein interactions intact affect DNA binding. Different mutations in the same gene leading to different interaction profiles often result in distinct disease phenotypes. Thus disease-associated alleles that perturb distinct protein activities rather than grossly affecting folding and stability are relatively widespread.


Assuntos
Doença/genética , Mutação de Sentido Incorreto , Mapas de Interação de Proteínas , Proteínas/genética , Proteínas/metabolismo , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Estudo de Associação Genômica Ampla , Humanos , Fases de Leitura Aberta , Dobramento de Proteína , Estabilidade Proteica
2.
Am J Hum Genet ; 109(1): 33-49, 2022 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-34951958

RESUMO

The identification of genes that evolve under recessive natural selection is a long-standing goal of population genetics research that has important applications to the discovery of genes associated with disease. We found that commonly used methods to evaluate selective constraint at the gene level are highly sensitive to genes under heterozygous selection but ubiquitously fail to detect recessively evolving genes. Additionally, more sophisticated likelihood-based methods designed to detect recessivity similarly lack power for a human gene of realistic length from current population sample sizes. However, extensive simulations suggested that recessive genes may be detectable in aggregate. Here, we offer a method informed by population genetics simulations designed to detect recessive purifying selection in gene sets. Applying this to empirical gene sets produced significant enrichments for strong recessive selection in genes previously inferred to be under recessive selection in a consanguineous cohort and in genes involved in autosomal recessive monogenic disorders.


Assuntos
Frequência do Gene , Genes Recessivos , Genética Populacional , Seleção Genética , Algoritmos , Alelos , Genes Dominantes , Predisposição Genética para Doença , Variação Genética , Genética Populacional/métodos , Genômica/métodos , Genótipo , Humanos , Padrões de Herança , Funções Verossimilhança , Modelos Genéticos , Mutação , Reino Unido
3.
PLoS Genet ; 18(11): e1010367, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36327219

RESUMO

Host genetics is a key determinant of COVID-19 outcomes. Previously, the COVID-19 Host Genetics Initiative genome-wide association study used common variants to identify multiple loci associated with COVID-19 outcomes. However, variants with the largest impact on COVID-19 outcomes are expected to be rare in the population. Hence, studying rare variants may provide additional insights into disease susceptibility and pathogenesis, thereby informing therapeutics development. Here, we combined whole-exome and whole-genome sequencing from 21 cohorts across 12 countries and performed rare variant exome-wide burden analyses for COVID-19 outcomes. In an analysis of 5,085 severe disease cases and 571,737 controls, we observed that carrying a rare deleterious variant in the SARS-CoV-2 sensor toll-like receptor TLR7 (on chromosome X) was associated with a 5.3-fold increase in severe disease (95% CI: 2.75-10.05, p = 5.41x10-7). This association was consistent across sexes. These results further support TLR7 as a genetic determinant of severe disease and suggest that larger studies on rare variants influencing COVID-19 outcomes could provide additional insights.


Assuntos
COVID-19 , Exoma , Humanos , Exoma/genética , Estudo de Associação Genômica Ampla , COVID-19/genética , Predisposição Genética para Doença , Receptor 7 Toll-Like/genética , SARS-CoV-2/genética
4.
Lancet ; 401(10372): 215-225, 2023 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-36563696

RESUMO

BACKGROUND: Binary diagnosis of coronary artery disease does not preserve the complexity of disease or quantify its severity or its associated risk with death; hence, a quantitative marker of coronary artery disease is warranted. We evaluated a quantitative marker of coronary artery disease derived from probabilities of a machine learning model. METHODS: In this cohort study, we developed and validated a coronary artery disease-predictive machine learning model using 95 935 electronic health records and assessed its probabilities as in-silico scores for coronary artery disease (ISCAD; range 0 [lowest probability] to 1 [highest probability]) in participants in two longitudinal biobank cohorts. We measured the association of ISCAD with clinical outcomes-namely, coronary artery stenosis, obstructive coronary artery disease, multivessel coronary artery disease, all-cause death, and coronary artery disease sequelae. FINDINGS: Among 95 935 participants, 35 749 were from the BioMe Biobank (median age 61 years [IQR 18]; 14 599 [41%] were male and 21 150 [59%] were female; 5130 [14%] were with diagnosed coronary artery disease) and 60 186 were from the UK Biobank (median age 62 [15] years; 25 031 [42%] male and 35 155 [58%] female; 8128 [14%] with diagnosed coronary artery disease). The model predicted coronary artery disease with an area under the receiver operating characteristic curve of 0·95 (95% CI 0·94-0·95; sensitivity of 0·94 [0·94-0·95] and specificity of 0·82 [0·81-0·83]) and 0·93 (0·92-0·93; sensitivity of 0·90 [0·89-0·90] and specificity of 0·88 [0·87-0·88]) in the BioMe validation and holdout sets, respectively, and 0·91 (0·91-0·91; sensitivity of 0·84 [0·83-0·84] and specificity of 0·83 [0·82-0·83]) in the UK Biobank external test set. ISCAD captured coronary artery disease risk from known risk factors, pooled cohort equations, and polygenic risk scores. Coronary artery stenosis increased quantitatively with ascending ISCAD quartiles (increase per quartile of 12 percentage points), including risk of obstructive coronary artery disease, multivessel coronary artery disease, and stenosis of major coronary arteries. Hazard ratios (HRs) and prevalence of all-cause death increased stepwise over ISCAD deciles (decile 1: HR 1·0 [95% CI 1·0-1·0], 0·2% prevalence; decile 6: 11 [3·9-31], 3·1% prevalence; and decile 10: 56 [20-158], 11% prevalence). A similar trend was observed for recurrent myocardial infarction. 12 (46%) undiagnosed individuals with high ISCAD (≥0·9) had clinical evidence of coronary artery disease according to the 2014 American College of Cardiology/American Heart Association Task Force guidelines. INTERPRETATION: Electronic health record-based machine learning was used to generate an in-silico marker for coronary artery disease that can non-invasively quantify atherosclerosis and risk of death on a continuous spectrum, and identify underdiagnosed individuals. FUNDING: National Institutes of Health.


Assuntos
Doença da Artéria Coronariana , Estenose Coronária , Humanos , Masculino , Feminino , Pessoa de Meia-Idade , Doença da Artéria Coronariana/diagnóstico , Doença da Artéria Coronariana/epidemiologia , Estudos de Coortes , Valor Preditivo dos Testes , Estenose Coronária/diagnóstico , Fatores de Risco , Aprendizado de Máquina , Angiografia Coronária
5.
PLoS Genet ; 17(1): e1009337, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33493176

RESUMO

Understanding the relationship between natural selection and phenotypic variation has been a long-standing challenge in human population genetics. With the emergence of biobank-scale datasets, along with new statistical metrics to approximate strength of purifying selection at the variant level, it is now possible to correlate a proxy of individual relative fitness with a range of medical phenotypes. We calculated a per-individual deleterious load score by summing the total number of derived alleles per individual after incorporating a weight that approximates strength of purifying selection. We assessed four methods for the weight, including GERP, phyloP, CADD, and fitcons. By quantitatively tracking each of these scores with the site frequency spectrum, we identified phyloP as the most appropriate weight. The phyloP-weighted load score was then calculated across 15,129,142 variants in 335,161 individuals from the UK Biobank and tested for association on 1,380 medical phenotypes. After accounting for multiple test correction, we observed a strong association of the load score amongst coding sites only on 27 traits including body mass, adiposity and metabolic rate. We further observed that the association signals were driven by common variants (derived allele frequency > 5%) with high phyloP score (phyloP > 2). Finally, through permutation analyses, we showed that the load score amongst coding sites had an excess of nominally significant associations on many medical phenotypes. These results suggest a broad impact of deleterious load on medical phenotypes and highlight the deleterious load score as a tool to disentangle the complex relationship between natural selection and medical phenotypes.


Assuntos
Evolução Molecular , Aptidão Genética/genética , Genética Populacional , Seleção Genética/genética , Alelos , Bancos de Espécimes Biológicos , Índice de Massa Corporal , Feminino , Frequência do Gene , Estudos de Associação Genética , Predisposição Genética para Doença , Variação Genética/genética , Humanos , Masculino , Reino Unido
6.
JAMA ; 327(4): 350-359, 2022 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-35076666

RESUMO

Importance: Population-based assessment of disease risk associated with gene variants informs clinical decisions and risk stratification approaches. Objective: To evaluate the population-based disease risk of clinical variants in known disease predisposition genes. Design, Setting, and Participants: This cohort study included 72 434 individuals with 37 780 clinical variants who were enrolled in the BioMe Biobank from 2007 onwards with follow-up until December 2020 and the UK Biobank from 2006 to 2010 with follow-up until June 2020. Participants had linked exome and electronic health record data, were older than 20 years, and were of diverse ancestral backgrounds. Exposures: Variants previously reported as pathogenic or predicted to cause a loss of protein function by bioinformatic algorithms (pathogenic/loss-of-function variants). Main Outcomes and Measures: The primary outcome was the disease risk associated with clinical variants. The risk difference (RD) between the prevalence of disease in individuals with a variant allele (penetrance) vs in individuals with a normal allele was measured. Results: Among 72 434 study participants, 43 395 were from the UK Biobank (mean [SD] age, 57 [8.0] years; 24 065 [55%] women; 2948 [7%] non-European) and 29 039 were from the BioMe Biobank (mean [SD] age, 56 [16] years; 17 355 [60%] women; 19 663 [68%] non-European). Of 5360 pathogenic/loss-of-function variants, 4795 (89%) were associated with an RD less than or equal to 0.05. Mean penetrance was 6.9% (95% CI, 6.0%-7.8%) for pathogenic variants and 0.85% (95% CI, 0.76%-0.95%) for benign variants reported in ClinVar (difference, 6.0 [95% CI, 5.6-6.4] percentage points), with a median of 0% for both groups due to large numbers of nonpenetrant variants. Penetrance of pathogenic/loss-of-function variants for late-onset diseases was modified by age: mean penetrance was 10.3% (95% CI, 9.0%-11.6%) in individuals 70 years or older and 8.5% (95% CI, 7.9%-9.1%) in individuals 20 years or older (difference, 1.8 [95% CI, 0.40-3.3] percentage points). Penetrance of pathogenic/loss-of-function variants was heterogeneous even in known disease predisposition genes, including BRCA1 (mean [range], 38% [0%-100%]), BRCA2 (mean [range], 38% [0%-100%]), and PALB2 (mean [range], 26% [0%-100%]). Conclusions and Relevance: In 2 large biobank cohorts, the estimated penetrance of pathogenic/loss-of-function variants was variable but generally low. Further research of population-based penetrance is needed to refine variant interpretation and clinical evaluation of individuals with these variant alleles.


Assuntos
Predisposição Genética para Doença , Variação Genética , Mutação com Perda de Função , Penetrância , Idoso , Bancos de Espécimes Biológicos , Estudos de Coortes , Feminino , Humanos , Masculino , Mutação , Reino Unido
7.
Annu Rev Genomics Hum Genet ; 19: 289-301, 2018 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-29641912

RESUMO

While sequence-based genetic tests have long been available for specific loci, especially for Mendelian disease, the rapidly falling costs of genome-wide genotyping arrays, whole-exome sequencing, and whole-genome sequencing are moving us toward a future where full genomic information might inform the prognosis and treatment of a variety of diseases, including complex disease. Similarly, the availability of large populations with full genomic information has enabled new insights about the etiology and genetic architecture of complex disease. Insights from the latest generation of genomic studies suggest that our categorization of diseases as complex may conceal a wide spectrum of genetic architectures and causal mechanisms that ranges from Mendelian forms of complex disease to complex regulatory structures underlying Mendelian disease. Here, we review these insights, along with advances in the prediction of disease risk and outcomes from full genomic information.


Assuntos
Doenças Genéticas Inatas/genética , Doenças Genéticas Inatas/complicações , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Sequenciamento do Exoma
8.
Nature ; 524(7564): 225-9, 2015 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-26123021

RESUMO

Patterns of amino acid conservation have served as a tool for understanding protein evolution. The same principles have also found broad application in human genomics, driven by the need to interpret the pathogenic potential of variants in patients. Here we performed a systematic comparative genomics analysis of human disease-causing missense variants. We found that an appreciable fraction of disease-causing alleles are fixed in the genomes of other species, suggesting a role for genomic context. We developed a model of genetic interactions that predicts most of these to be simple pairwise compensations. Functional testing of this model on two known human disease genes revealed discrete cis amino acid residues that, although benign on their own, could rescue the human mutations in vivo. This approach was also applied to ab initio gene discovery to support the identification of a de novo disease driver in BTG2 that is subject to protective cis-modification in more than 50 species. Finally, on the basis of our data and models, we developed a computational tool to predict candidate residues subject to compensation. Taken together, our data highlight the importance of cis-genomic context as a contributor to protein evolution; they provide an insight into the complexity of allele effect on phenotype; and they are likely to assist methods for predicting allele pathogenicity.


Assuntos
Doença/genética , Genômica , Mutação de Sentido Incorreto/genética , Supressão Genética/genética , Proteínas Adaptadoras de Transdução de Sinal/genética , Alelos , Animais , Evolução Molecular , Genoma Humano/genética , Humanos , Proteínas Imediatamente Precoces/genética , Microcefalia/genética , Proteínas Associadas aos Microtúbulos , Fenótipo , Proteínas/genética , Alinhamento de Sequência , Proteínas Supressoras de Tumor/genética
9.
PLoS Med ; 16(1): e1002725, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30645594

RESUMO

BACKGROUND: Studies have shown strong positive associations between serum urate (SU) levels and chronic kidney disease (CKD) risk; however, whether the relation is causal remains uncertain. We evaluate whether genetic data are consistent with a causal impact of SU level on the risk of CKD and estimated glomerular filtration rate (eGFR). METHODS AND FINDINGS: We used Mendelian randomization (MR) methods to evaluate the presence of a causal effect. We used aggregated genome-wide association data (N = 110,347 for SU, N = 69,374 for gout, N = 133,413 for eGFR, N = 117,165 for CKD), electronic-medical-record-linked UK Biobank data (N = 335,212), and population-based cohorts (N = 13,425), all in individuals of European ancestry, for SU levels and CKD. Our MR analysis showed that SU has a causal effect on neither eGFR level nor CKD risk across all MR analyses (all P > 0.05). These null associations contrasted with our epidemiological association findings from the 4 population-based cohorts (change in eGFR level per 1-mg/dl [59.48 µmol/l] increase in SU: -1.99 ml/min/1.73 m2; 95% CI -2.86 to -1.11; P = 8.08 × 10(-6); odds ratio [OR] for CKD: 1.48; 95% CI 1.32 to 1.65; P = 1.52 × 10(-11)). In contrast, the same MR approaches showed that SU has a causal effect on the risk of gout (OR estimates ranging from 3.41 to 6.04 per 1-mg/dl increase in SU, all P < 10-3), which served as a positive control of our approach. Overall, our MR analysis had >99% power to detect a causal effect of SU level on the risk of CKD of the same magnitude as the observed epidemiological association between SU and CKD. Limitations of this study include the lifelong effect of a genetic perturbation not being the same as an acute perturbation, the inability to study non-European populations, and some sample overlap between the datasets used in the study. CONCLUSIONS: Evidence from our series of causal inference approaches using genetics does not support a causal effect of SU level on eGFR level or CKD risk. Reducing SU levels is unlikely to reduce the risk of CKD development.


Assuntos
Insuficiência Renal Crônica/etiologia , Ácido Úrico/sangue , Adulto , Fatores Etários , Feminino , Estudo de Associação Genômica Ampla , Taxa de Filtração Glomerular/genética , Humanos , Masculino , Análise da Randomização Mendeliana , Insuficiência Renal Crônica/sangue , Insuficiência Renal Crônica/genética , Fatores Sexuais , Adulto Jovem
10.
Genet Med ; 20(9): 936-941, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-29388949

RESUMO

PURPOSE: Over 150,000 variants have been reported to cause Mendelian disease in the medical literature. It is still difficult to leverage this knowledge base in clinical practice, as many reports lack strong statistical evidence or may include false associations. Clinical laboratories assess whether these variants (along with newly observed variants that are adjacent to these published ones) underlie clinical disorders. METHODS: We investigated whether citation data-including journal impact factor and the number of cited variants (NCV) in each gene with published disease associations-can be used to improve variant assessment. RESULTS: Surprisingly, we found that impact factor is not predictive of pathogenicity, but the NCV score for each gene can provide statistical support for prediction of pathogenicity. When this gene-level citation metric is combined with variant-level evolutionary conservation and structural features, classification accuracy reaches 89.5%. Further, variants identified in clinical exome sequencing cases have higher NCVs than do simulated rare variants from the Exome Aggregation Consortium database within the same set of genes and functional consequences (P < 2.22 × 10-16). CONCLUSION: Aggregate citation data can complement existing variant-based predictive algorithms, and can boost their performance without the need to access and review large numbers of papers. The NCV is a slow-growing metric of scientific knowledge about each gene's association with disease.


Assuntos
Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Algoritmos , Bases de Dados Genéticas , Previsões , Variação Genética , Humanos , Fator de Impacto de Revistas
11.
PLoS Genet ; 11(10): e1005622, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26509271

RESUMO

Large genome-wide association studies (GWAS) have identified many genetic loci associated with risk for myocardial infarction (MI) and coronary artery disease (CAD). Concurrently, efforts such as the National Institutes of Health (NIH) Roadmap Epigenomics Project and the Encyclopedia of DNA Elements (ENCODE) Consortium have provided unprecedented data on functional elements of the human genome. In the present study, we systematically investigate the biological link between genetic variants associated with this complex disease and their impacts on gene function. First, we examined the heritability of MI/CAD according to genomic compartments. We observed that single nucleotide polymorphisms (SNPs) residing within nearby regulatory regions show significant polygenicity and contribute between 59-71% of the heritability for MI/CAD. Second, we showed that the polygenicity and heritability explained by these SNPs are enriched in histone modification marks in specific cell types. Third, we found that a statistically higher number of 45 MI/CAD-associated SNPs that have been identified from large-scale GWAS studies reside within certain functional elements of the genome, particularly in active enhancer and promoter regions. Finally, we observed significant heterogeneity of this signal across cell types, with strong signals observed within adipose nuclei, as well as brain and spleen cell types. These results suggest that the genetic etiology of MI/CAD is largely explained by tissue-specific regulatory perturbation within the human genome.


Assuntos
Doença da Artéria Coronariana/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único/genética , Doença da Artéria Coronariana/patologia , Genoma Humano , Genótipo , Humanos , Sequências Reguladoras de Ácido Nucleico , Fatores de Risco
12.
Mol Biol Evol ; 33(10): 2555-64, 2016 10.
Artigo em Inglês | MEDLINE | ID: mdl-27436009

RESUMO

Deleterious mutations are expected to evolve under negative selection and are usually purged from the population. However, deleterious alleles segregate in the human population and some disease-associated variants are maintained at considerable frequencies. Here, we test the hypothesis that balancing selection may counteract purifying selection in neighboring regions and thus maintain deleterious variants at higher frequency than expected from their detrimental fitness effect. We first show in realistic simulations that balancing selection reduces the density of polymorphic sites surrounding a locus under balancing selection, but at the same time markedly increases the population frequency of the remaining variants, including even substantially deleterious alleles. To test the predictions of our simulations empirically, we then use whole-exome sequencing data from 6,500 human individuals and focus on the most established example for balancing selection in the human genome, the major histocompatibility complex (MHC). Our analysis shows an elevated frequency of putatively deleterious coding variants in nonhuman leukocyte antigen (non-HLA) genes localized in the MHC region. The mean frequency of these variants declined with physical distance from the classical HLA genes, indicating dependency on genetic linkage. These results reveal an indirect cost of the genetic diversity maintained by balancing selection, which has hitherto been perceived as mostly advantageous, and have implications both for the evolution of recombination and also for the epidemiology of various MHC-associated diseases.


Assuntos
Antígenos HLA/genética , Complexo Principal de Histocompatibilidade/genética , Seleção Genética , Deleção de Sequência , Alelos , Evolução Biológica , Simulação por Computador , Bases de Dados Genéticas , Evolução Molecular , Frequência do Gene/genética , Variação Genética , Genoma Humano , Haplótipos/genética , Humanos , Polimorfismo Genético/genética
13.
Hum Mutat ; 36(10): 998-1003, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26378430

RESUMO

Clinical sequencing is expanding, but causal variants are still not identified in the majority of cases. These unsolved cases can aid in gene discovery when individuals with similar phenotypes are identified in systems such as the Matchmaker Exchange. We describe risks for gene discovery in this growing set of unsolved cases. In a set of rare disease cases with the same phenotype, it is not difficult to find two individuals with the same phenotype that carry variants in the same gene. We quantify the risk of false-positive association in a cohort of individuals with the same phenotype, using the prior probability of observing a variant in each gene from over 60,000 individuals (Exome Aggregation Consortium). Based on the number of individuals with a genic variant, cohort size, specific gene, and mode of inheritance, we calculate a P value that the match represents a true association. A match in two of 10 patients in MECP2 is statistically significant (P = 0.0014), whereas a match in TTN would not reach significance, as expected (P > 0.999). Finally, we analyze the probability of matching in clinical exome cases to estimate the number of cases needed to identify genes related to different disorders. We offer Rare Disease Match, an online tool to mitigate the uncertainty of false-positive associations.


Assuntos
Biologia Computacional/métodos , Estudos de Associação Genética/métodos , Doenças Raras/genética , Algoritmos , Bases de Dados Genéticas , Exoma , Reações Falso-Positivas , Variação Genética , Humanos , Fenótipo , Navegador
14.
Am J Hum Genet ; 88(2): 183-92, 2011 Feb 11.
Artigo em Inglês | MEDLINE | ID: mdl-21310275

RESUMO

Assessing the significance of novel genetic variants revealed by DNA sequencing is a major challenge to the integration of genomic techniques with medical practice. Many variants remain difficult to classify by traditional genetic methods. Computational methods have been developed that could contribute to classifying these variants, but they have not been properly validated and are generally not considered mature enough to be used effectively in a clinical setting. We developed a computational method for predicting the effects of missense variants detected in patients with hypertrophic cardiomyopathy (HCM). We used a curated clinical data set of 74 missense variants in six genes associated with HCM to train and validate an automated predictor. The predictor is based on support vector regression and uses phylogenetic and structural features specific to genes involved in HCM. Ten-fold cross validation estimated our predictor's sensitivity at 94% (95% confidence interval: 83%-98%) and specificity at 89% (95% confidence interval: 72%-100%). This corresponds to an odds ratio of 10 for a prediction of pathogenic (95% confidence interval: 4.0-infinity), or an odds ratio of 9.9 for a prediction of benign (95% confidence interval: 4.6-21). Coverage (proportion of variants for which a prediction was made) was 57% (95% confidence interval: 49%-64%). This performance exceeds that of existing methods that are not specifically designed for HCM. The accuracy of this predictor provides support for the clinical use of automated predictions alongside family segregation and population frequency data in the interpretation of new missense variants and suggests future development of similar tools for other diseases.


Assuntos
Cardiomiopatia Hipertrófica/genética , Biologia Computacional , Variação Genética/genética , Mutação de Sentido Incorreto/genética , Proteínas Nucleares/genética , Predisposição Genética para Doença , Humanos
15.
JACC Adv ; 3(4)2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38737007

RESUMO

BACKGROUND: Diet is a key modifiable risk factor of coronary artery disease (CAD). However, the causal effects of specific dietary traits on CAD risk remain unclear. With the expansion of dietary data in population biobanks, Mendelian randomization (MR) could help enable the efficient estimation of causality in diet-disease associations. OBJECTIVES: The primary goal was to test causality for 13 common dietary traits on CAD risk using a systematic 2-sample MR framework. A secondary goal was to identify plasma metabolites mediating diet-CAD associations suspected to be causal. METHODS: Cross-sectional genetic and dietary data on up to 420,531 UK Biobank and 184,305 CARDIoGRAMplusC4D individuals of European ancestry were used in 2-sample MR. The primary analysis used fixed effect inverse-variance weighted regression, while sensitivity analyses used weighted median estimation, MR-Egger regression, and MR-Pleiotropy Residual Sum and Outlier. RESULTS: Genetic variants serving as proxies for muesli intake were negatively associated with CAD risk (OR: 0.74; 95% CI: 0.65-0.84; P = 5.385 × 10-4). Sensitivity analyses using weighted median estimation supported this with a significant association in the same direction. Additionally, we identified higher plasma acetate levels as a potential mediator (OR: 0.03; 95% CI: 0.01-0.12; P = 1.15 × 10-4). CONCLUSIONS: Muesli, a mixture of oats, seeds, nuts, dried fruit, and milk, may causally reduce CAD risk. Circulating levels of acetate, a gut microbiota-derived short-chain fatty acid, could be mediating its cardioprotective effects. These findings highlight the role of gut flora in cardiovascular health and help prioritize randomized trials on dietary interventions for CAD.

16.
Nat Genet ; 56(7): 1412-1419, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38862854

RESUMO

Coronary artery disease (CAD) exists on a spectrum of disease represented by a combination of risk factors and pathogenic processes. An in silico score for CAD built using machine learning and clinical data in electronic health records captures disease progression, severity and underdiagnosis on this spectrum and could enhance genetic discovery efforts for CAD. Here we tested associations of rare and ultrarare coding variants with the in silico score for CAD in the UK Biobank, All of Us Research Program and BioMe Biobank. We identified associations in 17 genes; of these, 14 show at least moderate levels of prior genetic, biological and/or clinical support for CAD. We also observed an excess of ultrarare coding variants in 321 aggregated CAD genes, suggesting more ultrarare variant associations await discovery. These results expand our understanding of the genetic etiology of CAD and illustrate how digital markers can enhance genetic association investigations for complex diseases.


Assuntos
Doença da Artéria Coronariana , Predisposição Genética para Doença , Aprendizado de Máquina , Doença da Artéria Coronariana/genética , Humanos , Exoma/genética , Sequenciamento do Exoma/métodos , Variação Genética , Estudo de Associação Genômica Ampla/métodos , Feminino , Polimorfismo de Nucleotídeo Único
17.
Nat Genet ; 56(1): 51-59, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38172303

RESUMO

Studies have shown that drug targets with human genetic support are more likely to succeed in clinical trials. Hence, a tool integrating genetic evidence to prioritize drug target genes is beneficial for drug discovery. We built a genetic priority score (GPS) by integrating eight genetic features with drug indications from the Open Targets and SIDER databases. The top 0.83%, 0.28% and 0.19% of the GPS conferred a 5.3-, 9.9- and 11.0-fold increased effect of having an indication, respectively. In addition, we observed that targets in the top 0.28% of the score were 1.7-, 3.7- and 8.8-fold more likely to advance from phase I to phases II, III and IV, respectively. Complementary to the GPS, we incorporated the direction of genetic effect and drug mechanism into a directional version of the score called the GPS with direction of effect. We applied our method to 19,365 protein-coding genes and 399 drug indications and made all results available through a web portal.


Assuntos
Genética Humana , Farmacogenética , Humanos , Descoberta de Drogas
18.
Hum Mutat ; 34(9): 1216-20, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23818451

RESUMO

It is now affordable to order clinically interpreted whole-genome sequence reports from clinical laboratories. One major component of these reports is derived from the knowledge base of previously identified pathogenic variants, including research articles, locus-specific, and other databases. While over 150,000 such pathogenic variants have been identified, many of these were originally discovered in small cohort studies of affected individuals, so their applicability to asymptomatic populations is unclear. We analyzed the prevalence of a large set of pathogenic variants from the medical and scientific literature in a large set of asymptomatic individuals (N = 1,092) and found 8.5% of these pathogenic variants in at least one individual. In the average individual in the 1000 Genomes Project, previously identified pathogenic variants occur on average 294 times (σ = 25.5) in homozygous form and 942 times (σ = 68.2) in heterozygous form. We also find that many of these pathogenic variants are frequently occurring: there are 3,744 variants with minor allele frequency (MAF) ≥ 0.01 (4.6%) and 2,837 variants with MAF ≥ 0.05 (3.5%). This indicates that many of these variants may be erroneous findings or have lower penetrance than previously expected.


Assuntos
Frequência do Gene , Variação Genética , Análise de Sequência de DNA , Bases de Dados Genéticas , Genoma Humano , Genótipo , Heterozigoto , Homozigoto , Humanos , Achados Incidentais , Penetrância
19.
medRxiv ; 2023 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-38196638

RESUMO

It is estimated that as many as 1 in 16 people worldwide suffer from rare diseases. Rare disease patients face difficulty finding diagnosis and treatment for their conditions, including long diagnostic odysseys, multiple incorrect diagnoses, and unavailable or prohibitively expensive treatments. As a result, it is likely that large electronic health record (EHR) systems include high numbers of participants suffering from undiagnosed rare disease. While this has been shown in detail for specific diseases, these studies are expensive and time consuming and have only been feasible to perform for a handful of the thousands of known rare diseases. The bulk of these undiagnosed cases are effectively hidden, with no straightforward way to differentiate them from healthy controls. The ability to access them at scale would enormously expand our capacity to study and develop drugs for rare diseases, adding to tools aimed at increasing availability of study cohorts for rare disease. In this study, we train a deep learning transformer algorithm, RarePT (Rare-Phenotype Prediction Transformer), to impute undiagnosed rare disease from EHR diagnosis codes in 436,407 participants in the UK Biobank and validated on an independent cohort from 3,333,560 individuals from the Mount Sinai Health System. We applied our model to 155 rare diagnosis codes with fewer than 250 cases each in the UK Biobank and predicted participants with elevated risk for each diagnosis, with the number of participants predicted to be at risk ranging from 85 to 22,000 for different diagnoses. These risk predictions are significantly associated with increased mortality for 65% of diagnoses, with disease burden expressed as disability-adjusted life years (DALY) for 73% of diagnoses, and with 72% of available disease-specific diagnostic tests. They are also highly enriched for known rare diagnoses in patients not included in the training set, with an odds ratio (OR) of 48.0 in cross-validation cohorts of the UK Biobank and an OR of 30.6 in the independent Mount Sinai Health System cohort. Most importantly, RarePT successfully screens for undiagnosed patients in 32 rare diseases with available diagnostic tests in the UK Biobank. Using the trained model to estimate the prevalence of undiagnosed disease in the UK Biobank for these 32 rare phenotypes, we find that at least 50% of patients remain undiagnosed for 20 of 32 diseases. These estimates provide empirical evidence of a high prevalence of undiagnosed rare disease, as well as demonstrating the enormous potential benefit of using RarePT to screen for undiagnosed rare disease patients in large electronic health systems.

20.
Nat Commun ; 14(1): 2385, 2023 04 25.
Artigo em Inglês | MEDLINE | ID: mdl-37169741

RESUMO

Systemic autoimmune rheumatic diseases (SARDs) can lead to irreversible damage if left untreated, yet these patients often endure long diagnostic journeys before being diagnosed and treated. Machine learning may help overcome the challenges of diagnosing SARDs and inform clinical decision-making. Here, we developed and tested a machine learning model to identify patients who should receive rheumatological evaluation for SARDs using longitudinal electronic health records of 161,584 individuals from two institutions. The model demonstrated high performance for predicting cases of autoantibody-tested individuals in a validation set, an external test set, and an independent cohort with a broader case definition. This approach identified more individuals for autoantibody testing compared with current clinical standards and a greater proportion of autoantibody carriers among those tested. Diagnoses of SARDs and other autoimmune conditions increased with higher model probabilities. The model detected a need for autoantibody testing and rheumatology encounters up to five years before the test date and assessment date, respectively. Altogether, these findings illustrate that the clinical manifestations of a diverse array of autoimmune conditions are detectable in electronic health records using machine learning, which may help systematize and accelerate autoimmune testing.


Assuntos
Doenças Autoimunes , Registros Eletrônicos de Saúde , Humanos , Doenças Autoimunes/diagnóstico , Pacientes , Autoanticorpos , Aprendizado de Máquina
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA