RESUMO
Integrating human genomics and proteomics can help elucidate disease mechanisms, identify clinical biomarkers and discover drug targets1-4. Because previous proteogenomic studies have focused on common variation via genome-wide association studies, the contribution of rare variants to the plasma proteome remains largely unknown. Here we identify associations between rare protein-coding variants and 2,923 plasma protein abundances measured in 49,736 UK Biobank individuals. Our variant-level exome-wide association study identified 5,433 rare genotype-protein associations, of which 81% were undetected in a previous genome-wide association study of the same cohort5. We then looked at aggregate signals using gene-level collapsing analysis, which revealed 1,962 gene-protein associations. Of the 691 gene-level signals from protein-truncating variants, 99.4% were associated with decreased protein levels. STAB1 and STAB2, encoding scavenger receptors involved in plasma protein clearance, emerged as pleiotropic loci, with 77 and 41 protein associations, respectively. We demonstrate the utility of our publicly accessible resource through several applications. These include detailing an allelic series in NLRC4, identifying potential biomarkers for a fatty liver disease-associated variant in HSD17B13 and bolstering phenome-wide association studies by integrating protein quantitative trait loci with protein-truncating variants in collapsing analyses. Finally, we uncover distinct proteomic consequences of clonal haematopoiesis (CH), including an association between TET2-CH and increased FLT3 levels. Our results highlight a considerable role for rare variation in plasma protein abundance and the value of proteogenomics in therapeutic discovery.
Assuntos
Bancos de Espécimes Biológicos , Proteínas Sanguíneas , Estudos de Associação Genética , Genômica , Proteômica , Humanos , Alelos , Biomarcadores/sangue , Proteínas Sanguíneas/análise , Proteínas Sanguíneas/genética , Bases de Dados Factuais , Exoma/genética , Hematopoese , Mutação , Plasma/química , Reino UnidoRESUMO
Genome-wide association studies have uncovered thousands of common variants associated with human disease, but the contribution of rare variants to common disease remains relatively unexplored. The UK Biobank contains detailed phenotypic data linked to medical records for approximately 500,000 participants, offering an unprecedented opportunity to evaluate the effect of rare variation on a broad collection of traits1,2. Here we study the relationships between rare protein-coding variants and 17,361 binary and 1,419 quantitative phenotypes using exome sequencing data from 269,171 UK Biobank participants of European ancestry. Gene-based collapsing analyses revealed 1,703 statistically significant gene-phenotype associations for binary traits, with a median odds ratio of 12.4. Furthermore, 83% of these associations were undetectable via single-variant association tests, emphasizing the power of gene-based collapsing analysis in the setting of high allelic heterogeneity. Gene-phenotype associations were also significantly enriched for loss-of-function-mediated traits and approved drug targets. Finally, we performed ancestry-specific and pan-ancestry collapsing analyses using exome sequencing data from 11,933 UK Biobank participants of African, East Asian or South Asian ancestry. Our results highlight a significant contribution of rare variants to common disease. Summary statistics are publicly available through an interactive portal ( http://azphewas.com/ ).
Assuntos
Bancos de Espécimes Biológicos , Bases de Dados Genéticas , Doença/genética , Exoma/genética , Variação Genética/genética , Adulto , Idoso , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo , Proteínas/química , Proteínas/genética , Reino Unido , Sequenciamento do ExomaRESUMO
Genome-wide association studies (GWASs) have established the contribution of common and low-frequency variants to metabolic blood measurements in the UK Biobank (UKB). To complement existing GWAS findings, we assessed the contribution of rare protein-coding variants in relation to 355 metabolic blood measurements-including 325 predominantly lipid-related nuclear magnetic resonance (NMR)-derived blood metabolite measurements (Nightingale Health Plc) and 30 clinical blood biomarkers-using 412,393 exome sequences from four genetically diverse ancestries in the UKB. Gene-level collapsing analyses were conducted to evaluate a diverse range of rare-variant architectures for the metabolic blood measurements. Altogether, we identified significant associations (p < 1 × 10-8) for 205 distinct genes that involved 1,968 significant relationships for the Nightingale blood metabolite measurements and 331 for the clinical blood biomarkers. These include associations for rare non-synonymous variants in PLIN1 and CREB3L3 with lipid metabolite measurements and SYT7 with creatinine, among others, which may not only provide insights into novel biology but also deepen our understanding of established disease mechanisms. Of the study-wide significant clinical biomarker associations, 40% were not previously detected on analyzing coding variants in a GWAS in the same cohort, reinforcing the importance of studying rare variation to fully understand the genetic architecture of metabolic blood measurements.
Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Bancos de Espécimes Biológicos , Biomarcadores , Lipídeos , Reino Unido , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Encephalitis with antibodies to leucine-rich glioma-inactivated 1 (LGI1-Ab-E) is a common form of autoimmune encephalitis, presenting with seizures and neuropsychiatric changes, predominantly in older males. More than 90% of patients carry the human leucocyte antigen (HLA) class II allele, HLA-DRB1*07:01. However, this is also present in 25% of healthy controls. Therefore, we hypothesised the presence of additional genetic predispositions. In this genome-wide association study and meta-analysis, we studied a discovery cohort of 131 French LGI1-Ab-E and a validation cohort of 126 American, British and Irish LGI1-Ab-E patients, ancestry-matched to 2613 and 2538 European controls, respectively. Outside the known major HLA signal, we found two single nucleotide polymorphisms (SNPs) at genome-wide significance (p < 5 x 10-8), implicating PTPRD, a protein tyrosine phosphatase, and LINC00670, a non-protein coding RNA gene. Meta-analysis defined four additional non-HLA loci, including the protein coding COBL gene. Polygenic risk scores with and without HLA variants proposed a contribution of non-HLA loci. In silico network analyses suggested LGI1 and PTPRD mediated interactions via the established receptors of LGI1, ADAM22 and ADAM23. Our results identify new genetic loci in LGI1-Ab-E. These findings present opportunities for mechanistic studies and offer potential markers of susceptibility, prognostics and therapeutic responses.
RESUMO
Large-scale phenome-wide association studies performed using densely-phenotyped cohorts such as the UK Biobank (UKB), reveal many statistically robust gene-phenotype relationships for both clinical and continuous traits. Here, we present Gene-SCOUT, a tool used to identify genes with similar continuous trait fingerprints to a gene of interest. A fingerprint reflects the continuous traits identified to be statistically associated with a gene of interest based on multiple underlying rare variant genetic architectures. Similarities between genes are evaluated by the cosine similarity measure, to capture concordant effect directionality, elucidating clusters of genes in a high dimensional space. The underlying gene-biomarker population-scale association statistics were obtained from a gene-level rare variant collapsing analysis performed on over 1500 continuous traits using 394 692 UKB participant exomes, with additional metabolomic trait associations provided through Nightingale Health's recent study of 121 394 of these participants. We demonstrate that gene similarity estimates from Gene-SCOUT provide stronger enrichments for clinical traits compared to existing methods. Furthermore, we provide a fully interactive web-resource (http://genescout.public.cgr.astrazeneca.com) to explore the pre-calculated exome-wide similarities. This resource enables a user to examine the biological relevance of the most similar genes for Gene Ontology (GO) enrichment and UKB clinical trait enrichment statistics, as well as a detailed breakdown of the traits underpinning a given fingerprint.
Assuntos
Estudo de Associação Genômica Ampla , Fenômica , Humanos , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Sequenciamento do Exoma , Exoma , Polimorfismo de Nucleotídeo ÚnicoRESUMO
The combination of next-generation sequencing technologies and high-throughput genotyping platforms has revolutionized the pursuit of genetic variants that contribute towards disease. Furthermore, these technologies have provided invaluable insight into the genetic factors that prevent individuals from developing disease. Exploiting the evolutionary mechanisms that were designed by nature to help prevent disease is an attractive line of enquiry. Such efforts have the potential to generate a therapeutic target roadmap and rejuvenate the current drug-discovery pathway. By delineating the genomic factors that are protective against disease, there is potential to derive highly effective, genomically anchored medicines that assist in maintaining health.
Assuntos
Alelos , Descoberta de Drogas , Predisposição Genética para Doença , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala , HumanosRESUMO
BACKGROUND: Left atrial (LA) size and function are known predictors of new onset atrial fibrillation (AF) in hypertrophic cardiomyopathy (HCM) patients. Components of LA deformation including reservoir, conduit, and booster function provide additional information on atrial mechanics. Whether or not LA deformation can augment our ability to predict the risk of new onset AF in HCM patients beyond standard measurements is unknown. METHODS: We assessed LA size, function, and deformation on cardiovascular magnetic resonance (CMR) in 238 genotyped HCM patients and compared this with twenty age, sex, blood pressure and body mass index matched control subjects. We further evaluated the determinants of new onset AF in HCM patients. RESULTS: Compared to control subjects, HCM patients had higher LA antero-posterior diameter, lower LA ejection fraction and lower LA reservoir (19.9 [17.1, 22.2], 21.6 [19.9, 22.9], P = 0.047) and conduit strain (10.6 ± 4.4, 13.7 ± 3.3, P = 0.002). LA booster strain did not differ between healthy controls and HCM patients, but HCM patients who developed new onset AF (n = 33) had lower booster strain (7.6 ± 3.3, 9.5 ± 3.0, P = 0.001) than those that did not (n = 205). In separate multivariate models, age, LA ejection fraction, and LA booster and reservoir strain were each independent determinants of AF. Age ≥ 55 years was the strongest determinant (HR 6.62, 95% CI 2.79-15.70), followed by LA booster strain ≤ 8% (HR 3.69, 95% CI 1.81-7.52) and LA reservoir strain ≤ 18% (HR 2.56, 95% CI 1.24-5.27). Conventional markers of HCM phenotypic severity, age and sudden death risk factors were associated with LA strain components. CONCLUSIONS: LA strain components are impaired in HCM and, together with age, independently predicted the risk of new onset AF. Increasing age and phenotypic severity were associated with LA strain abnormalities. Our findings suggest that the routine assessment of LA strain components and consideration of age could augment LA size in predicting risk of AF, and potentially guide prophylactic anticoagulation use in HCM.
Assuntos
Fibrilação Atrial , Cardiomiopatia Hipertrófica , Fibrilação Atrial/diagnóstico por imagem , Fibrilação Atrial/etiologia , Cardiomiopatia Hipertrófica/diagnóstico por imagem , Átrios do Coração/diagnóstico por imagem , Humanos , Espectroscopia de Ressonância Magnética , Pessoa de Meia-Idade , Valor Preditivo dos TestesRESUMO
PURPOSE: Increasing numbers of genes are being implicated in Mendelian disorders and incorporated into clinical test panels. However, lack of evidence supporting the gene-disease relationship can hinder interpretation. We explored the utility of testing 51 additional genes for hypertrophic cardiomyopathy (HCM), one of the most commonly tested Mendelian disorders. METHODS: Using genome sequencing data from 240 sarcomere gene negative HCM cases and 6229 controls, we undertook case-control and individual variant analyses to assess 51 genes that have been proposed for HCM testing. RESULTS: We found no evidence to suggest that rare variants in these genes are prevalent causes of HCM. One variant, in a single case, was categorized as likely to be pathogenic. Over 99% of variants were classified as a variant of uncertain significance (VUS) and 54% of cases had one or more VUS. CONCLUSION: For almost all genes, the gene-disease relationship could not be validated and lack of evidence precluded variant interpretation. Thus, the incremental diagnostic yield of extending testing was negligible, and would, we propose, be outweighed by problems that arise with a high rate of uninterpretable findings. These findings highlight the need for rigorous, evidence-based selection of genes for clinical test panels.
Assuntos
Cardiomiopatia Hipertrófica/genética , Sarcômeros , Adolescente , Adulto , Idoso , Cardiomiopatia Hipertrófica/diagnóstico , Cardiomiopatia Hipertrófica/patologia , Estudos de Casos e Controles , Feminino , Estudos de Associação Genética , Humanos , Masculino , Pessoa de Meia-Idade , Sequenciamento Completo do Genoma , Adulto JovemRESUMO
We have reported previously that a missense mutation in the mitochondrial fission gene Dynamin-related protein 1 (Drp1) underlies the Python mouse model of monogenic dilated cardiomyopathy. The aim of this study was to investigate the consequences of the C452F mutation on Drp1 protein function and to define the cellular sequelae leading to heart failure in the Python monogenic dilated cardiomyopathy model. We found that the C452F mutation increased Drp1 GTPase activity. The mutation also conferred resistance to oligomer disassembly by guanine nucleotides and high ionic strength solutions. In a mouse embryonic fibroblast model, Drp1 C452F cells exhibited abnormal mitochondrial morphology and defective mitophagy. Mitochondria in C452F mouse embryonic fibroblasts were depolarized and had reduced calcium uptake with impaired ATP production by oxidative phosphorylation. In the Python heart, we found a corresponding progressive decline in oxidative phosphorylation with age and activation of sterile inflammation. As a corollary, enhancing autophagy by exposure to a prolonged low-protein diet improved cardiac function in Python mice. In conclusion, failure of Drp1 disassembly impairs mitophagy, leading to a downstream cascade of mitochondrial depolarization, aberrant calcium handling, impaired ATP synthesis, and activation of sterile myocardial inflammation, resulting in heart failure.
Assuntos
Biopolímeros/fisiologia , Dinaminas/fisiologia , Insuficiência Cardíaca/etiologia , Mitofagia , Miocardite/etiologia , Animais , Biopolímeros/genética , Biopolímeros/metabolismo , Células Cultivadas , Dinaminas/genética , Dinaminas/metabolismo , Insuficiência Cardíaca/fisiopatologia , Camundongos , Mutação , Miocardite/fisiopatologia , Fosforilação OxidativaRESUMO
The ongoing expansion of human genomic datasets propels therapeutic target identification; however, extracting gene-disease associations from gene annotations remains challenging. Here, we introduce Mantis-ML 2.0, a framework integrating AstraZeneca's Biological Insights Knowledge Graph and numerous tabular datasets, to assess gene-disease probabilities throughout the phenome. We use graph neural networks, capturing the graph's holistic structure, and train them on hundreds of balanced datasets via a robust semi-supervised learning framework to provide gene-disease probabilities across the human exome. Mantis-ML 2.0 incorporates natural language processing to automate disease-relevant feature selection for thousands of diseases. The enhanced models demonstrate a 6.9% average classification power boost, achieving a median receiver operating characteristic (ROC) area under curve (AUC) score of 0.90 across 5220 diseases from Human Phenotype Ontology, OpenTargets, and Genomics England. Notably, Mantis-ML 2.0 prioritizes associations from an independent UK Biobank phenome-wide association study (PheWAS), providing a stronger form of triaging and mitigating against underpowered PheWAS associations. Results are exposed through an interactive web resource.
Assuntos
Redes Neurais de Computação , Humanos , Algoritmos , Biologia Computacional/métodos , Bases de Dados Genéticas , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Fenômica/métodos , Fenótipo , Biobanco do Reino Unido , Reino UnidoRESUMO
The emergence of biobank-level datasets offers new opportunities to discover novel biomarkers and develop predictive algorithms for human disease. Here, we present an ensemble machine-learning framework (machine learning with phenotype associations, MILTON) utilizing a range of biomarkers to predict 3,213 diseases in the UK Biobank. Leveraging the UK Biobank's longitudinal health record data, MILTON predicts incident disease cases undiagnosed at time of recruitment, largely outperforming available polygenic risk scores. We further demonstrate the utility of MILTON in augmenting genetic association analyses in a phenome-wide association study of 484,230 genome-sequenced samples, along with 46,327 samples with matched plasma proteomics data. This resulted in improved signals for 88 known (P < 1 × 10-8) gene-disease relationships alongside 182 gene-disease relationships that did not achieve genome-wide significance in the nonaugmented baseline cohorts. We validated these discoveries in the FinnGen biobank alongside two orthogonal machine-learning methods built for gene-disease prioritization. All extracted gene-disease associations and incident disease predictive biomarkers are publicly available ( http://milton.public.cgr.astrazeneca.com ).
Assuntos
Bancos de Espécimes Biológicos , Biomarcadores , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Aprendizado de Máquina , Humanos , Reino Unido , Estudo de Associação Genômica Ampla/métodos , Estudos de Casos e Controles , Herança Multifatorial/genética , Proteômica/métodos , Fenótipo , Polimorfismo de Nucleotídeo Único , Algoritmos , Multiômica , Biobanco do Reino UnidoRESUMO
Antiplatelet therapy with aspirin and a platelet P2Y12 receptor antagonist reduces thrombotic and ischemic events after percutaneous coronary intervention and acute coronary syndrome. The platelet inhibitory effect of the thienopyridine clopidogrel varies widely among individuals, and high on-treatment platelet reactivity has been associated with a substantial hazard for post-PCI cardiovascular events, including stent thrombosis. The clinical availability of ex vivo methods to measure the antiplatelet effect of P2Y12 antagonists raises the possibility that incorporating platelet function testing into clinical practice could facilitate a stratified and efficient approach to antiplatelet therapy following PCI, although data from definitive randomized trials supporting a routine approach are currently lacking.
Assuntos
Plaquetas/efeitos dos fármacos , Monitoramento de Medicamentos/métodos , Inibidores da Agregação Plaquetária/uso terapêutico , Ticlopidina/análogos & derivados , Plaquetas/fisiologia , Clopidogrel , Humanos , Isquemia Miocárdica/prevenção & controle , Inibidores da Agregação Plaquetária/farmacocinética , Testes de Função Plaquetária/métodos , Trombose/prevenção & controle , Ticlopidina/farmacocinética , Ticlopidina/uso terapêuticoRESUMO
BACKGROUND: Left ventricular maximum wall thickness (LVMWT) is an important biomarker of left ventricular hypertrophy and provides diagnostic and prognostic information in hypertrophic cardiomyopathy (HCM). Limited information is available on the genetic determinants of LVMWT. METHODS: We performed a genome-wide association study of LVMWT measured from the cardiovascular magnetic resonance examinations of 42 176 European individuals. We evaluated the genetic relationship between LVMWT and HCM by performing pairwise analysis using the data from the Hypertrophic Cardiomyopathy Registry in which the controls were randomly selected from UK Biobank individuals not included in the cardiovascular magnetic resonance sub-study. RESULTS: Twenty-one genetic loci were discovered at P<5×10-8. Several novel candidate genes were identified including PROX1, PXN, and PTK2, with known functional roles in myocardial growth and sarcomere organization. The LVMWT genetic risk score is predictive of HCM in the Hypertrophic Cardiomyopathy Registry (odds ratio per SD: 1.18 [95% CI, 1.13-1.23]) with pairwise analyses demonstrating a moderate genetic correlation (rg=0.53) and substantial loci overlap (19/21). CONCLUSIONS: Our findings provide novel insights into the genetic underpinning of LVMWT and highlight its shared genetic background with HCM, supporting future endeavours to elucidate the genetic etiology of HCM.
Assuntos
Cardiomiopatia Hipertrófica , Hipertrofia Ventricular Esquerda , Humanos , Bancos de Espécimes Biológicos , Cardiomiopatia Hipertrófica/diagnóstico , Cardiomiopatia Hipertrófica/genética , Estudo de Associação Genômica Ampla , Hipertrofia Ventricular Esquerda/diagnóstico , Hipertrofia Ventricular Esquerda/genética , Reino UnidoRESUMO
BACKGROUND: A large proportion of genetic risk remains unexplained for structural heart disease involving the interventricular septum (IVS) including hypertrophic cardiomyopathy and ventricular septal defects. This study sought to develop a reproducible proxy of IVS structure from standard medical imaging, discover novel genetic determinants of IVS structure, and relate these loci to diseases of the IVS, hypertrophic cardiomyopathy, and ventricular septal defect. METHODS: We estimated the cross-sectional area of the IVS from the 4-chamber view of cardiac magnetic resonance imaging in 32 219 individuals from the UK Biobank which was used as the basis of genome wide association studies and Mendelian randomization. RESULTS: Measures of IVS cross-sectional area at diastole were a strong proxy for the 3-dimensional volume of the IVS (Pearson r=0.814, P=0.004), and correlated with anthropometric measures, blood pressure, and diagnostic codes related to cardiovascular physiology. Seven loci with clear genomic consequence and relevance to cardiovascular biology were uncovered by genome wide association studies, most notably a single nucleotide polymorphism in an intron of CDKN1A (rs2376620; ß, 7.7 mm2 [95% CI, 5.8-11.0]; P=6.0×10-10), and a common inversion incorporating KANSL1 predicted to disrupt local chromatin structure (ß, 8.4 mm2 [95% CI, 6.3-10.9]; P=4.2×10-14). Mendelian randomization suggested that inheritance of larger IVS cross-sectional area at diastole was strongly associated with hypertrophic cardiomyopathy risk (pIVW=4.6×10-10) while inheritance of smaller IVS cross-sectional area at diastole was associated with risk for ventricular septal defect (pIVW=0.007). CONCLUSIONS: Automated estimates of cross-sectional area of the IVS supports discovery of novel loci related to cardiac development and Mendelian disease. Inheritance of genetic liability for either small or large IVS, appears to confer risk for ventricular septal defect or hypertrophic cardiomyopathy, respectively. These data suggest that a proportion of risk for structural and congenital heart disease can be localized to the common genetic determinants of size and shape of cardiovascular anatomy.
Assuntos
Cardiomiopatia Hipertrófica , Comunicação Interventricular , Humanos , Estudo de Associação Genômica Ampla , Cardiomiopatia Hipertrófica/diagnóstico por imagem , Cardiomiopatia Hipertrófica/genética , Cardiomiopatia Hipertrófica/complicações , Comunicação Interventricular/diagnóstico por imagem , Comunicação Interventricular/genética , Comunicação Interventricular/complicações , Coração , Imageamento por Ressonância MagnéticaRESUMO
Hypertrophic cardiomyopathy (HCM) is an important cause of morbidity and mortality with both monogenic and polygenic components. We here report results from the largest HCM genome-wide association study (GWAS) and multi-trait analysis (MTAG) including 5,900 HCM cases, 68,359 controls, and 36,083 UK Biobank (UKB) participants with cardiac magnetic resonance (CMR) imaging. We identified a total of 70 loci (50 novel) associated with HCM, and 62 loci (32 novel) associated with relevant left ventricular (LV) structural or functional traits. Amongst the common variant HCM loci, we identify a novel HCM disease gene, SVIL, which encodes the actin-binding protein supervillin, showing that rare truncating SVIL variants cause HCM. Mendelian randomization analyses support a causal role of increased LV contractility in both obstructive and non-obstructive forms of HCM, suggesting common disease mechanisms and anticipating shared response to therapy. Taken together, the findings significantly increase our understanding of the genetic basis and molecular mechanisms of HCM, with potential implications for disease management.
RESUMO
The 3-dimensional spatial and 2-dimensional frontal QRS-T angles are measures derived from the vectorcardiogram. They are independent risk predictors for arrhythmia, but the underlying biology is unknown. Using multi-ancestry genome-wide association studies we identify 61 (58 previously unreported) loci for the spatial QRS-T angle (N = 118,780) and 11 for the frontal QRS-T angle (N = 159,715). Seven out of the 61 spatial QRS-T angle loci have not been reported for other electrocardiographic measures. Enrichments are observed in pathways related to cardiac and vascular development, muscle contraction, and hypertrophy. Pairwise genome-wide association studies with classical ECG traits identify shared genetic influences with PR interval and QRS duration. Phenome-wide scanning indicate associations with atrial fibrillation, atrioventricular block and arterial embolism and genetically determined QRS-T angle measures are associated with fascicular and bundle branch block (and also atrioventricular block for the frontal QRS-T angle). We identify potential biology involved in the QRS-T angle and their genetic relationships with cardiovascular traits and diseases, may inform future research and risk prediction.
Assuntos
Bloqueio Atrioventricular , Doenças Cardiovasculares , Humanos , Doenças Cardiovasculares/genética , Estudo de Associação Genômica Ampla , Fatores de Risco , Arritmias Cardíacas/genética , Eletrocardiografia/métodos , BiomarcadoresRESUMO
The druggability of targets is a crucial consideration in drug target selection. Here, we adopt a stochastic semi-supervised ML framework to develop DrugnomeAI, which estimates the druggability likelihood for every protein-coding gene in the human exome. DrugnomeAI integrates gene-level properties from 15 sources resulting in 324 features. The tool generates exome-wide predictions based on labelled sets of known drug targets (median AUC: 0.97), highlighting features from protein-protein interaction networks as top predictors. DrugnomeAI provides generic as well as specialised models stratified by disease type or drug therapeutic modality. The top-ranking DrugnomeAI genes were significantly enriched for genes previously selected for clinical development programs (p value < 1 × 10-308) and for genes achieving genome-wide significance in phenome-wide association studies of 450 K UK Biobank exomes for binary (p value = 1.7 × 10-5) and quantitative traits (p value = 1.6 × 10-7). We accompany our method with a web application ( http://drugnomeai.public.cgr.astrazeneca.com ) to visualise the druggability predictions and the key features that define gene druggability, per disease type and modality.
Assuntos
Aprendizado de Máquina , Software , Humanos , Sistemas de Liberação de MedicamentosRESUMO
Large reference datasets of protein-coding variation in human populations have allowed us to determine which genes and genic subregions are intolerant to germline genetic variation. There is also a growing number of genes implicated in severe Mendelian diseases that overlap with genes implicated in cancer. We hypothesized that cancer-driving mutations might be enriched in genic subregions that are depleted of germline variation relative to somatic variation. We introduce a new metric, OncMTR (oncology missense tolerance ratio), which uses 125,748 exomes in the Genome Aggregation Database (gnomAD) to identify these genic subregions. We demonstrate that OncMTR can significantly predict driver mutations implicated in hematologic malignancies. Divergent OncMTR regions were enriched for cancer-relevant protein domains, and overlaying OncMTR scores on protein structures identified functionally important protein residues. Last, we performed a rare variant, gene-based collapsing analysis on an independent set of 394,694 exomes from the UK Biobank and find that OncMTR markedly improves genetic signals for hematologic malignancies.