RESUMEN
Integrating human genomics and proteomics can help elucidate disease mechanisms, identify clinical biomarkers and discover drug targets1-4. Because previous proteogenomic studies have focused on common variation via genome-wide association studies, the contribution of rare variants to the plasma proteome remains largely unknown. Here we identify associations between rare protein-coding variants and 2,923 plasma protein abundances measured in 49,736 UK Biobank individuals. Our variant-level exome-wide association study identified 5,433 rare genotype-protein associations, of which 81% were undetected in a previous genome-wide association study of the same cohort5. We then looked at aggregate signals using gene-level collapsing analysis, which revealed 1,962 gene-protein associations. Of the 691 gene-level signals from protein-truncating variants, 99.4% were associated with decreased protein levels. STAB1 and STAB2, encoding scavenger receptors involved in plasma protein clearance, emerged as pleiotropic loci, with 77 and 41 protein associations, respectively. We demonstrate the utility of our publicly accessible resource through several applications. These include detailing an allelic series in NLRC4, identifying potential biomarkers for a fatty liver disease-associated variant in HSD17B13 and bolstering phenome-wide association studies by integrating protein quantitative trait loci with protein-truncating variants in collapsing analyses. Finally, we uncover distinct proteomic consequences of clonal haematopoiesis (CH), including an association between TET2-CH and increased FLT3 levels. Our results highlight a considerable role for rare variation in plasma protein abundance and the value of proteogenomics in therapeutic discovery.
Asunto(s)
Bancos de Muestras Biológicas , Proteínas Sanguíneas , Estudios de Asociación Genética , Genómica , Proteómica , Humanos , Alelos , Biomarcadores/sangre , Proteínas Sanguíneas/análisis , Proteínas Sanguíneas/genética , Bases de Datos Factuales , Exoma/genética , Hematopoyesis , Mutación , Plasma/química , Reino UnidoRESUMEN
Genome-wide association studies have uncovered thousands of common variants associated with human disease, but the contribution of rare variants to common disease remains relatively unexplored. The UK Biobank contains detailed phenotypic data linked to medical records for approximately 500,000 participants, offering an unprecedented opportunity to evaluate the effect of rare variation on a broad collection of traits1,2. Here we study the relationships between rare protein-coding variants and 17,361 binary and 1,419 quantitative phenotypes using exome sequencing data from 269,171 UK Biobank participants of European ancestry. Gene-based collapsing analyses revealed 1,703 statistically significant gene-phenotype associations for binary traits, with a median odds ratio of 12.4. Furthermore, 83% of these associations were undetectable via single-variant association tests, emphasizing the power of gene-based collapsing analysis in the setting of high allelic heterogeneity. Gene-phenotype associations were also significantly enriched for loss-of-function-mediated traits and approved drug targets. Finally, we performed ancestry-specific and pan-ancestry collapsing analyses using exome sequencing data from 11,933 UK Biobank participants of African, East Asian or South Asian ancestry. Our results highlight a significant contribution of rare variants to common disease. Summary statistics are publicly available through an interactive portal ( http://azphewas.com/ ).
Asunto(s)
Bancos de Muestras Biológicas , Bases de Datos Genéticas , Enfermedad/genética , Exoma/genética , Variación Genética/genética , Adulto , Anciano , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Persona de Mediana Edad , Fenotipo , Proteínas/química , Proteínas/genética , Reino Unido , Secuenciación del ExomaRESUMEN
Genome-wide association studies (GWASs) have established the contribution of common and low-frequency variants to metabolic blood measurements in the UK Biobank (UKB). To complement existing GWAS findings, we assessed the contribution of rare protein-coding variants in relation to 355 metabolic blood measurements-including 325 predominantly lipid-related nuclear magnetic resonance (NMR)-derived blood metabolite measurements (Nightingale Health Plc) and 30 clinical blood biomarkers-using 412,393 exome sequences from four genetically diverse ancestries in the UKB. Gene-level collapsing analyses were conducted to evaluate a diverse range of rare-variant architectures for the metabolic blood measurements. Altogether, we identified significant associations (p < 1 × 10-8) for 205 distinct genes that involved 1,968 significant relationships for the Nightingale blood metabolite measurements and 331 for the clinical blood biomarkers. These include associations for rare non-synonymous variants in PLIN1 and CREB3L3 with lipid metabolite measurements and SYT7 with creatinine, among others, which may not only provide insights into novel biology but also deepen our understanding of established disease mechanisms. Of the study-wide significant clinical biomarker associations, 40% were not previously detected on analyzing coding variants in a GWAS in the same cohort, reinforcing the importance of studying rare variation to fully understand the genetic architecture of metabolic blood measurements.
Asunto(s)
Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Bancos de Muestras Biológicas , Biomarcadores , Lípidos , Reino Unido , Polimorfismo de Nucleótido SimpleRESUMEN
Synonymous mutations change the DNA sequence of a gene without affecting the amino acid sequence of the encoded protein. Although some synonymous mutations can affect RNA splicing, translational efficiency, and mRNA stability, studies in human genetics, mutagenesis screens, and other experiments and evolutionary analyses have repeatedly shown that most synonymous variants are neutral or only weakly deleterious, with some notable exceptions. Based on a recent study in yeast, there have been claims that synonymous mutations could be as important as nonsynonymous mutations in causing disease, assuming the yeast findings hold up and translate to humans. Here, we argue that there is insufficient evidence to overturn the large, coherent body of knowledge establishing the predominant neutrality of synonymous variants in the human genome.
Asunto(s)
Evolución Biológica , Saccharomyces cerevisiae , Humanos , Mutación/genética , Secuencia de Aminoácidos , Genoma Humano/genéticaRESUMEN
Large-scale phenome-wide association studies performed using densely-phenotyped cohorts such as the UK Biobank (UKB), reveal many statistically robust gene-phenotype relationships for both clinical and continuous traits. Here, we present Gene-SCOUT, a tool used to identify genes with similar continuous trait fingerprints to a gene of interest. A fingerprint reflects the continuous traits identified to be statistically associated with a gene of interest based on multiple underlying rare variant genetic architectures. Similarities between genes are evaluated by the cosine similarity measure, to capture concordant effect directionality, elucidating clusters of genes in a high dimensional space. The underlying gene-biomarker population-scale association statistics were obtained from a gene-level rare variant collapsing analysis performed on over 1500 continuous traits using 394 692 UKB participant exomes, with additional metabolomic trait associations provided through Nightingale Health's recent study of 121 394 of these participants. We demonstrate that gene similarity estimates from Gene-SCOUT provide stronger enrichments for clinical traits compared to existing methods. Furthermore, we provide a fully interactive web-resource (http://genescout.public.cgr.astrazeneca.com) to explore the pre-calculated exome-wide similarities. This resource enables a user to examine the biological relevance of the most similar genes for Gene Ontology (GO) enrichment and UKB clinical trait enrichment statistics, as well as a detailed breakdown of the traits underpinning a given fingerprint.
Asunto(s)
Estudio de Asociación del Genoma Completo , Fenómica , Humanos , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Secuenciación del Exoma , Exoma , Polimorfismo de Nucleótido SimpleRESUMEN
Access to large-scale genomics datasets has increased the utility of hypothesis-free genome-wide analyses. However, gene signals are often insufficiently powered to reach experiment-wide significance, triggering a process of laborious triaging of genomic-association-study results. We introduce mantis-ml, a multi-dimensional, multi-step machine-learning framework that allows objective assessment of the biological relevance of genes to disease studies. Mantis-ml is an automated machine-learning framework that follows a multi-model approach of stochastic semi-supervised learning to rank disease-associated genes through iterative learning sessions on random balanced datasets across the protein-coding exome. When applied to a range of human diseases, including chronic kidney disease (CKD), epilepsy, and amyotrophic lateral sclerosis (ALS), mantis-ml achieved an average area under curve (AUC) prediction performance of 0.81-0.89. Critically, to prove its value as a tool that can be used to interpret exome-wide association studies, we overlapped mantis-ml predictions with data from published cohort-level association studies. We found a statistically significant enrichment of high mantis-ml predictions among the highest-ranked genes from hypothesis-free cohort-level statistics, indicating a substantial improvement over the performance of current state-of-the-art methods and pointing to the capture of true prioritization signals for disease-associated genes. Finally, we introduce a generic mantis-ml score (GMS) trained with over 1,200 features as a generic-disease-likelihood estimator, outperforming published gene-level scores. In addition to our tool, we provide a gene prioritization atlas that includes mantis-ml's predictions across ten disease areas and empowers researchers to interactively navigate through the gene-triaging framework. Mantis-ml is an intuitive tool that supports the objective triaging of large-scale genomic discovery studies and enhances our understanding of complex genotype-phenotype associations.
Asunto(s)
Esclerosis Amiotrófica Lateral/genética , Epilepsia/genética , Genómica/métodos , Insuficiencia Renal Crónica/genética , Aprendizaje Automático Supervisado , Animales , Área Bajo la Curva , Aprendizaje Profundo , Modelos Animales de Enfermedad , Exoma/genética , Estudios de Asociación Genética , Humanos , Ratones , Redes Neurales de la Computación , Curva ROC , Reproducibilidad de los Resultados , Procesos EstocásticosRESUMEN
A fundamental principle in biology is that the program for early development is established during oogenesis in the form of the maternal transcriptome. How the maternal transcriptome acquires the appropriate content and dosage of transcripts is not fully understood. Here we show that 3' terminal uridylation of mRNA mediated by TUT4 and TUT7 sculpts the mouse maternal transcriptome by eliminating transcripts during oocyte growth. Uridylation mediated by TUT4 and TUT7 is essential for both oocyte maturation and fertility. In comparison to somatic cells, the oocyte transcriptome has a shorter poly(A) tail and a higher relative proportion of terminal oligo-uridylation. Deletion of TUT4 and TUT7 leads to the accumulation of a cohort of transcripts with a high frequency of very short poly(A) tails, and a loss of 3' oligo-uridylation. By contrast, deficiency of TUT4 and TUT7 does not alter gene expression in a variety of somatic cells. In summary, we show that poly(A) tail length and 3' terminal uridylation have essential and specific functions in shaping a functional maternal transcriptome.
Asunto(s)
Herencia Materna/genética , Oocitos/metabolismo , Poli A/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Transcriptoma , Uridina Monofosfato/metabolismo , Animales , Línea Celular , Proteínas de Unión al ADN/deficiencia , Proteínas de Unión al ADN/genética , Femenino , Infertilidad Femenina/genética , Masculino , Ratones , Ratones Noqueados , Madres , Nucleotidiltransferasas/deficiencia , Nucleotidiltransferasas/genética , Oocitos/crecimiento & desarrollo , Especificidad de Órganos , Poli A/química , Estabilidad del ARNRESUMEN
BACKGROUND: Studies have identified many common genetic associations that influence renal function and all-cause CKD, but these explain only a small fraction of variance in these traits. The contribution of rare variants has not been systematically examined. METHODS: We performed exome sequencing of 3150 individuals, who collectively encompassed diverse CKD subtypes, and 9563 controls. To detect causal genes and evaluate the contribution of rare variants we used collapsing analysis, in which we compared the proportion of cases and controls carrying rare variants per gene. RESULTS: The analyses captured five established monogenic causes of CKD: variants in PKD1, PKD2, and COL4A5 achieved study-wide significance, and we observed suggestive case enrichment for COL4A4 and COL4A3. Beyond known disease-associated genes, collapsing analyses incorporating regional variant intolerance identified suggestive dominant signals in CPT2 and several other candidate genes. Biallelic mutations in CPT2 cause carnitine palmitoyltransferase II deficiency, sometimes associated with rhabdomyolysis and acute renal injury. Genetic modifier analysis among cases with APOL1 risk genotypes identified a suggestive signal in AHDC1, implicated in Xia-Gibbs syndrome, which involves intellectual disability and other features. On the basis of the observed distribution of rare variants, we estimate that a two- to three-fold larger cohort would provide 80% power to implicate new genes for all-cause CKD. CONCLUSIONS: This study demonstrates that rare-variant collapsing analyses can validate known genes and identify candidate genes and modifiers for kidney disease. In so doing, these findings provide a motivation for larger-scale investigation of rare-variant risk contributions across major clinical CKD categories.
Asunto(s)
Colágeno Tipo IV/genética , Secuenciación del Exoma , Variación Genética/genética , Proteínas Quinasas/genética , Insuficiencia Renal Crónica/genética , Canales Catiónicos TRPP/genética , Estudios de Casos y Controles , Femenino , Humanos , Masculino , Pronóstico , Proteína Quinasa D2 , Valores de Referencia , Insuficiencia Renal Crónica/diagnósticoRESUMEN
MicroRNAs are important genetic regulators in both animals and plants. They have a range of functions spanning development, differentiation, growth, metabolism and disease. The advent of next-generation sequencing technologies has made it a relatively straightforward task to detect these molecules and their relative expression via sequencing. There are a large number of published studies with deposited datasets. However, there are currently few resources that capitalize on these data to better understand the features, distribution and biogenesis of miRNAs. Herein, we focus on Human and Mouse for which the majority of data are available. We reanalyse sequencing data from 461 samples into a coordinated catalog of microRNA expression. We use this to perform large-scale analyses of miRNA function and biogenesis. These analyses include global expression comparison, co-expression of miRNA clusters and the prediction of miRNA strand-specificity and underlying constraints. Additionally, we report for the first time a global analysis of miRNA epi-transcriptomic modifications and assess their prevalence across tissues, samples and families. Finally, we report a list of potentially mis-annotated miRNAs in miRBase based on their aggregated modification profiles. The results have been collated into a comprehensive online repository of miRNA expression and features such as modifications and RNA editing events, which is available at: http://wwwdev.ebi.ac.uk/enright-dev/miratlas. We believe these findings will further contribute to our understanding of miRNA function in animals and benefit the miRNA community in general.
Asunto(s)
MicroARNs/genética , MicroARNs/metabolismo , Animales , Bases de Datos de Ácidos Nucleicos , Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Ratones , Anotación de Secuencia Molecular , Familia de Multigenes , Procesamiento Postranscripcional del ARN , Análisis de Secuencia de ARN , TranscriptomaRESUMEN
The discovery of microRNAs (miRNAs) remains an important problem, particularly given the growth of high-throughput sequencing, cell sorting and single cell biology. While a large number of miRNAs have already been annotated, there may well be large numbers of miRNAs that are expressed in very particular cell types and remain elusive. Sequencing allows us to quickly and accurately identify the expression of known miRNAs from small RNA-Seq data. The biogenesis of miRNAs leads to very specific characteristics observed in their sequences. In brief, miRNAs usually have a well-defined 5' end and a more flexible 3' end with the possibility of 3' tailing events, such as uridylation. Previous approaches to the prediction of novel miRNAs usually involve the analysis of structural features of miRNA precursor hairpin sequences obtained from genome sequence. We surmised that it may be possible to identify miRNAs by using these biogenesis features observed directly from sequenced reads, solely or in addition to structural analysis from genome data. To this end, we have developed mirnovo, a machine learning based algorithm, which is able to identify known and novel miRNAs in animals and plants directly from small RNA-Seq data, with or without a reference genome. This method performs comparably to existing tools, however is simpler to use with reduced run time. Its performance and accuracy has been tested on multiple datasets, including species with poorly assembled genomes, RNaseIII (Drosha and/or Dicer) deficient samples and single cells (at both embryonic and adult stage).
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Aprendizaje Automático , MicroARNs/química , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Algoritmos , Animales , Perfilación de la Expresión Génica , Genómica , Humanos , Ratones , MicroARNs/metabolismo , ARN de Planta/química , ARN Pequeño no Traducido/química , Ribonucleasa III/genética , Análisis de la Célula IndividualRESUMEN
Summary: BioPAXViz is a Cytoscape (version 3) application, providing a comprehensive framework for metabolic pathway visualization. Beyond the basic parsing, viewing and browsing roles, the main novel function that BioPAXViz provides is a visual comparative analysis of metabolic pathway topologies across pre-computed pathway phylogenomic profiles given a species phylogeny. Furthermore, BioPAXViz supports the display of hierarchical trees that allow efficient navigation through sets of variants of a single reference pathway. Thus, BioPAXViz can significantly facilitate, and contribute to, the study of metabolic pathway evolution and engineering. Availability and Implementation: BioPAXViz has been developed as a Cytoscape app and is available at: https://github.com/CGU-CERTH/BioPAX.Viz. The software is distributed under the MIT License and is accompanied by example files and data. Additional documentation is available at the aforementioned GitHub repository. Contact: ouzounis@certh.gr.
Asunto(s)
Biología Computacional/métodos , Evolución Molecular , Redes y Vías Metabólicas/genética , Programas Informáticos , FilogeniaRESUMEN
UNLABELLED: Chimira is a web-based system for microRNA (miRNA) analysis from small RNA-Seq data. Sequences are automatically cleaned, trimmed, size selected and mapped directly to miRNA hairpin sequences. This generates count-based miRNA expression data for subsequent statistical analysis. Moreover, it is capable of identifying epi-transcriptomic modifications in the input sequences. Supported modification types include multiple types of 3'-modifications (e.g. uridylation, adenylation), 5'-modifications and also internal modifications or variation (ADAR editing or single nucleotide polymorphisms). Besides cleaning and mapping of input sequences to miRNAs, Chimira provides a simple and intuitive set of tools for the analysis and interpretation of the results (see also Supplementary Material). These allow the visual study of the differential expression between two specific samples or sets of samples, the identification of the most highly expressed miRNAs within sample pairs (or sets of samples) and also the projection of the modification profile for specific miRNAs across all samples. Other tools have already been published in the past for various types of small RNA-Seq analysis, such as UEA workbench, seqBuster, MAGI, OASIS and CAP-miRSeq, CPSS for modifications identification. A comprehensive comparison of Chimira with each of these tools is provided in the Supplementary Material. Chimira outperforms all of these tools in total execution speed and aims to facilitate simple, fast and reliable analysis of small RNA-Seq data allowing also, for the first time, identification of global microRNA modification profiles in a simple intuitive interface. AVAILABILITY AND IMPLEMENTATION: Chimira has been developed as a web application and it is accessible here: http://www.ebi.ac.uk/research/enright/software/chimira. CONTACT: aje@ebi.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
MicroARNs/química , MicroARNs/metabolismo , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Humanos , ARN Pequeño no Traducido/químicaRESUMEN
The ongoing expansion of human genomic datasets propels therapeutic target identification; however, extracting gene-disease associations from gene annotations remains challenging. Here, we introduce Mantis-ML 2.0, a framework integrating AstraZeneca's Biological Insights Knowledge Graph and numerous tabular datasets, to assess gene-disease probabilities throughout the phenome. We use graph neural networks, capturing the graph's holistic structure, and train them on hundreds of balanced datasets via a robust semi-supervised learning framework to provide gene-disease probabilities across the human exome. Mantis-ML 2.0 incorporates natural language processing to automate disease-relevant feature selection for thousands of diseases. The enhanced models demonstrate a 6.9% average classification power boost, achieving a median receiver operating characteristic (ROC) area under curve (AUC) score of 0.90 across 5220 diseases from Human Phenotype Ontology, OpenTargets, and Genomics England. Notably, Mantis-ML 2.0 prioritizes associations from an independent UK Biobank phenome-wide association study (PheWAS), providing a stronger form of triaging and mitigating against underpowered PheWAS associations. Results are exposed through an interactive web resource.
Asunto(s)
Redes Neurales de la Computación , Humanos , Algoritmos , Biología Computacional/métodos , Bases de Datos Genéticas , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Genómica/métodos , Fenómica/métodos , Fenotipo , Biobanco del Reino Unido , Reino UnidoRESUMEN
The emergence of biobank-level datasets offers new opportunities to discover novel biomarkers and develop predictive algorithms for human disease. Here, we present an ensemble machine-learning framework (machine learning with phenotype associations, MILTON) utilizing a range of biomarkers to predict 3,213 diseases in the UK Biobank. Leveraging the UK Biobank's longitudinal health record data, MILTON predicts incident disease cases undiagnosed at time of recruitment, largely outperforming available polygenic risk scores. We further demonstrate the utility of MILTON in augmenting genetic association analyses in a phenome-wide association study of 484,230 genome-sequenced samples, along with 46,327 samples with matched plasma proteomics data. This resulted in improved signals for 88 known (P < 1 × 10-8) gene-disease relationships alongside 182 gene-disease relationships that did not achieve genome-wide significance in the nonaugmented baseline cohorts. We validated these discoveries in the FinnGen biobank alongside two orthogonal machine-learning methods built for gene-disease prioritization. All extracted gene-disease associations and incident disease predictive biomarkers are publicly available ( http://milton.public.cgr.astrazeneca.com ).
Asunto(s)
Bancos de Muestras Biológicas , Biomarcadores , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Aprendizaje Automático , Humanos , Reino Unido , Estudio de Asociación del Genoma Completo/métodos , Estudios de Casos y Controles , Herencia Multifactorial/genética , Proteómica/métodos , Fenotipo , Polimorfismo de Nucleótido Simple , Algoritmos , Multiómica , Biobanco del Reino UnidoRESUMEN
Telomeres protect chromosome ends from damage and their length is linked with human disease and aging. We developed a joint telomere length metric, combining quantitative PCR and whole-genome sequencing measurements from 462,666 UK Biobank participants. This metric increased SNP heritability, suggesting that it better captures genetic regulation of telomere length. Exome-wide rare-variant and gene-level collapsing association studies identified 64 variants and 30 genes significantly associated with telomere length, including allelic series in ACD and RTEL1. Notably, 16% of these genes are known drivers of clonal hematopoiesis-an age-related somatic mosaicism associated with myeloid cancers and several nonmalignant diseases. Somatic variant analyses revealed gene-specific associations with telomere length, including lengthened telomeres in individuals with large SRSF2-mutant clones, compared with shortened telomeres in individuals with clonal expansions driven by other genes. Collectively, our findings demonstrate the impact of rare variants on telomere length, with larger effects observed among genes also associated with clonal hematopoiesis.
Asunto(s)
Bancos de Muestras Biológicas , Polimorfismo de Nucleótido Simple , Telómero , Secuenciación Completa del Genoma , Humanos , Telómero/genética , Reino Unido , Secuenciación Completa del Genoma/métodos , Homeostasis del Telómero/genética , Masculino , Femenino , Hematopoyesis Clonal/genética , Estudio de Asociación del Genoma Completo/métodos , Anciano , ADN Helicasas/genética , Persona de Mediana Edad , Biobanco del Reino UnidoRESUMEN
The druggability of targets is a crucial consideration in drug target selection. Here, we adopt a stochastic semi-supervised ML framework to develop DrugnomeAI, which estimates the druggability likelihood for every protein-coding gene in the human exome. DrugnomeAI integrates gene-level properties from 15 sources resulting in 324 features. The tool generates exome-wide predictions based on labelled sets of known drug targets (median AUC: 0.97), highlighting features from protein-protein interaction networks as top predictors. DrugnomeAI provides generic as well as specialised models stratified by disease type or drug therapeutic modality. The top-ranking DrugnomeAI genes were significantly enriched for genes previously selected for clinical development programs (p value < 1 × 10-308) and for genes achieving genome-wide significance in phenome-wide association studies of 450 K UK Biobank exomes for binary (p value = 1.7 × 10-5) and quantitative traits (p value = 1.6 × 10-7). We accompany our method with a web application ( http://drugnomeai.public.cgr.astrazeneca.com ) to visualise the druggability predictions and the key features that define gene druggability, per disease type and modality.
Asunto(s)
Aprendizaje Automático , Programas Informáticos , Humanos , Sistemas de Liberación de MedicamentosRESUMEN
Large reference datasets of protein-coding variation in human populations have allowed us to determine which genes and genic subregions are intolerant to germline genetic variation. There is also a growing number of genes implicated in severe Mendelian diseases that overlap with genes implicated in cancer. We hypothesized that cancer-driving mutations might be enriched in genic subregions that are depleted of germline variation relative to somatic variation. We introduce a new metric, OncMTR (oncology missense tolerance ratio), which uses 125,748 exomes in the Genome Aggregation Database (gnomAD) to identify these genic subregions. We demonstrate that OncMTR can significantly predict driver mutations implicated in hematologic malignancies. Divergent OncMTR regions were enriched for cancer-relevant protein domains, and overlaying OncMTR scores on protein structures identified functionally important protein residues. Last, we performed a rare variant, gene-based collapsing analysis on an independent set of 394,694 exomes from the UK Biobank and find that OncMTR markedly improves genetic signals for hematologic malignancies.
Asunto(s)
Mutación de Línea Germinal , Neoplasias Hematológicas , Células Germinativas , Neoplasias Hematológicas/genética , HumanosRESUMEN
We performed collapsing analyses on 454,796 UK Biobank (UKB) exomes to detect gene-level associations with diabetes. Recessive carriers of nonsynonymous variants in MAP3K15 were 30% less likely to develop diabetes (P = 5.7 × 10-10) and had lower glycosylated hemoglobin (ß = -0.14 SD units, P = 1.1 × 10-24). These associations were independent of body mass index, suggesting protection against insulin resistance even in the setting of obesity. We replicated these findings in 96,811 Admixed Americans in the Mexico City Prospective Study (P < 0.05)Moreover, the protective effect of MAP3K15 variants was stronger in individuals who did not carry the Latino-enriched SLC16A11 risk haplotype (P = 6.0 × 10-4). Separately, we identified a Finnish-enriched MAP3K15 protein-truncating variant associated with decreased odds of both type 1 and type 2 diabetes (P < 0.05) in FinnGen. No adverse phenotypes were associated with protein-truncating MAP3K15 variants in the UKB, supporting this gene as a therapeutic target for diabetes.
Asunto(s)
Diabetes Mellitus Tipo 2 , Quinasas Quinasa Quinasa PAM , Humanos , Diabetes Mellitus Tipo 2/genética , Predisposición Genética a la Enfermedad , Transportadores de Ácidos Monocarboxílicos/genética , Obesidad/genética , Estudios Prospectivos , Quinasas Quinasa Quinasa PAM/genéticaRESUMEN
Elucidating functionality in non-coding regions is a key challenge in human genomics. It has been shown that intolerance to variation of coding and proximal non-coding sequence is a strong predictor of human disease relevance. Here, we integrate intolerance to variation, functional genomic annotations and primary genomic sequence to build JARVIS: a comprehensive deep learning model to prioritize non-coding regions, outperforming other human lineage-specific scores. Despite being agnostic to evolutionary conservation, JARVIS performs comparably or outperforms conservation-based scores in classifying pathogenic single-nucleotide and structural variants. In constructing JARVIS, we introduce the genome-wide residual variation intolerance score (gwRVIS), applying a sliding-window approach to whole genome sequencing data from 62,784 individuals. gwRVIS distinguishes Mendelian disease genes from more tolerant CCDS regions and highlights ultra-conserved non-coding elements as the most intolerant regions in the human genome. Both JARVIS and gwRVIS capture previously inaccessible human-lineage constraint information and will enhance our understanding of the non-coding genome.
Asunto(s)
Aprendizaje Profundo , Genoma Humano , Genómica , ADN Intergénico , Variación Genética , Humanos , Análisis de Secuencia de ADN , Secuenciación Completa del GenomaRESUMEN
Idiopathic pulmonary fibrosis (IPF) is a fatal disorder characterised by progressive, destructive lung scarring. Despite substantial progress, the genetic determinants of this disease remain incompletely defined. Using whole genome and whole exome sequencing data from 752 individuals with sporadic IPF and 119,055 UK Biobank controls, we performed a variant-level exome-wide association study (ExWAS) and gene-level collapsing analyses. Our variant-level analysis revealed a novel association between a rare missense variant in SPDL1 and IPF (NM_017785.5:g.169588475 G > A p.Arg20Gln; p = 2.4 × 10-7, odds ratio = 2.87, 95% confidence interval: 2.03-4.07). This signal was independently replicated in the FinnGen cohort, which contains 1028 cases and 196,986 controls (combined p = 2.2 × 10-20), firmly associating this variant as an IPF risk allele. SPDL1 encodes Spindly, a protein involved in mitotic checkpoint signalling during cell division that has not been previously described in fibrosis. To the best of our knowledge, these results highlight a novel mechanism underlying IPF, providing the potential for new therapeutic discoveries in a disease of great unmet need.