RESUMO
MOTIVATION: Pathogenic copy-number variants (CNVs) can cause a heterogeneous spectrum of rare and severe disorders. However, most CNVs are benign and are part of natural variation in human genomes. CNV pathogenicity classification, genotype-phenotype analyses, and therapeutic target identification are challenging and time-consuming tasks that require the integration and analysis of information from multiple scattered sources by experts. RESULTS: Here, we introduce the CNV-ClinViewer, an open-source web application for clinical evaluation and visual exploration of CNVs. The application enables real-time interactive exploration of large CNV datasets in a user-friendly designed interface and facilitates semi-automated clinical CNV interpretation following the ACMG guidelines by integrating the ClassifCNV tool. In combination with clinical judgment, the application enables clinicians and researchers to formulate novel hypotheses and guide their decision-making process. Subsequently, the CNV-ClinViewer enhances for clinical investigators' patient care and for basic scientists' translational genomic research. AVAILABILITY AND IMPLEMENTATION: The web application is freely available at https://cnv-ClinViewer.broadinstitute.org and the open-source code can be found at https://github.com/LalResearchGroup/CNV-clinviewer.
Assuntos
Variações do Número de Cópias de DNA , Software , Humanos , Genômica , Fenótipo , Genoma HumanoRESUMO
Neurodevelopmental disorders (NDDs), including severe paediatric epilepsy, autism and intellectual disabilities are heterogeneous conditions in which clinical genetic testing can often identify a pathogenic variant. For many of them, genetic therapies will be tested in this or the coming years in clinical trials. In contrast to first-generation symptomatic treatments, the new disease-modifying precision medicines require a genetic test-informed diagnosis before a patient can be enrolled in a clinical trial. However, even in 2022, most identified genetic variants in NDD genes are 'variants of uncertain significance'. To safely enrol patients in precision medicine clinical trials, it is important to increase our knowledge about which regions in NDD-associated proteins can 'tolerate' missense variants and which ones are 'essential' and will cause a NDD when mutated. In addition, knowledge about functionally indispensable regions in the 3D structure context of proteins can also provide insights into the molecular mechanisms of disease variants. We developed a novel consensus approach that overlays evolutionary, and population based genomic scores to identify 3D essential sites (Essential3D) on protein structures. After extensive benchmarking of AlphaFold predicted and experimentally solved protein structures, we generated the currently largest expert curated protein structure set for 242 NDDs and identified 14 377 Essential3D sites across 189 gene disorders associated proteins. We demonstrate that the consensus annotation of Essential3D sites improves prioritization of disease mutations over single annotations. The identified Essential3D sites were enriched for functional features such as intermembrane regions or active sites and discovered key inter-molecule interactions in protein complexes that were otherwise not annotated. Using the currently largest autism, developmental disorders, and epilepsies exome sequencing studies including >360 000 NDD patients and population controls, we found that missense variants at Essential3D sites are 8-fold enriched in patients. In summary, we developed a comprehensive protein structure set for 242 NDDs and identified 14 377 Essential3D sites in these. All data are available at https://es-ndd.broadinstitute.org for interactive visual inspection to enhance variant interpretation and development of mechanistic hypotheses for 242 NDDs genes. The provided resources will enhance clinical variant interpretation and in silico drug target development for NDD-associated genes and encoded proteins.
Assuntos
Deficiência Intelectual , Transtornos do Neurodesenvolvimento , Humanos , Criança , Transtornos do Neurodesenvolvimento/genética , Testes Genéticos , Mutação/genética , Deficiência Intelectual/genética , Mutação de Sentido IncorretoRESUMO
Understanding the exact molecular mechanisms involved in the aetiology of epileptogenic pathologies with or without tumour activity is essential for improving treatment of drug-resistant focal epilepsy. Here, we characterize the landscape of somatic genetic variants in resected brain specimens from 474 individuals with drug-resistant focal epilepsy using deep whole-exome sequencing (>350×) and whole-genome genotyping. Across the exome, we observe a greater number of somatic single-nucleotide variants in low-grade epilepsy-associated tumours (7.92 ± 5.65 single-nucleotide variants) than in brain tissue from malformations of cortical development (6.11 ± 4 single-nucleotide variants) or hippocampal sclerosis (5.1 ± 3.04 single-nucleotide variants). Tumour tissues also had the largest number of likely pathogenic variant carrying cells. low-grade epilepsy-associated tumours had the highest proportion of samples with one or more somatic copy-number variants (24.7%), followed by malformations of cortical development (5.4%) and hippocampal sclerosis (4.1%). Recurring somatic whole chromosome duplications affecting Chromosome 7 (16.8%), chromosome 5 (10.9%), and chromosome 20 (9.9%) were observed among low-grade epilepsy-associated tumours. For germline variant-associated malformations of cortical development genes such as TSC2, DEPDC5 and PTEN, germline single-nucleotide variants were frequently identified within large loss of heterozygosity regions, supporting the recently proposed 'second hit' disease mechanism in these genes. We detect somatic variants in 12 established lesional epilepsy genes and demonstrate exome-wide statistical support for three of these in the aetiology of low-grade epilepsy-associated tumours (e.g. BRAF) and malformations of cortical development (e.g. SLC35A2 and MTOR). We also identify novel significant associations for PTPN11 with low-grade epilepsy-associated tumours and NRAS Q61 mutated protein with a complex malformation of cortical development characterized by polymicrogyria and nodular heterotopia. The variants identified in NRAS are known from cancer studies to lead to hyperactivation of NRAS, which can be targeted pharmacologically. We identify large recurrent 1q21-q44 duplication including AKT3 in association with focal cortical dysplasia type 2a with hyaline astrocytic inclusions, another rare and possibly under-recognized brain lesion. The clinical-genetic analyses showed that the numbers of somatic single-nucleotide variant across the exome and the fraction of affected cells were positively correlated with the age at seizure onset and surgery in individuals with low-grade epilepsy-associated tumours. In summary, our comprehensive genetic screen sheds light on the genome-scale landscape of genetic variants in epileptic brain lesions, informs the design of gene panels for clinical diagnostic screening and guides future directions for clinical implementation of epilepsy surgery genetics.
Assuntos
Epilepsia Resistente a Medicamentos , Epilepsias Parciais , Epilepsia , Malformações do Desenvolvimento Cortical , Humanos , Epilepsia/patologia , Encéfalo/patologia , Epilepsia Resistente a Medicamentos/genética , Epilepsia Resistente a Medicamentos/cirurgia , Epilepsia Resistente a Medicamentos/metabolismo , Genômica , Malformações do Desenvolvimento Cortical/complicações , Malformações do Desenvolvimento Cortical/genética , Malformações do Desenvolvimento Cortical/metabolismo , Epilepsias Parciais/metabolismo , Nucleotídeos/metabolismoRESUMO
Genetic variants in the SLC6A1 gene can cause a broad phenotypic disease spectrum by altering the protein function. Thus, systematically curated clinically relevant genotype-phenotype associations are needed to understand the disease mechanism and improve therapeutic decision-making. We aggregated genetic and clinical data from 172 individuals with likely pathogenic/pathogenic (lp/p) SLC6A1 variants and functional data for 184 variants (14.1% lp/p). Clinical and functional data were available for a subset of 126 individuals. We explored the potential associations of variant positions on the GAT1 3D structure with variant pathogenicity, altered molecular function and phenotype severity using bioinformatic approaches. The GAT1 transmembrane domains 1, 6 and extracellular loop 4 (EL4) were enriched for patient over population variants. Across functionally tested missense variants (n = 156), the spatial proximity from the ligand was associated with loss-of-function in the GAT1 transporter activity. For variants with complete loss of in vitro GABA uptake, we found a 4.6-fold enrichment in patients having severe disease versus non-severe disease (P = 2.9 × 10-3, 95% confidence interval: 1.5-15.3). In summary, we delineated associations between the 3D structure and variant pathogenicity, variant function and phenotype in SLC6A1-related disorders. This knowledge supports biology-informed variant interpretation and research on GAT1 function. All our data can be interactively explored in the SLC6A1 portal (https://slc6a1-portal.broadinstitute.org/).
Assuntos
Proteínas da Membrana Plasmática de Transporte de GABA , Estudos de Associação Genética , Mutação de Sentido Incorreto , Humanos , Proteínas da Membrana Plasmática de Transporte de GABA/genética , Proteínas da Membrana Plasmática de Transporte de GABA/metabolismo , FenótipoRESUMO
SUMMARY: Literature exploration in PubMed on a large number of biomedical entities (e.g. genes, diseases or experiments) can be time-consuming and challenging, especially when assessing associations between entities. Here, we describe SimText, a user-friendly toolset that provides customizable and systematic workflows for the analysis of similarities among a set of entities based on text. SimText can be used for (i) text collection from PubMed and extraction of words with different text mining approaches, and (ii) interactive analysis and visualization of data using unsupervised learning techniques in an interactive app. AVAILABILITY AND IMPLEMENTATION: We developed SimText as an open-source R software and integrated it into Galaxy (https://usegalaxy.eu), an online data analysis platform with supporting self-learning training material available at https://training.galaxyproject.org. A command-line version of the toolset is available for download from GitHub (https://github.com/dlal-group/simtext) or as Docker image (https://hub.docker.com/r/dlalgroup/simtext/tags.). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Mineração de Dados , Software , Mineração de Dados/métodos , PubMed , Interpretação Estatística de Dados , Análise de DadosRESUMO
Many epilepsy-associated genes have been identified over the last three decades, revealing a remarkable molecular heterogeneity with the shared outcome of recurrent seizures. Information about the genetic landscape of epilepsies is scattered throughout the literature and answering the simple question of how many genes are associated with epilepsy is not straightforward. Here, we present a computationally driven analytical review of epilepsy-associated genes using the complete scientific literature in PubMed. Based on our search criteria, we identified a total of 738 epilepsy-associated genes. We further classified these genes into two Tiers. A broad gene list of 738 epilepsy-associated genes (Tier 2) and a narrow gene list composed of 143 epilepsy-associated genes (Tier 1). Our search criteria do not reflect the degree of association. The average yearly number of identified epilepsy-associated genes between 1992 and 2021 was 4.8. However, most of these genes were only identified in the last decade (2010-2019). Ion channels represent the largest class of epilepsy-associated genes. For many of these, both gain- and loss-of-function effects have been associated with epilepsy in recent years. We identify 28 genes frequently reported with heterogenous variant effects which should be considered for variant interpretation. Overall, our study provides an updated and manually curated list of epilepsy-related genes together with additional annotations and classifications reflecting the current genetic landscape of epilepsy.
Assuntos
Epilepsia , Humanos , Epilepsia/genética , ConvulsõesRESUMO
Copy number variants (CNV) are established risk factors for neurodevelopmental disorders with seizures or epilepsy. With the hypothesis that seizure disorders share genetic risk factors, we pooled CNV data from 10,590 individuals with seizure disorders, 16,109 individuals with clinically validated epilepsy, and 492,324 population controls and identified 25 genome-wide significant loci, 22 of which are novel for seizure disorders, such as deletions at 1p36.33, 1q44, 2p21-p16.3, 3q29, 8p23.3-p23.2, 9p24.3, 10q26.3, 15q11.2, 15q12-q13.1, 16p12.2, 17q21.31, duplications at 2q13, 9q34.3, 16p13.3, 17q12, 19p13.3, 20q13.33, and reciprocal CNVs at 16p11.2, and 22q11.21. Using genetic data from additional 248,751 individuals with 23 neuropsychiatric phenotypes, we explored the pleiotropy of these 25 loci. Finally, in a subset of individuals with epilepsy and detailed clinical data available, we performed phenome-wide association analyses between individual CNVs and clinical annotations categorized through the Human Phenotype Ontology (HPO). For six CNVs, we identified 19 significant associations with specific HPO terms and generated, for all CNVs, phenotype signatures across 17 clinical categories relevant for epileptologists. This is the most comprehensive investigation of CNVs in epilepsy and related seizure disorders, with potential implications for clinical practice.