ABSTRACT
MOTIVATION: Pathogenic copy-number variants (CNVs) can cause a heterogeneous spectrum of rare and severe disorders. However, most CNVs are benign and are part of natural variation in human genomes. CNV pathogenicity classification, genotype-phenotype analyses, and therapeutic target identification are challenging and time-consuming tasks that require the integration and analysis of information from multiple scattered sources by experts. RESULTS: Here, we introduce the CNV-ClinViewer, an open-source web application for clinical evaluation and visual exploration of CNVs. The application enables real-time interactive exploration of large CNV datasets in a user-friendly designed interface and facilitates semi-automated clinical CNV interpretation following the ACMG guidelines by integrating the ClassifCNV tool. In combination with clinical judgment, the application enables clinicians and researchers to formulate novel hypotheses and guide their decision-making process. Subsequently, the CNV-ClinViewer enhances for clinical investigators' patient care and for basic scientists' translational genomic research. AVAILABILITY AND IMPLEMENTATION: The web application is freely available at https://cnv-ClinViewer.broadinstitute.org and the open-source code can be found at https://github.com/LalResearchGroup/CNV-clinviewer.
Subject(s)
DNA Copy Number Variations , Software , Humans , Genomics , Phenotype , Genome, HumanABSTRACT
OBJECTIVE: Determining the pathogenicity of missense variants in clinical genetic tests for individuals with epilepsy is crucial for guiding personalized treatment. However, achieving a definitive pathogenic classification remains challenging, with most missense variants still classified as variants of uncertain significance (VUS) and with the availability of many computational tools which may provide conflicting predictions. Here, we aim to evaluate the performance of state-of-the-art computational tools in pathogenicity prediction of missense variants in epilepsy-associated genes. This will assist in selecting the most appropriate tool and critically assess their use in clinical setting. METHODS: We assessed the performance of nine in silico pathogenicity prediction tools for missense variants in epilepsy-associated genes on three carefully curated data sets. The first two data sets comprise missense variants in epilepsy associated genes that have been uploaded to ClinVar in the last year and were, therefore, not part of the training set of any of the nine considered tools. These two data sets are based on two different lists of epilepsy-associated genes and comprise ~700 and ~ 250 missense variants, respectively. The third data set includes ~400 missense variants within epilepsy-associated genes for which the functional effects have been determined experimentally and are therefore used here to infer pathogenicity. These three data sets represent the best available approximation to blind and independent test sets. RESULTS: Among the nine assessed tools, AlphaMissense (area under the curve [AUC]: .93, .88, and .95) and REVEL (AUC: .93, .88, and .93) showed the best classification performance, also outperforming other tools in the number of classified variants. SIGNIFICANCE: We show which recently developed prediction tools achieve higher performance in epilepsy-associated genes and should be integrated, therefore, into the American College of Medical Genetics and Genomics/Association of Molecular Pathology (AGMC/AMP) variant classification process. Periodic reevaluation of genetic test results with newly developed or updated tools should be incorporated into standard clinical practice to improve diagnostic yield and better inform precision medicine.
ABSTRACT
OBJECTIVE: SCN1A variants are associated with epilepsy syndromes ranging from mild genetic epilepsy with febrile seizures plus (GEFS+) to severe Dravet syndrome (DS). Many variants are de novo, making early phenotype prediction difficult, and genotype-phenotype associations remain poorly understood. METHODS: We assessed data from a retrospective cohort of 1018 individuals with SCN1A-related epilepsies. We explored relationships between variant characteristics (position, in silico prediction scores: Combined Annotation Dependent Depletion (CADD), Rare Exome Variant Ensemble Learner (REVEL), SCN1A genetic score), seizure characteristics, and epilepsy phenotype. RESULTS: DS had earlier seizure onset than other GEFS+ phenotypes (5.3 vs. 12.0 months, p < .001). In silico variant scores were higher in DS versus GEFS+ (p < .001). Patients with missense variants in functionally important regions (conserved N-terminus, S4-S6) exhibited earlier seizure onset (6.0 vs. 7.0 months, p = .003) and were more likely to have DS (280/340); those with missense variants in nonconserved regions had later onset (10.0 vs. 7.0 months, p = .036) and were more likely to have GEFS+ (15/29, χ2 = 19.16, p < .001). A minority of protein-truncating variants were associated with GEFS+ (10/393) and more likely to be located in the proximal first and last exon coding regions than elsewhere in the gene (9.7% vs. 1.0%, p < .001). Carriers of the same missense variant exhibited less variability in age at seizure onset compared with carriers of different missense variants for both DS (1.9 vs. 2.9 months, p = .001) and GEFS+ (8.0 vs. 11.0 months, p = .043). Status epilepticus as presenting seizure type is a highly specific (95.2%) but nonsensitive (32.7%) feature of DS. SIGNIFICANCE: Understanding genotype-phenotype associations in SCN1A-related epilepsies is critical for early diagnosis and management. We demonstrate an earlier disease onset in patients with missense variants in important functional regions, the occurrence of GEFS+ truncating variants, and the value of in silico prediction scores. Status epilepticus as initial seizure type is a highly specific, but not sensitive, early feature of DS.
Subject(s)
Epilepsies, Myoclonic , Epilepsy , Seizures, Febrile , Status Epilepticus , Humans , Retrospective Studies , NAV1.1 Voltage-Gated Sodium Channel/genetics , Epilepsy/genetics , Epilepsy/diagnosis , Epilepsies, Myoclonic/genetics , Seizures, Febrile/genetics , Phenotype , Genetic Association Studies , Mutation/geneticsABSTRACT
Neurodevelopmental disorders (NDDs), including severe paediatric epilepsy, autism and intellectual disabilities are heterogeneous conditions in which clinical genetic testing can often identify a pathogenic variant. For many of them, genetic therapies will be tested in this or the coming years in clinical trials. In contrast to first-generation symptomatic treatments, the new disease-modifying precision medicines require a genetic test-informed diagnosis before a patient can be enrolled in a clinical trial. However, even in 2022, most identified genetic variants in NDD genes are 'variants of uncertain significance'. To safely enrol patients in precision medicine clinical trials, it is important to increase our knowledge about which regions in NDD-associated proteins can 'tolerate' missense variants and which ones are 'essential' and will cause a NDD when mutated. In addition, knowledge about functionally indispensable regions in the 3D structure context of proteins can also provide insights into the molecular mechanisms of disease variants. We developed a novel consensus approach that overlays evolutionary, and population based genomic scores to identify 3D essential sites (Essential3D) on protein structures. After extensive benchmarking of AlphaFold predicted and experimentally solved protein structures, we generated the currently largest expert curated protein structure set for 242 NDDs and identified 14 377 Essential3D sites across 189 gene disorders associated proteins. We demonstrate that the consensus annotation of Essential3D sites improves prioritization of disease mutations over single annotations. The identified Essential3D sites were enriched for functional features such as intermembrane regions or active sites and discovered key inter-molecule interactions in protein complexes that were otherwise not annotated. Using the currently largest autism, developmental disorders, and epilepsies exome sequencing studies including >360 000 NDD patients and population controls, we found that missense variants at Essential3D sites are 8-fold enriched in patients. In summary, we developed a comprehensive protein structure set for 242 NDDs and identified 14 377 Essential3D sites in these. All data are available at https://es-ndd.broadinstitute.org for interactive visual inspection to enhance variant interpretation and development of mechanistic hypotheses for 242 NDDs genes. The provided resources will enhance clinical variant interpretation and in silico drug target development for NDD-associated genes and encoded proteins.
Subject(s)
Intellectual Disability , Neurodevelopmental Disorders , Humans , Child , Neurodevelopmental Disorders/genetics , Genetic Testing , Mutation/genetics , Intellectual Disability/genetics , Mutation, MissenseABSTRACT
Clinically identified genetic variants in ion channels can be benign or cause disease by increasing or decreasing the protein function. As a consequence, therapeutic decision-making is challenging without molecular testing of each variant. Our biophysical knowledge of ion-channel structures and function is just emerging, and it is currently not well understood which amino acid residues cause disease when mutated. We sought to systematically identify biological properties associated with variant pathogenicity across all major voltage and ligand-gated ion-channel families. We collected and curated 3049 pathogenic variants from hundreds of neurodevelopmental and other disorders and 12 546 population variants for 30 ion channel or channel subunits for which a high-quality protein structure was available. Using a wide range of bioinformatics approaches, we computed 163 structural features and tested them for pathogenic variant enrichment. We developed a novel 3D spatial distance scoring approach that enables comparisons of pathogenic and population variant distribution across protein structures. We discovered and independently replicated that several pore residue properties and proximity to the pore axis were most significantly enriched for pathogenic variants compared to population variants. Using our 3D scoring approach, we showed that the strongest pathogenic variant enrichment was observed for pore-lining residues and alpha-helix residues within 5Å distance from the pore axis centre and not involved in gating. Within the subset of residues located at the pore, the hydrophobicity of the pore was the feature most strongly associated with variant pathogenicity. We also found an association between the identified properties and both clinical phenotypes and functional in vitro assays for voltage-gated sodium channels (SCN1A, SCN2A, SCN8A) and N-methyl-D-aspartate receptor (GRIN1, GRIN2A, GRIN2B) encoding genes. In an independent expert-curated dataset of 1422 neurodevelopmental disorder pathogenic patient variants and 679 electrophysiological experiments, we show that pore axis distance is associated with seizure age of onset and cognitive performance as well as differential gain versus loss-of-channel function. In summary, we identified biological properties associated with ion-channel malfunction and show that these are correlated with in vitro functional readouts and clinical phenotypes in patients with neurodevelopmental disorders. Our results suggest that clinical decision support algorithms that predict variant pathogenicity and function are feasible in the future.
Subject(s)
Receptors, N-Methyl-D-Aspartate , Seizures , Humans , Virulence , Phenotype , Receptors, N-Methyl-D-Aspartate/genetics , BiophysicsABSTRACT
Understanding the exact molecular mechanisms involved in the aetiology of epileptogenic pathologies with or without tumour activity is essential for improving treatment of drug-resistant focal epilepsy. Here, we characterize the landscape of somatic genetic variants in resected brain specimens from 474 individuals with drug-resistant focal epilepsy using deep whole-exome sequencing (>350×) and whole-genome genotyping. Across the exome, we observe a greater number of somatic single-nucleotide variants in low-grade epilepsy-associated tumours (7.92 ± 5.65 single-nucleotide variants) than in brain tissue from malformations of cortical development (6.11 ± 4 single-nucleotide variants) or hippocampal sclerosis (5.1 ± 3.04 single-nucleotide variants). Tumour tissues also had the largest number of likely pathogenic variant carrying cells. low-grade epilepsy-associated tumours had the highest proportion of samples with one or more somatic copy-number variants (24.7%), followed by malformations of cortical development (5.4%) and hippocampal sclerosis (4.1%). Recurring somatic whole chromosome duplications affecting Chromosome 7 (16.8%), chromosome 5 (10.9%), and chromosome 20 (9.9%) were observed among low-grade epilepsy-associated tumours. For germline variant-associated malformations of cortical development genes such as TSC2, DEPDC5 and PTEN, germline single-nucleotide variants were frequently identified within large loss of heterozygosity regions, supporting the recently proposed 'second hit' disease mechanism in these genes. We detect somatic variants in 12 established lesional epilepsy genes and demonstrate exome-wide statistical support for three of these in the aetiology of low-grade epilepsy-associated tumours (e.g. BRAF) and malformations of cortical development (e.g. SLC35A2 and MTOR). We also identify novel significant associations for PTPN11 with low-grade epilepsy-associated tumours and NRAS Q61 mutated protein with a complex malformation of cortical development characterized by polymicrogyria and nodular heterotopia. The variants identified in NRAS are known from cancer studies to lead to hyperactivation of NRAS, which can be targeted pharmacologically. We identify large recurrent 1q21-q44 duplication including AKT3 in association with focal cortical dysplasia type 2a with hyaline astrocytic inclusions, another rare and possibly under-recognized brain lesion. The clinical-genetic analyses showed that the numbers of somatic single-nucleotide variant across the exome and the fraction of affected cells were positively correlated with the age at seizure onset and surgery in individuals with low-grade epilepsy-associated tumours. In summary, our comprehensive genetic screen sheds light on the genome-scale landscape of genetic variants in epileptic brain lesions, informs the design of gene panels for clinical diagnostic screening and guides future directions for clinical implementation of epilepsy surgery genetics.
Subject(s)
Drug Resistant Epilepsy , Epilepsies, Partial , Epilepsy , Malformations of Cortical Development , Humans , Epilepsy/pathology , Brain/pathology , Drug Resistant Epilepsy/genetics , Drug Resistant Epilepsy/surgery , Drug Resistant Epilepsy/metabolism , Genomics , Malformations of Cortical Development/complications , Malformations of Cortical Development/genetics , Malformations of Cortical Development/metabolism , Epilepsies, Partial/metabolism , Nucleotides/metabolismABSTRACT
Genetic variants in the SLC6A1 gene can cause a broad phenotypic disease spectrum by altering the protein function. Thus, systematically curated clinically relevant genotype-phenotype associations are needed to understand the disease mechanism and improve therapeutic decision-making. We aggregated genetic and clinical data from 172 individuals with likely pathogenic/pathogenic (lp/p) SLC6A1 variants and functional data for 184 variants (14.1% lp/p). Clinical and functional data were available for a subset of 126 individuals. We explored the potential associations of variant positions on the GAT1 3D structure with variant pathogenicity, altered molecular function and phenotype severity using bioinformatic approaches. The GAT1 transmembrane domains 1, 6 and extracellular loop 4 (EL4) were enriched for patient over population variants. Across functionally tested missense variants (n = 156), the spatial proximity from the ligand was associated with loss-of-function in the GAT1 transporter activity. For variants with complete loss of in vitro GABA uptake, we found a 4.6-fold enrichment in patients having severe disease versus non-severe disease (P = 2.9 × 10-3, 95% confidence interval: 1.5-15.3). In summary, we delineated associations between the 3D structure and variant pathogenicity, variant function and phenotype in SLC6A1-related disorders. This knowledge supports biology-informed variant interpretation and research on GAT1 function. All our data can be interactively explored in the SLC6A1 portal (https://slc6a1-portal.broadinstitute.org/).
Subject(s)
GABA Plasma Membrane Transport Proteins , Genetic Association Studies , Mutation, Missense , Humans , GABA Plasma Membrane Transport Proteins/genetics , GABA Plasma Membrane Transport Proteins/metabolism , PhenotypeABSTRACT
Missense variant interpretation is challenging. Essential regions for protein function are conserved among gene-family members, and genetic variants within these regions are potentially more likely to confer risk to disease. Here, we generated 2871 gene-family protein sequence alignments involving 9990 genes and performed missense variant burden analyses to identify novel essential protein regions. We mapped 2,219,811 variants from the general population into these alignments and compared their distribution with 76,153 missense variants from patients. With this gene-family approach, we identified 465 regions enriched for patient variants spanning 41,463 amino acids in 1252 genes. As a comparison, by testing the same genes individually, we identified fewer patient variant enriched regions, involving only 2639 amino acids and 215 genes. Next, we selected de novo variants from 6753 patients with neurodevelopmental disorders and 1911 unaffected siblings and observed an 8.33-fold enrichment of patient variants in our identified regions (95% C.I. = 3.90-Inf, P-value = 2.72 × 10-11). By using the complete ClinVar variant set, we found that missense variants inside the identified regions are 106-fold more likely to be classified as pathogenic in comparison to benign classification (OR = 106.15, 95% C.I = 70.66-Inf, P-value < 2.2 × 10-16). All pathogenic variant enriched regions (PERs) identified are available online through "PER viewer," a user-friendly online platform for interactive data mining, visualization, and download. In summary, our gene-family burden analysis approach identified novel PERs in protein sequences. This annotation can empower variant interpretation.
Subject(s)
Chromosome Mapping , Genetic Predisposition to Disease , Genetic Variation , Multigene Family , Alleles , Amino Acid Sequence , Amino Acid Substitution , Computational Biology/methods , Female , Genome-Wide Association Study , Humans , Male , Mutation, Missense , Software , User-Computer InterfaceABSTRACT
Pathogenic variants in the voltage-gated sodium channel gene family lead to early onset epilepsies, neurodevelopmental disorders, skeletal muscle channelopathies, peripheral neuropathies and cardiac arrhythmias. Disease-associated variants have diverse functional effects ranging from complete loss-of-function to marked gain-of-function. Therapeutic strategy is likely to depend on functional effect. Experimental studies offer important insights into channel function but are resource intensive and only performed in a minority of cases. Given the evolutionarily conserved nature of the sodium channel genes, we investigated whether similarities in biophysical properties between different voltage-gated sodium channels can predict function and inform precision treatment across sodium channelopathies. We performed a systematic literature search identifying functionally assessed variants in any of the nine voltage-gated sodium channel genes until 28 April 2021. We included missense variants that had been electrophysiologically characterized in mammalian cells in whole-cell patch-clamp recordings. We performed an alignment of linear protein sequences of all sodium channel genes and correlated variants by their overall functional effect on biophysical properties. Of 951 identified records, 437 sodium channel-variants met our inclusion criteria and were reviewed for functional properties. Of these, 141 variants were epilepsy-associated (SCN1/2/3/8A), 79 had a neuromuscular phenotype (SCN4/9/10/11A), 149 were associated with a cardiac phenotype (SCN5/10A) and 68 (16%) were considered benign. We detected 38 missense variant pairs with an identical disease-associated variant in a different sodium channel gene. Thirty-five out of 38 of those pairs resulted in similar functional consequences, indicating up to 92% biophysical agreement between corresponding sodium channel variants (odds ratio = 11.3; 95% confidence interval = 2.8 to 66.9; P < 0.001). Pathogenic missense variants were clustered in specific functional domains, whereas population variants were significantly more frequent across non-conserved domains (odds ratio = 18.6; 95% confidence interval = 10.9-34.4; P < 0.001). Pore-loop regions were frequently associated with loss-of-function variants, whereas inactivation sites were associated with gain-of-function (odds ratio = 42.1, 95% confidence interval = 14.5-122.4; P < 0.001), whilst variants occurring in voltage-sensing regions comprised a range of gain- and loss-of-function effects. Our findings suggest that biophysical characterisation of variants in one SCN-gene can predict channel function across different SCN-genes where experimental data are not available. The collected data represent the first gain- versus loss-of-function topological map of SCN proteins indicating shared patterns of biophysical effects aiding variant analysis and guiding precision therapy. We integrated our findings into a free online webtool to facilitate functional sodium channel gene variant interpretation (http://SCN-viewer.broadinstitute.org).
Subject(s)
Channelopathies , Epilepsy , Peripheral Nervous System Diseases , Voltage-Gated Sodium Channels , Animals , Channelopathies/genetics , Voltage-Gated Sodium Channels/genetics , Epilepsy/genetics , Phenotype , MammalsABSTRACT
CACNA1I is implicated in the susceptibility to schizophrenia by large-scale genetic association studies of single nucleotide polymorphisms. However, the channelopathy of CACNA1I in schizophrenia is unknown. CACNA1I encodes CaV3.3, a neuronal voltage-gated calcium channel that underlies a subtype of T-type current that is important for neuronal excitability in the thalamic reticular nucleus and other regions of the brain. Here, we present an extensive functional characterization of 57 naturally occurring rare and common missense variants of CACNA1I derived from a Swedish schizophrenia cohort of more than 10 000 individuals. Our analysis of this allelic series of coding CACNA1I variants revealed that reduced CaV3.3 channel current density was the dominant phenotype associated with rare CACNA1I coding alleles derived from control subjects, whereas rare CACNA1I alleles from schizophrenia patients encoded CaV3.3 channels with altered responses to voltages. CACNA1I variants associated with altered current density primarily impact the ionic channel pore and those associated with altered responses to voltage impact the voltage-sensing domain. CaV3.3 variants associated with altered voltage dependence of the CaV3.3 channel and those associated with peak current density deficits were significantly segregated across affected and unaffected groups (Fisher's exact test, P = 0.034). Our results, together with recent data from the SCHEMA (Schizophrenia Exome Sequencing Meta-Analysis) cohort, suggest that reduced CaV3.3 function may protect against schizophrenia risk in rare cases. We subsequently modelled the effect of the biophysical properties of CaV3.3 channel variants on thalamic reticular nucleus excitability and found that compared with common variants, ultrarare CaV3.3-coding variants derived from control subjects significantly decreased thalamic reticular nucleus excitability (P = 0.011). When all rare variants were analysed, there was a non-significant trend between variants that reduced thalamic reticular nucleus excitability and variants that either had no effect or increased thalamic reticular nucleus excitability across disease status. Taken together, the results of our functional analysis of an allelic series of >50 CACNA1I variants in a schizophrenia cohort reveal that loss of function of CaV3.3 is a molecular phenotype associated with reduced disease risk burden, and our approach may serve as a template strategy for channelopathies in polygenic disorders.
Subject(s)
Calcium Channels, T-Type , Channelopathies , Schizophrenia , Alleles , Calcium Channels, T-Type/genetics , Channelopathies/genetics , Humans , Mutation, Missense , Schizophrenia/genetics , SwedenABSTRACT
Brain voltage-gated sodium channel NaV1.1 (SCN1A) loss-of-function variants cause the severe epilepsy Dravet syndrome, as well as milder phenotypes associated with genetic epilepsy with febrile seizures plus. Gain of function SCN1A variants are associated with familial hemiplegic migraine type 3. Novel SCN1A-related phenotypes have been described including early infantile developmental and epileptic encephalopathy with movement disorder, and more recently neonatal presentations with arthrogryposis. Here we describe the clinical, genetic and functional evaluation of affected individuals. Thirty-five patients were ascertained via an international collaborative network using a structured clinical questionnaire and from the literature. We performed whole-cell voltage-clamp electrophysiological recordings comparing sodium channels containing wild-type versus variant NaV1.1 subunits. Findings were related to Dravet syndrome and familial hemiplegic migraine type 3 variants. We identified three distinct clinical presentations differing by age at onset and presence of arthrogryposis and/or movement disorder. The most severely affected infants (n = 13) presented with congenital arthrogryposis, neonatal onset epilepsy in the first 3â days of life, tonic seizures and apnoeas, accompanied by a significant movement disorder and profound intellectual disability. Twenty-one patients presented later, between 2â weeks and 3â months of age, with a severe early infantile developmental and epileptic encephalopathy and a movement disorder. One patient presented after 3â months with developmental and epileptic encephalopathy only. Associated SCN1A variants cluster in regions of channel inactivation associated with gain of function, different to Dravet syndrome variants (odds ratio = 17.8; confidence interval = 5.4-69.3; P = 1.3 × 10-7). Functional studies of both epilepsy and familial hemiplegic migraine type 3 variants reveal alterations of gating properties in keeping with neuronal hyperexcitability. While epilepsy variants result in a moderate increase in action current amplitude consistent with mild gain of function, familial hemiplegic migraine type 3 variants induce a larger effect on gating properties, in particular the increase of persistent current, resulting in a large increase of action current amplitude, consistent with stronger gain of function. Clinically, 13 out of 16 (81%) gain of function variants were associated with a reduction in seizures in response to sodium channel blocker treatment (carbamazepine, oxcarbazepine, phenytoin, lamotrigine or lacosamide) without evidence of symptom exacerbation. Our study expands the spectrum of gain of function SCN1A-related epilepsy phenotypes, defines key clinical features, provides novel insights into the underlying disease mechanisms between SCN1A-related epilepsy and familial hemiplegic migraine type 3, and identifies sodium channel blockers as potentially efficacious therapies. Gain of function disease should be considered in early onset epilepsies with a pathogenic SCN1A variant and non-Dravet syndrome phenotype.
Subject(s)
Arthrogryposis , Epilepsies, Myoclonic , Epilepsy , Migraine with Aura , Movement Disorders , Spasms, Infantile , Humans , Epilepsies, Myoclonic/drug therapy , Epilepsies, Myoclonic/genetics , Epilepsies, Myoclonic/diagnosis , Epilepsy/genetics , Epilepsy/diagnosis , Gain of Function Mutation , NAV1.1 Voltage-Gated Sodium Channel/genetics , Phenotype , Infant, Newborn , InfantABSTRACT
Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations' positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants' pathogenicity in terms of the perturbed molecular mechanisms.
Subject(s)
Mutation, Missense/genetics , Proteins/chemistry , Proteins/genetics , Amino Acid Sequence , BRCA1 Protein/chemistry , BRCA1 Protein/genetics , Computational Biology/methods , Humans , Machine Learning , Models, Molecular , Mutation, Missense/physiology , PTEN Phosphohydrolase/chemistry , PTEN Phosphohydrolase/genetics , Protein Conformation , Proteins/physiologyABSTRACT
SUMMARY: Literature exploration in PubMed on a large number of biomedical entities (e.g. genes, diseases or experiments) can be time-consuming and challenging, especially when assessing associations between entities. Here, we describe SimText, a user-friendly toolset that provides customizable and systematic workflows for the analysis of similarities among a set of entities based on text. SimText can be used for (i) text collection from PubMed and extraction of words with different text mining approaches, and (ii) interactive analysis and visualization of data using unsupervised learning techniques in an interactive app. AVAILABILITY AND IMPLEMENTATION: We developed SimText as an open-source R software and integrated it into Galaxy (https://usegalaxy.eu), an online data analysis platform with supporting self-learning training material available at https://training.galaxyproject.org. A command-line version of the toolset is available for download from GitHub (https://github.com/dlal-group/simtext) or as Docker image (https://hub.docker.com/r/dlalgroup/simtext/tags.). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Data Mining , Software , Data Mining/methods , PubMed , Data Interpretation, Statistical , Data AnalysisABSTRACT
PURPOSE: Monogenic disorders can present clinically heterogeneous symptoms. We hypothesized that in patients with a monogenic disorder caused by a large deletion, frequently additional loss-of-function (LOF)-intolerant genes are affected, potentially contributing to the phenotype. METHODS: We investigated the LOF-intolerant gene distribution across the genome and its association with benign population and pathogenic classified deletions from individuals with presumably monogenic disorders. For people with presumably monogenic epilepsy, we compared Human Phenotype Ontology terms in people with large and small deletions. RESULTS: We identified LOF-intolerant gene dense regions that were enriched for ClinVar and depleted for population copy number variants. Analysis of data from >143,000 individuals with a suspected monogenic disorder showed that 2.5% of haploinsufficiency disorder-associated deletions can affect at least 1 other LOF-intolerant gene. Focusing on epilepsy, we observed that 13.1% of pathogenic and likely pathogenic ClinVar deletions <3 megabase pair, covering the diagnostically most relevant genes, affected at least 1 additional LOF-intolerant gene. Those patients have potentially more complex phenotypes with increasing deletion size. CONCLUSION: We could systematically show that large deletions frequently affected admditional LOF-intolerant genes in addition to the established disease gene. Further research is needed to understand how additional potential disease-relevant genes influence monogenic disorders to improve clinical care and the efficacy of targeted therapies.
Subject(s)
DNA Copy Number Variations , Genome , DNA Copy Number Variations/genetics , Haploinsufficiency , Humans , PhenotypeABSTRACT
PURPOSE: Pathogenic variants in GABRB3 have been associated with a spectrum of phenotypes from severe developmental disorders and epileptic encephalopathies to milder epilepsy syndromes and mild intellectual disability (ID). In this study, we analyzed a large cohort of individuals with GABRB3 variants to deepen the phenotypic understanding and investigate genotype-phenotype correlations. METHODS: Through an international collaboration, we analyzed electro-clinical data of unpublished individuals with variants in GABRB3, and we reviewed previously published cases. All missense variants were mapped onto the 3-dimensional structure of the GABRB3 subunit, and clinical phenotypes associated with the different key structural domains were investigated. RESULTS: We characterized 71 individuals with GABRB3 variants, including 22 novel subjects, expressing a wide spectrum of phenotypes. Interestingly, phenotypes correlated with structural locations of the variants. Generalized epilepsy, with a median age at onset of 12 months, and mild-to-moderate ID were associated with variants in the extracellular domain. Focal epilepsy with earlier onset (median: age 4 months) and severe ID were associated with variants in both the pore-lining helical transmembrane domain and the extracellular domain. CONCLUSION: These genotype-phenotype correlations will aid the genetic counseling and treatment of individuals affected by GABRB3-related disorders. Future studies may reveal whether functional differences underlie the phenotypic differences.
Subject(s)
Epilepsy , Intellectual Disability , Epilepsy/genetics , Genetic Association Studies , Humans , Intellectual Disability/genetics , Mutation , Phenotype , Receptors, GABA-A/geneticsABSTRACT
Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the molecular-level effect of missense variants, however, remains challenging and requires a particular investigation of amino acid substitutions in the context of protein structure and function. Answers to questions like 'Is a variant perturbing a site involved in key macromolecular interactions and/or cellular signaling?', or 'Is a variant changing an amino acid located at the protein core or part of a cluster of known pathogenic mutations in 3D?' are crucial. Motivated by these needs, we developed MISCAST (missense variant to protein structure analysis web suite; http://miscast.broadinstitute.org/). MISCAST is an interactive and user-friendly web server to visualize and analyze missense variants in protein sequence and structure space. Additionally, a comprehensive set of protein structural and functional features have been aggregated in MISCAST from multiple databases, and displayed on structures alongside the variants to provide users with the biological context of the variant location in an integrated platform. We further made the annotated data and protein structures readily downloadable from MISCAST to foster advanced offline analysis of missense variants by a wide biological community.
Subject(s)
Mutation, Missense , Protein Conformation , Software , Humans , Internet , Proteins/chemistry , Proteins/geneticsABSTRACT
BACKGROUND: Parkinson's disease is the second most common neurodegenerative disorder and affects people from all ethnic backgrounds, yet little is known about the genetics of Parkinson's disease in non-European populations. In addition, the overall identification of copy number variants at a genome-wide level has been understudied in Parkinson's patients. The objective of this study was to understand the genome-wide burden of copy number variants in Latinos and its association with Parkinson's disease. METHODS: We used genome-wide genotyping data from 747 Parkinson's disease patients and 632 controls from the Latin American Research Consortium on the Genetics of Parkinson's disease. RESULTS: Genome-wide copy number burden analysis showed that patients were significantly enriched for copy number variants overlapping known Parkinson's disease genes compared with controls (odds ratio, 3.97; 95%CI, 1.69-10.5; P = 0.018). PRKN showed the strongest copy number burden, with 20 copy number variant carriers. These patients presented an earlier age of disease onset compared with patients with other copy number variants (median age at onset, 31 vs 57 years, respectively; P = 7.46 × 10-7 ). CONCLUSIONS: We found that although overall genome-wide copy number variant burden was not significantly different, Parkinson's disease patients were significantly enriched with copy number variants affecting known Parkinson's disease genes. We also identified that of 250 patients with early-onset disease, 5.6% carried a copy number variant on PRKN in our cohort. Our study is the first to analyze genome-wide copy number variant association in Latino Parkinson's disease patients and provides insights about this complex disease in this understudied population. © 2020 International Parkinson and Movement Disorder Society.
Subject(s)
Parkinson Disease , Age of Onset , DNA Copy Number Variations/genetics , Genome-Wide Association Study , Hispanic or Latino/genetics , Humans , Latin America , Middle Aged , Parkinson Disease/geneticsABSTRACT
OBJECTIVE: Clinical genetic sequencing is frequently utilized to diagnose individuals with neurodevelopmental disorders (NDDs). Here we perform a meta-analysis and systematic review of the success rate (diagnostic yield) of clinical sequencing through next-generation sequencing (NGS) across NDDs. We compare the genetic testing yield across NDD subtypes and sequencing technology. METHODS: We performed a systematic review of the PubMed literature until May 2020. We included clinical sequencing studies that utilized NGS in individuals with epilepsy, autism spectrum disorder (ASD), or intellectual disability (ID). Data were extracted, reviewed, and categorized according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Two investigators performed clinical evaluation and grouping following the International League Against Epilepsy (ILAE) guidelines. Pooled rates of the diagnostic yield and 95% confidence intervals were estimated with a random-effects model. RESULTS: We identified 103 studies (epilepsy, N = 72; ASD, N = 14; ID, N = 21) across 32,331 individuals. Targeted gene panel sequencing was used in 73, and exome sequencing in 36 cohorts. Given highly selected patient cohorts, the diagnostic yield was 17.1% for ASD, 24% for epilepsy, and 28.2% for ID (23.7% overall). The highest diagnostic yield for epilepsy subtypes was observed in individuals with ID (27.9%) and early onset seizures (36.8%). The diagnostic yield for exome sequencing was higher than for panel sequencing, even though not statistically significant (27.2% vs 22.6%, P = .071). We observed that clinical sequencing studies are performed predominantly in countries with a high Inequality-adjusted Human Development Index (IHDI) (countries with sequencing studies: IHDI median = 0.84, interquartile range [IQR] = 0.09 vs countries without sequencing studies: IHDI median = 0.56, IQR = 0.3). No studies from Africa, India, or Latin America were identified, indicating potential barriers to genetic testing. SIGNIFICANCE: This meta-analysis and systematic review provides a comprehensive overview of clinical sequencing studies of NDDs and will help guide policymaking and steer decision-making in patient management.
Subject(s)
Autism Spectrum Disorder/diagnosis , Epilepsy/diagnosis , Exome Sequencing , Intellectual Disability/diagnosis , Age of Onset , Autism Spectrum Disorder/genetics , Epilepsy/genetics , High-Throughput Nucleotide Sequencing , Humans , Intellectual Disability/genetics , Sequence Analysis, DNAABSTRACT
A large fraction of rare and severe neurodevelopmental disorders are caused by sporadic de novo variants. Epidemiological disease estimates are not available for the vast majority of these de novo monogenic neurodevelopmental disorders because of phenotypic heterogeneity and the absence of large-scale genomic screens. Yet, knowledge of disease incidence is important for clinicians and researchers to guide health policy planning. Here, we adjusted a statistical method based on genetic data to predict, for the first time, the incidences of 101 known de novo variant-associated neurodevelopmental disorders as well as 3106 putative monogenic disorders. Two corroboration analyses supported the validity of the calculated estimates. First, greater predicted gene-disorder incidences positively correlated with larger numbers of pathogenic variants collected from patient variant databases (Kendall's τ = 0.093, P-value = 6.9 × 10-6). Second, for six of seven (86%) de novo variant associated monogenic disorders for which epidemiological estimates were available (SCN1A, SLC2A1, SALL1, TBX5, KCNQ2, and CDKL5), the predicted incidence estimates matched the reported estimates. We conclude that in the absence of epidemiological data, our catalogue of 3207 incidence estimates for disorders caused by de novo variants can guide patient advocacy groups, clinicians, researchers, and policymakers in strategic decision-making.
Subject(s)
Neurodevelopmental Disorders/epidemiology , Neurodevelopmental Disorders/genetics , Rare Diseases/epidemiology , Rare Diseases/genetics , Genetic Variation , Humans , IncidenceABSTRACT
Cytogenic testing is routinely applied in most neurological centres for severe paediatric epilepsies. However, which characteristics of copy number variants (CNVs) confer most epilepsy risk and which epilepsy subtypes carry the most CNV burden, have not been explored on a genome-wide scale. Here, we present the largest CNV investigation in epilepsy to date with 10 712 European epilepsy cases and 6746 ancestry-matched controls. Patients with genetic generalized epilepsy, lesional focal epilepsy, non-acquired focal epilepsy, and developmental and epileptic encephalopathy were included. All samples were processed with the same technology and analysis pipeline. All investigated epilepsy types, including lesional focal epilepsy patients, showed an increase in CNV burden in at least one tested category compared to controls. However, we observed striking differences in CNV burden across epilepsy types and investigated CNV categories. Genetic generalized epilepsy patients have the highest CNV burden in all categories tested, followed by developmental and epileptic encephalopathy patients. Both epilepsy types also show association for deletions covering genes intolerant for truncating variants. Genome-wide CNV breakpoint association showed not only significant loci for genetic generalized and developmental and epileptic encephalopathy patients but also for lesional focal epilepsy patients. With a 34-fold risk for developing genetic generalized epilepsy, we show for the first time that the established epilepsy-associated 15q13.3 deletion represents the strongest risk CNV for genetic generalized epilepsy across the whole genome. Using the human interactome, we examined the largest connected component of the genes overlapped by CNVs in the four epilepsy types. We observed that genetic generalized epilepsy and non-acquired focal epilepsy formed disease modules. In summary, we show that in all common epilepsy types, 1.5-3% of patients carry epilepsy-associated CNVs. The characteristics of risk CNVs vary tremendously across and within epilepsy types. Thus, we advocate genome-wide genomic testing to identify all disease-associated types of CNVs.