Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Brain ; 146(11): 4608-4621, 2023 11 02.
Artículo en Inglés | MEDLINE | ID: mdl-37394881

RESUMEN

Within recent years, there has been a growing number of genes associated with amyotrophic lateral sclerosis (ALS), resulting in an increasing number of novel variants, particularly missense variants, many of which are of unknown clinical significance. Here, we leverage the sequencing efforts of the ALS Knowledge Portal (3864 individuals with ALS and 7839 controls) and Project MinE ALS Sequencing Consortium (4366 individuals with ALS and 1832 controls) to perform proteomic and transcriptomic characterization of missense variants in 24 ALS-associated genes. The two sequencing datasets were interrogated for missense variants in the 24 genes, and variants were annotated with gnomAD minor allele frequencies, ClinVar pathogenicity classifications, protein sequence features including Uniprot functional site annotations, and PhosphoSitePlus post-translational modification site annotations, structural features from AlphaFold predicted monomeric 3D structures, and transcriptomic expression levels from Genotype-Tissue Expression. We then applied missense variant enrichment and gene-burden testing following binning of variation based on the selected proteomic and transcriptomic features to identify those most relevant to pathogenicity in ALS-associated genes. Using predicted human protein structures from AlphaFold, we determined that missense variants carried by individuals with ALS were significantly enriched in ß-sheets and α-helices, as well as in core, buried or moderately buried regions. At the same time, we identified that hydrophobic amino acid residues, compositionally biased protein regions and regions of interest are predominantly enriched in missense variants carried by individuals with ALS. Assessment of expression level based on transcriptomics also revealed enrichment of variants of high and medium expression across all tissues and within the brain. We further explored enriched features of interest using burden analyses and identified individual genes were indeed driving certain enrichment signals. A case study is presented for SOD1 to demonstrate proof-of-concept of how enriched features may aid in defining variant pathogenicity. Our results present proteomic and transcriptomic features that are important indicators of missense variant pathogenicity in ALS and are distinct from features associated with neurodevelopmental disorders.


Asunto(s)
Esclerosis Amiotrófica Lateral , Humanos , Esclerosis Amiotrófica Lateral/genética , Transcriptoma/genética , Proteómica , Mutación Missense/genética , Pruebas Genéticas
2.
Brain ; 146(2): 519-533, 2023 02 13.
Artículo en Inglés | MEDLINE | ID: mdl-36256779

RESUMEN

Neurodevelopmental disorders (NDDs), including severe paediatric epilepsy, autism and intellectual disabilities are heterogeneous conditions in which clinical genetic testing can often identify a pathogenic variant. For many of them, genetic therapies will be tested in this or the coming years in clinical trials. In contrast to first-generation symptomatic treatments, the new disease-modifying precision medicines require a genetic test-informed diagnosis before a patient can be enrolled in a clinical trial. However, even in 2022, most identified genetic variants in NDD genes are 'variants of uncertain significance'. To safely enrol patients in precision medicine clinical trials, it is important to increase our knowledge about which regions in NDD-associated proteins can 'tolerate' missense variants and which ones are 'essential' and will cause a NDD when mutated. In addition, knowledge about functionally indispensable regions in the 3D structure context of proteins can also provide insights into the molecular mechanisms of disease variants. We developed a novel consensus approach that overlays evolutionary, and population based genomic scores to identify 3D essential sites (Essential3D) on protein structures. After extensive benchmarking of AlphaFold predicted and experimentally solved protein structures, we generated the currently largest expert curated protein structure set for 242 NDDs and identified 14 377 Essential3D sites across 189 gene disorders associated proteins. We demonstrate that the consensus annotation of Essential3D sites improves prioritization of disease mutations over single annotations. The identified Essential3D sites were enriched for functional features such as intermembrane regions or active sites and discovered key inter-molecule interactions in protein complexes that were otherwise not annotated. Using the currently largest autism, developmental disorders, and epilepsies exome sequencing studies including >360 000 NDD patients and population controls, we found that missense variants at Essential3D sites are 8-fold enriched in patients. In summary, we developed a comprehensive protein structure set for 242 NDDs and identified 14 377 Essential3D sites in these. All data are available at https://es-ndd.broadinstitute.org for interactive visual inspection to enhance variant interpretation and development of mechanistic hypotheses for 242 NDDs genes. The provided resources will enhance clinical variant interpretation and in silico drug target development for NDD-associated genes and encoded proteins.


Asunto(s)
Discapacidad Intelectual , Trastornos del Neurodesarrollo , Humanos , Niño , Trastornos del Neurodesarrollo/genética , Pruebas Genéticas , Mutación/genética , Discapacidad Intelectual/genética , Mutación Missense
3.
Genome Res ; 30(1): 62-71, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31871067

RESUMEN

Missense variant interpretation is challenging. Essential regions for protein function are conserved among gene-family members, and genetic variants within these regions are potentially more likely to confer risk to disease. Here, we generated 2871 gene-family protein sequence alignments involving 9990 genes and performed missense variant burden analyses to identify novel essential protein regions. We mapped 2,219,811 variants from the general population into these alignments and compared their distribution with 76,153 missense variants from patients. With this gene-family approach, we identified 465 regions enriched for patient variants spanning 41,463 amino acids in 1252 genes. As a comparison, by testing the same genes individually, we identified fewer patient variant enriched regions, involving only 2639 amino acids and 215 genes. Next, we selected de novo variants from 6753 patients with neurodevelopmental disorders and 1911 unaffected siblings and observed an 8.33-fold enrichment of patient variants in our identified regions (95% C.I. = 3.90-Inf, P-value = 2.72 × 10-11). By using the complete ClinVar variant set, we found that missense variants inside the identified regions are 106-fold more likely to be classified as pathogenic in comparison to benign classification (OR = 106.15, 95% C.I = 70.66-Inf, P-value < 2.2 × 10-16). All pathogenic variant enriched regions (PERs) identified are available online through "PER viewer," a user-friendly online platform for interactive data mining, visualization, and download. In summary, our gene-family burden analysis approach identified novel PERs in protein sequences. This annotation can empower variant interpretation.


Asunto(s)
Mapeo Cromosómico , Predisposición Genética a la Enfermedad , Variación Genética , Familia de Multigenes , Alelos , Secuencia de Aminoácidos , Sustitución de Aminoácidos , Biología Computacional/métodos , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Mutación Missense , Programas Informáticos , Interfaz Usuario-Computador
4.
PLoS Comput Biol ; 18(3): e1009911, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35275927

RESUMEN

All proteomes contain both proteins and polypeptide segments that don't form a defined three-dimensional structure yet are biologically active-called intrinsically disordered proteins and regions (IDPs and IDRs). Most of these IDPs/IDRs lack useful functional annotation limiting our understanding of their importance for organism fitness. Here we characterized IDRs using protein sequence annotations of functional sites and regions available in the UniProt knowledgebase ("UniProt features": active site, ligand-binding pocket, regions mediating protein-protein interactions, etc.). By measuring the statistical enrichment of twenty-five UniProt features in 981 IDRs of 561 human proteins, we identified eight features that are commonly located in IDRs. We then collected the genetic variant data from the general population and patient-based databases and evaluated the prevalence of population and pathogenic variations in IDPs/IDRs. We observed that some IDRs tolerate 2 to 12-times more single amino acid-substituting missense mutations than synonymous changes in the general population. However, we also found that 37% of all germline pathogenic mutations are located in disordered regions of 96 proteins. Based on the observed-to-expected frequency of mutations, we categorized 34 IDRs in 20 proteins (DDX3X, KIT, RB1, etc.) as intolerant to mutation. Finally, using statistical analysis and a machine learning approach, we demonstrate that mutation-intolerant IDRs carry a distinct signature of functional features. Our study presents a novel approach to assign functional importance to IDRs by leveraging the wealth of available genetic data, which will aid in a deeper understating of the role of IDRs in biological processes and disease mechanisms.


Asunto(s)
Proteínas Intrínsecamente Desordenadas , Secuencia de Aminoácidos , Variación Genética/genética , Humanos , Proteínas Intrínsecamente Desordenadas/química , Conformación Proteica , Proteoma/genética
5.
Proc Natl Acad Sci U S A ; 117(45): 28201-28211, 2020 11 10.
Artículo en Inglés | MEDLINE | ID: mdl-33106425

RESUMEN

Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations' positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants' pathogenicity in terms of the perturbed molecular mechanisms.


Asunto(s)
Mutación Missense/genética , Proteínas/química , Proteínas/genética , Secuencia de Aminoácidos , Proteína BRCA1/química , Proteína BRCA1/genética , Biología Computacional/métodos , Humanos , Aprendizaje Automático , Modelos Moleculares , Mutación Missense/fisiología , Fosfohidrolasa PTEN/química , Fosfohidrolasa PTEN/genética , Conformación Proteica , Proteínas/fisiología
6.
Proteins ; 90(9): 1634-1644, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-35394672

RESUMEN

The contact topology of a protein determines important aspects of the folding process. The topological measure of contact order has been shown to be predictive of the rate of folding. Circuit topology is emerging as another fundamental descriptor of biomolecular structure, with predicted effects on the folding rate. We analyze the residue-based circuit topological environments of 21 K mutations labeled as pathogenic or benign. Multiple statistical lines of reasoning support the conclusion that the number of contacts in two specific circuit topological arrangements, namely inverse parallel and cross relations, with contacts involving the mutated residue have discriminatory value in determining the pathogenicity of human variants. We investigate how results vary with residue type and according to whether the gene is essential. We further explore the relationship to a number of structural features and find that circuit topology provides nonredundant information on protein structures and pathogenicity of mutations. Results may have implications for the polymer physics of protein folding and suggest that "local" topological information, including residue-based circuit topology and residue contact order, could be useful in improving state-of-the-art machine learning algorithms for pathogenicity prediction.


Asunto(s)
Mutación Missense , Pliegue de Proteína , Algoritmos , Humanos , Proteínas/química , Virulencia
7.
Am J Hum Genet ; 105(3): 509-525, 2019 09 05.
Artículo en Inglés | MEDLINE | ID: mdl-31422817

RESUMEN

The human RNA helicase DDX6 is an essential component of membrane-less organelles called processing bodies (PBs). PBs are involved in mRNA metabolic processes including translational repression via coordinated storage of mRNAs. Previous studies in human cell lines have implicated altered DDX6 in molecular and cellular dysfunction, but clinical consequences and pathogenesis in humans have yet to be described. Here, we report the identification of five rare de novo missense variants in DDX6 in probands presenting with intellectual disability, developmental delay, and similar dysmorphic features including telecanthus, epicanthus, arched eyebrows, and low-set ears. All five missense variants (p.His372Arg, p.Arg373Gln, p.Cys390Arg, p.Thr391Ile, and p.Thr391Pro) are located in two conserved motifs of the RecA-2 domain of DDX6 involved in RNA binding, helicase activity, and protein-partner binding. We use functional studies to demonstrate that the first variants identified (p.Arg373Gln and p.Cys390Arg) cause significant defects in PB assembly in primary fibroblast and model human cell lines. These variants' interactions with several protein partners were also disrupted in immunoprecipitation assays. Further investigation via complementation assays included the additional variants p.Thr391Ile and p.Thr391Pro, both of which, similarly to p.Arg373Gln and p.Cys390Arg, demonstrated significant defects in P-body assembly. Complementing these molecular findings, modeling of the variants on solved protein structures showed distinct spatial clustering near known protein binding regions. Collectively, our clinical and molecular data describe a neurodevelopmental syndrome associated with pathogenic missense variants in DDX6. Additionally, we suggest DDX6 join the DExD/H-box genes DDX3X and DHX30 in an emerging class of neurodevelopmental disorders involving RNA helicases.


Asunto(s)
ARN Helicasas DEAD-box/genética , Discapacidad Intelectual/genética , Mutación Missense , Proteínas Proto-Oncogénicas/genética , ARN/genética , Humanos
8.
Genet Med ; 24(3): 681-693, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-34906499

RESUMEN

PURPOSE: Pathogenic variants in GABRB3 have been associated with a spectrum of phenotypes from severe developmental disorders and epileptic encephalopathies to milder epilepsy syndromes and mild intellectual disability (ID). In this study, we analyzed a large cohort of individuals with GABRB3 variants to deepen the phenotypic understanding and investigate genotype-phenotype correlations. METHODS: Through an international collaboration, we analyzed electro-clinical data of unpublished individuals with variants in GABRB3, and we reviewed previously published cases. All missense variants were mapped onto the 3-dimensional structure of the GABRB3 subunit, and clinical phenotypes associated with the different key structural domains were investigated. RESULTS: We characterized 71 individuals with GABRB3 variants, including 22 novel subjects, expressing a wide spectrum of phenotypes. Interestingly, phenotypes correlated with structural locations of the variants. Generalized epilepsy, with a median age at onset of 12 months, and mild-to-moderate ID were associated with variants in the extracellular domain. Focal epilepsy with earlier onset (median: age 4 months) and severe ID were associated with variants in both the pore-lining helical transmembrane domain and the extracellular domain. CONCLUSION: These genotype-phenotype correlations will aid the genetic counseling and treatment of individuals affected by GABRB3-related disorders. Future studies may reveal whether functional differences underlie the phenotypic differences.


Asunto(s)
Epilepsia , Discapacidad Intelectual , Epilepsia/genética , Estudios de Asociación Genética , Humanos , Discapacidad Intelectual/genética , Mutación , Fenotipo , Receptores de GABA-A/genética
9.
Ann Neurol ; 89(3): 573-586, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33325057

RESUMEN

OBJECTIVE: We aimed to characterize the phenotypic spectrum and functional consequences associated with variants in the gene GABRB2, coding for the γ-aminobutyric acid type A (GABAA ) receptor subunit ß2. METHODS: We recruited and systematically evaluated 25 individuals with variants in GABRB2, 17 of whom are newly described and 8 previously reported with additional clinical data. Functional analysis was performed using a Xenopus laevis oocyte model system. RESULTS: Our cohort of 25 individuals from 22 families with variants in GABRB2 demonstrated a range of epilepsy phenotypes from genetic generalized epilepsy to developmental and epileptic encephalopathy. Fifty-eight percent of individuals had pharmacoresistant epilepsy; response to medications targeting the GABAergic pathway was inconsistent. Developmental disability (present in 84%) ranged from mild intellectual disability to severe global disability; movement disorders (present in 44%) included choreoathetosis, dystonia, and ataxia. Disease-associated variants cluster in the extracellular N-terminus and transmembrane domains 1-3, with more severe phenotypes seen in association with variants in transmembrane domains 1 and 2 and the allosteric binding site between transmembrane domains 2 and 3. Functional analysis of 4 variants in transmembrane domains 1 or 2 (p.Ile246Thr, p.Pro252Leu, p.Ile288Ser, p.Val282Ala) revealed strongly reduced amplitudes of GABA-evoked anionic currents. INTERPRETATION: GABRB2-related epilepsy ranges broadly in severity from genetic generalized epilepsy to developmental and epileptic encephalopathies. Developmental disability and movement disorder are key features. The phenotypic spectrum is comparable to other GABAA receptor-encoding genes. Phenotypic severity varies by protein domain. Experimental evidence supports loss of GABAergic inhibition as the mechanism underlying GABRB2-associated neurodevelopmental disorders. ANN NEUROL 2021;89:573-586.


Asunto(s)
Epilepsia/fisiopatología , Trastornos del Movimiento/fisiopatología , Trastornos del Neurodesarrollo/fisiopatología , Receptores de GABA-A/genética , Adolescente , Adulto , Animales , Ataxia/genética , Ataxia/fisiopatología , Atetosis/genética , Atetosis/fisiopatología , Niño , Preescolar , Corea/genética , Corea/fisiopatología , Estudios de Cohortes , Discapacidades del Desarrollo/genética , Discapacidades del Desarrollo/fisiopatología , Epilepsia Refractaria/genética , Epilepsia Refractaria/fisiopatología , Distonía/genética , Distonía/fisiopatología , Epilepsia/genética , Femenino , Genotipo , Humanos , Discapacidad Intelectual/genética , Discapacidad Intelectual/fisiopatología , Masculino , Persona de Mediana Edad , Trastornos del Movimiento/genética , Mutación Missense , Trastornos del Neurodesarrollo/genética , Oocitos , Técnicas de Placa-Clamp , Fenotipo , Dominios Proteicos/genética , Xenopus laevis , Adulto Joven
10.
Nucleic Acids Res ; 48(W1): W132-W139, 2020 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-32402084

RESUMEN

Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the molecular-level effect of missense variants, however, remains challenging and requires a particular investigation of amino acid substitutions in the context of protein structure and function. Answers to questions like 'Is a variant perturbing a site involved in key macromolecular interactions and/or cellular signaling?', or 'Is a variant changing an amino acid located at the protein core or part of a cluster of known pathogenic mutations in 3D?' are crucial. Motivated by these needs, we developed MISCAST (missense variant to protein structure analysis web suite; http://miscast.broadinstitute.org/). MISCAST is an interactive and user-friendly web server to visualize and analyze missense variants in protein sequence and structure space. Additionally, a comprehensive set of protein structural and functional features have been aggregated in MISCAST from multiple databases, and displayed on structures alongside the variants to provide users with the biological context of the variant location in an integrated platform. We further made the annotated data and protein structures readily downloadable from MISCAST to foster advanced offline analysis of missense variants by a wide biological community.


Asunto(s)
Mutación Missense , Conformación Proteica , Programas Informáticos , Humanos , Internet , Proteínas/química , Proteínas/genética
11.
Bioinformatics ; 35(21): 4478-4479, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31086968

RESUMEN

MOTIVATION: The correct classification of missense variants as benign or pathogenic remains challenging. Pathogenic variants are expected to have higher deleterious prediction scores than benign variants in the same gene. However, most of the existing variant annotation tools do not reference the score range of benign population variants on gene level. RESULTS: We present a web-application, Variant Score Ranker, which enables users to rapidly annotate variants and perform gene-specific variant score ranking on the population level. We also provide an intuitive example of how gene- and population-calibrated variant ranking scores can improve epilepsy variant prioritization. AVAILABILITY AND IMPLEMENTATION: http://vsranker.broadinstitute.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Mutación Missense , Programas Informáticos
12.
Bioinformatics ; 34(19): 3289-3299, 2018 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-29726965

RESUMEN

Motivation: Machine learning plays a substantial role in bioscience owing to the explosive growth in sequence data and the challenging application of computational methods. Peptide-recognition domains (PRDs) are critical as they promote coupled-binding with short peptide-motifs of functional importance through transient interactions. It is challenging to build a reliable predictor of peptide-binding residue in proteins with diverse types of PRDs from protein sequence alone. On the other hand, it is vital to cope up with the sequencing speed and to broaden the scope of study. Results: In this paper, we propose a machine-learning-based tool, named PBRpredict, to predict residues in peptide-binding domains from protein sequence alone. To develop a generic predictor, we train the models on peptide-binding residues of diverse types of domains. As inputs to the models, we use a high-dimensional feature set of chemical, structural and evolutionary information extracted from protein sequence. We carefully investigate six different state-of-the-art classification algorithms for this application. Finally, we use the stacked generalization approach to non-linearly combine a set of complementary base-level learners using a meta-level learner which outperformed the winner-takes-all approach. The proposed predictor is found competitive based on statistical evaluation. Availability and implementation: PBRpredict-Suite software: http://cs.uno.edu/~tamjid/Software/PBRpredict/pbrpredict-suite.zip. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuencia de Aminoácidos , Péptidos/química , Proteínas/química , Análisis de Secuencia de Proteína , Programas Informáticos , Algoritmos , Biología Computacional
13.
J Theor Biol ; 441: 44-57, 2018 03 14.
Artículo en Inglés | MEDLINE | ID: mdl-29305182

RESUMEN

Accessible surface area (ASA) of a protein residue is an effective feature for protein structure prediction, binding region identification, fold recognition problems etc. Improving the prediction of ASA by the application of effective feature variables is a challenging but explorable task to consider, specially in the field of machine learning. Among the existing predictors of ASA, REGAd3p is a highly accurate ASA predictor which is based on regularized exact regression with polynomial kernel of degree 3. In this work, we present a new predictor RBSURFpred, which extends REGAd3p on several dimensions by incorporating 58 physicochemical, evolutionary and structural properties into 9-tuple peptides via Chou's general PseAAC, which allowed us to obtain higher accuracies in predicting both real-valued and binary ASA. We have compared RBSURFpred for both real and binary space predictions with state-of-the-art predictors, such as REGAd3p and SPIDER2. We also have carried out a rigorous analysis of the performance of RBSURFpred in terms of different amino acids and their properties, and also with biologically relevant case-studies. The performance of RBSURFpred establishes itself as a useful tool for the community.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Modelos Teóricos , Conformación Proteica , Proteínas/química , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Reproducibilidad de los Resultados
14.
J Theor Biol ; 398: 112-21, 2016 06 07.
Artículo en Inglés | MEDLINE | ID: mdl-27029514

RESUMEN

The success of solving the protein folding and structure prediction problems in molecular and structural biology relies on an accurate energy function. With the rapid advancement in the computational biology and bioinformatics fields, there is a growing need of solving unknown fold and structure faster and thus an accurate energy function is indispensable. To address this need, we develop a new potential function, namely 3DIGARS3.0, which is a linearly weighted combination of 3DIGARS, mined accessible surface area (ASA) and ubiquitously computed Phi (uPhi) and Psi (uPsi) energies - optimized by a Genetic Algorithm (GA). We use a dataset of 4332 protein-structures to generate uPhi and uPsi based score libraries to be used within the core 3DIGARS method. The optimized weight of each component is obtained by applying Genetic Algorithm based optimization on three challenging decoy sets. The improved 3DIGARS3.0 outperformed state-of-the-art methods significantly based on a set of independent test datasets.


Asunto(s)
Conformación Proteica , Proteínas/química , Proyectos de Investigación , Bases de Datos de Proteínas , Solventes , Termodinámica
15.
J Theor Biol ; 389: 60-71, 2016 Jan 21.
Artículo en Inglés | MEDLINE | ID: mdl-26549467

RESUMEN

Secondary structure (SS) refers to the local spatial organization of a polypeptide backbone atoms of a protein. Accurate prediction of SS can provide crucial features to form the next higher level of 3D structure of a protein accurately. SS has three different major components, helix (H), beta (E) and coil (C). Most of the SS predictors express imbalanced accuracies by claiming higher prediction performances in predicting H and C, and on the contrary having low accuracy in E predictions. E component being in low count, a predictor may show very good overall performance by over-predicting H and C and under predicting E, which can make such predictors biologically inapplicable. In this work we are motivated to develop a balanced SS predictor by incorporating 33 physicochemical properties into 15-tuble peptides via Chou׳s general PseAAC, which allowed obtaining higher accuracies in predicting all three SS components. Our approach uses three different support vector machines for binary classification of the major classes and then form optimized multiclass predictor using genetic algorithm (GA). The trained three binary SVMs are E versus non-E (i.e., E/¬E), C/¬C and H/¬H. This GA based optimized and combined three class predictor, called cSVM, is further combined with SPINE X to form the proposed final balanced predictor, called MetaSSPred. This novel paradigm assists us in optimizing the precision and recall. We prepared two independent test datasets (CB471 and N295) to compare the performance of our predictors with SPINE X. MetaSSPred significantly increases beta accuracy (QE) for both the datasets. QE score of MetaSSPred on CB471 and N295 were 71.7% and 74.4% respectively. These scores are 20.9% and 19.0% improvement over the QE scores given by SPINE X alone on CB471 and N295 datasets respectively. Standard deviations of the accuracies across three SS classes of MetaSSPred on CB471 and N295 datasets were 4.2% and 2.3% respectively. On the other hand, for SPINE X, these values are 12.9% and 10.9% respectively. These findings suggest that the proposed MetaSSPred is a well-balanced SS predictor compared to the state-of-the-art SPINE X predictor.


Asunto(s)
Biología Computacional/métodos , Estructura Secundaria de Proteína , Proteínas/química , Algoritmos , Bases de Datos de Proteínas , Proteasa del VIH/química , Internet , Probabilidad , Reproducibilidad de los Resultados , Análisis de Secuencia de Proteína , Máquina de Vectores de Soporte
16.
J Theor Biol ; 380: 380-91, 2015 Sep 07.
Artículo en Inglés | MEDLINE | ID: mdl-26092374

RESUMEN

An accurate prediction of real value accessible surface area (ASA) from protein sequence alone has wide application in the field of bioinformatics and computational biology. ASA has been helpful in understanding the 3-dimensional structure and function of a protein, acting as high impact feature in secondary structure prediction, disorder prediction, binding region identification and fold recognition applications. To enhance and support broad applications of ASA, we have made an attempt to improve the prediction accuracy of absolute accessible surface area by developing a new predictor paradigm, namely REGAd(3)p, for real value prediction through classical Exact Regression with Regularization and polynomial kernel of degree 3 which was further optimized using Genetic Algorithm. ASA assisting effective energy function, motivated us to enhance the accuracy of predicted ASA for better energy function application. Our ASA prediction paradigm was trained and tested using a new benchmark dataset, proposed in this work, consisting of 1001 and 298 protein chains, respectively. We achieved maximum Pearson Correlation Coefficient (PCC) of 0.76 and 1.45% improved PCC when compared with existing top performing predictor, SPINE-X, in ASA prediction on independent test set. Furthermore, we modeled the error between actual and predicted ASA in terms of energy and combined this energy linearly with the energy function 3DIGARS which resulted in an effective energy function, namely 3DIGARS2.0, outperforming all the state-of-the-art energy functions. Based on Rosetta and Tasser decoy-sets 3DIGARS2.0 resulted 80.78%, 73.77%, 141.24%, 16.52%, and 32.32% improvement over DFIRE, RWplus, dDFIRE, GOAP and 3DIGARS respectively.


Asunto(s)
Modelos Teóricos , Propiedades de Superficie , Aminoácidos/química , Estructura Molecular
17.
bioRxiv ; 2024 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-38260256

RESUMEN

Recent advances in AI-based methods have revolutionized the field of structural biology. Concomitantly, high-throughput sequencing and functional genomics technologies have enabled the detection and generation of variants at an unprecedented scale. However, efficient tools and resources are needed to link these two disparate data types - to "map" variants onto protein structures, to better understand how the variation causes disease and thereby design therapeutics. Here we present the Genomics 2 Proteins Portal (G2P; g2p.broadinstitute.org/): a human proteome-wide resource that maps 19,996,443 genetic variants onto 42,413 protein sequences and 77,923 structures, with a comprehensive set of structural and functional features. Additionally, the G2P portal generalizes the capability of linking genomics to proteins beyond databases by allowing users to interactively upload protein residue-wise annotations (variants, scores, etc.) as well as the protein structure to establish the connection. The portal serves as an easy-to-use discovery tool for researchers and scientists to hypothesize the structure-function relationship between natural or synthetic variations and their molecular phenotype.

18.
Sci Rep ; 14(1): 2798, 2024 02 02.
Artículo en Inglés | MEDLINE | ID: mdl-38307912

RESUMEN

Human genetic studies have revealed rare missense and protein-truncating variants in GRIN2A, encoding for the GluN2A subunit of the NMDA receptors, that confer significant risk for schizophrenia (SCZ). Mutations in GRIN2A are also associated with epilepsy and developmental delay/intellectual disability (DD/ID). However, it remains enigmatic how alterations to the same protein can result in diverse clinical phenotypes. Here, we performed functional characterization of human GluN1/GluN2A heteromeric NMDA receptors that contain SCZ-linked GluN2A variants, and compared them to NMDA receptors with GluN2A variants associated with epilepsy or DD/ID. Our findings demonstrate that SCZ-associated GRIN2A variants were predominantly loss-of-function (LoF), whereas epilepsy and DD/ID-associated variants resulted in both gain- and loss-of-function phenotypes. We additionally show that M653I and S809R, LoF GRIN2A variants associated with DD/ID, exert a dominant-negative effect when co-expressed with a wild-type GluN2A, whereas E58Ter and Y698C, SCZ-linked LoF variants, and A727T, an epilepsy-linked LoF variant, do not. These data offer a potential mechanism by which SCZ/epilepsy and DD/ID-linked variants can cause different effects on receptor function and therefore result in divergent pathological outcomes.


Asunto(s)
Epilepsia , Trastornos del Neurodesarrollo , Esquizofrenia , Humanos , Epilepsia/genética , Mutación , Trastornos del Neurodesarrollo/genética , Receptores de N-Metil-D-Aspartato/genética , Receptores de N-Metil-D-Aspartato/metabolismo , Esquizofrenia/genética
19.
bioRxiv ; 2024 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-38979347

RESUMEN

The large-scale experimental measures of variant functional assays submitted to MaveDB have the potential to provide key information for resolving variants of uncertain significance, but the reporting of results relative to assayed sequence hinders their downstream utility. The Atlas of Variant Effects Alliance mapped multiplexed assays of variant effect data to human reference sequences, creating a robust set of machine-readable homology mappings. This method processed approximately 2.5 million protein and genomic variants in MaveDB, successfully mapping 98.61% of examined variants and disseminating data to resources such as the UCSC Genome Browser and Ensembl Variant Effect Predictor.

20.
Cancer Discov ; 2024 Apr 03.
Artículo en Inglés | MEDLINE | ID: mdl-38564707

RESUMEN

Activating point mutations in the MET tyrosine kinase domain (TKD) are oncogenic in a subset of papillary renal cell carcinomas (PRCC). Here, using comprehensive genomic profiling among >600,000 patients, we identify activating MET TKD point mutations as putative oncogenic driver across diverse cancers, with a frequency of ~0.5%. The most common mutations in the MET TKD defined as oncogenic or likely oncogenic according to OncoKB resulted in amino acid substitutions at positions H1094, L1195, F1200, D1228, Y1230, M1250, and others. Preclinical modeling of these alterations confirmed their oncogenic potential, and also demonstrated differential patterns of sensitivity to type I and type II MET inhibitors. Two patients with metastatic lung adenocarcinoma harboring MET TKD mutations (H1094Y, F1200I) and no other known oncogenic drivers achieved confirmed partial responses to a type I MET inhibitor. Activating MET TKD mutations occur in multiple malignancies and may confer clinical sensitivity to currently available MET inhibitors.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA