Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36611255

RESUMEN

Accurate in silico prediction of conformational B-cell epitopes would lead to major improvements in disease diagnostics, drug design and vaccine development. A variety of computational methods, mainly based on machine learning approaches, have been developed in the last decades to tackle this challenging problem. Here, we rigorously benchmarked nine state-of-the-art conformational B-cell epitope prediction webservers, including generic and antibody-specific methods, on a dataset of over 250 antibody-antigen structures. The results of our assessment and statistical analyses show that all the methods achieve very low performances, and some do not perform better than randomly generated patches of surface residues. In addition, we also found that commonly used consensus strategies that combine the results from multiple webservers are at best only marginally better than random. Finally, we applied all the predictors to the SARS-CoV-2 spike protein as an independent case study, and showed that they perform poorly in general, which largely recapitulates our benchmarking conclusions. We hope that these results will lead to greater caution when using these tools until the biases and issues that limit current methods have been addressed, promote the use of state-of-the-art evaluation methodologies in future publications and suggest new strategies to improve the performance of conformational B-cell epitope prediction methods.


Asunto(s)
Epítopos de Linfocito B , Glicoproteína de la Espiga del Coronavirus , Humanos , Biología Computacional/métodos , Epítopos de Linfocito B/inmunología , SARS-CoV-2 , Glicoproteína de la Espiga del Coronavirus/inmunología
2.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-38197311

RESUMEN

Understanding the impact of mutations on protein-protein binding affinity is a key objective for a wide range of biotechnological applications and for shedding light on disease-causing mutations, which are often located at protein-protein interfaces. Over the past decade, many computational methods using physics-based and/or machine learning approaches have been developed to predict how protein binding affinity changes upon mutations. They all claim to achieve astonishing accuracy on both training and test sets, with performances on standard benchmarks such as SKEMPI 2.0 that seem overly optimistic. Here we benchmarked eight well-known and well-used predictors and identified their biases and dataset dependencies, using not only SKEMPI 2.0 as a test set but also deep mutagenesis data on the severe acute respiratory syndrome coronavirus 2 spike protein in complex with the human angiotensin-converting enzyme 2. We showed that, even though most of the tested methods reach a significant degree of robustness and accuracy, they suffer from limited generalizability properties and struggle to predict unseen mutations. Interestingly, the generalizability problems are more severe for pure machine learning approaches, while physics-based methods are less affected by this issue. Moreover, undesirable prediction biases toward specific mutation properties, the most marked being toward destabilizing mutations, are also observed and should be carefully considered by method developers. We conclude from our analyses that there is room for improvement in the prediction models and suggest ways to check, assess and improve their generalizability and robustness.


Asunto(s)
Glicoproteína de la Espiga del Coronavirus , Humanos , Unión Proteica , Mutación , Sesgo
3.
Bioinformatics ; 40(2)2024 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-38335928

RESUMEN

MOTIVATION: The accurate prediction of how mutations change biophysical properties of proteins or RNA is a major goal in computational biology with tremendous impacts on protein design and genetic variant interpretation. Evolutionary approaches such as coevolution can help solving this issue. RESULTS: We present pycofitness, a standalone Python-based software package for the in silico mutagenesis of protein and RNA sequences. It is based on coevolution and, more specifically, on a popular inverse statistical approach, namely direct coupling analysis by pseudo-likelihood maximization. Its efficient implementation and user-friendly command line interface make it an easy-to-use tool even for researchers with no bioinformatics background. To illustrate its strengths, we present three applications in which pycofitness efficiently predicts the deleteriousness of genetic variants and the effect of mutations on protein fitness and thermodynamic stability. AVAILABILITY AND IMPLEMENTATION: https://github.com/KIT-MBS/pycofitness.


Asunto(s)
ARN , Programas Informáticos , ARN/genética , Secuencia de Aminoácidos , Biología Computacional , Proteínas
4.
Hum Genomics ; 18(1): 36, 2024 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-38627807

RESUMEN

Systematically predicting the effects of mutations on protein fitness is essential for the understanding of genetic diseases. Indeed, predictions complement experimental efforts in analyzing how variants lead to dysfunctional proteins that in turn can cause diseases. Here we present our new fitness predictor, FiTMuSiC, which leverages structural, evolutionary and coevolutionary information. We show that FiTMuSiC predicts fitness with high accuracy despite the simplicity of its underlying model: it was among the top predictors on the hydroxymethylbilane synthase (HMBS) target of the sixth round of the Critical Assessment of Genome Interpretation challenge (CAGI6) and performs as well as much more complex deep learning models such as AlphaMissense. To further demonstrate FiTMuSiC's robustness, we compared its predictions with in vitro activity data on HMBS, variant fitness data on human glucokinase (GCK), and variant deleteriousness data on HMBS and GCK. These analyses further confirm FiTMuSiC's qualities and accuracy, which compare favorably with those of other predictors. Additionally, FiTMuSiC returns two scores that separately describe the functional and structural effects of the variant, thus providing mechanistic insight into why the variant leads to fitness loss or gain. We also provide an easy-to-use webserver at https://babylone.ulb.ac.be/FiTMuSiC , which is freely available for academic use and does not require any bioinformatics expertise, which simplifies the accessibility of our tool for the entire scientific community.


Asunto(s)
Proteínas , Humanos , Mutación
5.
Int J Mol Sci ; 25(10)2024 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-38791470

RESUMEN

Antibodies play a central role in the adaptive immune response of vertebrates through the specific recognition of exogenous or endogenous antigens. The rational design of antibodies has a wide range of biotechnological and medical applications, such as in disease diagnosis and treatment. However, there are currently no reliable methods for predicting the antibodies that recognize a specific antigen region (or epitope) and, conversely, epitopes that recognize the binding region of a given antibody (or paratope). To fill this gap, we developed ImaPEp, a machine learning-based tool for predicting the binding probability of paratope-epitope pairs, where the epitope and paratope patches were simplified into interacting two-dimensional patches, which were colored according to the values of selected features, and pixelated. The specific recognition of an epitope image by a paratope image was achieved by using a convolutional neural network-based model, which was trained on a set of two-dimensional paratope-epitope images derived from experimental structures of antibody-antigen complexes. Our method achieves good performances in terms of cross-validation with a balanced accuracy of 0.8. Finally, we showcase examples of application of ImaPep, including extensive screening of large libraries to identify paratope candidates that bind to a selected epitope, and rescoring and refining antibody-antigen docking poses.


Asunto(s)
Epítopos , Redes Neurales de la Computación , Epítopos/inmunología , Epítopos/química , Aprendizaje Automático , Complejo Antígeno-Anticuerpo/química , Complejo Antígeno-Anticuerpo/inmunología , Humanos , Simulación del Acoplamiento Molecular , Anticuerpos/inmunología , Anticuerpos/química , Antígenos/inmunología , Sitios de Unión de Anticuerpos
6.
Bioinformatics ; 38(18): 4418-4419, 2022 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-35861514

RESUMEN

MOTIVATION: The SARS-CoV-2 virus has shown a remarkable ability to evolve and spread across the globe through successive waves of variants since the original Wuhan lineage. Despite all the efforts of the last 2 years, the early and accurate prediction of variant severity is still a challenging issue which needs to be addressed to help, for example, the decision of activating COVID-19 plans long before the peak of new waves. Upstream preparation would indeed make it possible to avoid the overflow of health systems and limit the most severe cases. RESULTS: We recently developed SpikePro, a structure-based computational model capable of quickly and accurately predicting the viral fitness of a variant from its spike protein sequence. It is based on the impact of mutations on the stability of the spike protein as well as on its binding affinity for the angiotensin-converting enzyme 2 (ACE2) and for a set of neutralizing antibodies. It yields a precise indication of the virus transmissibility, infectivity, immune escape and basic reproduction rate. We present here an updated version of the model that is now available on an easy-to-use webserver, and illustrate its power in a retrospective study of fitness evolution and reproduction rate of the main viral lineages. SpikePro is thus expected to be great help to assess the fitness of newly emerging SARS-CoV-2 variants in genomic surveillance and viral evolution programs. AVAILABILITY AND IMPLEMENTATION: SpikePro webserver http://babylone.ulb.ac.be/SpikePro/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , Glicoproteína de la Espiga del Coronavirus/genética , Estudios Retrospectivos , Peptidil-Dipeptidasa A , Mutación
7.
J Chem Inf Model ; 63(6): 1766-1775, 2023 03 27.
Artículo en Inglés | MEDLINE | ID: mdl-36877828

RESUMEN

The electronic properties of DNA molecules, defined by the sequence-dependent ionization potentials of nucleobases, enable long-range charge transport along the DNA stacks. This has been linked to a range of key physiological processes in the cells and to the triggering of nucleobase substitutions, some of which may cause diseases. To gain molecular-level understanding of the sequence dependence of these phenomena, we estimated the vertical ionization potential (vIP) of all possible nucleobase stacks in B-conformation, containing one to four Gua, Ade, Thy, Cyt, or methylated Cyt. To do this, we used quantum chemistry calculations and more precisely the second-order Møller-Plesset perturbation theory (MP2) and three double-hybrid density functional theory methods, combined with several basis sets for describing atomic orbitals. The calculated vIP of single nucleobases were compared to experimental data and those of nucleobase pairs, triplets, and quadruplets, to observed mutability frequencies in the human genome, reported to be correlated with vIP values. This comparison selected MP2 with the 6-31G* basis set as the best of the tested calculation levels. These results were exploited to set up a recursive model, called vIPer, which estimates the vIP of all possible single-stranded DNA sequences of any length based on the calculated vIPs of overlapping quadruplets. vIPer's vIP values correlate well with oxidation potentials measured by cyclic voltammetry and activities obtained through photoinduced DNA cleavage experiments, further validating our approach. vIPer is freely available on the github.com/3BioCompBio/vIPer repository.


Asunto(s)
ADN de Cadena Simple , ADN , Humanos , ADN/química , Conformación Molecular
8.
Brain ; 145(4): 1519-1534, 2022 05 24.
Artículo en Inglés | MEDLINE | ID: mdl-34788392

RESUMEN

With more than 40 causative genes identified so far, autosomal dominant cerebellar ataxias exhibit a remarkable genetic heterogeneity. Yet, half the patients are lacking a molecular diagnosis. In a large family with nine sampled affected members, we performed exome sequencing combined with whole-genome linkage analysis. We identified a missense variant in NPTX1, NM_002522.3:c.1165G>A: p.G389R, segregating with the phenotype. Further investigations with whole-exome sequencing and an amplicon-based panel identified four additional unrelated families segregating the same variant, for whom a common founder effect could be excluded. A second missense variant, NM_002522.3:c.980A>G: p.E327G, was identified in a fifth familial case. The NPTX1-associated phenotype consists of a late-onset, slowly progressive, cerebellar ataxia, with downbeat nystagmus, cognitive impairment reminiscent of cerebellar cognitive affective syndrome, myoclonic tremor and mild cerebellar vermian atrophy on brain imaging. NPTX1 encodes the neuronal pentraxin 1, a secreted protein with various cellular and synaptic functions. Both variants affect conserved amino acid residues and are extremely rare or absent from public databases. In COS7 cells, overexpression of both neuronal pentraxin 1 variants altered endoplasmic reticulum morphology and induced ATF6-mediated endoplasmic reticulum stress, associated with cytotoxicity. In addition, the p.E327G variant abolished neuronal pentraxin 1 secretion, as well as its capacity to form a high molecular weight complex with the wild-type protein. Co-immunoprecipitation experiments coupled with mass spectrometry analysis demonstrated abnormal interactions of this variant with the cytoskeleton. In agreement with these observations, in silico modelling of the neuronal pentraxin 1 complex evidenced a destabilizing effect for the p.E327G substitution, located at the interface between monomers. On the contrary, the p.G389 residue, located at the protein surface, had no predictable effect on the complex stability. Our results establish NPTX1 as a new causative gene in autosomal dominant cerebellar ataxias. We suggest that variants in NPTX1 can lead to cerebellar ataxia due to endoplasmic reticulum stress, mediated by ATF6, and associated to a destabilization of NP1 polymers in a dominant-negative manner for one of the variants.


Asunto(s)
Proteína C-Reactiva , Ataxia Cerebelosa , Estrés del Retículo Endoplásmico , Proteínas del Tejido Nervioso , Humanos , Proteína C-Reactiva/genética , Ataxia Cerebelosa/genética , Estrés del Retículo Endoplásmico/genética , Secuenciación del Exoma , Mutación , Proteínas del Tejido Nervioso/genética , Linaje
9.
Nucleic Acids Res ; 49(22): 12661-12672, 2021 12 16.
Artículo en Inglés | MEDLINE | ID: mdl-34871451

RESUMEN

Co-evolutionary models such as direct coupling analysis (DCA) in combination with machine learning (ML) techniques based on deep neural networks are able to predict accurate protein contact or distance maps. Such information can be used as constraints in structure prediction and massively increase prediction accuracy. Unfortunately, the same ML methods cannot readily be applied to RNA as they rely on large structural datasets only available for proteins. Here, we demonstrate how the available smaller data for RNA can be used to improve prediction of RNA contact maps. We introduce an algorithm called CoCoNet that is based on a combination of a Coevolutionary model and a shallow Convolutional Neural Network. Despite its simplicity and the small number of trained parameters, the method boosts the positive predictive value (PPV) of predicted contacts by about 70% with respect to DCA as tested by cross-validation of about eighty RNA structures. However, the direct inclusion of the CoCoNet contacts in 3D modeling tools does not result in a proportional increase of the 3D RNA structure prediction accuracy. Therefore, we suggest that the field develops, in addition to contact PPV, metrics which estimate the expected impact for 3D structure modeling tools better. CoCoNet is freely available and can be found at https://github.com/KIT-MBS/coconet.


Asunto(s)
Redes Neurales de la Computación , ARN/química , Algoritmos , Modelos Moleculares , Conformación de Ácido Nucleico , Riboswitch
10.
J Biol Chem ; 297(5): 101308, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34673030

RESUMEN

The design of allosteric modulators to control protein function is a key objective in drug discovery programs. Altering functionally essential allosteric residue networks provides unique protein family subtype specificity, minimizes unwanted off-target effects, and helps avert resistance acquisition typically plaguing drugs that target orthosteric sites. In this work, we used protein engineering and dimer interface mutations to positively and negatively modulate the immunosuppressive activity of the proapoptotic human galectin-7 (GAL-7). Using the PoPMuSiC and BeAtMuSiC algorithms, mutational sites and residue identity were computationally probed and predicted to either alter or stabilize the GAL-7 dimer interface. By designing a covalent disulfide bridge between protomers to control homodimer strength and stability, we demonstrate the importance of dimer interface perturbations on the allosteric network bridging the two opposite glycan-binding sites on GAL-7, resulting in control of induced apoptosis in Jurkat T cells. Molecular investigation of G16X GAL-7 variants using X-ray crystallography, biophysical, and computational characterization illuminates residues involved in dimer stability and allosteric communication, along with discrete long-range dynamic behaviors involving loops 1, 3, and 5. We show that perturbing the protein-protein interface between GAL-7 protomers can modulate its biological function, even when the overall structure and ligand-binding affinity remains unaltered. This study highlights new avenues for the design of galectin-specific modulators influencing both glycan-dependent and glycan-independent interactions.


Asunto(s)
Apoptosis , Galectinas , Tolerancia Inmunológica , Multimerización de Proteína , Linfocitos T/inmunología , Regulación Alostérica , Apoptosis/genética , Apoptosis/inmunología , Galectinas/química , Galectinas/genética , Galectinas/inmunología , Humanos , Células Jurkat , Multimerización de Proteína/genética , Multimerización de Proteína/inmunología
11.
RNA ; 26(7): 794-802, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32276988

RESUMEN

RNA molecules play many pivotal roles in a cell that are still not fully understood. Any detailed understanding of RNA function requires knowledge of its three-dimensional structure, yet experimental RNA structure resolution remains demanding. Recent advances in sequencing provide unprecedented amounts of sequence data that can be statistically analyzed by methods such as direct coupling analysis (DCA) to determine spatial proximity or contacts of specific nucleic acid pairs, which improve the quality of structure prediction. To quantify this structure prediction improvement, we here present a well curated data set of about 70 RNA structures of high resolution and compare different nucleotide-nucleotide contact prediction methods available in the literature. We observe only minor differences between the performances of the different methods. Moreover, we discuss how robust these predictions are for different contact definitions and how strongly they depend on procedures used to curate and align the families of homologous RNA sequences.


Asunto(s)
ARN/genética , Análisis de Datos , Conjuntos de Datos como Asunto , Conformación de Ácido Nucleico , Alineación de Secuencia/métodos
12.
Bioinformatics ; 38(1): 265-266, 2021 12 22.
Artículo en Inglés | MEDLINE | ID: mdl-34165491

RESUMEN

MOTIVATION: High-throughput experiments are generating ever increasing amounts of various -omics data, so shedding new light on the link between human disorders, their genetic causes and the related impact on protein behavior and structure. While numerous bioinformatics tools now exist that predict which variants in the human exome cause diseases, few tools predict the reasons why they might do so. Yet, understanding the impact of variants at the molecular level is a prerequisite for the rational development of targeted drugs or personalized therapies. RESULTS: We present the updated MutaFrame webserver, which aims to meet this need. It offers two deleteriousness prediction softwares, DEOGEN2 and SNPMuSiC, and is designed for bioinformaticians and medical researchers who want to gain insights into the origins of monogenic diseases. It contains information at two levels for each human protein: its amino acid sequence and its three-dimensional structure; we used the experimental structures whenever available, and modeled structures otherwise. MutaFrame also includes higher-level information, such as protein essentiality and protein-protein interactions. It has a user-friendly interface for the interpretation of results and a convenient visualization system for protein structures, in which the variant positions introduced by the user and other structural information are shown. In this way, MutaFrame aids our understanding of the pathogenic processes caused by single-site mutations and their molecular and contextual interpretation. AVAILABILITY AND IMPLEMENTATION: Mutaframe webserver at http://mutaframe.com/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Exoma , Humanos , Programas Informáticos , Proteínas , Mutación Missense
13.
Bioinformatics ; 37(14): 1963­1971, 2021 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-33471089

RESUMEN

MOTIVATION: Although structured proteins adopt their lowest free energy conformation in physiological conditions, the individual residues are generally not in their lowest free energy conformation. Residues that are stability weaknesses are often involved in functional regions, whereas stability strengths ensure local structural stability. The detection of strengths and weaknesses provides key information to guide protein engineering experiments aiming to modulate folding and various functional processes. RESULTS: We developed the SWOTein predictor which identifies strong and weak residues in proteins on the basis of three types of statistical energy functions describing local interactions along the chain, hydrophobic forces and tertiary interactions. The large-scale analysis of the different types of strengths and weaknesses demonstrated their complementarity and the enhancement of the information they provide. Moreover, a good average correlation was observed between predicted and experimental strengths and weaknesses obtained from native hydrogen exchange data. SWOTein application to three test cases further showed its suitability to predict and interpret strong and weak residues in the context of folding, conformational changes and protein-protein binding. In summary, SWOTein is both fast and accurate and can be applied at small and large scale to analyze and modulate folding and molecular recognition processes. AVAILABILITY: The SWOTein webserver provides the list of predicted strengths and weaknesses and a protein structure visualization tool that facilitates the interpretation of the predictions. It is freely available for academic use at http://babylone.ulb.ac.be/SWOTein/.

14.
Int J Mol Sci ; 23(4)2022 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-35216194

RESUMEN

SARS-CoV-2 infection elicits a polyclonal neutralizing antibody (nAb) response that primarily targets the spike protein, but it is still unclear which nAbs are immunodominant and what distinguishes them from subdominant nAbs. This information would however be crucial to predict the evolutionary trajectory of the virus and design future vaccines. To shed light on this issue, we gathered 83 structures of nAbs in complex with spike protein domains. We analyzed in silico the ability of these nAbs to bind the full spike protein trimer in open and closed conformations, and predicted the change in binding affinity of the most frequently observed spike protein variants in the circulating strains. This led us to define four nAb classes with distinct variant escape fractions. By comparing these fractions with those measured from plasma of infected patients, we showed that the class of nAbs that most contributes to the immune response is able to bind the spike protein in its closed conformation. Although this class of nAbs only partially inhibits the spike protein binding to the host's angiotensin converting enzyme 2 (ACE2), it has been suggested to lock the closed pre-fusion spike protein conformation and therefore prevent its transition to an open state. Furthermore, comparison of our predictions with mRNA-1273 vaccinated patient plasma measurements suggests that spike proteins contained in vaccines elicit a different nAb class than the one elicited by natural SARS-CoV-2 infection and suggests the design of highly stable closed-form spike proteins as next-generation vaccine immunogens.


Asunto(s)
Anticuerpos Neutralizantes/inmunología , SARS-CoV-2/metabolismo , Glicoproteína de la Espiga del Coronavirus/inmunología , Enzima Convertidora de Angiotensina 2/química , Enzima Convertidora de Angiotensina 2/metabolismo , Anticuerpos Monoclonales/inmunología , Reacciones Antígeno-Anticuerpo , COVID-19/patología , COVID-19/virología , Epítopos/inmunología , Humanos , Mutagénesis , Unión Proteica , Conformación Proteica , SARS-CoV-2/aislamiento & purificación , Glicoproteína de la Espiga del Coronavirus/química , Glicoproteína de la Espiga del Coronavirus/genética , Glicoproteína de la Espiga del Coronavirus/metabolismo
15.
Bioinformatics ; 36(7): 2264-2265, 2020 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-31778142

RESUMEN

MOTIVATION: The ongoing advances in sequencing technologies have provided a massive increase in the availability of sequence data. This made it possible to study the patterns of correlated substitution between residues in families of homologous proteins or RNAs and to retrieve structural and stability information. Direct coupling analysis (DCA) infers coevolutionary couplings between pairs of residues indicating their spatial proximity, making such information a valuable input for subsequent structure prediction. RESULTS: Here, we present pydca, a standalone Python-based software package for the DCA of protein- and RNA-homologous families. It is based on two popular inverse statistical approaches, namely, the mean-field and the pseudo-likelihood maximization and is equipped with a series of functionalities that range from multiple sequence alignment trimming to contact map visualization. Thanks to its efficient implementation, features and user-friendly command line interface, pydca is a modular and easy-to-use tool that can be used by researchers with a wide range of backgrounds. AVAILABILITY AND IMPLEMENTATION: pydca can be obtained from https://github.com/KIT-MBS/pydca or from the Python Package Index under the MIT License. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
ARN , Programas Informáticos , Secuencia de Aminoácidos , Proteínas , Alineación de Secuencia
16.
Bioinformatics ; 36(5): 1445-1452, 2020 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-31603466

RESUMEN

MOTIVATION: The solubility of a protein is often decisive for its proper functioning. Lack of solubility is a major bottleneck in high-throughput structural genomic studies and in high-concentration protein production, and the formation of protein aggregates causes a wide variety of diseases. Since solubility measurements are time-consuming and expensive, there is a strong need for solubility prediction tools. RESULTS: We have recently introduced solubility-dependent distance potentials that are able to unravel the role of residue-residue interactions in promoting or decreasing protein solubility. Here, we extended their construction by defining solubility-dependent potentials based on backbone torsion angles and solvent accessibility, and integrated them, together with other structure- and sequence-based features, into a random forest model trained on a set of Escherichia coli proteins with experimental structures and solubility values. We thus obtained the SOLart protein solubility predictor, whose most informative features turned out to be folding free energy differences computed from our solubility-dependent statistical potentials. SOLart performances are very good, with a Pearson correlation coefficient between experimental and predicted solubility values of almost 0.7 both in cross-validation on the training dataset and in an independent set of Saccharomyces cerevisiae proteins. On test sets of modeled structures, only a limited drop in performance is observed. SOLart can thus be used with both high-resolution and low-resolution structures, and clearly outperforms state-of-art solubility predictors. It is available through a user-friendly webserver, which is easy to use by non-expert scientists. AVAILABILITY AND IMPLEMENTATION: The SOLart webserver is freely available at http://babylone.ulb.ac.be/SOLART/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Proteínas de Escherichia coli , Solubilidad , Solventes
17.
BMC Biol ; 18(1): 146, 2020 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-33081759

RESUMEN

BACKGROUND: How, and the extent to which, evolution acts on DNA and protein sequences to ensure mutational robustness and evolvability is a long-standing open question in the field of molecular evolution. We addressed this issue through the first structurome-scale computational investigation, in which we estimated the change in folding free energy upon all possible single-site mutations introduced in more than 20,000 protein structures, as well as through available experimental stability and fitness data. RESULTS: At the amino acid level, we found the protein surface to be more robust against random mutations than the core, this difference being stronger for small proteins. The destabilizing and neutral mutations are more numerous in the core and on the surface, respectively, whereas the stabilizing mutations are about 4% in both regions. At the genetic code level, we observed smallest destabilization for mutations that are due to substitutions of base III in the codon, followed by base I, bases I+III, base II, and other multiple base substitutions. This ranking highly anticorrelates with the codon-anticodon mispairing frequency in the translation process. This suggests that the standard genetic code is optimized to limit the impact of random mutations, but even more so to limit translation errors. At the codon level, both the codon usage and the usage bias appear to optimize mutational robustness and translation accuracy, especially for surface residues. CONCLUSION: Our results highlight the non-universality of mutational robustness and its multiscale dependence on protein features, the structure of the genetic code, and the codon usage. Our analyses and approach are strongly supported by available experimental mutagenesis data.


Asunto(s)
Código Genético , Mutagénesis , Mutación , Biosíntesis de Proteínas , Uso de Codones , Simulación por Computador
18.
Int J Mol Sci ; 22(9)2021 Apr 26.
Artículo en Inglés | MEDLINE | ID: mdl-33925997

RESUMEN

Sphingomyelin phosphodiesterase (SMPD1) is a key enzyme in the sphingolipid metabolism. Genetic SMPD1 variants have been related to the Niemann-Pick lysosomal storage disorder, which has different degrees of phenotypic severity ranging from severe symptomatology involving the central nervous system (type A) to milder ones (type B). They have also been linked to neurodegenerative disorders such as Parkinson and Alzheimer. In this paper, we leveraged structural, evolutionary and stability information on SMPD1 to predict and analyze the impact of variants at the molecular level. We developed the SMPD1-ZooM algorithm, which is able to predict with good accuracy whether variants cause Niemann-Pick disease and its phenotypic severity; the predictor is freely available for download. We performed a large-scale analysis of all possible SMPD1 variants, which led us to identify protein regions that are either robust or fragile with respect to amino acid variations, and show the importance of aromatic-involving interactions in SMPD1 function and stability. Our study also revealed a good correlation between SMPD1-ZooM scores and in vitro loss of SMPD1 activity. The understanding of the molecular effects of SMPD1 variants is of crucial importance to improve genetic screening of SMPD1-related disorders and to develop personalized treatments that restore SMPD1 functionality.


Asunto(s)
Enfermedades de Niemann-Pick/genética , Esfingomielina Fosfodiesterasa/genética , Simulación por Computador , Bases de Datos Genéticas , Exones/genética , Variación Genética/genética , Humanos , Mutación/genética , Enfermedades de Niemann-Pick/metabolismo , Fenotipo , Índice de Severidad de la Enfermedad , Esfingolípidos/genética , Esfingolípidos/metabolismo , Esfingomielina Fosfodiesterasa/metabolismo
19.
Methods ; 162-163: 68-73, 2019 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-31028927

RESUMEN

Structured RNA plays many functionally relevant roles in molecular life. Structural information, while required to understand the functional cycles in detail, is challenging to gather. Computational methods promise to complement experimental efforts by predicting three-dimensional RNA models. Here, we provide a concise view of the state of the art methodologies with a focus on the strengths and the weaknesses of the different approaches. Furthermore, we analyzed the recent developments regarding the use of coevolutionary information and how it can boost the prediction performances. We finally discuss some open perspectives and challenges for the near future in the RNA structural stability field.


Asunto(s)
Biología Computacional/métodos , Modelos Moleculares , Conformación de Ácido Nucleico , ARN/química , Análisis de Secuencia de ARN/métodos , ARN/genética , Estabilidad del ARN/genética , Programas Informáticos
20.
BMC Genomics ; 20(Suppl 8): 551, 2019 Jul 16.
Artículo en Inglés | MEDLINE | ID: mdl-31307386

RESUMEN

BACKGROUND: It is nowadays clear that single base substitutions that occur in the human genome, of which some lead to pathogenic conditions, are non-random and influenced by their flanking nucleobase sequences. However, despite recent progress, the understanding of these "non-local" effects is still far from being achieved. RESULTS: To advance this problem, we analyzed the relationship between the base mutability in specific gene regions and the electron hole transport along the DNA base stacks, as it is one of the mechanisms that have been suggested to contribute to these effects. More precisely, we studied the connection between the normalized frequency of single base substitutions and the vertical ionization potential of the base and its flanking sequence, estimated using MP2/6-31G* ab initio quantum chemistry calculations. We found a statistically significant overall anticorrelation between these two quantities: the lower the vIP value, the more probable the substitution. Moreover, the slope of the regression lines varies. It is larger for introns than for exons and untranslated regions, and for synonymous than for missense substitutions. Interestingly, the correlation appears to be more pronounced when considering the flanking sequence of the substituted base in the 3' rather than in the 5' direction, which corresponds to the preferred direction of charge migration. A weaker but still statistically significant correlation is found between the ionization potentials and the pathogenicity of the base substitutions. Moreover, pathogenicity is also preferentially associated with larger changes in ionization potentials upon base substitution. CONCLUSIONS: With this analysis we gained new insights into the complex biophysical mechanisms that are at the basis of mutagenesis and pathogenicity, and supported the role of electron-hole transport in these matters.


Asunto(s)
Biología Computacional/métodos , ADN/química , ADN/genética , Enfermedad/genética , Electrones , Polimorfismo de Nucleótido Simple , Bases de Datos Genéticas , Mutación , Motivos de Nucleótidos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA