RESUMEN
Mutations, which result in amino acid substitutions, influence the stability of proteins and their binding to biomolecules. A molecular understanding of the effects of protein mutations is both of biotechnological and medical relevance. Empirical free energy functions that quickly estimate the free energy change upon mutation (ΔΔG) can be exploited for systematic screenings of proteins and protein complexes. In silico saturation mutagenesis can guide the design of new experiments or rationalize the consequences of known mutations. Often software such as FoldX, while fast and reliable, lack the necessary automation features to apply them in a high-throughput manner. We introduce MutateX, a software to automate the prediction of ΔΔGs associated with the systematic mutation of each residue within a protein, or protein complex to all other possible residue types, using the FoldX energy function. MutateX also supports ΔΔG calculations over protein ensembles, upon post-translational modifications and in multimeric assemblies. At the heart of MutateX lies an automated pipeline engine that handles input preparation, parallelization and outputs publication-ready figures. We illustrate the MutateX protocol applied to different case studies. The results of the high-throughput scan provided by our tools can help in different applications, such as the analysis of disease-associated mutations, to complement experimental deep mutational scans, or assist the design of variants for industrial applications. MutateX is a collection of Python tools that relies on open-source libraries. It is available free of charge under the GNU General Public License from https://github.com/ELELAB/mutatex.
Asunto(s)
Proteínas , Programas Informáticos , Sustitución de Aminoácidos , Mutagénesis , Mutación , Proteínas/química , Proteínas/genéticaRESUMEN
Computational methods relying on protein structure strongly depend on the structure selected for investigation. Typical sources of protein structures include experimental structures available at the Protein Data Bank (PDB) and high-quality in silico model structures, such as those available at the AlphaFold Protein Structure Database. Either option has significant advantages and drawbacks, and exploring the wealth of available structures to identify the most suitable ones for specific applications can be a daunting task. We provide an open-source software package, PDBminer, with the purpose of making structure identification and selection easier, faster, and less error prone. PDBminer searches the AlphaFold Database and the PDB for available structures of interest and provides an up-to-date, quality-ranked table of structures applicable for further use. PDBminer provides an overview of the available protein structures to one or more input proteins, parallelizing the runs if multiple cores are specified. The output table reports the coverage of the protein structures aligned to the UniProt sequence, overcoming numbering differences in PDB structures and providing information regarding model quality, protein complexes, ligands, and nucleic acid chain binding. The PDBminer2coverage and PDBminer2network tools assist in visualizing the results. PDBminer can be applied to overcome the tedious task of choosing a PDB structure without losing the wealth of additional information available in the PDB. Here, we showcase the main functionalities of the package on the p53 tumor suppressor protein. The package is available at http://github.com/ELELAB/PDBminer.
Asunto(s)
Proteínas , Programas Informáticos , Proteínas/química , Simulación por Computador , Bases de Datos de Proteínas , LigandosRESUMEN
Due to the complex nature of noncovalent interactions and their long-range effects, analyzing protein conformations using network theory can be enlightening. Protein Structure Networks (PSNs) provide a convenient formalism to study protein structures in relation to essential properties such as key residues for structural stability, allosteric communication, and the effects of modifications of the protein. PSNs can be defined according to very different principles, and the available tools have limitations in input formats, supported models, and version control. Other outstanding problems are related to the definition of network cutoffs and the assessment of the stability of the network properties. The protein science community could benefit from a common framework to carry out these analyses and make them easier to reproduce, reuse, and evaluate. We here provide two open-source software packages, PyInteraph2 and PyInKnife2, to implement and analyze PSNs in a reproducible and documented manner. PyInteraph2 interfaces with multiple formats for protein ensembles and incorporates different network models with the possibility of integrating them into a macronetwork and performing various downstream analyses, including hubs, connected components, and several other centrality measures, and visualizes the networks or further analyzes them thanks to compatibility with Cytoscape.PyInKnife2 that supports the network models implemented in PyInteraph2. It employs a jackknife resampling approach to estimate the convergence of network properties and streamline the selection of distance cutoffs. We foresee that the modular structure of the code and the supported version control system will promote the transition to a community-driven effort, boost reproducibility, and establish common protocols in the PSN field. As developers, we will guarantee the introduction of new functionalities and maintenance, assistance, and training of new contributors.
Asunto(s)
Proteínas , Programas Informáticos , Reproducibilidad de los Resultados , Proteínas/química , Conformación ProteicaRESUMEN
Reliable prediction of free energy changes upon amino acid substitutions (ΔΔGs) is crucial to investigate their impact on protein stability and protein-protein interaction. Advances in experimental mutational scans allow high-throughput studies thanks to multiplex techniques. On the other hand, genomics initiatives provide a large amount of data on disease-related variants that can benefit from analyses with structure-based methods. Therefore, the computational field should keep the same pace and provide new tools for fast and accurate high-throughput ΔΔG calculations. In this context, the Rosetta modeling suite implements effective approaches to predict folding/unfolding ΔΔGs in a protein monomer upon amino acid substitutions and calculate the changes in binding free energy in protein complexes. However, their application can be challenging to users without extensive experience with Rosetta. Furthermore, Rosetta protocols for ΔΔG prediction are designed considering one variant at a time, making the setup of high-throughput screenings cumbersome. For these reasons, we devised RosettaDDGPrediction, a customizable Python wrapper designed to run free energy calculations on a set of amino acid substitutions using Rosetta protocols with little intervention from the user. Moreover, RosettaDDGPrediction assists with checking completed runs and aggregates raw data for multiple variants, as well as generates publication-ready graphics. We showed the potential of the tool in four case studies, including variants of uncertain significance in childhood cancer, proteins with known experimental unfolding ΔΔGs values, interactions between target proteins and disordered motifs, and phosphomimetics. RosettaDDGPrediction is available, free of charge and under GNU General Public License v3.0, at https://github.com/ELELAB/RosettaDDGPrediction.
Asunto(s)
Proteínas , Programas Informáticos , Proteínas/química , Mutación , Entropía , Estabilidad ProteicaRESUMEN
The tumor protein 53 (p53) is involved in transcription-dependent and independent processes. Several p53 variants related to cancer have been found to impact protein stability. Other variants, on the contrary, might have little impact on structural stability and have local or long-range effects on the p53 interactome. Our group previously identified a loop in the DNA binding domain (DBD) of p53 (residues 207-213) which can recruit different interactors. Experimental structures of p53 in complex with other proteins strengthen the importance of this interface for protein-protein interactions. We here characterized with structure-based approaches somatic and germline variants of p53 which could have a marginal effect in terms of stability and act locally or allosterically on the region 207-213 with consequences on the cytosolic functions of this protein. To this goal, we studied 1132 variants in the p53 DBD with structure-based approaches, accounting also for protein dynamics. We focused on variants predicted with marginal effects on structural stability. We then investigated each of these variants for their impact on DNA binding, dimerization of the p53 DBD, and intramolecular contacts with the 207-213 region. Furthermore, we identified variants that could modulate long-range the conformation of the region 207-213 using a coarse-grain model for allostery and all-atom molecular dynamics simulations. Our predictions have been further validated using enhanced sampling methods for 15 variants. The methodologies used in this study could be more broadly applied to other p53 variants or cases where conformational changes of loop regions are essential in the function of disease-related proteins.
Asunto(s)
Neoplasias , Proteína p53 Supresora de Tumor , Regulación Alostérica/genética , ADN/química , Humanos , Simulación de Dinámica Molecular , Mutación , Neoplasias/genética , Unión Proteica , Dominios Proteicos , Proteína p53 Supresora de Tumor/química , Proteína p53 Supresora de Tumor/genéticaRESUMEN
Climate change and emerging drug resistance make the control of many infectious diseases increasingly challenging and diminish the exclusive reliance on drug treatment as sole solution to the problem. As disease transmission often depends on environmental conditions that can be modified, such modifications may become crucial to risk reduction if we can assess their potential benefit at policy-relevant scales. However, so far, the value of environmental management for this purpose has received little attention. Here, using the parasitic disease of fasciolosis in livestock in the UK as a case study, we demonstrate how mechanistic hydro-epidemiological modelling can be applied to understand disease risk drivers and the efficacy of environmental management across a large heterogeneous domain. Our results show how weather and other environmental characteristics interact to define disease transmission potential and reveal that environmental interventions such as risk avoidance management strategies can provide a valuable alternative or complement to current treatment-based control practice.
Asunto(s)
Control de Enfermedades Transmisibles/métodos , Ambiente , Fascioliasis/prevención & control , Ganado/parasitología , Animales , Bovinos , Fasciola/patogenicidad , Fascioliasis/transmisión , Fascioliasis/veterinaria , Hidrología , Modelos EstadísticosRESUMEN
The majority of existing models for predicting disease risk in response to climate change are empirical. These models exploit correlations between historical data, rather than explicitly describing relationships between cause and response variables. Therefore, they are unsuitable for capturing impacts beyond historically observed variability and have limited ability to guide interventions. In this study, we integrate environmental and epidemiological processes into a new mechanistic model, taking the widespread parasitic disease of fasciolosis as an example. The model simulates environmental suitability for disease transmission at a daily time step and 25 m resolution, explicitly linking the parasite life cycle to key weather-water-environment conditions. Using epidemiological data, we show that the model can reproduce observed infection levels in time and space for two case studies in the UK. To overcome data limitations, we propose a calibration approach combining Monte Carlo sampling and expert opinion, which allows constraint of the model in a process-based way, including a quantification of uncertainty. The simulated disease dynamics agree with information from the literature, and comparison with a widely used empirical risk index shows that the new model provides better insight into the time-space patterns of infection, which will be valuable for decision support.