RESUMEN
Enhancing protein stability holds paramount significance in biotechnology, therapeutics, and the food industry. Circular permutations offer a distinctive avenue for manipulating protein stability while keeping intra-protein interactions intact. Amidst the creation of circular permutants, determining the optimal placement of the new N- and C-termini stands as a pivotal, albeit largely unexplored, endeavor. In this study, we employed PONDR-FIT's predictions of disorder propensity to guide the design of circular permutants for the GroEL apical domain (residues 191-345). Our underlying hypothesis posited that a higher predicted disorder value would correspond to reduced stability in the circular permutants, owing to the increased likelihood of fluctuations in the novel N- and C-termini. To substantiate this hypothesis, we engineered six circular permutants, positioning glycines within the loops as locations for the new N- and C-termini. We demonstrated the validity of our hypothesis along the set of the designed circular permutants, as supported by measurements of melting temperatures by circular dichroism and differential scanning microcalorimetry. Consequently, we propose a novel computational methodology that rationalizes the design of circular permutants with projected stability. Video Abstract.
RESUMEN
MOTIVATION: Prediction of protein stability change upon mutation (ΔΔG) is crucial for facilitating protein engineering and understanding of protein folding principles. Robust prediction of protein folding free energy change requires the knowledge of protein three-dimensional (3D) structure. In case, protein 3D structure is not available, one can predict the structure from protein sequence; however, the perspectives of ΔΔG predictions for predicted protein structures are unknown. The accuracy of using 3D structures of the best templates for the ΔΔG prediction is also unclear. RESULTS: To investigate these questions, we used a representative set of seven diverse and accurate publicly available tools (FoldX, Eris, Rosetta, DDGun, ACDC-NN, ThermoNet and DynaMut) for stability change prediction combined with AlphaFold or I-Tasser for protein 3D structure prediction. We found that best templates perform consistently better than (or similar to) homology models for all ΔΔG predictors. Our findings imply using the best template structure for the prediction of protein stability change upon mutation if the protein 3D structure is not available. AVAILABILITY AND IMPLEMENTATION: The data are available at https://github.com/ivankovlab/template-vs-model. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Algoritmos , Proteínas , Estabilidad Proteica , Proteínas/genética , Proteínas/química , Mutación , Pliegue de ProteínaRESUMEN
Mutations that prevent the production of proteins in the DMD gene cause Duchenne muscular dystrophy. Most frequently, these are deletions leading to reading-frame shift. The "reading-frame rule" states that deletions that preserve ORF result in a milder Becker muscular dystrophy. By removing several exons, new genome editing tools enable reading-frame restoration in DMD with the production of BMD-like dystrophins. However, not every truncated dystrophin with a significant internal loss functions properly. To determine the effectiveness of potential genome editing, each variant should be carefully studied in vitro or in vivo. In this study, we focused on the deletion of exons 8-50 as a potential reading-frame restoration option. Using the CRISPR-Cas9 tool, we created the novel mouse model DMDdel8-50, which has an in-frame deletion in the DMD gene. We compared DMDdel8-50 mice to C57Bl6/CBA background control mice and previously generated DMDdel8-34 KO mice. We discovered that the shortened protein was expressed and correctly localized on the sarcolemma. The truncated protein, on the other hand, was unable to function like a full-length dystrophin and prevent disease progression. On the basis of protein expression, histological examination, and physical assessment of the mice, we concluded that the deletion of exons 8-50 is an exception to the reading-frame rule.
Asunto(s)
Distrofina , Distrofia Muscular de Duchenne , Ratones , Animales , Distrofina/genética , Ratones Endogámicos CBA , Distrofia Muscular de Duchenne/metabolismo , Fenotipo , Exones/genética , Eliminación de GenRESUMEN
Fitness landscapes depict how genotypes manifest at the phenotypic level and form the basis of our understanding of many areas of biology, yet their properties remain elusive. Previous studies have analysed specific genes, often using their function as a proxy for fitness, experimentally assessing the effect on function of single mutations and their combinations in a specific sequence or in different sequences. However, systematic high-throughput studies of the local fitness landscape of an entire protein have not yet been reported. Here we visualize an extensive region of the local fitness landscape of the green fluorescent protein from Aequorea victoria (avGFP) by measuring the native function (fluorescence) of tens of thousands of derivative genotypes of avGFP. We show that the fitness landscape of avGFP is narrow, with 3/4 of the derivatives with a single mutation showing reduced fluorescence and half of the derivatives with four mutations being completely non-fluorescent. The narrowness is enhanced by epistasis, which was detected in up to 30% of genotypes with multiple mutations and mostly occurred through the cumulative effect of slightly deleterious mutations causing a threshold-like decrease in protein stability and a concomitant loss of fluorescence. A model of orthologous sequence divergence spanning hundreds of millions of years predicted the extent of epistasis in our data, indicating congruence between the fitness landscape properties at the local and global scales. The characterization of the local fitness landscape of avGFP has important implications for several fields including molecular evolution, population genetics and protein design.
Asunto(s)
Aptitud Genética , Proteínas Fluorescentes Verdes/genética , Proteínas Fluorescentes Verdes/metabolismo , Animales , Epistasis Genética , Evolución Molecular , Fluorescencia , Estudios de Asociación Genética , Genotipo , Hidrozoos/química , Hidrozoos/genética , Proteínas Mutantes/genética , Proteínas Mutantes/metabolismo , Mutación/genética , FenotipoRESUMEN
Characterizing the fitness landscape, a representation of fitness for a large set of genotypes, is key to understanding how genetic information is interpreted to create functional organisms. Here we determined the evolutionarily-relevant segment of the fitness landscape of His3, a gene coding for an enzyme in the histidine synthesis pathway, focusing on combinations of amino acid states found at orthologous sites of extant species. Just 15% of amino acids found in yeast His3 orthologues were always neutral while the impact on fitness of the remaining 85% depended on the genetic background. Furthermore, at 67% of sites, amino acid replacements were under sign epistasis, having both strongly positive and negative effect in different genetic backgrounds. 46% of sites were under reciprocal sign epistasis. The fitness impact of amino acid replacements was influenced by only a few genetic backgrounds but involved interaction of multiple sites, shaping a rugged fitness landscape in which many of the shortest paths between highly fit genotypes are inaccessible.
Asunto(s)
Evolución Molecular , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Aptitud Genética , Levaduras/genética , Levaduras/metabolismo , Secuencia de Aminoácidos , Sustitución de Aminoácidos , Aminoácidos/genética , Aminoácidos/metabolismo , Epistasis Genética , Proteínas Fúngicas/química , Genes Fúngicos , Genotipo , Hidroliasas/química , Hidroliasas/genética , Hidroliasas/metabolismo , Modelos Genéticos , Modelos Moleculares , Filogenia , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismoRESUMEN
Elevated plasma levels of hyaluronic acid (HA) is a disease marker in liver pathology and other inflammatory disorders. Inhibition of HA synthesis with coumarin 4-methylumbelliferone (4MU) has a beneficial effect in animal models of fibrosis, inflammation, cancer and metabolic syndrome. 4MU is an active compound of approved choleretic drug hymecromone with low bioavailability and a broad spectrum of action. New, more specific and efficient inhibitors of hyaluronan synthases (HAS) are required. We have tested several newly synthesized coumarin compounds and commercial chitin synthesis inhibitors to inhibit HA production in cell culture assay. Coumarin derivative compound VII (10'-methyl-6'-phenyl-3'H-spiro[piperidine-4,2'-pyrano[3,2-g]chromene]-4',8'-dione) demonstrated inhibition of HA secretion by NIH3T3 cells with the half-maximal inhibitory concentration (IC50) = 1.69 ± 0.75 µΜ superior to 4MU (IC50 = 8.68 ± 1.6 µΜ). Inhibitors of chitin synthesis, etoxazole, buprofezin, triflumuron, reduced HA deposition with IC50 of 4.21 ± 3.82 µΜ, 1.24 ± 0.87 µΜ and 1.48 ± 1.44 µΜ, respectively. Etoxazole reduced HA production and prevented collagen fibre formation in the CCl4 liver fibrosis model in mice similar to 4MU. Bioinformatics analysis revealed homology between chitin synthases and HAS enzymes, particularly in the pore-forming domain, containing the proposed site for etoxazole binding.
Asunto(s)
Ácido Hialurónico , Himecromona , Animales , Quitina , Hialuronano Sintasas/metabolismo , Ácido Hialurónico/metabolismo , Himecromona/farmacología , Ratones , Células 3T3 NIHRESUMEN
MOTIVATION: Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a "combinatorially complete dataset". So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets. RESULTS: We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data. AVAILABILITY: https://github.com/ivankovlab/HypercubeME.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMEN
Motivation: Computational prediction of the effect of mutations on protein stability is used by researchers in many fields. The utility of the prediction methods is affected by their accuracy and bias. Bias, a systematic shift of the predicted change of stability, has been noted as an issue for several methods, but has not been investigated systematically. Presence of the bias may lead to misleading results especially when exploring the effects of combination of different mutations. Results: Here we use a protocol to measure the bias as a function of the number of introduced mutations. It is based on a self-consistency test of the reciprocity the effect of a mutation. An advantage of the used approach is that it relies solely on crystal structures without experimentally measured stability values. We applied the protocol to four popular algorithms predicting change of protein stability upon mutation, FoldX, Eris, Rosetta and I-Mutant, and found an inherent bias. For one program, FoldX, we manage to substantially reduce the bias using additional relaxation by Modeller. Authors using algorithms for predicting effects of mutations should be aware of the bias described here. Availability and implementation: All calculations were implemented by in-house PERL scripts. Supplementary information: Supplementary data are available at Bioinformatics online. Note: The article 10.1093/bioinformatics/bty348, published alongside this paper, also addresses the problem of biases in protein stability change predictions.
Asunto(s)
Proteínas/genética , Programas Informáticos , Algoritmos , Sesgo , Mutación , Estabilidad ProteicaRESUMEN
The ability of protein chains to spontaneously form their spatial structures is a long-standing puzzle in molecular biology. Experimentally measured rates of spontaneous folding of single-domain globular proteins range from microseconds to hours: the difference (11 orders of magnitude) is akin to the difference between the life span of a mosquito and the age of the universe. Here, we show that physical theory with biological constraints outlines a "golden triangle" limiting the possible range of folding rates for single-domain globular proteins of various size and stability, and that the experimentally measured folding rates fall within this narrow triangle built without any adjustable parameters, filling it almost completely. In addition, the golden triangle predicts the maximal size of protein domains that fold under solely thermodynamic (rather than kinetic) control. It also predicts the maximal allowed size of the "foldable" protein domains, and the size of domains found in known protein structures is in a good agreement with this limit.
Asunto(s)
Modelos Biológicos , Modelos Moleculares , Pliegue de Proteína , Estructura Terciaria de Proteína/fisiología , Proteínas/metabolismo , Biofisica , TermodinámicaRESUMEN
Regulated intramembrane proteolysis (RIP) is a critical mechanism for intercellular communication and regulates the function of membrane proteins through sequential proteolysis. RIP typically starts with ectodomain shedding of membrane proteins by extracellular membrane-bound proteases followed by intramembrane proteolysis of the resulting membrane-tethered fragment. However, for the majority of RIP proteases the corresponding substrates and thus, their functions, remain unknown. Proteome-wide identification of RIP protease substrates is possible by mass spectrometry-based quantitative comparison of RIP substrates or their cleavage products between different biological states. However, this requires quantification of peptides from only the ectodomain or cytoplasmic domain. Current analysis software does not allow matching peptides to either domain. Here we present the QARIP (Quantitative Analysis of Regulated Intramembrane Proteolysis) web server which matches identified peptides to the protein transmembrane topology. QARIP allows determination of quantitative ratios separately for the topological domains (cytoplasmic, ectodomain) of a given protein and is thus a powerful tool for quality control, improvement of quantitative ratios and identification of novel substrates in proteomic RIP datasets. To our knowledge, the QARIP web server is the first tool directly addressing the phenomenon of RIP. The web server is available at http://webclu.bio.wzw.tum.de/qarip/. This website is free and open to all users and there is no login requirement.
Asunto(s)
Proteínas de la Membrana/metabolismo , Programas Informáticos , Ácido Aspártico Endopeptidasas/metabolismo , Células HEK293 , Humanos , Internet , Espectrometría de Masas , Proteínas de la Membrana/química , Péptidos/análisis , Estructura Terciaria de Proteína , Proteolisis , ProteómicaRESUMEN
Over the last 5 years proteogenomics (using mass spectroscopy to identify proteins predicted from genomic sequences) has emerged as a promising approach to the high-throughput identification of protein N-termini, which remains a problem in genome annotation. Comparison of the experimentally determined N-termini with those predicted by sequence analysis tools allows identification of the signal peptides and therefore conclusions on the cytoplasmic or extracytoplasmic (periplasmic or extracellular) localization of the respective proteins. We present here the results of a proteogenomic study of the signal peptides in Escherichia coliâ K-12 and compare its results with the available experimental data and predictions by such software tools as SignalP and Phobius. A single proteogenomics experiment recovered more than a third of all signal peptides that had been experimentally determined during the past three decades and confirmed at least 31 additional signal peptides, mostly in the known exported proteins, which had been previously predicted but not validated. The filtering of putative signal peptides for the peptide length and the presence of an eight-residue hydrophobic patch and a typical signal peptidase cleavage site proved sufficient to eliminate the false-positive hits. Surprisingly, the results of this proteogenomics study, as well as a re-analysis of the E. coli genome with the latest version of SignalP program, show that the fraction of proteins containing signal peptides is only about 10%, or half of previous estimates.
Asunto(s)
Escherichia coli K12/química , Péptidos/análisis , Señales de Clasificación de Proteína , Proteoma , Secuencia de Bases , Espectrometría de Masas , Proteínas de la Membrana/análisis , Mapeo Peptídico , Péptidos/clasificación , Proteínas/análisis , Proteínas/clasificación , Análisis de Secuencia , Serina Endopeptidasas/análisis , Programas InformáticosRESUMEN
The molecular toxicity of the uranyl ion (UO22+) in living cells is primarily determined by its high affinity to both native and potential metal-binding sites that commonly occur in the structure of biomolecules. Recent advances in computational and experimental research have shed light on the structural properties and functional impacts of uranyl binding to proteins, organic ligands, nucleic acids, and their complexes. In the present work, we report the results of the computational investigation of the uranyl-mediated loss of DNA-binding activity of PARP-1, a eukaryotic enzyme that participates in DNA repair, cell differentiation, and the induction of inflammation. The latest experimental studies have shown that the uranyl ion directly interacts with its DNA-binding subdomains, zinc fingers Zn1 and Zn2, and alters their tertiary structure. Here, we propose an atomistic mechanism underlying this process and compute the free energy change along the suggested pathway. Our Quantum Mechanics/Molecular Mechanics (QM/MM) simulations of the Zn2-UO22+ complex indicate that the uranyl ion replaces zinc in its native binding site. However, the resulting state is destroyed due to the spontaneous internal hydrolysis of the U-Cys162 coordination bond. Despite the enthalpy of hydrolysis being +2.8 kcal/mol, the overall reaction free energy change is -0.6 kcal/mol, which is attributed to the loss of domain's native tertiary structure originally maintained by a zinc ion. The subsequent reorganization of the binding site includes the association of the uranyl ion with the Glu190/Asp191 acidic cluster and significant perturbations in the domain's tertiary structure driven by a further decrease in the free energy by 6.8 kcal/mol. The disruption of the DNA-binding interface revealed in our study is consistent with previous experimental findings and explains the loss of PARP-like zinc fingers' affinity for nucleic acids.
Asunto(s)
Ácidos Nucleicos , Poli(ADP-Ribosa) Polimerasa-1 , Simulación por Computador , Dominios Proteicos , ADNRESUMEN
AlphaFold changed the field of structural biology by achieving three-dimensional (3D) structure prediction from protein sequence at experimental quality. The astounding success even led to claims that the protein folding problem is "solved". However, protein folding problem is more than just structure prediction from sequence. Presently, it is unknown if the AlphaFold-triggered revolution could help to solve other problems related to protein folding. Here we assay the ability of AlphaFold to predict the impact of single mutations on protein stability (ΔΔG) and function. To study the question we extracted the pLDDT and
Asunto(s)
Pliegue de Proteína , Proteínas , Proteínas/química , Mutación , Secuencia de Aminoácidos , Estabilidad ProteicaRESUMEN
The ability of protein chains to spontaneously form their three-dimensional structures is a long-standing mystery in molecular biology. The most conceptual aspect of this mystery is how the protein chain can find its native, "working" spatial structure (which, for not too big protein chains, corresponds to the global free energy minimum) in a biologically reasonable time, without exhaustive enumeration of all possible conformations, which would take billions of years. This is the so-called "Levinthal's paradox." In this review, we discuss the key ideas and discoveries leading to the current understanding of protein folding kinetics, including folding landscapes and funnels, free energy barriers at the folding/unfolding pathways, and the solution of Levinthal's paradox. A special role here is played by the "all-or-none" phase transition occurring at protein folding and unfolding and by the point of thermodynamic (and kinetic) equilibrium between the "native" and the "unfolded" phases of the protein chain (where the theory obtains the simplest form). The modern theory provides an understanding of key features of protein folding and, in good agreement with experiments, it (i) outlines the chain length-dependent range of protein folding times, (ii) predicts the observed maximal size of "foldable" proteins and domains. Besides, it predicts the maximal size of proteins and domains that fold under solely thermodynamic (rather than kinetic) control. Complementarily, a theoretical analysis of the number of possible protein folding patterns, performed at the level of formation and assembly of secondary structures, correctly outlines the upper limit of protein folding times.
RESUMEN
We propose here KineticDB, a systematically compiled database of protein folding kinetics, which contains about 90 unique proteins. The main goal of the KineticDB is to provide users with a diverse set of protein folding rates determined experimentally. The search for determinants of protein folding is still in progress, aimed at obtaining a new understanding of the folding process. Comparison with experimental protein folding rates has been the main tool for validation of both theoretical models and empirical relationships during the last 10 years. It is, therefore, necessary to provide a researcher with as much data as possible in a simple and easy-to-use way. At present, the KineticDB contains the results of folding kinetics measurements of single-domain proteins and separate protein domains as well as short peptides without disulfide bonds. It includes data on about 90 unique proteins and many mutants that have been systematically accumulated over the last 10 years and is the largest collection of protein folding kinetic data presented as a database. The KineticDB is available at http://kineticdb.protres.ru/db/index.pl.
Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Cinética , Péptidos/química , Pliegue de ProteínaRESUMEN
"How do proteins fold?" Researchers have been studying different aspects of this question for more than 50 years. The most conceptual aspect of the problem is how protein can find the global free energy minimum in a biologically reasonable time, without exhaustive enumeration of all possible conformations, the so-called "Levinthal's paradox." Less conceptual but still critical are aspects about factors defining folding times of particular proteins and about perspectives of machine learning for their prediction. We will discuss in this review the key ideas and discoveries leading to the current understanding of folding kinetics, including the solution of Levinthal's paradox, as well as the current state of the art in the prediction of protein folding times.
Asunto(s)
Pliegue de Proteína , Proteínas/química , Proteínas/metabolismo , Entropía , Cinética , Conformación Proteica , TermodinámicaRESUMEN
We have demonstrated that, among proteins of the same size, alpha/beta proteins have on the average a greater number of contacts per residue due to their more compact (more "spherical") structure, rather than due to tighter packing. We have examined the relationship between the average number of contacts per residue and folding rates in globular proteins according to general protein structural class (all-alpha, all-beta, alpha/beta, alpha+beta). Our analysis demonstrates that alpha/beta proteins have both the greatest number of contacts and the slowest folding rates in comparison to proteins from the other structural classes. Because alpha/beta proteins are also known to be the oldest proteins, it can be suggested that proteins have evolved to pack more quickly and into looser structures.
Asunto(s)
Pliegue de Proteína , Proteínas/química , Conformación ProteicaRESUMEN
We have demonstrated here that protein compactness, which we define as the ratio of the accessible surface area of a protein to that of the ideal sphere of the same volume, is one of the factors determining the mechanism of protein folding. Proteins with multi-state kinetics, on average, are more compact (compactness is 1.49+/-0.02 for proteins within the size range of 101-151 amino acid residues) than proteins with two-state kinetics (compactness is 1.59+/-0.03 for proteins within the same size range of 101-151 amino acid residues). We have shown that compactness for homologous proteins can explain both the difference in folding rates and the difference in folding mechanisms.
Asunto(s)
Modelos Químicos , Modelos Moleculares , Pliegue de Proteína , Proteínas/química , Proteínas/ultraestructura , Simulación por Computador , Conformación ProteicaRESUMEN
The first part of this paper contains an overview of protein structures, their spontaneous formation ("folding"), and the thermodynamic and kinetic aspects of this phenomenon, as revealed by in vitro experiments. It is stressed that universal features of folding are observed near the point of thermodynamic equilibrium between the native and denatured states of the protein. Here the "two-state" ("denatured state" <--> "native state") transition proceeds without accumulation of metastable intermediates, but includes only the unstable "transition state". This state, which is the most unstable in the folding pathway, and its structured core (a "nucleus") are distinguished by their essential influence on the folding/unfolding kinetics. In the second part of the paper, a theory of protein folding rates and related phenomena is presented. First, it is shown that the protein size determines the range of a protein's folding rates in the vicinity of the point of thermodynamic equilibrium between the native and denatured states of the protein. Then, we present methods for calculating folding and unfolding rates of globular proteins from their sizes, stabilities and either 3D structures or amino acid sequences. Finally, we show that the same theory outlines the location of the protein folding nucleus (i.e., the structured part of the transition state) in reasonable agreement with experimental data.