RESUMO
We describe the machine learning tool that we applied in the CAGI 6 experiment to predict whether single residue mutations in proteins are deleterious or benign. This tool was trained using only single sequences, i.e., without multiple sequence alignments or structural information. Instead, we used global characterizations of the protein sequence. Training and testing data for human gene mutations was obtained from ClinVar (ncbi.nlm.nih.gov/pub/ClinVar/), and for non-human gene mutations from Uniprot (www.uniprot.org). Testing was done on post-training data from ClinVar. This testing yielded high AUC and Matthews correlation coefficient (MCC) for well trained examples but low generalizability. For genes with either sparse or unbalanced training data, the prediction accuracy is poor. The resulting prediction server is available online at http://www.mamiris.com/Shoni.cagi6.
Assuntos
Aprendizado de Máquina , Mutação de Sentido Incorreto , Humanos , Mutação de Sentido Incorreto/genética , Software , Biologia Computacional/métodos , Proteínas/genéticaRESUMO
Kinesin-mediated transport along microtubules is critical for axon development and health. Mutations in the kinesin Kif21a, or the microtubule subunit ß-tubulin, inhibit axon growth and/or maintenance resulting in the eye-movement disorder congenital fibrosis of the extraocular muscles (CFEOM). While most examined CFEOM-causing ß-tubulin mutations inhibit kinesin-microtubule interactions, Kif21a mutations activate the motor protein. These contrasting observations have led to opposed models of inhibited or hyperactive Kif21a in CFEOM. We show that, contrary to other CFEOM-causing ß-tubulin mutations, R380C enhances kinesin activity. Expression of ß-tubulin-R380C increases kinesin-mediated peroxisome transport in S2 cells. The binding frequency, percent motile engagements, run length and plus-end dwell time of Kif21a are also elevated on ß-tubulin-R380C compared with wildtype microtubules in vitro. This conserved effect persists across tubulins from multiple species and kinesins from different families. The enhanced activity is independent of tail-mediated kinesin autoinhibition and thus utilizes a mechanism distinct from CFEOM-causing Kif21a mutations. Using molecular dynamics, we visualize how ß-tubulin-R380C allosterically alters critical structural elements within the kinesin motor domain, suggesting a basis for the enhanced motility. These findings resolve the disparate models and confirm that inhibited or increased kinesin activity can both contribute to CFEOM. They also demonstrate the microtubule's role in regulating kinesins and highlight the importance of balanced transport for cellular and organismal health.
Assuntos
Oftalmoplegia , Tubulina (Proteína) , Humanos , Tubulina (Proteína)/metabolismo , Cinesinas/metabolismo , Oftalmoplegia/genética , Oftalmoplegia/metabolismo , Mutação/genética , Microtúbulos/metabolismo , Atividade MotoraRESUMO
Understanding protein sequences and how they relate to the functions of proteins is extremely important. One of the most basic operations in bioinformatics is sequence alignment and usually the first things learned from these are which positions are the most conserved and often these are critical parts of the structure, such as enzyme active site residues. In addition, the contact pairs in a protein usually correspond closely to the correlations between residue positions in the multiple sequence alignment, and these usually change in a systematic and coordinated way, if one position changes then the other member of the pair also changes to compensate. In the present work, these correlated pairs are taken as anchor points for a new type of sequence alignment. The main advantage of the method here is its combining the remote homolog detection from our method PROST with pairwise sequence substitutions in the rigorous method from Kleinjung et al. We show a few examples of some resulting sequence alignments, and how they can lead to improvements in alignments for function, even for a disordered protein.
RESUMO
Motivation: Presenting the integrated results of bioinformatics research can be challenging and requires sophisticated visualization components, which can be time-consuming to develop. This article presents a new way to effectively communicate research findings. Results: We have developed a static web page generator, JSONWP, which is specifically designed for protein bioinformatics research. Utilizing React (a JavaScript library used to build interactive and dynamic user interfaces for web applications), we have integrated publicly available bioinformatics visualization components to provide standardized access to these components. JSON (or JavaScript Object Notation, is a lightweight textual data format often used to structure and exchange information between different software tools.) is used as the input source due to its ability to represent nearly all types of data using key and value pairs. This allows researchers to use their preferred programming language to create a JSON representation, which can then be converted into a website by JSONWP. No server or domain is required to host the website, as only the publicly accessible JSON file is required. Conclusions: Overall, JSONWP provides a useful new tool for bioinformatics researchers to effectively communicate their findings. The open-source implementation is located at https://github.com/MesihK/react-json-wpbuilder, and the tool can be used at jsonwp.onrender.com.
RESUMO
Cadherin intermolecular interactions are critical for cell-cell adhesion and play essential roles in tissue formation and the maintenance of tissue structures. In this study, we focus on E-cadherin, a classical cadherin that connects epithelial cells, to understand how they interact in cis and trans conformations when attached to the same cell or opposing cells. We employ coevolutionary sequence analysis and molecular dynamics simulations to confirm previously known interaction sites as well as to identify new interaction sites. The sequence coevolutionary results yield a surprising result indicating that there are no strongly favored intermolecular interaction sites, which is unusual and suggests that many interaction sites may be possible, with none being strongly preferred over others. By using molecular dynamics, we test the persistence of these interactions and how they facilitate adhesion. We build several types of cadherin assemblages, with different numbers and combinations of cis and trans interfaces to understand how these conformations act to facilitate adhesion. Our results suggest that, in addition to the established interaction sites on the EC1 and EC2 domains, an additional plausible cis interface at the EC3-EC5 domain exists. Furthermore, we identify specific mutations at cis/trans binding sites that impair adhesion within E-cadherin assemblages.
Assuntos
Caderinas , Sítios de Ligação , Caderinas/química , Caderinas/metabolismo , Adesão Celular , Mutação , Ligação Proteica , Animais , CamundongosRESUMO
MOTIVATION: Allostery enables changes to the dynamic behavior of a protein at distant positions induced by binding. Here, we present APOP, a new allosteric pocket prediction method, which perturbs the pockets formed in the structure by stiffening pairwise interactions in the elastic network across the pocket, to emulate ligand binding. Ranking the pockets based on the shifts in the global mode frequencies, as well as their mean local hydrophobicities, leads to high prediction success when tested on a dataset of allosteric proteins, composed of both monomers and multimeric assemblages. RESULTS: Out of the 104 test cases, APOP predicts known allosteric pockets for 92 within the top 3 rank out of multiple pockets available in the protein. In addition, we demonstrate that APOP can also find new alternative allosteric pockets in proteins. Particularly interesting findings are the discovery of previously overlooked large pockets located in the centers of many protein biological assemblages; binding of ligands at these sites would likely be particularly effective in changing the protein's global dynamics. AVAILABILITY AND IMPLEMENTATION: APOP is freely available as an open-source code (https://github.com/Ambuj-UF/APOP) and as a web server at https://apop.bb.iastate.edu/.
Assuntos
Proteínas , Software , Proteínas/química , Ligantes , Ligação Proteica , Sítios de Ligação , Conformação Proteica , Sítio AlostéricoRESUMO
There are several hundred million protein sequences, but the relationships among them are not fully available from existing homolog detection methods. There is an essential need for an improved method to push homolog detection to lower levels of sequence identity. The method used here relies on a language model to represent proteins numerically in a matrix (an embedding) and uses discrete cosine transforms to compress the data to extract the most essential part, significantly reducing the data size. This PRotein Ortholog Search Tool (PROST) is significantly faster with linear runtimes, and most importantly, computes the distances between pairs of protein sequences to yield homologs at significantly lower levels of sequence identity than previously. The extent of allosteric effects in proteins points out the importance of global aspects of structure and sequence. PROST excels at global homology detection but not at detecting local homologs. Results are validated by strong similarities between the corresponding pairs of structures. The number of remote homologs detected increased significantly and pushes the effective sequence matches more deeply into the twilight zone. Human protein sequences presently having no assigned function now find significant numbers of putative homologs for 93% of cases and structurally verified assigned functions for 76.4% of these cases. The data compression enables massive searches for homologs with short search times while yielding significant gains in the numbers of remote homologs detected. The method is sufficiently efficient to permit whole-genome/proteome comparisons. The PROST web server is accessible at https://mesihk.github.io/prost.
Assuntos
Compressão de Dados , Proteoma , Humanos , Sequência de Aminoácidos , Ferramenta de Busca , Genoma , Bases de Dados de ProteínasRESUMO
The sequence correlations within a protein multiple sequence alignment are routinely being used to predict contacts within its structure, but here we point out that these data can also be used to predict a protein's dynamics directly. The elastic network protein dynamics models rely directly upon the contacts, and the normal modes of motion are obtained from the decomposition of the inverse of the contact map. To make the direct connection between sequence and dynamics, it is necessary to apply coarse-graining to the structure at the level of one point per amino acid, which has often been done, and protein coarse-grained dynamics from elastic network models has been highly successful, particularly in representing the large-scale motions of proteins that usually relate closely to their functions. The interesting implication of this is that it is not necessary to know the structure itself to obtain its dynamics and instead to use the sequence information directly to obtain the dynamics.
Assuntos
Aminoácidos , Proteínas , Conformação Proteica , Modelos Moleculares , Proteínas/química , Movimento (Física)RESUMO
A fast, simple, yet robust method to calculate protein entropy from a single protein structure is presented here. The focus is on the atomic packing details, which are calculated by combining Voronoi diagrams and Delaunay tessellations. Even though the method is simple, the entropies computed exhibit an extremely high correlation with the entropies previously derived by other methods based on quasi-harmonic motions, quantum mechanics, and molecular dynamics simulations. These packing-based entropies account directly for the local freedom and provide entropy for any individual protein structure that could be used to compute free energies directly during simulations for the generation of more reliable trajectories and also for better evaluations of modeled protein structures. Physico-chemical properties of amino acids are compared with these packing entropies to uncover the relationships with the entropies of different residue types. A public packing entropy web server is provided at packing-entropy.bb.iastate.edu, and the application programing interface is available within the PACKMAN (https://github.com/Pranavkhade/PACKMAN) package.
RESUMO
SUMMARY: A new dynamic community identifier (DCI) is presented that relies upon protein residue dynamic cross-correlations generated by Gaussian elastic network models to identify those residue clusters exhibiting motions within a protein. A number of examples of communities are shown for diverse proteins, including GPCRs. It is a tool that can immediately simplify and clarify the most essential functional moving parts of any given protein. Proteins usually can be subdivided into groups of residues that move as communities. These are usually densely packed local sub-structures, but in some cases can be physically distant residues identified to be within the same community. The set of these communities for each protein are the moving parts. The ways in which these are organized overall can aid in understanding many aspects of functional dynamics and allostery. DCI enables a more direct understanding of functions including enzyme activity, action across membranes and changes in the community structure from mutations or ligand binding. The DCI server is freely available on a web site (https://dci.bb.iastate.edu/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Proteínas de Grãos , Movimento (Física) , Distribuição Normal , Conformação Proteica , Proteínas/químicaRESUMO
Studying the interactions within protein structures can inform about the details of how proteins of various types interact and aggregate. Empirical contact potentials have proven to be extremely important in the evaluation of individual modeled protein structures, but have found few applications to protein-protein interactions. In part, this is caused by a lack of properly formulated potentials with a proper reference state. Since the comparisons are made between different bound structures, the proper reference state should take into account other contacts. Therefore, a preferred reference state should be defined with respect to a given residue type interacting with an average residue instead of interacting with solvent as typically is used in derivation of statistical contact potentials. Here, a two-stage procedure for generating and evaluating interacting protein pairs is described, and an example of E-cadherin interactions is shown.
Assuntos
Caderinas , Sítios de Ligação , Caderinas/metabolismo , Interações Hidrofóbicas e Hidrofílicas , Ligação Proteica , SolventesRESUMO
PACKMAN-molecule is a Structural Bioinformatics toolbox in the form of an Application Programming Interface that contains several utilities that can be used for structural bioinformatics applications. It has already been used in several applications, and its added features and unique object hierarchy make it readily extensible, feature-rich and user-friendly. The tutorial for it is available at: https://py-packman.readthedocs.io/en/latest/tutorials/molecule.html. Availability and implementation: PACKMAN-Molecule is freely available with an MIT license on GitHub at https://github.com/Pranavkhade/PACKMAN.
RESUMO
Drug extrusion through molecular efflux pumps is an important mechanism for the survival of many pathogenic bacteria by removing drugs, providing multidrug resistance (MDR). Understanding molecular mechanisms for drug extrusion in multidrug efflux pumps is important for the development of new antiresistance drugs. The AbgT family of transporters involved in the folic acid biosynthesis pathway represents one such important efflux pump system. In addition to the transport of the folic acid precursor p-amino benzoic acid (PABA), members of this family are involved in the efflux of several sulfa drugs, conferring drug resistance to the bacteria. With the availability of structures for two members of this family (YdaH and MtrF), we investigate molecular pathways for transport of PABA and a sulfa drug (sulfamethazine) particularly for the YdaH transporter using steered molecular dynamics. Our analyses reveal the probable ligand migration pathways through the transporter, which also identifies key residues along the transport pathway. In addition, simulations using both PABA and sulfamethazine show how the protein is able to transport ligands of different shapes and sizes out of the pathogen. Our observations confirm previously reported functional residues for transport along the pathways by which YdaH transporters achieve antibiotic resistance to shuttle drugs out of the cells.
Assuntos
Proteínas de Membrana Transportadoras , Preparações Farmacêuticas , Antibacterianos/farmacologia , Bactérias/metabolismo , Proteínas de Bactérias/metabolismo , Resistência a MedicamentosRESUMO
Hinge motions are essential for many protein functions, and their dynamics are important to understand underlying biological mechanisms. The ways that these motions are represented by various computational methods differ significantly. By focusing on a specific class of motion, we have developed a new hinge-domain anisotropic network model (hdANM) that is based on the prior identification of flexible hinges and rigid domains in the protein structure and the subsequent generation of global hinge motions. This yields a set of motions in which the relative translations and rotations of the rigid domains are modulated and controlled by the deformation of the flexible hinges, leading to a more restricted, specific view of these motions. hdANM is the first model, to our knowledge, that combines information about protein hinges and domains to model the characteristic hinge motions of a protein. The motions predicted with this new elastic network model provide important conceptual advantages for understanding the underlying biological mechanisms. As a matter of fact, the generated hinge movements are found to resemble the expected mechanisms required for the biological functions of diverse proteins. Another advantage of this model is that the domain-level coarse graining makes it significantly more computationally efficient, enabling the generation of hinge motions within even the largest molecular assemblies, such as those from cryo-electron microscopy. hdANM is also comprehensive as it can perform in the same way as the well-known protein dynamics models (anisotropic network model, rotations-translations of blocks, and nonlinear rigid block normal mode analysis), depending on the definition of flexible and rigid parts in the protein structure and on whether the motions are extrapolated in a linear or nonlinear fashion. Furthermore, our results indicate that hdANM produces more realistic motions as compared to the anisotropic network model. hdANM is an open-source software, freely available, and hosted on a user-friendly website.
Assuntos
Algoritmos , Proteínas , Simulação por Computador , Microscopia Crioeletrônica , Modelos Moleculares , Conformação ProteicaRESUMO
Allostery is usually considered to be a mechanism for transmission of signals associated with physical or dynamic changes in some part of a protein. Here, we investigate the changes in fluctuations across the protein upon ligand binding based on the fluctuations computed with elastic network models. These results suggest that binding reduces the fluctuations at the binding site but increases fluctuations at remote sites, but not to fully compensating extents. If there were complete conservation of entropy, then only the enthalpies of binding would matter and not the entropies; however this does not appear to be the case. Experimental evidence also suggests that energies and entropies of binding can compensate but that the extent of compensation varies widely from case to case. Our results do however always show transmission of an allosteric signal to distant locations where the fluctuations are increased. These fluctuations could be used to compute entropies to improve evaluations of the thermodynamics of binding. We also show the allosteric relationship between peptide binding in the GroEL trans-ring that leads directly to the release of GroES from the GroEL-GroES cis-ring. This finding provides an example of how calculating these changes to protein dynamics induced by the binding of an allosteric ligand can regulate protein function and mechanism.
RESUMO
Glycosyltransferases (GTs) are a large family of enzymes that add sugars to a broad range of acceptor substrates, including polysaccharides, proteins and lipids, by utilizing a wide variety of donor substrates in the form of activated sugars. Individual GTs have generally been considered to exhibit a high level of substrate specificity, but this has not been thoroughly investigated across the extremely large set of GTs. Here we investigate xyloglucan xylosyltransferase 1 (XXT1), a GT involved in the synthesis of the plant cell wall polysaccharide, xyloglucan. Xyloglucan has a glucan backbone, with initial side chain substitutions exclusively composed of xylose from uridine diphosphate (UDP)-xylose. While this conserved substitution pattern suggests a high substrate specificity for XXT1, our in vitro kinetic studies elucidate a more complex set of behavior. Kinetic studies demonstrate comparable kcat values for reactions with UDP-xylose and UDP-glucose, while reactions with UDP-arabinose and UDP-galactose are over 10-fold slower. Using kcat/KM as a measure of efficiency, UDP-xylose is 8-fold more efficient as a substrate than the next best alternative, UDP-glucose. To the best of our knowledge, we are the first to demonstrate that not all plant XXTs are highly substrate specific and some do show significant promiscuity in their in vitro reactions. Kinetic parameters alone likely do not explain the high substrate selectivity in planta, suggesting that there are additional control mechanisms operating during polysaccharide biosynthesis. Improved understanding of substrate specificity of the GTs will aid in protein engineering, development of diagnostic tools, and understanding of biological systems.
Assuntos
Glucanos/biossíntese , Pentosiltransferases/genética , Proteínas de Plantas/genética , Plantas/enzimologia , Glucanos/genética , Cinética , Pentosiltransferases/metabolismo , Proteínas de Plantas/metabolismo , Plantas/metabolismo , Especificidade por SubstratoRESUMO
Protein sequence matching presently fails to identify many structures that are highly similar, even when they are known to have the same function. The high packing densities in globular proteins lead to interdependent substitutions, which have not previously been considered for amino acid similarities. At present, sequence matching compares sequences based only upon the similarities of single amino acids, ignoring the fact that in densely packed protein, there are additional conservative substitutions representing exchanges between two interacting amino acids, such as a small-large pair changing to a large-small pair substitutions that are not individually so conservative. Here we show that including information for such pairs of substitutions yields improved sequence matches, and that these yield significant gains in the agreements between sequence alignments and structure matches of the same protein pair. The result shows sequence segments matched where structure segments are aligned. There are gains for all 2002 collected cases where the sequence alignments that were not previously congruent with the structure matches. Our results also demonstrate a significant gain in detecting homology for "twilight zone" protein sequences. The amino acid substitution metrics derived have many other potential applications, for annotations, protein design, mutagenesis design, and empirical potential derivation.
Assuntos
Algoritmos , Substituição de Aminoácidos , Aminoácidos/química , Proteínas/química , Sequência de Aminoácidos , Aminoácidos/metabolismo , Bases de Dados de Proteínas , Conjuntos de Dados como Assunto , Humanos , Modelos Moleculares , Engenharia de Proteínas/métodos , Proteínas/metabolismo , Alinhamento de Sequência , Homologia de Sequência de AminoácidosRESUMO
We have studied the ability of three types of neural networks to predict the closeness of a given protein model to the native structure associated with its sequence. We show that a partial combination of the Levenberg-Marquardt algorithm and the back-propagation algorithm produced the best results, giving the lowest error and largest Pearson correlation coefficient. We also find, as previous studies, that adding associative memory to a neural network improves its performance. Additionally, we find that the hybrid method we propose was the most robust in the sense that other configurations of it experienced less decline in comparison to the other methods. We find that the hybrid networks also undergo more fluctuations on the path to convergence. We propose that these fluctuations allow for better sampling. Overall we find it may be beneficial to treat different parts of a neural network with varied computational approaches during optimization.
Assuntos
Redes Neurais de Computação , Proteínas/química , AlgoritmosRESUMO
Proteins are the active players in performing essential molecular activities throughout biology, and their dynamics has been broadly demonstrated to relate to their mechanisms. The intrinsic fluctuations have often been used to represent their dynamics and then compared to the experimental B-factors. However, proteins do not move in a vacuum and their motions are modulated by solvent that can impose forces on the structure. In this paper, we introduce a new structural concept, which has been called the structural compliance, for the evaluation of the global and local deformability of the protein structure in response to intramolecular and solvent forces. Based on the application of pairwise pulling forces to a protein elastic network, this structural quantity has been computed and sometimes is even found to yield an improved correlation with the experimental B-factors, meaning that it may serve as a better metric for protein flexibility. The inverse of structural compliance, namely the structural stiffness, has also been defined, which shows a clear anticorrelation with the experimental data. Although the present applications are made to proteins, this approach can also be applied to other biomolecular structures such as RNA. This present study considers only elastic network models, but the approach could be applied further to conventional atomic molecular dynamics. Compliance is found to have a slightly better agreement with the experimental B-factors, perhaps reflecting its bias toward the effects of local perturbations, in contrast to mean square fluctuations. The code for calculating protein compliance and stiffness is freely accessible at https://jerniganlab.github.io/Software/PACKMAN/Tutorials/compliance.