Search | VHL Search Portal

1.

Application of artificial intelligence and machine learning techniques to the analysis of dynamic protein sequences.

Kombo, David C; LaMarche, Matthew J; Konkankit, Chilaluck C; Rackovsky, S.

Proteins ; 2024 May 29.

Article in English | MEDLINE | ID: mdl-38808365

ABSTRACT

We apply methods of Artificial Intelligence and Machine Learning to protein dynamic bioinformatics. We rewrite the sequences of a large protein data set, containing both folded and intrinsically disordered molecules, using a representation developed previously, which encodes the intrinsic dynamic properties of the naturally occurring amino acids. We Fourier analyze the resulting sequences. It is demonstrated that classification models built using several different supervised learning methods are able to successfully distinguish folded from intrinsically disordered proteins from sequence alone. It is further shown that the most important sequence property for this discrimination is the sequence mobility, which is the sequence averaged value of the residue-specific average alpha carbon B factor. This is in agreement with previous work, in which we have demonstrated the central role played by the sequence mobility in protein dynamic bioinformatics and biophysics. This finding opens a path to the application of dynamic bioinformatics, in combination with machine learning algorithms, to a range of significant biomedical problems.

2.

The structure of protein dynamic space.

Rackovsky, S; Scheraga, Harold A.

Proc Natl Acad Sci U S A ; 117(33): 19938-19942, 2020 08 18.

Article in English | MEDLINE | ID: mdl-32759212

ABSTRACT

We use a bioinformatic description of amino acid dynamic properties, based on residue-specific average B factors, to construct a dynamics-based, large-scale description of a space of protein sequences. We examine the relationship between that space and an independently constructed, structure-based space comprising the same sequences. It is demonstrated that structure and dynamics are only moderately correlated. It is further shown that helical proteins fall into two classes with very different structure-dynamics relationships. We suggest that dynamics in the two helical classes are dominated by distinctly different modes--pseudo-one-dimensional, localized helical modes in one case, and pseudo-three-dimensional (3D) global modes in the other. Sheet/barrel and mixed-α/ß proteins exhibit more conventional structure-dynamics relationships. It is found that the strongest correlation between structure and dynamic properties arises when the latter are represented by the sequence average of the dynamic index, which corresponds physically to the overall mobility of the protein. None of these results are accessible to bioinformatic methods hitherto available.

Subject(s)

Proteins/chemistry , Computational Biology , Protein Structure, Secondary

3.

The dynamic basis of structural order in proteins.

Konkankit, Chilaluck; Rackovsky, S.

Proteins ; 90(5): 1115-1118, 2022 05.

Article in English | MEDLINE | ID: mdl-34981860

ABSTRACT

We compare the sequences of folded and intrinsically disordered proteins (IDPs), using bioinformatic methods recently developed to study protein dynamic properties. We demonstrate that the two classes of sequences are organized in diametrically opposite ways with respect to long-length-scale dynamic properties. We further demonstrate a statistically significant difference between the amino acid compositions of folded and disordered proteins, which is expressed in dynamic properties. Our results indicate that the long-length-scale properties of sequences are critical in determining whether proteins are able to fold, and, more generally, that they are central to an understanding of protein physics. They further provide a physical basis for the empirically observed differences in amino acid composition between folded and IDPs.

Subject(s)

Intrinsically Disordered Proteins , Protein Folding , Amino Acids , Computational Biology , Intrinsically Disordered Proteins/chemistry , Protein Conformation

4.

Dynamic and conformational switching in proteins.

Scheraga, H A; Rackovsky, S.

Biopolymers ; 112(10): e23411, 2021 Oct.

Article in English | MEDLINE | ID: mdl-33270217

ABSTRACT

Using bioinformatic methods for treating protein dynamics, developed in earlier work, we study the relationship between sequence mobility and dynamics in proteins. It is shown that sequence mobility drives a transition between two dynamic regimes in proteins, and that the specific details of this transition differ qualitatively between α-helical proteins and those in other structural classes. We examine the possibility that conformational switching is related to dynamic switching, by considering a specific system of sequences which exhibit the switching phenomenon. It is shown that a relationship between dynamic and conformational switching is entirely plausible.

Subject(s)

Computational Biology , Proteins , Protein Conformation , Protein Structure, Secondary

5.

Sequence-, structure-, and dynamics-based comparisons of structurally homologous CheY-like proteins.

He, Yi; Maisuradze, Gia G; Yin, Yanping; Kachlishvili, Khatuna; Rackovsky, S; Scheraga, Harold A.

Proc Natl Acad Sci U S A ; 114(7): 1578-1583, 2017 02 14.

Article in English | MEDLINE | ID: mdl-28143938

ABSTRACT

We recently introduced a physically based approach to sequence comparison, the property factor method (PFM). In the present work, we apply the PFM approach to the study of a challenging set of sequences-the bacterial chemotaxis protein CheY, the N-terminal receiver domain of the nitrogen regulation protein NT-NtrC, and the sporulation response regulator Spo0F. These are all response regulators involved in signal transduction. Despite functional similarity and structural homology, they exhibit low sequence identity. PFM sequence comparison demonstrates a statistically significant qualitative difference between the sequence of CheY and those of the other two proteins that is not found using conventional alignment methods. This difference is shown to be consonant with structural characteristics, using distance matrix comparisons. We also demonstrate that residues participating strongly in native contacts during unfolding are distributed differently in CheY than in the other two proteins. The PFM result is also in accord with dynamic simulation results of several types. Molecular dynamics simulations of all three proteins were carried out at several temperatures, and it is shown that the dynamics of CheY are predicted to differ from those of NT-NtrC and Spo0F. The predicted dynamic properties of the three proteins are in good agreement with experimentally determined B factors and with fluctuations predicted by the Gaussian network model. We pinpoint the differences between the PFM and traditional sequence comparisons and discuss the informatic basis for the ability of the PFM approach to detect physical differences between these sequences that are not apparent from traditional alignment-based comparison.

Subject(s)

Bacterial Proteins/genetics , Methyl-Accepting Chemotaxis Proteins/genetics , Sequence Alignment/methods , Signal Transduction/genetics , Amino Acid Sequence , Bacterial Proteins/chemistry , Bacterial Proteins/metabolism , Binding Sites/genetics , Computational Biology/methods , Methyl-Accepting Chemotaxis Proteins/chemistry , Methyl-Accepting Chemotaxis Proteins/metabolism , Models, Molecular , Protein Domains , Sequence Homology, Amino Acid

6.

Sequence-specific dynamic information in proteins.

Scheraga, H A; Rackovsky, S.

Proteins ; 87(10): 799-804, 2019 10.

Article in English | MEDLINE | ID: mdl-31134683

ABSTRACT

We examine the local and global properties of the average B-factor, ãBã, as a residue-specific indicator of protein dynamic characteristics. It has been shown that values of ãBã for the 20 amino acids differ in a statistically significant manner, and that, while strongly determined by the static physical properties of amino acids, they also encode averaged information about the influence of global fold on single-residue dynamics. Therefore, complete sequences of amino acids also encode fold-related global dynamic information, in addition to the local information that arises from static physical properties. We show that the relative magnitudes of these two contributions can be determined using Fourier methods, which represent the global properties of the sequences. It has also been shown that the behavior of Fourier components of ãBã differs, with very high statistical significance, between structural groups, and that this information is not available from a comparable analysis of static amino acid properties.

Subject(s)

Algorithms , Amino Acids/chemistry , Computational Biology/methods , Proteins/chemistry , Sequence Analysis, Protein/methods , Amino Acids/analysis , Humans , Protein Conformation , Protein Domains , Protein Folding , Proteins/analysis

7.

Global informatics and physical property selection in protein sequences.

Scheraga, Harold A; Rackovsky, S.

Proc Natl Acad Sci U S A ; 113(7): 1808-10, 2016 Feb 16.

Article in English | MEDLINE | ID: mdl-26831093

ABSTRACT

The degree of informatic independence between the physical properties of amino acids as encoded in actual protein sequences is calculated. It is shown that no physical property can be identified that carries significantly less information than others and that the information overlap between different properties and different length scales along the sequence is essentially zero. These observations suggest that bioinformatic models based on arbitrarily selected sets of physical properties are inherently deficient.

Subject(s)

Computational Biology , Proteins/chemistry , Amino Acid Sequence , Fourier Analysis

8.

Alternative approach to protein structure prediction based on sequential similarity of physical properties.

He, Yi; Rackovsky, S; Yin, Yanping; Scheraga, Harold A.

Proc Natl Acad Sci U S A ; 112(16): 5029-32, 2015 Apr 21.

Article in English | MEDLINE | ID: mdl-25848034

ABSTRACT

The relationship between protein sequence and structure arises entirely from amino acid physical properties. An alternative method is therefore proposed to identify homologs in which residue equivalence is based exclusively on the pairwise physical property similarities of sequences. This approach, the property factor method (PFM), is entirely different from those in current use. A comparison is made between our method and PSI BLAST. We demonstrate that traditionally defined sequence similarity can be very low for pairs of sequences (which therefore cannot be identified using PSI BLAST), but similarity of physical property distributions results in almost identical 3D structures. The performance of PFM is shown to be better than that of PSI BLAST when sequence matching is comparable, based on a comparison using targets from CASP10 (89 targets) and CASP11 (51 targets). It is also shown that PFM outperforms PSI BLAST in informatically challenging targets.

Subject(s)

Computational Biology/methods , Physical Phenomena , Proteins/chemistry , Sequence Homology, Amino Acid , Amino Acid Sequence , Models, Molecular , Molecular Sequence Data

9.

Homolog detection using global sequence properties suggests an alternate view of structural encoding in protein sequences.

Scheraga, Harold A; Rackovsky, S.

Proc Natl Acad Sci U S A ; 111(14): 5225-9, 2014 Apr 08.

Article in English | MEDLINE | ID: mdl-24706836

ABSTRACT

We show that a Fourier-based sequence distance function is able to identify structural homologs of target sequences with high accuracy. It is shown that Fourier distances correlate very strongly with independently determined structural distances between molecules, a property of the method that is not attainable using conventional representations. It is further shown that the ability of the Fourier approach to identify protein folds is statistically far in excess of random expectation. It is then shown that, in actual searches for structural homologs of selected target sequences, the Fourier approach gives excellent results. On the basis of these results, we suggest that the global information detected by the Fourier representation is an essential feature of structure encoding in protein sequences and a key to structural homology detection.

Subject(s)

Proteins/chemistry , Protein Conformation , Protein Folding

10.

Nonlinearities in protein space limit the utility of informatics in protein biophysics.

Rackovsky, S.

Proteins ; 83(11): 1923-8, 2015 Nov.

Article in English | MEDLINE | ID: mdl-26315852

ABSTRACT

We examine the utility of informatic-based methods in computational protein biophysics. To do so, we use newly developed metric functions to define completely independent sequence and structure spaces for a large database of proteins. By investigating the relationship between these spaces, we demonstrate quantitatively the limits of knowledge-based correlation between the sequences and structures of proteins. It is shown that there are well-defined, nonlinear regions of protein space in which dissimilar structures map onto similar sequences (the conformational switch), and dissimilar sequences map onto similar structures (remote homology). These nonlinearities are shown to be quite common-almost half the proteins in our database fall into one or the other of these two regions. They are not anomalies, but rather intrinsic properties of structural encoding in amino acid sequences. It follows that extreme care must be exercised in using bioinformatic data as a basis for computational structure prediction. The implications of these results for protein evolution are examined.

Subject(s)

Amino Acid Sequence , Computational Biology/methods , Protein Conformation , Proteins/chemistry , Biophysical Phenomena , Databases, Protein , Fourier Analysis , Sequence Homology, Amino Acid

11.

Sequence determinants of protein architecture.

Rackovsky, S.

Proteins ; 81(10): 1681-5, 2013 Oct.

Article in English | MEDLINE | ID: mdl-23720385

ABSTRACT

Delineation of the relationship between sequence and structure in proteins has proven elusive. Most studies of this problem use alignment methods and other approaches based on the characteristics of individual residues. It is demonstrated herein that the sequence-structure relationship is determined in significant part by global characteristics of sequence organization. Information encoded in complete sequences is required to distinguish proteins in different architectural groups. It is found that the statistically significant differences between sequences encoding different architectures are encoded in a surprisingly small set of low-wave-number sequence periodicities. It would therefore appear that unexpected simplicity in an appropriately defined Fourier space may be an inherent characteristic of the sequences of folded proteins.

Subject(s)

Protein Conformation , Protein Folding , Proteins , Sequence Analysis, Protein/methods , Computational Biology/methods , Databases, Factual , Fourier Analysis , Proteins/chemistry , Proteins/metabolism

12.

Global characteristics of protein sequences and their implications.

Rackovsky, S.

Proc Natl Acad Sci U S A ; 107(19): 8623-6, 2010 May 11.

Article in English | MEDLINE | ID: mdl-20421501

ABSTRACT

Computational studies of the relationships between protein sequence, structure, and folding have traditionally relied on purely local sequence representations. Here we show that global representations, on the basis of parameters that encode information about complete sequences, contain otherwise inaccessible information about the organization of sequences. By studying the spectral properties of these parameters, we demonstrate that amino acid physical properties fall into two distinct classes. One class is comprised of properties that favor sequentially localized interaction clusters. The other class is comprised of properties that favor globally distributed interactions. This observation provides a bridge between two classic models of protein folding-the collapse model and the nucleation model-and provides a basis for understanding how any degree of intermediacy between these two extremes can occur.

Subject(s)

Proteins/chemistry , Sequence Analysis, Protein , Amino Acid Sequence , Molecular Sequence Data

13.

Global Survey of Protein Dynamic Properties.

Konkankit, Chilaluck C; Rackovsky, S.

J Phys Chem B ; 127(27): 6073-6077, 2023 07 13.

Article in English | MEDLINE | ID: mdl-37368985

ABSTRACT

Using tools developed to study the dynamic bioinformatics of proteins, we are able to study the dynamic characteristics of very large numbers of protein sequences simultaneously. We study herein the distribution of protein sequences in a space determined by sequence mobility. It is shown that there are statistically significant differences in mobility distribution between folded sequences of different structural classes and between those and sequences of intrinsically disordered proteins. It is also shown that the several regions of mobility space differ significantly with respect to structural makeup. Helical proteins are shown to have distinctive dynamic characteristics at both extremes of the mobility spectrum.

Subject(s)

Intrinsically Disordered Proteins , Intrinsically Disordered Proteins/chemistry , Amino Acid Sequence , Protein Conformation , Protein Folding

14.

Sequence physical properties encode the global organization of protein structure space.

Rackovsky, S.

Proc Natl Acad Sci U S A ; 106(34): 14345-8, 2009 Aug 25.

Article in English | MEDLINE | ID: mdl-19706520

ABSTRACT

It is demonstrated that, properly represented, the amino acid composition of protein sequences contains the information necessary to delineate the global properties of protein structure space. A numerical representation of amino acid sequence in terms of a set of property factors is used, and the values of those property factors are averaged over individual sequences and then over sets of sequences belonging to structurally defined groups. These sequence sets then can be viewed as points in a 10-dimensional space, and the organization of that space, determined only by sequence properties, is similar at both local and global scales to that of the space of protein structures determined previously.

Subject(s)

Algorithms , Proteins/chemistry , Computer Simulation , Databases, Protein , Physical Phenomena , Protein Conformation , Sequence Analysis, Protein

15.

Structure Class Encoding in Protein Dynamic Bioinformatics.

Rackovsky, S.

J Phys Chem B ; 126(31): 5730-5734, 2022 08 11.

Article in English | MEDLINE | ID: mdl-35900129

ABSTRACT

Using recently developed methods for studying the bioinformatics of protein dynamics, we investigate differences in dynamic characteristics between the sequences of proteins that fall into different structural classes. It is shown that there is a clear differentiation of dynamic properties of sequences as a function of structural class. Taken together with previous results we have developed, the present work demonstrates that dynamic properties are associated with structural behavior in two ways. The determination as to whether a given sequence folds is governed by the long-length-scale organization of the sequence. If the sequence folds, the choice of architectural class is governed by short- and intermediate-length-scale organization.

Subject(s)

Computational Biology , Proteins , Proteins/chemistry

16.

Spectral analysis of a protein conformational switch.

Rackovsky, S.

Phys Rev Lett ; 106(24): 248101, 2011 Jun 17.

Article in English | MEDLINE | ID: mdl-21770602

ABSTRACT

The existence of conformational switching in proteins, induced by single amino acid mutations, presents an important challenge to our understanding of the physics of protein folding. Sequence-local methods, commonly used to detect structural homology, are incapable of accounting for this phenomenon. We examine a set of proteins, derived from the G(A) and G(B) domains of Streptococcus protein G, which are known to show a dramatic conformational change as a result of single-residue replacement. It is shown that these sequences, which are almost identical locally, can have very different global patterns of physical properties. These differences are consistent with the observed complete change in conformation. These results suggest that sequence-local methods for identifying structural homology can be misleading. They point to the importance of global sequence analysis in understanding sequence-structure relationships.

Subject(s)

Bacterial Proteins/chemistry , Spectrum Analysis/methods , Fourier Analysis , Protein Structure, Tertiary

17.

Beyond Supersecondary Structure: Physics-Based Sequence Alignment.

Rackovsky, S.

Methods Mol Biol ; 1958: 341-346, 2019.

Article in English | MEDLINE | ID: mdl-30945228

ABSTRACT

Traditional approaches to sequence alignment are based on evolutionary ideas. As a result, they are prebiased toward results which are in accord with initial expectations. We present here a method of sequence alignment which is based entirely on the physical properties of the amino acids. This approach has no inherent bias, eliminates much of the computational complexity associated with methods currently in use, and has been shown to give good results for structures which were poorly predicted by traditional methods in recent CASP competitions and to identify sequence differences which correlate with structural and dynamic differences not detectable by traditional methods.

Subject(s)

Amino Acid Motifs , Computational Biology/methods , Proteins/genetics , Sequence Alignment/methods , Algorithms , Amino Acid Sequence/genetics , Physics , Proteins/chemistry , Sequence Homology, Amino Acid

18.

Information and discrimination in pairwise contact potentials.

Solis, Armando D; Rackovsky, S.

Proteins ; 71(3): 1071-87, 2008 May 15.

Article in English | MEDLINE | ID: mdl-18004788

ABSTRACT

We examine the information-theoretic characteristics of statistical potentials that describe pairwise long-range contacts between amino acid residues in proteins. In our work, we seek to map out an efficient information-based strategy to detect and optimally utilize the structural information latent in empirical data, to make contact potentials, and other statistically derived folding potentials, more effective tools in protein structure prediction. Foremost, we establish fundamental connections between basic information-theoretic quantities (including the ubiquitous Z-score) and contact "energies" or scores used routinely in protein structure prediction, and demonstrate that the informatic quantity that mediates fold discrimination is the total divergence. We find that pairwise contacts between residues bear a moderate amount of fold information, and if optimized, can assist in the discrimination of native conformations from large ensembles of native-like decoys. Using an extensive battery of threading tests, we demonstrate that parameters that affect the information content of contact potentials (e.g., choice of atoms to define residue location and the cut-off distance between pairs) have a significant influence in their performance in fold recognition. We conclude that potentials that have been optimized for mutual information and that have high number of score events per sequence-structure alignment are superior in identifying the correct fold. We derive the quantity "information product" that embodies these two critical factors. We demonstrate that the information product, which does not require explicit threading to compute, is as effective as the Z-score, which requires expensive decoy threading to evaluate. This new objective function may be able to speed up the multidimensional parameter search for better statistical potentials. Lastly, by demonstrating the functional equivalence of quasi-chemically approximated "energies" to fundamental informatic quantities, we make statistical potentials less dependent on theoretically tenuous biophysical formalisms and more amenable to direct bioinformatic optimization.

Subject(s)

Information Storage and Retrieval/methods , Models, Molecular , Protein Folding , Sequence Analysis, Protein/methods , Models, Statistical , Protein Conformation , Sequence Alignment/methods , Thermodynamics

19.

Property-based sequence representations do not adequately encode local protein folding information.

Solis, A D; Rackovsky, S.

Proteins ; 67(4): 785-8, 2007 Jun 01.

Article in English | MEDLINE | ID: mdl-17387739

ABSTRACT

We examine the informatic characteristics of amino acid representations based on physical properties. We demonstrate that sequences rewritten using contracted alphabets based on physical properties do not encode local folding information well. The best four-character alphabet can only encode approximately 57% of the maximum possible amount of structural information. This result suggests that property-based representations that operate on a local length scale are not likely to be useful in homology searches and fold-recognition exercises.

Subject(s)

Protein Folding , Proteins/chemistry , Proteins/metabolism , Amino Acid Sequence , Cluster Analysis , Molecular Sequence Data

20.

Improvement of statistical potentials and threading score functions using information maximization.

Solis, Armando D; Rackovsky, S.

Proteins ; 62(4): 892-908, 2006 Mar 01.

Article in English | MEDLINE | ID: mdl-16395676

ABSTRACT

We show that statistical potentials and threading score functions, derived from finite data sets, are informatic functions, and that their performance depends on the manner in which data are classified and compressed. The choice of sequence and structural parameters affects estimates of the conditional probabilities P(C|S), the quantification of the effect of sequence S on conformation C, and determines the amount of information extracted from the data set, as measured by information gain. The mathematical link between information gain and mean conformational energy, established in this work using the local backbone potential as model, demonstrates that manipulation of descriptive parameters also alters the "energy" values assigned to native conformation and to decoy structures in the test pool, and consequently, the performance of such statistical potential functions in fold recognition exercises. We show that sequence and structural partitions that maximize information gain also minimize the mean energy of the ensemble of native conformations. Moreover, we establish an informatic basis for the placement of the native score within an energy spectrum given by the decoy pool in a threading exercise. We discover that, among all informatic quantities, information gain is the best predictor of threading success, even better than the standard Z-score. Consequently, the choices of sequence and structural descriptors, extent of compression, and levels of discretization that maximize information gain must also produce the best potential functions. Strategies to optimize these parameters with respect to information extraction are therefore relevant to building better statistical potentials. Last, we demonstrate that the backbone torsion potential, defined by the trimer sequence, can be an effective tool in greatly reducing the set of possible conformations from a vast decoy pool.

Subject(s)

Proteins/chemistry , Amino Acid Sequence , Databases, Protein , Entropy , Information Systems , Models, Statistical , Probability , Protein Conformation , Proteins/metabolism , Sequence Alignment , Sequence Homology, Amino Acid , Thermodynamics

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL