Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
Curr Protein Pept Sci ; 13(1): 6-18, 2012 Feb.
Article in English | MEDLINE | ID: mdl-22044149

ABSTRACT

Intrinsic disorder is relatively common in proteins, plays important roles in numerous cellular activities, and its prevalence was implicated in various human diseases. However, annotations of the disorder lag behind the rapidly increasing number of known protein chains. The last decade observed development of a relatively large number of in-silico methods that predict the disorder using the protein sequence as their input. We perform a first-of-its kind comprehensive empirical evaluation of the disorder predictors which is characterized by three novel aspects, (1) we evaluate the quality of the disorder predictions at the residue, segment, and chain levels; (2) we consider a large number of published and accessible to the end user predictors that are evaluated on a relatively big dataset with close to 500 proteins; and (3) we assess statistical significance of differences between the considered methods. Our study reveals that there is no universally superior predictor and that the top-performing methods are complementary. We show that while recent consensus-based predictors outperform other considered methods for the residue-level predictions, some older methods perform better for the prediction of the disordered segments. Our analysis indicates that certain predictors are biased to under-predict the disorder, while some other solutions tend to over-predict the number of the disordered residues. We also evaluate the utility of the predicted residue-level disorder for prediction of proteins with long disordered segments and prediction of the chainlevel disorder content. Lastly, we provide recommendations concerning development of a new generation of consensusbased methods and specialized methods for improved prediction of the disorder content.


Subject(s)
Proteins/chemistry , Algorithms , Computational Biology/methods , Protein Conformation , ROC Curve
2.
BMC Bioinformatics ; 11: 420, 2010 Aug 09.
Article in English | MEDLINE | ID: mdl-20696050

ABSTRACT

BACKGROUND: G-protein-coupled receptors (GPCRs) play a key role in diverse physiological processes and are the targets of almost two-thirds of the marketed drugs. The 3 D structures of GPCRs are largely unavailable; however, a large number of GPCR primary sequences are known. To facilitate the identification and characterization of novel receptors, it is therefore very valuable to develop a computational method to accurately predict GPCRs from the protein primary sequences. RESULTS: We propose a new method called PCA-GPCR, to predict GPCRs using a comprehensive set of 1497 sequence-derived features. The principal component analysis is first employed to reduce the dimension of the feature space to 32. Then, the resulting 32-dimensional feature vectors are fed into a simple yet powerful classification algorithm, called intimate sorting, to predict GPCRs at five levels. The prediction at the first level determines whether a protein is a GPCR or a non-GPCR. If it is predicted to be a GPCR, then it will be further predicted into certain family, subfamily, sub-subfamily and subtype by the classifiers at the second, third, fourth, and fifth levels, respectively. To train the classifiers applied at five levels, a non-redundant dataset is carefully constructed, which contains 3178, 1589, 4772, 4924, and 2741 protein sequences at the respective levels. Jackknife tests on this training dataset show that the overall accuracies of PCA-GPCR at five levels (from the first to the fifth) can achieve up to 99.5%, 88.8%, 80.47%, 80.3%, and 92.34%, respectively. We further perform predictions on a dataset of 1238 GPCRs at the second level, and on another two datasets of 167 and 566 GPCRs respectively at the fourth level. The overall prediction accuracies of our method are consistently higher than those of the existing methods to be compared. CONCLUSIONS: The comprehensive set of 1497 features is believed to be capable of capturing information about amino acid composition, sequence order as well as various physicochemical properties of proteins. Therefore, high accuracies are achieved when predicting GPCRs at all the five levels with our proposed method.


Subject(s)
Algorithms , Receptors, G-Protein-Coupled/chemistry , Receptors, G-Protein-Coupled/classification , Amino Acids/analysis , Principal Component Analysis , Receptors, G-Protein-Coupled/genetics , Sequence Analysis, Protein
3.
BMC Bioinformatics ; 11 Suppl 1: S9, 2010 Jan 18.
Article in English | MEDLINE | ID: mdl-20122246

ABSTRACT

BACKGROUND: Prediction of protein structural classes (alpha, beta, alpha + beta and alpha/beta) from amino acid sequences is of great importance, as it is beneficial to study protein function, regulation and interactions. Many methods have been developed for high-homology protein sequences, and the prediction accuracies can achieve up to 90%. However, for low-homology sequences whose average pairwise sequence identity lies between 20% and 40%, they perform relatively poorly, yielding the prediction accuracy often below 60%. RESULTS: We propose a new method to predict protein structural classes on the basis of features extracted from the predicted secondary structures of proteins rather than directly from their amino acid sequences. It first uses PSIPRED to predict the secondary structure for each protein sequence. Then, the chaos game representation is employed to represent the predicted secondary structure as two time series, from which we generate a comprehensive set of 24 features using recurrence quantification analysis, K-string based information entropy and segment-based analysis. The resulting feature vectors are finally fed into a simple yet powerful Fisher's discriminant algorithm for the prediction of protein structural classes. We tested the proposed method on three benchmark datasets in low homology and achieved the overall prediction accuracies of 82.9%, 83.1% and 81.3%, respectively. Comparisons with ten existing methods showed that our method consistently performs better for all the tested datasets and the overall accuracy improvements range from 2.3% to 27.5%. A web server that implements the proposed method is freely available at http://www1.spms.ntu.edu.sg/~chenxin/RKS_PPSC/. CONCLUSION: The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the predicted secondary structure sequences, which is capable of characterizing the sequence order information, local interactions of the secondary structural elements, and spacial arrangements of alpha helices and beta strands. Thus, it is a valuable method to predict protein structural classes particularly for low-homology amino acid sequences.


Subject(s)
Algorithms , Protein Structure, Secondary , Proteins/chemistry , Sequence Homology, Amino Acid , Amino Acid Sequence , Entropy , Models, Molecular , Molecular Sequence Data , Protein Conformation , Proteins/classification
4.
J Theor Biol ; 257(4): 618-26, 2009 Apr 21.
Article in English | MEDLINE | ID: mdl-19183559

ABSTRACT

In this paper, we intend to predict protein structural classes (alpha, beta, alpha+beta, or alpha/beta) for low-homology data sets. Two data sets were used widely, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence homology being 40% and 25%, respectively. We propose to decompose the chaos game representation of proteins into two kinds of time series. Then, a novel and powerful nonlinear analysis technique, recurrence quantification analysis (RQA), is applied to analyze these time series. For a given protein sequence, a total of 16 characteristic parameters can be calculated with RQA, which are treated as feature representation of protein sequences. Based on such feature representation, the structural class for each protein is predicted with Fisher's linear discriminant algorithm. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies with step-by-step procedure are 65.8% and 64.2% for 1189 and 25PDB data sets, respectively. With one-against-others procedure used widely, we compare our method with five other existing methods. Especially, the overall accuracies of our method are 6.3% and 4.1% higher for the two data sets, respectively. Furthermore, only 16 parameters are used in our method, which is less than that used by other methods. This suggests that the current method may play a complementary role to the existing methods and is promising to perform the prediction of protein structural classes.


Subject(s)
Protein Conformation , Algorithms , Amino Acid Sequence , Amino Acids/genetics , Base Sequence , Computational Biology/methods , Nonlinear Dynamics , Sequence Homology, Amino Acid
SELECTION OF CITATIONS
SEARCH DETAIL
...