Pesquisa | BVS IEC

DescFold: a web server for protein fold recognition.

Yan, Ren-Xiang; Si, Jing-Na; Wang, Chuan; Zhang, Ziding.

BMC Bioinformatics ; 10: 416, 2009 Dec 14.

Artigo em Inglês | MEDLINE | ID: mdl-20003426

RESUMO

BACKGROUND: Machine learning-based methods have been proven to be powerful in developing new fold recognition tools. In our previous work [Zhang, Kochhar and Grigorov (2005) Protein Science, 14: 431-444], a machine learning-based method called DescFold was established by using Support Vector Machines (SVMs) to combine the following four descriptors: a profile-sequence-alignment-based descriptor using Psi-blast e-values and bit scores, a sequence-profile-alignment-based descriptor using Rps-blast e-values and bit scores, a descriptor based on secondary structure element alignment (SSEA), and a descriptor based on the occurrence of PROSITE functional motifs. In this work, we focus on the improvement of DescFold by incorporating more powerful descriptors and setting up a user-friendly web server. RESULTS: In seeking more powerful descriptors, the profile-profile alignment score generated from the COMPASS algorithm was first considered as a new descriptor (i.e., PPA). When considering a profile-profile alignment between two proteins in the context of fold recognition, one protein is regarded as a template (i.e., its 3D structure is known). Instead of a sequence profile derived from a Psi-blast search, a structure-seeded profile for the template protein was generated by searching its structural neighbors with the assistance of the TM-align structural alignment algorithm. Moreover, the COMPASS algorithm was used again to derive a profile-structural-profile-alignment-based descriptor (i.e., PSPA). We trained and tested the new DescFold in a total of 1,835 highly diverse proteins extracted from the SCOP 1.73 version. When the PPA and PSPA descriptors were introduced, the new DescFold boosts the performance of fold recognition substantially. Using the SCOP_1.73_40% dataset as the fold library, the DescFold web server based on the trained SVM models was further constructed. To provide a large-scale test for the new DescFold, a stringent test set of 1,866 proteins were selected from the SCOP 1.75 version. At a less than 5% false positive rate control, the new DescFold is able to correctly recognize structural homologs at the fold level for nearly 46% test proteins. Additionally, we also benchmarked the DescFold method against several well-established fold recognition algorithms through the LiveBench targets and Lindahl dataset. CONCLUSIONS: The new DescFold method was intensively benchmarked to have very competitive performance compared with some well-established fold recognition methods, suggesting that it can serve as a useful tool to assist in template-based protein structure prediction. The DescFold server is freely accessible at http://202.112.170.199/DescFold/index.html.

Assuntos

Biologia Computacional/métodos , Internet , Proteínas/química , Software , Inteligência Artificial , Bases de Dados de Proteínas , Modelos Moleculares , Dobramento de Proteína , Proteínas/metabolismo , Alinhamento de Sequência , Análise de Sequência de Proteína/métodos

TIM-Finder: a new method for identifying TIM-barrel proteins.

Si, Jing-Na; Yan, Ren-Xiang; Wang, Chuan; Zhang, Ziding; Su, Xiao-Dong.

BMC Struct Biol ; 9: 73, 2009 Dec 14.

Artigo em Inglês | MEDLINE | ID: mdl-20003393

RESUMO

BACKGROUND: The triosephosphate isomerase (TIM)-barrel fold occurs frequently in the proteomes of different organisms, and the known TIM-barrel proteins have been found to play diverse functional roles. To accelerate the exploration of the sequence-structure protein landscape in the TIM-barrel fold, a computational tool that allows sensitive detection of TIM-barrel proteins is required. RESULTS: To develop a new TIM-barrel protein identification method in this work, we consider three descriptors: a sequence-alignment-based descriptor using PSI-BLAST e-values and bit scores, a descriptor based on secondary structure element alignment (SSEA), and a descriptor based on the occurrence of PROSITE functional motifs. With the assistance of Support Vector Machine (SVM), the three descriptors were combined to obtain a new method with improved performance, which we call TIM-Finder. When tested on the whole proteome of Bacillus subtilis, TIM-Finder is able to detect 194 TIM-barrel proteins at a 99% confidence level, outperforming the PSI-BLAST search as well as one existing fold recognition method. CONCLUSIONS: TIM-Finder can serve as a competitive tool for proteome-wide TIM-barrel protein identification. The TIM-Finder web server is freely accessible at http://202.112.170.199/TIM-Finder/.

Assuntos

Bacillus subtilis/química , Biologia Computacional/métodos , Dobramento de Proteína , Proteoma/análise , Triose-Fosfato Isomerase/análise , Motivos de Aminoácidos , Bases de Dados de Ácidos Nucleicos , Isoenzimas/análise , Isoenzimas/química , Isoenzimas/metabolismo , Modelos Moleculares , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Proteoma/química , Proteoma/metabolismo , Análise de Sequência de Proteína , Triose-Fosfato Isomerase/química , Triose-Fosfato Isomerase/metabolismo

Comparison of linear gap penalties and profile-based variable gap penalties in profile-profile alignments.

Wang, Chuan; Yan, Ren-Xiang; Wang, Xiao-Feng; Si, Jing-Na; Zhang, Ziding.

Comput Biol Chem ; 35(5): 308-18, 2011 Oct 12.

Artigo em Inglês | MEDLINE | ID: mdl-22000802

RESUMO

Profile-profile alignment algorithms have proven powerful for recognizing remote homologs and generating alignments by effectively integrating sequence evolutionary information into scoring functions. In comparison to scoring function, the development of gap penalty functions has rarely been addressed in profile-profile alignment algorithms. Although indel frequency profiles have been used to construct profile-based variable gap penalties in some profile-profile alignment algorithms, there is still no fair comparison between variable gap penalties and traditional linear gap penalties to quantify the improvement of alignment accuracy. We compared two linear gap penalty functions, the traditional affine gap penalty (AGP) and the bilinear gap penalty (BGP), with two profile-based variable gap penalty functions, the Profile-based Gap Penalty used in SP(5) (SPGP) and a new Weighted Profile-based Gap Penalty (WPGP) developed by us, on some well-established benchmark datasets. Our results show that profile-based variable gap penalties get limited improvements than linear gap penalties, whether incorporated with secondary structure information or not. Secondary structure information appears less powerful to be incorporated into gap penalties than into scoring functions. Analysis of gap length distributions indicates that gap penalties could stably maintain corresponding distributions of gap lengths in their alignments, but the distribution difference from reference alignments does not reflect the performance of gap penalties. There is useful information in indel frequency profiles, but it is still not good enough for improving alignment accuracy when used in profile-based variable gap penalties. All of the methods tested in this work are freely accessible at http://protein.cau.edu.cn/gppat/.

Assuntos

Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos , Biologia Computacional , Bases de Dados de Proteínas , Dados de Sequência Molecular , Estrutura Secundária de Proteína , Sensibilidade e Especificidade , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência de Proteína/estatística & dados numéricos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA