pTop 1.0: A High-Accuracy and High-Efficiency Search Engine for Intact Protein Identification.

Sun, Rui-Xiang; Luo, Lan; Wu, Long; Wang, Rui-Min; Zeng, Wen-Feng; Chi, Hao; Liu, Chao; He, Si-Min

Sun, Rui-Xiang; Luo, Lan; Wu, Long; Wang, Rui-Min; Zeng, Wen-Feng; Chi, Hao; Liu, Chao; He, Si-Min.

Afiliação

Sun RX; Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS , Beijing 100190, China.
Luo L; Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS , Beijing 100190, China.
Wu L; University of Chinese Academy of Sciences , Beijing 100049, China.
Wang RM; Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS , Beijing 100190, China.
Zeng WF; University of Chinese Academy of Sciences , Beijing 100049, China.
Chi H; Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS , Beijing 100190, China.
Liu C; University of Chinese Academy of Sciences , Beijing 100049, China.
He SM; Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS , Beijing 100190, China.

Anal Chem ; 88(6): 3082-90, 2016 Mar 15.

Article em En | MEDLINE | ID: mdl-26844380

ABSTRACT

ABSTRACT

There has been tremendous progress in top-down proteomics (TDP) in the past 5 years, particularly in intact protein separation and high-resolution mass spectrometry. However, bioinformatics to deal with large-scale mass spectra has lagged behind, in both algorithmic research and software development. In this study, we developed pTop 1.0, a novel software tool to significantly improve the accuracy and efficiency of mass spectral data analysis in TDP. The precursor mass offers crucial clues to infer the potential post-translational modifications co-occurring on the protein, the reliability of which relies heavily on its mass accuracy. Concentrating on detecting the precursors more accurately, a machine-learning model incorporating a variety of spectral features was trained online in pTop via a support vector machine (SVM). pTop employs the sequence tags extracted from the MS/MS spectra and a dynamic programming algorithm to accelerate the search speed, especially for those spectra with multiple post-translational modifications. We tested pTop on three publicly available data sets and compared it with ProSight and MS-Align+ in terms of its recall, precision, running time, and so on. The results showed that pTop can, in general, outperform ProSight and MS-Align+. pTop recalled 22% more correct precursors, although it exported 30% fewer precursors than Xtract (in ProSight) from a human histone data set. The running speed of pTop was about 1 to 2 orders of magnitude faster than that of MS-Align+. This algorithmic advancement in pTop, including both accuracy and speed, will inspire the development of other similar software to analyze the mass spectra from the entire proteins.

Assuntos

Bases de Dados de Proteínas; Armazenamento e Recuperação da Informação; Proteínas/análise; Algoritmos; Aprendizado de Máquina; Software

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Proteínas / Armazenamento e Recuperação da Informação / Bases de Dados de Proteínas Tipo de estudo: Diagnostic_studies Idioma: En Ano de publicação: 2016 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google