Your browser doesn't support javascript.
loading
Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features.
Li, Bo; Cai, Lijun; Liao, Bo; Fu, Xiangzheng; Bing, Pingping; Yang, Jialiang.
Afiliação
  • Li B; College of Information Science and Engineering, Hunan University, Changsha 410082, China. hn.libo@163.com.
  • Cai L; College of Information Science and Engineering, Hunan University, Changsha 410082, China. ljcai@hnu.edu.cn.
  • Liao B; College of Information Science and Engineering, Hunan University, Changsha 410082, China. dragonbw@163.com.
  • Fu X; School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China. dragonbw@163.com.
  • Bing P; College of Information Science and Engineering, Hunan University, Changsha 410082, China. excelsior511@126.com.
  • Yang J; Academics Working Station, Changsha Medical University, Changsha 410219, China. bpping@163.com.
Molecules ; 24(5)2019 Mar 06.
Article em En | MEDLINE | ID: mdl-30845684
ABSTRACT
The prediction of protein subcellular localization is critical for inferring protein functions, gene regulations and protein-protein interactions. With the advances of high-throughput sequencing technologies and proteomic methods, the protein sequences of numerous yeasts have become publicly available, which enables us to computationally predict yeast protein subcellular localization. However, widely-used protein sequence representation techniques, such as amino acid composition and the Chou's pseudo amino acid composition (PseAAC), are difficult in extracting adequate information about the interactions between residues and position distribution of each residue. Therefore, it is still urgent to develop novel sequence representations. In this study, we have presented two novel protein sequence representation techniques including Generalized Chaos Game Representation (GCGR) based on the frequency and distributions of the residues in the protein primary sequence, and novel statistics and information theory (NSI) reflecting local position information of the sequence. In the GCGR + NSI representation, a protein primary sequence is simply represented by a 5-dimensional feature vector, while other popular methods like PseAAC and dipeptide adopt features of more than hundreds of dimensions. In practice, the feature representation is highly efficient in predicting protein subcellular localization. Even without using machine learning-based classifiers, a simple model based on the feature vector can achieve prediction accuracies of 0.8825 and 0.7736 respectively for the CL317 and ZW225 datasets. To further evaluate the effectiveness of the proposed encoding schemes, we introduce a multi-view features-based method to combine the two above-mentioned features with other well-known features including PseAAC and dipeptide composition, and use support vector machine as the classifier to predict protein subcellular localization. This novel model achieves prediction accuracies of 0.927 and 0.871 respectively for the CL317 and ZW225 datasets, better than other existing methods in the jackknife tests. The results suggest that the GCGR and NSI features are useful complements to popular protein sequence representations in predicting yeast protein subcellular localization. Finally, we validate a few newly predicted protein subcellular localizations by evidences from some published articles in authority journals and books.
Assuntos
Palavras-chave

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Proteínas / Biologia Computacional / Proteômica Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Molecules Assunto da revista: BIOLOGIA Ano de publicação: 2019 Tipo de documento: Article País de afiliação: China

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Proteínas / Biologia Computacional / Proteômica Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Molecules Assunto da revista: BIOLOGIA Ano de publicação: 2019 Tipo de documento: Article País de afiliação: China