Predicting protein-protein interactions from sequence using correlation coefficient and high-quality interaction dataset.
Amino Acids
; 38(3): 891-9, 2010 Mar.
Article
em En
| MEDLINE
| ID: mdl-19387790
Identifying protein-protein interactions (PPIs) is critical for understanding the cellular function of the proteins and the machinery of a proteome. Data of PPIs derived from high-throughput technologies are often incomplete and noisy. Therefore, it is important to develop computational methods and high-quality interaction dataset for predicting PPIs. A sequence-based method is proposed by combining correlation coefficient (CC) transformation and support vector machine (SVM). CC transformation not only adequately considers the neighboring effect of protein sequence but describes the level of CC between two protein sequences. A gold standard positives (interacting) dataset MIPS Core and a gold standard negatives (non-interacting) dataset GO-NEG of yeast Saccharomyces cerevisiae were mined to objectively evaluate the above method and attenuate the bias. The SVM model combined with CC transformation yielded the best performance with a high accuracy of 87.94% using gold standard positives and gold standard negatives datasets. The source code of MATLAB and the datasets are available on request under smgsmg@mail.ustc.edu.cn.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Proteoma
/
Proteínas de Saccharomyces cerevisiae
/
Mapeamento de Interação de Proteínas
/
Aminoácidos
Tipo de estudo:
Evaluation_studies
/
Prognostic_studies
/
Risk_factors_studies
Idioma:
En
Revista:
Amino Acids
Assunto da revista:
BIOQUIMICA
Ano de publicação:
2010
Tipo de documento:
Article
País de afiliação:
China