RESUMO
Motivation: Protein-protein interactions (PPI) play a crucial role in our understanding of protein function and biological processes. The standardization and recording of experimental findings is increasingly stored in ontologies, with the Gene Ontology (GO) being one of the most successful projects. Several PPI evaluation algorithms have been based on the application of probabilistic frameworks or machine learning algorithms to GO properties. Here, we introduce a new training set design and machine learning based approach that combines dependent heterogeneous protein annotations from the entire ontology to evaluate putative co-complex protein interactions determined by empirical studies. Results: PPI annotations are built combinatorically using corresponding GO terms and InterPro annotation. We use a S.cerevisiae high-confidence complex dataset as a positive training set. A series of classifiers based on Maximum Entropy and support vector machines (SVMs), each with a composite counterpart algorithm, are trained on a series of training sets. These achieve a high performance area under the ROC curve of ≤0.97, outperforming go2ppi-a previously established prediction tool for protein-protein interactions (PPI) based on Gene Ontology (GO) annotations. Availability and implementation: https://github.com/ima23/maxent-ppi. Contact: sbh11@cl.cam.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Biologia Computacional/métodos , Ontologia Genética , Anotação de Sequência Molecular , Máquina de Vetores de Suporte , EntropiaRESUMO
The support vector machine (SVM) methodology has become a popular and well-used component of present chemometric analysis. We assess a relatively recent development of the algorithm, multiple kernel learning (MKL), on published structure-property relationship (SPR) data. The MKL algorithm learns a weighting across multiple kernel-based representations of the data during supervised classifier creation and, thereby, may be used to describe the influence of distinct groups of structural descriptors upon a single structure-property classifier without explicitly omitting any of them. We observe a statistically significant performance improvement over a conventional, single kernel SVM on all three SPR data sets analysed. Furthermore, MKL output is observed to provide useful information regarding the relative influence of five distinct descriptor subsets present in each data set.