Search | VHL Regional Portal

SeRenDIP: SEquential REmasteriNg to DerIve profiles for fast and accurate predictions of PPI interface positions.

Hou, Qingzhen; De Geest, Paul F G; Griffioen, Christian J; Abeln, Sanne; Heringa, Jaap; Feenstra, K Anton.

Bioinformatics ; 35(22): 4794-4796, 2019 11 01.

Article in English | MEDLINE | ID: mdl-31116381

ABSTRACT

MOTIVATION: Interpretation of ubiquitous protein sequence data has become a bottleneck in biomolecular research, due to a lack of structural and other experimental annotation data for these proteins. Prediction of protein interaction sites from sequence may be a viable substitute. We therefore recently developed a sequence-based random forest method for protein-protein interface prediction, which yielded a significantly increased performance than other methods on both homomeric and heteromeric protein-protein interactions. Here, we present a webserver that implements this method efficiently. RESULTS: With the aim of accelerating our previous approach, we obtained sequence conservation profiles by re-mastering the alignment of homologous sequences found by PSI-BLAST. This yielded a more than 10-fold speedup and at least the same accuracy, as reported previously for our method; these results allowed us to offer the method as a webserver. The web-server interface is targeted to the non-expert user. The input is simply a sequence of the protein of interest, and the output a table with scores indicating the likelihood of having an interaction interface at a certain position. As the method is sequence-based and not sensitive to the type of protein interaction, we expect this webserver to be of interest to many biological researchers in academia and in industry. AVAILABILITY AND IMPLEMENTATION: Webserver, source code and datasets are available at www.ibi.vu.nl/programs/serendipwww/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Software , Algorithms , Amino Acid Sequence , Proteins , Sequence Analysis, Protein

Seeing the trees through the forest: sequence-based homo- and heteromeric protein-protein interaction sites prediction using random forest.

Hou, Qingzhen; De Geest, Paul F G; Vranken, Wim F; Heringa, Jaap; Feenstra, K Anton.

Bioinformatics ; 33(10): 1479-1487, 2017 May 15.

Article in English | MEDLINE | ID: mdl-28073761

ABSTRACT

MOTIVATION: Genome sequencing is producing an ever-increasing amount of associated protein sequences. Few of these sequences have experimentally validated annotations, however, and computational predictions are becoming increasingly successful in producing such annotations. One key challenge remains the prediction of the amino acids in a given protein sequence that are involved in protein-protein interactions. Such predictions are typically based on machine learning methods that take advantage of the properties and sequence positions of amino acids that are known to be involved in interaction. In this paper, we evaluate the importance of various features using Random Forest (RF), and include as a novel feature backbone flexibility predicted from sequences to further optimise protein interface prediction. RESULTS: We observe that there is no single sequence feature that enables pinpointing interacting sites in our Random Forest models. However, combining different properties does increase the performance of interface prediction. Our homomeric-trained RF interface predictor is able to distinguish interface from non-interface residues with an area under the ROC curve of 0.72 in a homomeric test-set. The heteromeric-trained RF interface predictor performs better than existing predictors on a independent heteromeric test-set. We trained a more general predictor on the combined homomeric and heteromeric dataset, and show that in addition to predicting homomeric interfaces, it is also able to pinpoint interface residues in heterodimers. This suggests that our random forest model and the features included capture common properties of both homodimer and heterodimer interfaces. AVAILABILITY AND IMPLEMENTATION: The predictors and test datasets used in our analyses are freely available ( http://www.ibi.vu.nl/downloads/RF_PPI/ ). CONTACT: k.a.feenstra@vu.nl. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Algorithms , Models, Statistical , Protein Interaction Domains and Motifs , Protein Interaction Mapping/methods , Protein Multimerization , Computational Biology/methods , ROC Curve , Sequence Analysis, Protein/methods

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL