MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins.
Bioinformatics
; 31(7): 999-1006, 2015 Apr 01.
Article
em En
| MEDLINE
| ID: mdl-25431331
ABSTRACT
MOTIVATION Recent developments of statistical techniques to infer direct evolutionary couplings between residue pairs have rendered covariation-based contact prediction a viable means for accurate 3D modelling of proteins, with no information other than the sequence required. To extend the usefulness of contact prediction, we have designed a new meta-predictor (MetaPSICOV) which combines three distinct approaches for inferring covariation signals from multiple sequence alignments, considers a broad range of other sequence-derived features and, uniquely, a range of metrics which describe both the local and global quality of the input multiple sequence alignment. Finally, we use a two-stage predictor, where the second stage filters the output of the first stage. This two-stage predictor is additionally evaluated on its ability to accurately predict the long range network of hydrogen bonds, including correctly assigning the donor and acceptor residues. RESULTS:
Using the original PSICOV benchmark set of 150 protein families, MetaPSICOV achieves a mean precision of 0.54 for top-L predicted long range contacts-around 60% higher than PSICOV, and around 40% better than CCMpred. In de novo protein structure prediction using FRAGFOLD, MetaPSICOV is able to improve the TM-scores of models by a median of 0.05 compared with PSICOV. Lastly, for predicting long range hydrogen bonding, MetaPSICOV-HB achieves a precision of 0.69 for the top-L/10 hydrogen bonds compared with just 0.26 for the baseline MetaPSICOV. AVAILABILITY AND IMPLEMENTATION MetaPSICOV is available as a freely available web server at http//bioinf.cs.ucl.ac.uk/MetaPSICOV. Raw data (predicted contact lists and 3D models) and source code can be downloaded from http//bioinf.cs.ucl.ac.uk/downloads/MetaPSICOV. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Algoritmos
/
Software
/
Proteínas
/
Alinhamento de Sequência
/
Biologia Computacional
/
Análise de Sequência de Proteína
Tipo de estudo:
Prognostic_studies
/
Risk_factors_studies
Limite:
Humans
Idioma:
En
Ano de publicação:
2015
Tipo de documento:
Article