Predicting the accuracy of ligand overlay methods with Random Forest models.
J Chem Inf Model
; 48(12): 2386-94, 2008 Dec.
Article
em En
| MEDLINE
| ID: mdl-19053524
The accuracy of binding mode prediction using standard molecular overlay methods (ROCS, FlexS, Phase, and FieldCompare) is studied. Previous work has shown that simple decision tree modeling can be used to improve accuracy by selection of the best overlay template. This concept is extended to the use of Random Forest (RF) modeling for template and algorithm selection. An extensive data set of 815 ligand-bound X-ray structures representing 5 gene families was used for generating ca. 70,000 overlays using four programs. RF models, trained using standard measures of ligand and protein similarity and Lipinski-related descriptors, are used for automatically selecting the reference ligand and overlay method maximizing the probability of reproducing the overlay deduced from X-ray structures (i.e., using rmsd < or = 2 A as the criteria for success). RF model scores are highly predictive of overlay accuracy, and their use in template and method selection produces correct overlays in 57% of cases for 349 overlay ligands not used for training RF models. The inclusion in the models of protein sequence similarity enables the use of templates bound to related protein structures, yielding useful results even for proteins having no available X-ray structures.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Algoritmos
/
Proteínas
/
Descoberta de Drogas
Tipo de estudo:
Clinical_trials
/
Prognostic_studies
/
Risk_factors_studies
Idioma:
En
Revista:
J Chem Inf Model
Assunto da revista:
INFORMATICA MEDICA
/
QUIMICA
Ano de publicação:
2008
Tipo de documento:
Article
País de afiliação:
Reino Unido