Comparison of random forest and Pipeline Pilot Naïve Bayes in prospective QSAR predictions.

Chen, Bin; Sheridan, Robert P; Hornak, Viktor; Voigt, Johannes H

Chen, Bin; Sheridan, Robert P; Hornak, Viktor; Voigt, Johannes H.

Afiliação

Chen B; School of Informatics and Computing, Indiana University at Bloomington, Bloomington, Indiana 47405, USA.

J Chem Inf Model ; 52(3): 792-803, 2012 Mar 26.

Article em En | MEDLINE | ID: mdl-22360769

ABSTRACT

ABSTRACT

Random forest is currently considered one of the best QSAR methods available in terms of accuracy of prediction. However, it is computationally intensive. Naïve Bayes is a simple, robust classification method. The Laplacian-modified Naïve Bayes implementation is the preferred QSAR method in the widely used commercial chemoinformatics platform Pipeline Pilot. We made a comparison of the ability of Pipeline Pilot Naïve Bayes (PLPNB) and random forest to make accurate predictions on 18 large, diverse in-house QSAR data sets. These include on-target and ADME-related activities. These data sets were set up as classification problems with either binary or multicategory activities. We used a time-split method of dividing training and test sets, as we feel this is a realistic way of simulating prospective prediction. PLPNB is computationally efficient. However, random forest predictions are at least as good and in many cases significantly better than those of PLPNB on our data sets. PLPNB performs better with ECFP4 and ECFP6 descriptors, which are native to Pipeline Pilot, and more poorly with other descriptors we tried.

Assuntos

Árvores de Decisões; Relação Quantitativa Estrutura-Atividade; Teorema de Bayes; Inibidores Enzimáticos/química; Inibidores Enzimáticos/farmacologia; Humanos; Fatores de Tempo

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Árvores de Decisões / Relação Quantitativa Estrutura-Atividade Tipo de estudo: Clinical_trials / Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Revista: J Chem Inf Model Assunto da revista: INFORMATICA MEDICA / QUIMICA Ano de publicação: 2012 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google