Classifying large chemical data sets: using a regularized potential function method.

Mussa, Hamse Y; Hawizy, Lezan; Nigsch, Florian; Glen, Robert C

Mussa, Hamse Y; Hawizy, Lezan; Nigsch, Florian; Glen, Robert C.

Afiliação

Mussa HY; Unilever Centre for Molecular Sciences Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom. hym21@cam.ac.uk

J Chem Inf Model ; 51(1): 4-14, 2011 Jan 24.

Article em En | MEDLINE | ID: mdl-21155612

ABSTRACT

ABSTRACT

In recent years classifiers generated with kernel-based methods, such as support vector machines (SVM), Gaussian processes (GP), regularization networks (RN), and binary kernel discrimination (BKD) have been very popular in chemoinformatics data analysis. Aizerman et al. were the first to introduce the notion of employing kernel-based classifiers in the area of pattern recognition. Their original scheme, which they termed the potential function method (PFM), can basically be viewed as a kernel-based perceptron procedure and arguably subsumes the modern kernel-based algorithms. PFM can be computationally much cheaper than modern kernel-based classifiers; furthermore, PFM is far simpler conceptually and easier to implement than the SVM, GP, and RN algorithms. Unfortunately, unlike, e.g., SVM, GP, and RN, PFM is not endowed with both theoretical guarantees and practical strategies to safeguard it against generating overfitting classifiers. This is, in our opinion, the reason why this simple and elegant method has not been taken up in chemoinformatics. In this paper we empirically address this drawback while maintaining its simplicity, we demonstrate that PFM combined with a simple regularization scheme may yield binary classifiers that can be, in practice, as efficient as classifiers obtained by employing state-of-the-art kernel-based methods. Using a realistic classification example, the augmented PFM was used to generate binary classifiers. Using a large chemical data set, the generalization ability of PFM classifiers were then compared with the prediction power of Laplacian-modified naive Bayesian (LmNB), Winnow (WN), and SVM classifiers.

Assuntos

Química/métodos; Classificação/métodos; Informática/métodos; Tomada de Decisões; Análise Discriminante; Dinâmica não Linear

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Química / Classificação / Informática Idioma: En Ano de publicação: 2011 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Química / Classificação / Informática Idioma: En Ano de publicação: 2011 Tipo de documento: Article