Definition of loss functions for learning from imbalanced data to minimize evaluation metrics.

García-Gómez, Juan Miguel; Tortajada, Salvador

García-Gómez, Juan Miguel; Tortajada, Salvador.

Afiliação

García-Gómez JM; Biomedical Informatics group (IBIME), Instituto de las Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas (ITACA), Universitat Politècnica de València, Valencia, Spain, juanmig@ibime.upv.es.

Methods Mol Biol ; 1246: 19-37, 2015.

Article em En | MEDLINE | ID: mdl-25417077

RESUMO

Most learning algorithms for classification use objective functions based on regularized and/or continuous versions of the 0-1 loss function. Moreover, the performance of the classification models is usually measured by means of the empirical error or misclassification rate. Nevertheless, neither those loss functions nor the empirical error is adequate for learning from imbalanced data. In these problems, the empirical error is uninformative about the performance of the classifier and the loss functions usually produce models that are shifted to the majority class. This study defines the loss function L BER whose associated empirical risk is equal to the BER. Our results show that classifiers based on our L BER loss function are optimal in terms of the BER evaluation metric. Furthermore, the boundaries of the classifiers were invariant to the imbalance ratio of the training dataset. The L BER-based models outperformed the 0-1-based models and other algorithms for imbalanced data in terms of BER, regardless of the prevalence of the positive class. Finally, we demonstrate the equivalence of the loss function to the method of inverted prior probabilities, and we define the family of loss functions L WER that is associated with any WER evaluation metric by the generalization of L BER.

Assuntos

Algoritmos; Inteligência Artificial; Estatística como Assunto

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Inteligência Artificial Tipo de estudo: Risk_factors_studies Idioma: En Revista: Methods Mol Biol Assunto da revista: BIOLOGIA MOLECULAR Ano de publicação: 2015 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google