Definition of loss functions for learning from imbalanced data to minimize evaluation metrics.
Methods Mol Biol
; 1246: 19-37, 2015.
Article
em En
| MEDLINE
| ID: mdl-25417077
Most learning algorithms for classification use objective functions based on regularized and/or continuous versions of the 0-1 loss function. Moreover, the performance of the classification models is usually measured by means of the empirical error or misclassification rate. Nevertheless, neither those loss functions nor the empirical error is adequate for learning from imbalanced data. In these problems, the empirical error is uninformative about the performance of the classifier and the loss functions usually produce models that are shifted to the majority class. This study defines the loss function L BER whose associated empirical risk is equal to the BER. Our results show that classifiers based on our L BER loss function are optimal in terms of the BER evaluation metric. Furthermore, the boundaries of the classifiers were invariant to the imbalance ratio of the training dataset. The L BER-based models outperformed the 0-1-based models and other algorithms for imbalanced data in terms of BER, regardless of the prevalence of the positive class. Finally, we demonstrate the equivalence of the loss function to the method of inverted prior probabilities, and we define the family of loss functions L WER that is associated with any WER evaluation metric by the generalization of L BER.
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Algoritmos
/
Inteligência Artificial
Tipo de estudo:
Risk_factors_studies
Idioma:
En
Revista:
Methods Mol Biol
Assunto da revista:
BIOLOGIA MOLECULAR
Ano de publicação:
2015
Tipo de documento:
Article