A refined approach for evaluating small datasets via binary classification using machine learning.
PLoS One
; 19(5): e0301276, 2024.
Article
em En
| MEDLINE
| ID: mdl-38771767
ABSTRACT
Classical statistical analysis of data can be complemented or replaced with data analysis based on machine learning. However, in certain disciplines, such as education research, studies are frequently limited to small datasets, which raises several questions regarding biases and coincidentally positive results. In this study, we present a refined approach for evaluating the performance of a binary classification based on machine learning for small datasets. The approach includes a non-parametric permutation test as a method to quantify the probability of the results generalising to new data. Furthermore, we found that a repeated nested cross-validation is almost free of biases and yields reliable results that are only slightly dependent on chance. Considering the advantages of several evaluation metrics, we suggest a combination of more than one metric to train and evaluate machine learning classifiers. In the specific case that both classes are equally important, the Matthews correlation coefficient exhibits the lowest bias and chance for coincidentally good results. The results indicate that it is essential to avoid several biases when analysing small datasets using machine learning.
Texto completo:
1
Base de dados:
MEDLINE
Assunto principal:
Aprendizado de Máquina
Limite:
Humans
Idioma:
En
Revista:
PLoS One
Assunto da revista:
CIENCIA
/
MEDICINA
Ano de publicação:
2024
Tipo de documento:
Article
País de afiliação:
Alemanha