Your browser doesn't support javascript.
loading
Limitations in Evaluating Machine Learning Models for Imbalanced Binary Outcome Classification in Spine Surgery: A Systematic Review.
Ghanem, Marc; Ghaith, Abdul Karim; El-Hajj, Victor Gabriel; Bhandarkar, Archis; de Giorgio, Andrea; Elmi-Terander, Adrian; Bydon, Mohamad.
Afiliação
  • Ghanem M; Mayo Clinic Neuro-Informatics Laboratory, Mayo Clinic, Rochester, MN 55902, USA.
  • Ghaith AK; Department of Neurological Surgery, Mayo Clinic, Rochester, MN 55902, USA.
  • El-Hajj VG; School of Medicine, Lebanese American University, Byblos 4504, Lebanon.
  • Bhandarkar A; Mayo Clinic Neuro-Informatics Laboratory, Mayo Clinic, Rochester, MN 55902, USA.
  • de Giorgio A; Department of Neurological Surgery, Mayo Clinic, Rochester, MN 55902, USA.
  • Elmi-Terander A; Mayo Clinic Neuro-Informatics Laboratory, Mayo Clinic, Rochester, MN 55902, USA.
  • Bydon M; Department of Neurological Surgery, Mayo Clinic, Rochester, MN 55902, USA.
Brain Sci ; 13(12)2023 Dec 16.
Article em En | MEDLINE | ID: mdl-38137171
ABSTRACT
Clinical prediction models for spine surgery applications are on the rise, with an increasing reliance on machine learning (ML) and deep learning (DL). Many of the predicted outcomes are uncommon; therefore, to ensure the models' effectiveness in clinical practice it is crucial to properly evaluate them. This systematic review aims to identify and evaluate current research-based ML and DL models applied for spine surgery, specifically those predicting binary outcomes with a focus on their evaluation metrics. Overall, 60 papers were included, and the findings were reported according to the PRISMA guidelines. A total of 13 papers focused on lengths of stay (LOS), 12 on readmissions, 12 on non-home discharge, 6 on mortality, and 5 on reoperations. The target outcomes exhibited data imbalances ranging from 0.44% to 42.4%. A total of 59 papers reported the model's area under the receiver operating characteristic (AUROC), 28 mentioned accuracies, 33 provided sensitivity, 29 discussed specificity, 28 addressed positive predictive value (PPV), 24 included the negative predictive value (NPV), 25 indicated the Brier score with 10 providing a null model Brier, and 8 detailed the F1 score. Additionally, data visualization varied among the included papers. This review discusses the use of appropriate evaluation schemes in ML and identifies several common errors and potential bias sources in the literature. Embracing these recommendations as the field advances may facilitate the integration of reliable and effective ML models in clinical settings.
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Temas: Geral Base de dados: MEDLINE Tipo de estudo: Systematic_reviews Idioma: En Revista: Brain Sci Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Coleções: 01-internacional Temas: Geral Base de dados: MEDLINE Tipo de estudo: Systematic_reviews Idioma: En Revista: Brain Sci Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Estados Unidos