Assessing the Accuracy of Artificial Intelligence Models in Scoliosis Classification and Suggested Therapeutic Approaches.

Fabijan, Artur; Zawadzka-Fabijan, Agnieszka; Fabijan, Robert; Zakrzewski, Krzysztof; Nowoslawska, Emilia; Polis, Bartosz

Fabijan, Artur; Zawadzka-Fabijan, Agnieszka; Fabijan, Robert; Zakrzewski, Krzysztof; Nowoslawska, Emilia; Polis, Bartosz.

Afiliación

Fabijan A; Department of Neurosurgery, Polish-Mother's Memorial Hospital Research Institute, 93-338 Lodz, Poland.
Zawadzka-Fabijan A; Department of Rehabilitation Medicine, Faculty of Health Sciences, Medical University of Lodz, 90-419 Lodz, Poland.
Fabijan R; Independent Researcher, Luton LU2 0GS, UK.
Zakrzewski K; Department of Neurosurgery, Polish-Mother's Memorial Hospital Research Institute, 93-338 Lodz, Poland.
Nowoslawska E; Department of Neurosurgery, Polish-Mother's Memorial Hospital Research Institute, 93-338 Lodz, Poland.
Polis B; Department of Neurosurgery, Polish-Mother's Memorial Hospital Research Institute, 93-338 Lodz, Poland.

J Clin Med ; 13(14)2024 Jul 09.

Article en En | MEDLINE | ID: mdl-39064053

ABSTRACT

ABSTRACT

Background:

Open-source artificial intelligence models (OSAIMs) are increasingly being applied in various fields, including IT and medicine, offering promising solutions for diagnostic and therapeutic interventions. In response to the growing interest in AI for clinical diagnostics, we evaluated several OSAIMs-such as ChatGPT 4, Microsoft Copilot, Gemini, PopAi, You Chat, Claude, and the specialized PMC-LLaMA 13B-assessing their abilities to classify scoliosis severity and recommend treatments based on radiological descriptions from AP radiographs.

Methods:

Our study employed a two-stage methodology, where descriptions of single-curve scoliosis were analyzed by AI models following their evaluation by two independent neurosurgeons. Statistical analysis involved the Shapiro-Wilk test for normality, with non-normal distributions described using medians and interquartile ranges. Inter-rater reliability was assessed using Fleiss' kappa, and performance metrics, like accuracy, sensitivity, specificity, and F1 scores, were used to evaluate the AI systems' classification accuracy.

Results:

The analysis indicated that although some AI systems, like ChatGPT 4, Copilot, and PopAi, accurately reflected the recommended Cobb angle ranges for disease severity and treatment, others, such as Gemini and Claude, required further calibration. Particularly, PMC-LLaMA 13B expanded the classification range for moderate scoliosis, potentially influencing clinical decisions and delaying interventions.

Conclusions:

These findings highlight the need for the continuous refinement of AI models to enhance their clinical applicability.

Palabras clave

ChatGPT 4; PMC-LLaMA; artificial intelligence; clinical decision support systems; scoliosis

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: J Clin Med Año: 2024 Tipo del documento: Article País de afiliación: Polonia Pais de publicación: Suiza

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google