RESUMEN
BACKGROUND: The surge in emergency head CT imaging and artificial intelligence (AI) advancements, especially deep learning (DL) and convolutional neural networks (CNN), have accelerated the development of computer-aided diagnosis (CADx) for emergency imaging. External validation assesses model generalizability, providing preliminary evidence of clinical potential. OBJECTIVES: This study systematically reviews externally validated CNN-CADx models for emergency head CT scans, critically appraises diagnostic test accuracy (DTA), and assesses adherence to reporting guidelines. METHODS: Studies comparing CNN-CADx model performance to reference standard were eligible. The review was registered in PROSPERO (CRD42023411641) and conducted on Medline, Embase, EBM-Reviews and Web of Science following PRISMA-DTA guideline. DTA reporting were systematically extracted and appraised using standardised checklists (STARD, CHARMS, CLAIM, TRIPOD, PROBAST, QUADAS-2). RESULTS: Six of 5636 identified studies were eligible. The common target condition was intracranial haemorrhage (ICH), and intended workflow roles auxiliary to experts. Due to methodological and clinical between-study variation, meta-analysis was inappropriate. The scan-level sensitivity exceeded 90 % in 5/6 studies, while specificities ranged from 58,0-97,7 %. The SROC 95 % predictive region was markedly broader than the confidence region, ranging above 50 % sensitivity and 20 % specificity. All studies had unclear or high risk of bias and concern for applicability (QUADAS-2, PROBAST), and reporting adherence was below 50 % in 20 of 32 TRIPOD items. CONCLUSION: 0.01 % of identified studies met the eligibility criteria. The evidence on the DTA of CNN-CADx models for emergency head CT scans remains limited in the scope of this review, as the reviewed studies were scarce, inapt for meta-analysis and undermined by inadequate methodological conduct and reporting. Properly conducted, external validation remains preliminary for evaluating the clinical potential of AI-CADx models, but prospective and pragmatic clinical validation in comparative trials remains most crucial. In conclusion, future AI-CADx research processes should be methodologically standardized and reported in a clinically meaningful way to avoid research waste.