Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Emerg Radiol ; 2024 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-39034382

RESUMEN

PURPOSE: To evaluate whether a commercial AI tool for intracranial hemorrhage (ICH) detection on head CT exhibited sociodemographic biases. METHODS: Our retrospective study reviewed 9736 consecutive, adult non-contrast head CT scans performed between November 2021 and February 2022 in a single healthcare system. Each CT scan was evaluated by a commercial ICH AI tool and a board-certified neuroradiologist; ground truth was defined as final radiologist determination of ICH presence/absence. After evaluating the AI tool's aggregate diagnostic performance, sub-analyses based on sociodemographic groups (age, sex, race, ethnicity, insurance status, and Area of Deprivation Index [ADI] scores) assessed for biases. χ2 test or Fisher's exact tests evaluated for statistical significance with p ≤ 0.05. RESULTS: Our patient population was 50% female (mean age 60 ± 19 years). The AI tool had an aggregate accuracy of 93% [9060/9736], sensitivity of 85% [1140/1338], specificity of 94% [7920/ 8398], positive predictive value (PPV) of 71% [1140/1618] and negative predictive value (NPV) of 98% [7920/8118]. Sociodemographic biases were identified, including lower PPV for patients who were females (67.3% [62,441/656] vs. 72.7% [699/962], p = 0.02), Black (66.7% [454/681] vs. 73.2% [686/937], p = 0.005), non-Hispanic/non-Latino (69.7% [1038/1490] vs. 95.4% [417/437]), p = 0.009), and who had Medicaid/Medicare (69.9% [754/1078]) or Private (66.5% [228/343]) primary insurance (p = 0.003). Lower sensitivity was seen for patients in the third quartile of national (78.8% [241/306], p = 0.001) and state ADI scores (79.0% [22/287], p = 0.001). CONCLUSIONS: In our healthcare system, a commercial AI tool had lower performance for ICH detection than previously reported and demonstrated several sociodemographic biases.

2.
Radiol Artif Intell ; 6(3): e230240, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38477660

RESUMEN

Purpose To evaluate the robustness of an award-winning bone age deep learning (DL) model to extensive variations in image appearance. Materials and Methods In December 2021, the DL bone age model that won the 2017 RSNA Pediatric Bone Age Challenge was retrospectively evaluated using the RSNA validation set (1425 pediatric hand radiographs; internal test set in this study) and the Digital Hand Atlas (DHA) (1202 pediatric hand radiographs; external test set). Each test image underwent seven types of transformations (rotations, flips, brightness, contrast, inversion, laterality marker, and resolution) to represent a range of image appearances, many of which simulate real-world variations. Computational "stress tests" were performed by comparing the model's predictions on baseline and transformed images. Mean absolute differences (MADs) of predicted bone ages compared with radiologist-determined ground truth on baseline versus transformed images were compared using Wilcoxon signed rank tests. The proportion of clinically significant errors (CSEs) was compared using McNemar tests. Results There was no evidence of a difference in MAD of the model on the two baseline test sets (RSNA = 6.8 months, DHA = 6.9 months; P = .05), indicating good model generalization to external data. Except for the RSNA dataset images with an appended radiologic laterality marker (P = .86), there were significant differences in MAD for both the DHA and RSNA datasets among other transformation groups (rotations, flips, brightness, contrast, inversion, and resolution). There were significant differences in proportion of CSEs for 57% of the image transformations (19 of 33) performed on the DHA dataset. Conclusion Although an award-winning pediatric bone age DL model generalized well to curated external images, it had inconsistent predictions on images that had undergone simple transformations reflective of several real-world variations in image appearance. Keywords: Pediatrics, Hand, Convolutional Neural Network, Radiography Supplemental material is available for this article. © RSNA, 2024 See also commentary by Faghani and Erickson in this issue.


Asunto(s)
Determinación de la Edad por el Esqueleto , Aprendizaje Profundo , Niño , Humanos , Algoritmos , Redes Neurales de la Computación , Radiografía , Estudios Retrospectivos , Determinación de la Edad por el Esqueleto/métodos
3.
Radiology ; 306(2): e220505, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-36165796

RESUMEN

Background Although deep learning (DL) models have demonstrated expert-level ability for pediatric bone age prediction, they have shown poor generalizability and bias in other use cases. Purpose To quantify generalizability and bias in a bone age DL model measured by performance on external versus internal test sets and performance differences between different demographic groups, respectively. Materials and Methods The winning DL model of the 2017 RSNA Pediatric Bone Age Challenge was retrospectively evaluated and trained on 12 611 pediatric hand radiographs from two U.S. hospitals. The DL model was tested from September 2021 to December 2021 on an internal validation set and an external test set of pediatric hand radiographs with diverse demographic representation. Images reporting ground-truth bone age were included for study. Mean absolute difference (MAD) between ground-truth bone age and the model prediction bone age was calculated for each set. Generalizability was evaluated by comparing MAD between internal and external evaluation sets with use of t tests. Bias was evaluated by comparing MAD and clinically significant error rate (rate of errors changing the clinical diagnosis) between demographic groups with use of t tests or analysis of variance and χ2 tests, respectively (statistically significant difference defined as P < .05). Results The internal validation set had images from 1425 individuals (773 boys), and the external test set had images from 1202 individuals (mean age, 133 months ± 60 [SD]; 614 boys). The bone age model generalized well to the external test set, with no difference in MAD (6.8 months in the validation set vs 6.9 months in the external set; P = .64). Model predictions would have led to clinically significant errors in 194 of 1202 images (16%) in the external test set. The MAD was greater for girls than boys in the internal validation set (P = .01) and in the subcategories of age and Tanner stage in the external test set (P < .001 for both). Conclusion A deep learning (DL) bone age model generalized well to an external test set, although clinically significant sex-, age-, and sexual maturity-based biases in DL bone age were identified. © RSNA, 2022 Online supplemental material is available for this article See also the editorial by Larson in this issue.


Asunto(s)
Aprendizaje Profundo , Masculino , Femenino , Humanos , Niño , Lactante , Estudios Retrospectivos , Radiografía
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...