Range of Radiologist Performance in a Population-based Screening Cohort of 1 Million Digital Mammography Examinations.
Radiology
; 297(1): 33-39, 2020 10.
Article
en En
| MEDLINE
| ID: mdl-32720866
Background There is great interest in developing artificial intelligence (AI)-based computer-aided detection (CAD) systems for use in screening mammography. Comparative performance benchmarks from true screening cohorts are needed. Purpose To determine the range of human first-reader performance measures within a population-based screening cohort of 1 million screening mammograms to gauge the performance of emerging AI CAD systems. Materials and Methods This retrospective study consisted of all screening mammograms in women aged 40-74 years in Stockholm County, Sweden, who underwent screening with full-field digital mammography between 2008 and 2015. There were 110 interpreting radiologists, of whom 24 were defined as high-volume readers (ie, those who interpreted more than 5000 annual screening mammograms). A true-positive finding was defined as the presence of a pathology-confirmed cancer within 12 months. Performance benchmarks included sensitivity and specificity, examined per quartile of radiologists' performance. First-reader sensitivity was determined for each tumor subgroup, overall and by quartile of high-volume reader sensitivity. Screening outcomes were examined based on the first reader's sensitivity quartile with 10 000 screening mammograms per quartile. Linear regression models were fitted to test for a linear trend across quartiles of performance. Results A total of 418 041 women (mean age, 54 years ± 10 [standard deviation]) were included, and 1 186 045 digital mammograms were evaluated, with 972 899 assessed by high-volume readers. Overall sensitivity was 73% (95% confidence interval [CI]: 69%, 77%), and overall specificity was 96% (95% CI: 95%, 97%). The mean values per quartile of high-volume reader performance ranged from 63% to 84% for sensitivity and from 95% to 98% for specificity. The sensitivity difference was very large for basal cancers, with the least sensitive and most sensitive high-volume readers detecting 53% and 89% of cancers, respectively (P < .001). Conclusion Benchmarks showed a wide range of performance differences between high-volume readers. Sensitivity varied by tumor characteristics. © RSNA, 2020 Online supplemental material is available for this article.
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Contexto en salud:
2_ODS3
Problema de salud:
2_muertes_prematuras_enfermedades_notrasmisibles
Asunto principal:
Neoplasias de la Mama
/
Inteligencia Artificial
/
Competencia Clínica
Tipo de estudio:
Diagnostic_studies
/
Observational_studies
/
Prognostic_studies
/
Screening_studies
Límite:
Adult
/
Aged
/
Female
/
Humans
/
Middle aged
País/Región como asunto:
Europa
Idioma:
En
Revista:
Radiology
Año:
2020
Tipo del documento:
Article