RESUMEN
OBJECTIVE: Digital breast tomosynthesis (DBT) is more accurate than full-field digital mammography alone but requires a longer reading time. A radiologist reader study evaluated the use of concurrent computer-aided detection (CAD) to shorten the reading time while maintaining interpretation performance. MATERIALS AND METHODS: A CAD system was developed to detect suspicious soft-tissue densities in DBT planes. Abnormalities are extracted from the plane in which they are detected and blended into the corresponding synthetic image. The study used an enriched sample of 240 DBT cases with 68 malignancies in 61 patients. Twenty radiologists retrospectively reviewed all 240 cases in a multireader multicase crossover design to compare reading time and performance with and without CAD. The performance of CAD alone was also evaluated. RESULTS: Reading time improved by 29.2% with CAD (95% CI, 21.1-36.5%; p < 0.01). Reader performance, measured by ROC AUC, was noninferior with CAD (p < 0.01). The mean AUC increased from 0.841 without to 0.850 with CAD (95% CI, -0.012 to 0.030). Mean sensitivity increased from 0.847 without to 0.871 with CAD (difference 95% CI, -0.005 to 0.055), showing a 0.033 increase in sensitivity for cases with soft-tissue densities (95% CI, -0.002 to 0.068). Mean specificity decreased from 0.527 without to 0.509 with CAD (difference 95% CI, -0.041 to 0.005), and mean recall rate for noncancers slightly increased from 0.474 without to 0.492 with CAD (difference 95% CI, -0.006 to 0.041). CONCLUSION: Concurrent use of CAD with DBT resulted in 29.2% faster reading time, while maintaining reader interpretation performance.
Asunto(s)
Neoplasias de la Mama/diagnóstico por imagen , Diagnóstico por Computador/métodos , Mamografía/métodos , Adulto , Anciano , Densidad de la Mama , Eficiencia , Femenino , Francia , Humanos , Persona de Mediana Edad , Variaciones Dependientes del Observador , Estudios Retrospectivos , Sensibilidad y Especificidad , Factores de Tiempo , Estados UnidosRESUMEN
OBJECTIVE. The purpose of this study was to assess the diagnostic performance of supplemental screening molecular breast imaging (MBI) in women with mammographically dense breasts after system modifications to permit radiation dose reduction. SUBJECTS AND METHODS. A total of 1651 asymptomatic women with mammographically dense breasts on prior mammography underwent screening mammography and adjunct MBI performed with 300-MBq (99m)Tc-sestamibi and a direct-conversion (cadmium zinc telluride) gamma camera, both interpreted independently. The cancer detection rate, sensitivity, specificity, and positive predictive value of biopsies performed (PPV3) were determined. RESULTS. In 1585 participants with a complete reference standard, 21 were diagnosed with cancer: two detected by mammography only, 14 by MBI only, three by both modalities, and two by neither. Of 14 participants with cancers detected only by MBI, 11 had invasive disease (median size, 0.9 cm; range, 0.5-4.1 cm). Nine of 11 (82%) were node negative, and two had bilateral cancers. With the addition of MBI to mammography, the overall cancer detection rate (per 1000 screened) increased from 3.2 to 12.0 (p < 0.001) (supplemental yield 8.8). The invasive cancer detection rate increased from 1.9 to 8.8 (p < 0.001) (supplemental yield 6.9), a relative increase of 363%, while the change in DCIS detection was not statistically significant (from 1.3 to 3.2, p =0.250). For mammography alone, sensitivity was 24%; specificity, 89%; and PPV3, 25%. For the combination, sensitivity was 91% (p < 0.001); specificity, 83% (p < 0.001); and PPV3, 28% (p = 0.70). The recall rate increased from 11.0% with mammography alone to 17.6% (p < 0.001) for the combination; the biopsy rate increased from 1.3% for mammography alone to 4.2% (p < 0.001). CONCLUSION. When added to screening mammography, MBI performed using a radiopharmaceutical activity acceptable for screening (effective dose 2.4 mSv) yielded a supplemental cancer detection rate of 8.8 per 1000 women with mammographically dense breasts.
Asunto(s)
Neoplasias de la Mama/diagnóstico por imagen , Detección Precoz del Cáncer/métodos , Mamografía/métodos , Imagen Molecular , Adulto , Anciano , Anciano de 80 o más Años , Femenino , Humanos , Persona de Mediana Edad , Estudios Prospectivos , Dosis de RadiaciónRESUMEN
OBJECTIVE: To evaluate the clinical value of combining one-view mammography (cranio-caudal, CC) with the complementary view tomosynthesis (mediolateral-oblique, MLO) in comparison to standard two-view mammography (MX) in terms of both lesion detection and characterization. METHODS: A free-response receiver operating characteristic (FROC) experiment was conducted independently by six breast radiologists, obtaining data from 463 breasts of 250 patients. Differences in mean lesion detection fraction (LDF) and mean lesion characterization fraction (LCF) were analysed by analysis of variance (ANOVA) to compare clinical performance of the combination of techniques to standard two-view digital mammography. RESULTS: The 463 cases (breasts) reviewed included 258 with one to three lesions each, and 205 with no lesions. The 258 cases with lesions included 77 cancers in 68 breasts and 271 benign lesions to give a total of 348 proven lesions. The combination, DBT(MLO)+MX(CC), was superior to MX (CC+MLO) in both lesion detection (LDF) and lesion characterization (LCF) overall and for benign lesions. DBT(MLO)+MX(CC) was non-inferior to two-view MX for malignant lesions. CONCLUSIONS: This study shows that readers' capabilities in detecting and characterizing breast lesions are improved by combining single-view digital breast tomosynthesis and single-view mammography compared to two-view digital mammography. KEY POINTS: ⢠Digital breast tomosynthesis is becoming adopted as an adjunct to mammography (MX) ⢠DBT (MLO) +MX (CC) is superior to MX (CC+MLO) in lesion detection (overall and benign lesions) ⢠DBT (MLO) +MX (CC) is non-inferior to MX (CC+MLO) in cancer detection ⢠DBT (MLO) +MX (CC) is superior to MX (CC+MLO) in lesion characterization (overall and benign lesions) ⢠DBT (MLO) +MX (CC) is non-inferior to MX (CC+MLO) in characterization of malignant lesions.
Asunto(s)
Neoplasias de la Mama/diagnóstico por imagen , Imagenología Tridimensional/métodos , Mamografía/métodos , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Adulto , Anciano , Anciano de 80 o más Años , Mama/patología , Reacciones Falso Positivas , Femenino , Humanos , Persona de Mediana Edad , Imagen Multimodal/métodos , Variaciones Dependientes del Observador , Curva ROC , Intensificación de Imagen Radiográfica/métodos , Reproducibilidad de los ResultadosRESUMEN
OBJECTIVE: The purpose of this study is to assess the "real-world" impact of an artificial intelligence (AI) tool designed to detect breast cancer in digital breast tomosynthesis (DBT) screening exams following 12 months of utilization in a subspecialized academic breast center. METHODS: Following IRB approval, mammography audit reports, as specified in the BI-RADS atlas, were retrospectively generated for five radiologists reading at three locations during a 12-month time frame. One location had the AI tool (iCAD ProFound AI v2.0), and the other two locations did not. The co-primary endpoints were cancer detection rate (CDR) and abnormal interpretation rate (AIR). Secondary endpoints included positive predictive values (PPVs) for cancer among screenings with abnormal interpretations (PPV1) and for biopsies performed (PPV3). Odds ratios (OR) with two-sided 95% confidence intervals (CIs) summarized the impact of AI across radiologists using generalized estimating equations. RESULTS: Nonsignificant differences were observed in CDR, AIR, and PPVs. The CDR was 7.3 with AI and 5.9 without AI (OR 1.3, 95% CI: 0.9-1.7). The AIR was 11.7% with AI and 11.8% without AI (OR 1.0, 95% CI: 0.8-1.3). The PPV1 was 6.2% with AI and 5.0% without AI (OR 1.3, 95% CI: 0.97-1.7). The PPV3 was 33.3% with AI and 32.0% without AI (OR 1.1, 95% CI: 0.8-1.5). CONCLUSION: Although we are unable to show statistically significant changes in CDR and AIR outcomes in the two groups, the results are consistent with prior reader studies. There is a nonsignificant trend toward improvement in CDR with AI, without significant increases in AIR.
Asunto(s)
Inteligencia Artificial , Neoplasias de la Mama , Humanos , Femenino , Estudios Retrospectivos , Detección Precoz del Cáncer/métodos , Mamografía/métodos , Neoplasias de la Mama/diagnóstico por imagenRESUMEN
INTRODUCTION: The purpose of this study was to compare the diagnostic accuracy of dual-energy contrast-enhanced digital mammography (CEDM) as an adjunct to mammography (MX) ± ultrasonography (US) with the diagnostic accuracy of MX ± US alone. METHODS: One hundred ten consenting women with 148 breast lesions (84 malignant, 64 benign) underwent two-view dual-energy CEDM in addition to MX and US using a specially modified digital mammography system (Senographe DS, GE Healthcare). Reference standard was histology for 138 lesions and follow-up for 12 lesions. Six radiologists from 4 institutions interpreted the images using high-resolution softcopy workstations. Confidence of presence (5-point scale), probability of cancer (7-point scale), and BI-RADS scores were evaluated for each finding. Sensitivity, specificity and ROC curve areas were estimated for each reader and overall. Visibility of findings on MX ± CEDM and MX ± US was evaluated with a Likert scale. RESULTS: The average per-lesion sensitivity across all readers was significantly higher for MX ± US ± CEDM than for MX ± US (0.78 vs. 0.71 using BIRADS, p = 0.006). All readers improved their clinical performance and the average area under the ROC curve was significantly superior for MX ± US ± CEDM than for MX ± US ((0.87 vs 0.83, p = 0.045). Finding visibility was similar or better on MX ± CEDM than MX ± US in 80% of cases. CONCLUSIONS: Dual-energy contrast-enhanced digital mammography as an adjunct to MX ± US improves diagnostic accuracy compared to MX ± US alone. Addition of iodinated contrast agent to MX facilitates the visualization of breast lesions.
Asunto(s)
Neoplasias de la Mama/diagnóstico por imagen , Mamografía/métodos , Ultrasonografía Mamaria/métodos , Medios de Contraste , Femenino , Humanos , Persona de Mediana Edad , Intensificación de Imagen RadiográficaRESUMEN
PURPOSE: To conduct post-hoc analysis of National CT Colonography Trial data and compare the sensitivity and specificity of computed tomographic (CT) colonography in participants younger than 65 years with those in participants aged 65 years and older. MATERIALS AND METHODS: Of 2600 asymptomatic participants recruited at 15 centers for the trial, 497 were 65 years of age or older. Approval of this HIPAA-compliant study was obtained from the institutional review board of each site, and informed consent was obtained from each subject. Radiologists certified in CT colonography reported lesions 5 mm in diameter or larger. Screening detection of large (≥10-mm) histologically confirmed colorectal neoplasia was the primary end point; screening detection of smaller (6-9-mm) colorectal neoplasia was a secondary end point. The differences in sensitivity and specificity of CT colonography in the two age cohorts (age < 65 years and age ≥ 65 years) were estimated with bootstrap confidence intervals (CIs). RESULTS: Complete data were available for 477 participants 65 years of age or older (among 2531 evaluable participants). Prevalence of adenomas 1 cm or larger for the older participants versus the younger participants was 6.9% (33 of 477) versus 3.7% (76 of 2054) (P < .004). For large neoplasms, mean estimates for CT colonography sensitivity and specificity among the older cohort were 0.82 (95% CI: 0.644, 0.944) and 0.83 (95% CI: 0.779, 0.883), respectively. For large neoplasms in the younger group, CT colonography sensitivity and specificity were 0.92 (95% CI: 0.837, 0.967) and 0.86 (95% CI: 0.816, 0.899), respectively. Per-polyp sensitivity for large neoplasms for the older and younger populations was 0.75 (95% CI: 0.578, 0.869) and 0.84 (95% CI: 0.717, 0.924), respectively. For the older and younger groups, per-participant sensitivity was 0.72 (95% CI: 0.565, 0.854) and 0.81 (95% CI: 0.745, 0.882) for detecting adenomas 6 mm in diameter or larger. CONCLUSION: For most measures of diagnostic performance and in most subsets, the difference between senior-aged participants and those younger than 65 years was not statistically significant.
Asunto(s)
Colonografía Tomográfica Computarizada , Neoplasias Colorrectales/diagnóstico por imagen , Factores de Edad , Anciano , Anciano de 80 o más Años , Ensayos Clínicos como Asunto , Neoplasias Colorrectales/epidemiología , Femenino , Humanos , Imagenología Tridimensional , Masculino , Tamizaje Masivo , Persona de Mediana Edad , Valor Predictivo de las Pruebas , Prevalencia , Sensibilidad y Especificidad , Estados Unidos/epidemiologíaRESUMEN
BACKGROUND: Computed tomographic (CT) colonography is a noninvasive option in screening for colorectal cancer. However, its accuracy as a screening tool in asymptomatic adults has not been well defined. METHODS: We recruited 2600 asymptomatic study participants, 50 years of age or older, at 15 study centers. CT colonographic images were acquired with the use of standard bowel preparation, stool and fluid tagging, mechanical insufflation, and multidetector-row CT scanners (with 16 or more rows). Radiologists trained in CT colonography reported all lesions measuring 5 mm or more in diameter. Optical colonoscopy and histologic review were performed according to established clinical protocols at each center and served as the reference standard. The primary end point was detection by CT colonography of histologically confirmed large adenomas and adenocarcinomas (10 mm in diameter or larger) that had been detected by colonoscopy; detection of smaller colorectal lesions (6 to 9 mm in diameter) was also evaluated. RESULTS: Complete data were available for 2531 participants (97%). For large adenomas and cancers, the mean (+/-SE) per-patient estimates of the sensitivity, specificity, positive and negative predictive values, and area under the receiver-operating-characteristic curve for CT colonography were 0.90+/-0.03, 0.86+/-0.02, 0.23+/-0.02, 0.99+/-<0.01, and 0.89+/-0.02, respectively. The sensitivity of 0.90 (i.e., 90%) indicates that CT colonography failed to detect a lesion measuring 10 mm or more in diameter in 10% of patients. The per-polyp sensitivity for large adenomas or cancers was 0.84+/-0.04. The per-patient sensitivity for detecting adenomas that were 6 mm or more in diameter was 0.78. CONCLUSIONS: In this study of asymptomatic adults, CT colonographic screening identified 90% of subjects with adenomas or cancers measuring 10 mm or more in diameter. These findings augment published data on the role of CT colonography in screening patients with an average risk of colorectal cancer. (ClinicalTrials.gov number, NCT00084929; American College of Radiology Imaging Network [ACRIN] number, 6664.)
Asunto(s)
Adenocarcinoma/diagnóstico por imagen , Adenoma/diagnóstico por imagen , Colonografía Tomográfica Computarizada , Neoplasias Colorrectales/diagnóstico por imagen , Adenocarcinoma/diagnóstico , Adenocarcinoma/patología , Adenoma/diagnóstico , Adenoma/patología , Anciano , Colon/diagnóstico por imagen , Pólipos del Colon/diagnóstico , Pólipos del Colon/diagnóstico por imagen , Pólipos del Colon/patología , Colonoscopía , Neoplasias Colorrectales/diagnóstico , Neoplasias Colorrectales/patología , Femenino , Humanos , Masculino , Persona de Mediana Edad , Valor Predictivo de las Pruebas , Curva ROC , Sensibilidad y EspecificidadRESUMEN
OBJECTIVE: To compare the clinical performance of digital breast tomosynthesis (DBT) with that of full-field digital mammography (FFDM) in a diagnostic population. METHODS: The study enrolled 200 consenting women who had at least one breast lesion discovered by mammography and/or ultrasound classified as doubtful or suspicious or probably malignant. They underwent tomosynthesis in one view [mediolateral oblique (MLO)] of both breasts at a dose comparable to that of standard screen-film mammography in two views [craniocaudal (CC) and MLO]. Images were rated by six breast radiologists using the BIRADS score. Ratings were compared with the truth established according to the standard of care and a multiple-reader multiple-case (MRMC) receiver-operating characteristic (ROC) analysis was performed. Clinical performance of DBT compared with that of FFDM was evaluated in terms of the difference between areas under ROC curves (AUCs) for BIRADS scores. RESULTS: Overall clinical performance with DBT and FFDM for malignant versus all other cases was not significantly different (AUCs 0.851 vs 0.836, p = 0.645). The lower limit of the 95% CI or the difference between DBT and FFDM AUCs was -4.9%. CONCLUSION: Clinical performance of tomosynthesis in one view at the same total dose as standard screen-film mammography is not inferior to digital mammography in two views.
Asunto(s)
Neoplasias de la Mama/diagnóstico , Mama/patología , Mamografía/métodos , Intensificación de Imagen Radiográfica/métodos , Neoplasias de la Mama/diagnóstico por imagen , Femenino , Humanos , Variaciones Dependientes del Observador , Curva ROC , Sensibilidad y EspecificidadRESUMEN
OBJECTIVE: The objective of this article is to describe the experience of the National CT Colonography Trial with radiologist training and qualification testing at CT colonography (CTC) and to correlate this experience with subsequent performance in a prospective screening study. SUBJECTS AND METHODS: Ten inexperienced radiologists participated in a 1-day educational course, during which partial CTC examinations of 27 cases with neoplasia and full CTC examinations of 15 cases were reviewed using primary 2D and 3D search. Subsequently 15 radiologists took a qualification examination composed of 20 CTC cases. Radiologists who did not pass the first qualification examination attended a second day of focused retraining of 30 cases, which was followed by a second qualification examination. The results of the initial and subsequent qualification tests were compared with reader performance in a large prospective screening trial. RESULTS: All radiologists took and passed the qualification examinations. Seven radiologists passed the qualification examination the first time it was offered, and eight radiologists passed after focused retraining. Significantly better sensitivities were obtained on the second versus the first examination for the retrained radiologists (difference = 16%, p < 0.001). There was no significant difference in sensitivities between the groups who passed the qualification examination the first time versus those who passed the second time in the prospective study (88% vs 92%, respectively; p = 0.612). In the prospective study, the odds of correctly identifying diseased cases increased by 1.5 fold for every 50-case increase in reader experience or formal training (p < 0.025). CONCLUSION: A significant difference in performance was observed among radiologists before formalized training, but testing and focused retraining improved radiologist performance, resulting in an overall high sensitivity across radiologists in a subsequent, prospective screening study.
Asunto(s)
Competencia Clínica , Pólipos del Colon/diagnóstico por imagen , Colonografía Tomográfica Computarizada/normas , Educación de Postgrado en Medicina , Radiología/educación , Errores Diagnósticos , Evaluación Educacional , Humanos , Imagenología Tridimensional , Modelos Logísticos , Estudios Prospectivos , Sensibilidad y Especificidad , Estadísticas no ParamétricasRESUMEN
PURPOSE: To evaluate the use of artificial intelligence (AI) to shorten digital breast tomosynthesis (DBT) reading time while maintaining or improving accuracy. MATERIALS AND METHODS: A deep learning AI system was developed to identify suspicious soft-tissue and calcified lesions in DBT images. A reader study compared the performance of 24 radiologists (13 of whom were breast subspecialists) reading 260 DBT examinations (including 65 cancer cases) both with and without AI. Readings occurred in two sessions separated by at least 4 weeks. Area under the receiver operating characteristic curve (AUC), reading time, sensitivity, specificity, and recall rate were evaluated with statistical methods for multireader, multicase studies. RESULTS: Radiologist performance for the detection of malignant lesions, measured by mean AUC, increased 0.057 with the use of AI (95% confidence interval [CI]: 0.028, 0.087; P < .01), from 0.795 without AI to 0.852 with AI. Reading time decreased 52.7% (95% CI: 41.8%, 61.5%; P < .01), from 64.1 seconds without to 30.4 seconds with AI. Sensitivity increased from 77.0% without AI to 85.0% with AI (8.0%; 95% CI: 2.6%, 13.4%; P < .01), specificity increased from 62.7% without to 69.6% with AI (6.9%; 95% CI: 3.0%, 10.8%; noninferiority P < .01), and recall rate for noncancers decreased from 38.0% without to 30.9% with AI (7.2%; 95% CI: 3.1%, 11.2%; noninferiority P < .01). CONCLUSION: The concurrent use of an accurate DBT AI system was found to improve cancer detection efficacy in a reader study that demonstrated increases in AUC, sensitivity, and specificity and a reduction in recall rate and reading time.© RSNA, 2019See also the commentary by Hsu and Hoyt in this issue.
RESUMEN
PURPOSE: Accurate target definition is considered essential for sophisticated, image-guided radiation therapy; however, relatively little information has been reported that measures our ability to identify the precise shape of targets accurately. We decided to assess the manner in which eight "experts" interpreted the size and shape of tumors based on "real-life" contrast-enhanced computed tomographic (CT) scans. METHODS AND MATERIALS: Four neuroradiologists and four radiation oncologists (the authors) with considerable experience and presumed expertise in treating head-and-neck tumors independently contoured, slice-by-slice, his/her interpretation of the precise gross tumor volume (GTV) on each of 20 sets of CT scans taken from 20 patients who previously were enrolled in Radiation Therapy Oncology Group protocol 91-11. RESULTS: The average proportion of overlap (i.e., the degree of agreement) was 0.532 (95% confidence interval 0.457 to 0.606). There was a slight tendency for the proportion of overlap to increase with increasing average GTV. CONCLUSIONS: Our work suggests that estimation of tumor shape currently is imprecise, even for experienced physicians. In consequence, there appears to be a practical limit to the current trend of smaller fields and tighter margins.
Asunto(s)
Carcinoma de Células Escamosas/diagnóstico por imagen , Neoplasias Laríngeas/diagnóstico por imagen , Variaciones Dependientes del Observador , Oncología por Radiación/normas , Tomografía Computarizada por Rayos X , Competencia Clínica , Femenino , Humanos , Masculino , Neurología/normasRESUMEN
PURPOSE: Evaluate concurrent Computer-Aided Detection (CAD) with Digital Breast Tomosynthesis (DBT) to determine impact on radiologist performance and reading time. MATERIALS AND METHODS: The CAD system detects and extracts suspicious masses, architectural distortions and asymmetries from DBT planes that are blended into corresponding synthetic images to form CAD-enhanced synthetic images. Review of CAD-enhanced images and navigation to corresponding planes to confirm or dismiss potential lesions allows radiologists to more quickly review DBT planes. A retrospective, crossover study with and without CAD was conducted with six radiologists who read an enriched sample of 80 DBT cases including 23 malignant lesions in 21 women. Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) compared the readings with and without CAD to determine the effect of CAD on overall interpretation performance. Sensitivity, specificity, recall rate and reading time were also assessed. Multi-reader, multi-case (MRMC) methods accounting for correlation and requiring correct lesion localization were used to analyze all endpoints. AUCs were based on a 0-100% probability of malignancy (POM) score. Sensitivity and specificity were based on BI-RADS scores, where 3 or higher was positive. RESULTS: Average AUC across readers without CAD was 0.854 (range: 0.785-0.891, 95% confidence interval (CI): 0.769,0.939) and 0.850 (range: 0.746-0.905, 95% CI: 0.751,0.949) with CAD (95% CI for difference: -0.046,0.039), demonstrating non-inferiority of AUC. Average reduction in reading time with CAD was 23.5% (95% CI: 7.0-37.0% improvement), from an average 48.2 (95% CI: 39.1,59.6) seconds without CAD to 39.1 (95% CI: 26.2,54.5) seconds with CAD. Per-patient sensitivity was the same with and without CAD (0.865; 95% CI for difference: -0.070,0.070), and there was a small 0.022 improvement (95% CI for difference: -0.046,0.089) in per-lesion sensitivity from 0.790 without CAD to 0.812 with CAD. A slight reduction in specificity with a -0.014 difference (95% CI for difference: -0.079,0.050) and a small 0.025 increase (95% CI for difference: -0.036,0.087) in recall rate in non-cancer cases were observed with CAD. CONCLUSIONS: Concurrent CAD resulted in faster reading time with non-inferiority of radiologist interpretation performance. Radiologist sensitivity, specificity and recall rate were similar with and without CAD.
Asunto(s)
Neoplasias de la Mama/diagnóstico por imagen , Carcinoma Ductal de Mama/diagnóstico por imagen , Mamografía/normas , Neoplasias de la Mama/patología , Diagnóstico por Computador/métodos , Diagnóstico por Computador/normas , Métodos Epidemiológicos , Femenino , Humanos , Mamografía/métodos , Persona de Mediana EdadRESUMEN
RATIONALE AND OBJECTIVES: This study aims to estimate observer performance for a range of dose levels for common computed tomography (CT) examinations (detection of liver metastases or pulmonary nodules, and cause of neurologic deficit) to prioritize noninferior dose levels for further analysis. MATERIALS AND METHODS: Using CT data from 131 examinations (abdominal CT, 44; chest CT, 44; head CT, 43), CT images corresponding to 4%-100% of the routine clinical dose were reconstructed with filtered back projection or iterative reconstruction. Radiologists evaluated CT images, marking specified targets, providing confidence scores, and grading image quality. Noninferiority was assessed using reference standards, reader agreement rules, and jackknife alternative free-response receiver operating characteristic figures of merit. Reader agreement required that a majority of readers at lower dose identify target lesions seen by the majority of readers at routine dose. RESULTS: Reader agreement identified dose levels lower than 50% and 4% to have inadequate performance for detection of hepatic metastases and pulmonary nodules, respectively, but could not exclude any low dose levels for head CT. Estimated differences in jackknife alternative free-response receiver operating characteristic figures of merit between routine and lower dose configurations found that only the lowest dose configurations tested (ie, 30%, 4%, and 10% of routine dose levels for abdominal, chest, and head CT examinations, respectively) did not meet criteria for noninferiority. At lower doses, subjective image quality declined before observer performance. Iterative reconstruction was only beneficial when filtered back projection did not result in noninferior performance. CONCLUSION: Opportunity exists for substantial radiation dose reduction using existing CT technology for common diagnostic tasks.
Asunto(s)
Neoplasias Hepáticas/diagnóstico por imagen , Nódulos Pulmonares Múltiples/diagnóstico por imagen , Dosis de Radiación , Tomografía Computarizada por Rayos X/métodos , Femenino , Humanos , Masculino , Variaciones Dependientes del Observador , Curva ROC , Interpretación de Imagen Radiográfica Asistida por Computador/métodosRESUMEN
The development and implementation of quantitative imaging biomarkers has been hampered by the inconsistent and often incorrect use of terminology related to these markers. Sponsored by the Radiological Society of North America, an interdisciplinary group of radiologists, statisticians, physicists, and other researchers worked to develop a comprehensive terminology to serve as a foundation for quantitative imaging biomarker claims. Where possible, this working group adapted existing definitions derived from national or international standards bodies rather than invent new definitions for these terms. This terminology also serves as a foundation for the design of studies that evaluate the technical performance of quantitative imaging biomarkers and for studies of algorithms that generate the quantitative imaging biomarkers from clinical scans. This paper provides examples of research studies and quantitative imaging biomarker claims that use terminology consistent with these definitions as well as examples of the rampant confusion in this emerging field. We provide recommendations for appropriate use of quantitative imaging biomarker terminological concepts. It is hoped that this document will assist researchers and regulatory reviewers who examine quantitative imaging biomarkers and will also inform regulatory guidance. More consistent and correct use of terminology could advance regulatory science, improve clinical research, and provide better care for patients who undergo imaging studies.
Asunto(s)
Biomarcadores , Diagnóstico por Imagen , Estadística como Asunto , Terminología como Asunto , HumanosRESUMEN
Quantitative biomarkers from medical images are becoming important tools for clinical diagnosis, staging, monitoring, treatment planning, and development of new therapies. While there is a rich history of the development of quantitative imaging biomarker (QIB) techniques, little attention has been paid to the validation and comparison of the computer algorithms that implement the QIB measurements. In this paper we provide a framework for QIB algorithm comparisons. We first review and compare various study designs, including designs with the true value (e.g. phantoms, digital reference images, and zero-change studies), designs with a reference standard (e.g. studies testing equivalence with a reference standard), and designs without a reference standard (e.g. agreement studies and studies of algorithm precision). The statistical methods for comparing QIB algorithms are then presented for various study types using both aggregate and disaggregate approaches. We propose a series of steps for establishing the performance of a QIB algorithm, identify limitations in the current statistical literature, and suggest future directions for research.
Asunto(s)
Algoritmos , Biomarcadores , Diagnóstico por Imagen , Proyectos de Investigación , Estadística como Asunto , Sesgo , Simulación por Computador , Humanos , Fantasmas de Imagen , Estándares de Referencia , Reproducibilidad de los ResultadosRESUMEN
Videodensitometric analysis of myocardial contrast echocardiography is traditionally performed off line. Recently, an online contrast ultrasound analysis system, Acoustic Densitometry (Hewlett-Packard), was introduced. We compared pixel intensities acquired with Acoustic Densitometry to pixel intensities derived from videodensitometry. A tissue phantom was imaged in phase I using three transducer frequencies (2.5, 3.5, and 5.0 MHz). In phase II, an in vitro flowing tube model with various concentrations of Albunex(R) was imaged at two flow rates, 0.6 and 1.2 m/sec, and at two transducer frequencies, 2.5 and 3.5 MHz. The relationship between pixel intensities yielded by the two systems for identical ultrasound signals was determined with linear regression. Intensities derived with Acoustic Densitometry strongly correlated with those derived from the offline videodensitometry system. The intensities were related by a predictive multiplicative factor based on display characteristics of the two systems. These results suggest that semiquantitative, online perfusion analysis with Acoustic Densitometry is as sensitive as analysis offline with videodensitometry. (ECHOCARDIOGRAPHY, Volume 13, September 1996)
RESUMEN
PURPOSE: The purpose of this study was to investigate the correlation between model observer and human observer performance in CT imaging for the task of lesion detection and localization when the lesion location is uncertain. METHODS: Two cylindrical rods (3-mm and 5-mm diameters) were placed in a 35×26 cm torso-shaped water phantom to simulate lesions with -15 HU contrast at 120 kV. The phantom was scanned 100 times on a 128-slice CT scanner at each of four dose levels (CTDIvol=5.7, 11.4, 17.1, and 22.8 mGy). Regions of interest (ROIs) around each lesion were extracted to generate images with signal-present, with each ROI containing 128×128 pixels. Corresponding ROIs of signal-absent images were generated from images without lesion mimicking rods. The location of the lesion (rod) in each ROI was randomly distributed by moving the ROIs around each lesion. Human observer studies were performed by having three trained observers identify the presence or absence of lesions, indicating the lesion location in each image and scoring confidence for the detection task on a 6-point scale. The same image data were analyzed using a channelized Hotelling model observer (CHO) with Gabor channels. Internal noise was added to the decision variables for the model observer study. Area under the curve (AUC) of ROC and localization ROC (LROC) curves were calculated using a nonparametric approach. The Spearman's rank order correlation between the average performance of the human observers and the model observer performance was calculated for the AUC of both ROC and LROC curves for both the 3- and 5-mm diameter lesions. RESULTS: In both ROC and LROC analyses, AUC values for the model observer agreed well with the average values across the three human observers. The Spearman's rank order correlation values for both ROC and LROC analyses for both the 3- and 5-mm diameter lesions were all 1.0, indicating perfect rank ordering agreement of the figures of merit (AUC) between the average performance of the human observers and the model observer performance. CONCLUSIONS: In CT imaging of different sizes of low-contrast lesions (-15 HU), the performance of CHO with Gabor channels was highly correlated with human observer performance for the detection and localization tasks with uncertain lesion location in CT imaging at four clinically relevant dose levels. This suggests the ability of Gabor CHO model observers to meaningfully assess CT image quality for the purpose of optimizing scan protocols and radiation dose levels in detection and localization tasks for low-contrast lesions.
Asunto(s)
Procesamiento de Imagen Asistido por Computador/métodos , Neoplasias/diagnóstico por imagen , Tomografía Computarizada por Rayos X/métodos , Incertidumbre , Variaciones Dependientes del Observador , Fantasmas de ImagenRESUMEN
RATIONALE AND OBJECTIVES: The purpose of this study was to assess the performance of a MicroDose photon-counting full-field digital mammography (PCM) system in comparison to full-field digital mammography (FFDM) for area under the receiver-operating characteristic (ROC) curve (AUC), sensitivity, specificity, and feature analysis of standard-view mammography for women presenting for screening mammography, diagnostic mammography, or breast biopsy. MATERIALS AND METHODS: A total of 133 women were enrolled in this study at two European medical centers, with 67 women who had a pre-existing 10-36 months FFDM enrolled prospectively into the study and 66 women who underwent breast biopsy and had screening PCM and diagnostic FFDM, including standard craniocaudal and mediolateral oblique views of the breast with the lesion, enrolled retrospectively. The case mix consisted of 49 cancers, 17 biopsy-benign cases, and 67 normal cases. Sixteen radiologists participated in the reader study and interpreted all 133 cases in both conditions, separated by washout period of ≥4 weeks. ROC curve and free-response ROC curve analyses were performed for noninferiority of PCM compared to FFDM using a noninferiority margin Δ value of 0.10. Feature analysis of the 66 cases with lesions was conducted with all 16 readers at the conclusion of the blinded reads. Mean glandular dose was recorded for all cases. RESULTS: The AUC for PCM was 0.947 (95% confidence interval [CI], 0.920-0.974) and for FFDM was 0.931 (95% CI, 0.898-0.964). Sensitivity per case for PCM was 0.936 (95% CI, 0.897-0.976) and for FFDM was 0.908 (95% CI, 0.856-0.960). Specificity per case for PCM was 0.764 (95% CI, 0.688-0.841) and for FFDM was 0.749 (95% CI, 0.668-0.830). Free-response ROC curve figures of merit were 0.920 (95% CI, 0.881-0.959) and 0.903 (95% CI, 0.858-0.948) for PCM and FFDM, respectively. Sensitivity per lesion was 0.903 (95% CI, 0.846-0.960) and 0.883 (95% CI, 0.823-0.944) for PCM and FFDM, respectively. The average false-positive marks per image of noncancer cases were 0.265 (95% CI, 0.171-0.359) and 0.281 (95% CI, 0.188-0.374) for PCM and FFDM, respectively. Noninferiority P values for AUC, sensitivity (per case and per lesion), specificity, and average false-positive marks per image were all statistically significant (P < .001). The noninferiority P value for free-response ROC was <.025, from the 95% CI for the difference. Feature analysis resulted in PCM being preferred to FFDM by the readers for ≥70% of the cases. The average mean glandular dose for PCM was 0.74 mGy (95% CI, 0.722-0.759 mGy) and for FFDM was 1.23 mGy (95% CI, 1.199-1.262 mGy). CONCLUSIONS: In this study, radiologist performance with PCM was not inferior to that with conventional FFDM at an average 40% lower mean glandular dose.
Asunto(s)
Neoplasias de la Mama/diagnóstico por imagen , Mamografía/métodos , Tamizaje Masivo/métodos , Fotometría/métodos , Protección Radiológica/métodos , Intensificación de Imagen Radiográfica/métodos , Adulto , Anciano , Anciano de 80 o más Años , Europa (Continente) , Femenino , Humanos , Persona de Mediana Edad , Variaciones Dependientes del Observador , Fotones , Dosis de Radiación , Reproducibilidad de los Resultados , Sensibilidad y EspecificidadRESUMEN
This report summarizes the Joint FDA-MIPS Workshop on Methods for the Evaluation of Imaging and Computer-Assist Devices. The purpose of the workshop was to gather information on the current state of the science and facilitate consensus development on statistical methods and study designs for the evaluation of imaging devices to support US Food and Drug Administration submissions. Additionally, participants expected to identify gaps in knowledge and unmet needs that should be addressed in future research. This summary is intended to document the topics that were discussed at the meeting and disseminate the lessons that have been learned through past studies of imaging and computer-aided detection and diagnosis device performance.
Asunto(s)
Aprobación de Recursos , Interpretación de Imagen Asistida por Computador/instrumentación , Interpretación de Imagen Asistida por Computador/normas , Evaluación de la Tecnología Biomédica/normas , Evaluación de la Tecnología Biomédica/tendencias , Estados Unidos , United States Food and Drug AdministrationRESUMEN
OBJECTIVE: To demonstrate the safety and effectiveness of MelaFind, a noninvasive and objective computer-vision system designed to aid in detection of early pigmented cutaneous melanoma. DESIGN: A prospective, multicenter, blinded study. The diagnostic performance of MelaFind and of study clinicians was evaluated using the histologic reference standard. Standard images and patient information for a subset of 50 randomly selected lesions (25 melanomas) were used in a reader study of 39 independent dermatologists to estimate clinicians' biopsy sensitivity to melanoma. SETTING: Three academic and 4 community practices in the United States with expertise in management of pigmented skin lesions. PATIENTS: A total of 1383 patients with 1831 lesions enrolled from January 2007 to July 2008; 1632 lesions (including 127 melanomas-45% in situ-with median Breslow thickness of invasive lesions, 0.36 mm) were eligible and evaluable for the study end points. MAIN OUTCOME MEASURES: Sensitivity of MelaFind; specificities and biopsy ratios for MelaFind and the study investigators; and biopsy sensitivities of independent dermatologists in the reader study. RESULTS: The measured sensitivity of MelaFind was 98.4% (125 of 127 melanomas) with a 95% lower confidence bound at 95.6% and a biopsy ratio of 10.8:1; the average biopsy sensitivity of dermatologists was 78% in the reader study. Including borderline lesions (high-grade dysplastic nevi, atypical melanocytic proliferations, or hyperplasias), MelaFind's sensitivity was 98.3% (172 of 175), with a biopsy ratio of 7.6:1. On lesions biopsied mostly to rule out melanoma, MelaFind's average specificity (9.9%) was superior to that of clinicians (3.7%) (P=.02). CONCLUSION: MelaFind is a safe and effective tool to assist in the evaluation of pigmented skin lesions. TRIAL REGISTRATION: clinicaltrials.gov Identifier: NCT00434057.