Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 33
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Radiology ; 282(1): 236-250, 2017 Jan.
Article in English | MEDLINE | ID: mdl-27439324

ABSTRACT

Purpose To conduct a multi-institutional, multireader study to compare the performance of digital tomosynthesis, dual-energy (DE) imaging, and conventional chest radiography for pulmonary nodule detection and management. Materials and Methods In this binational, institutional review board-approved, HIPAA-compliant prospective study, 158 subjects (43 subjects with normal findings) were enrolled at four institutions. Informed consent was obtained prior to enrollment. Subjects underwent chest computed tomography (CT) and imaging with conventional chest radiography (posteroanterior and lateral), DE imaging, and tomosynthesis with a flat-panel imaging device. Three experienced thoracic radiologists identified true locations of nodules (n = 516, 3-20-mm diameters) with CT and recommended case management by using Fleischner Society guidelines. Five other radiologists marked nodules and indicated case management by using images from conventional chest radiography, conventional chest radiography plus DE imaging, tomosynthesis, and tomosynthesis plus DE imaging. Sensitivity, specificity, and overall accuracy were measured by using the free-response receiver operating characteristic method and the receiver operating characteristic method for nodule detection and case management, respectively. Results were further analyzed according to nodule diameter categories (3-4 mm, >4 mm to 6 mm, >6 mm to 8 mm, and >8 mm to 20 mm). Results Maximum lesion localization fraction was higher for tomosynthesis than for conventional chest radiography in all nodule size categories (3.55-fold for all nodules, P < .001; 95% confidence interval [CI]: 2.96, 4.15). Case-level sensitivity was higher with tomosynthesis than with conventional chest radiography for all nodules (1.49-fold, P < .001; 95% CI: 1.25, 1.73). Case management decisions showed better overall accuracy with tomosynthesis than with conventional chest radiography, as given by the area under the receiver operating characteristic curve (1.23-fold, P < .001; 95% CI: 1.15, 1.32). There were no differences in any specificity measures. DE imaging did not significantly affect nodule detection when paired with either conventional chest radiography or tomosynthesis. Conclusion Tomosynthesis outperformed conventional chest radiography for lung nodule detection and determination of case management; DE imaging did not show significant differences over conventional chest radiography or tomosynthesis alone. These findings indicate performance likely achievable with a range of reader expertise. © RSNA, 2016 Online supplemental material is available for this article.


Subject(s)
Multiple Pulmonary Nodules/diagnostic imaging , Multiple Pulmonary Nodules/therapy , Radiographic Image Enhancement/methods , Radiography, Dual-Energy Scanned Projection , Radiography, Thoracic , Adult , Aged , Case-Control Studies , Female , Humans , Male , Middle Aged , Sensitivity and Specificity , Sweden , Tomography, X-Ray Computed , United States , X-Ray Intensifying Screens
2.
Eur Radiol ; 26(3): 874-83, 2016 Mar.
Article in English | MEDLINE | ID: mdl-26105023

ABSTRACT

OBJECTIVE: To compare the performance of different types of detectors in breast cancer detection. METHODS: A mammography image set containing subtle malignant non-calcification lesions, biopsy-proven benign lesions, simulated malignant calcification clusters and normals was acquired using amorphous-selenium (a-Se) detectors. The images were adapted to simulate four types of detectors at the same radiation dose: digital radiography (DR) detectors with a-Se and caesium iodide (CsI) convertors, and computed radiography (CR) detectors with a powder phosphor (PIP) and a needle phosphor (NIP). Seven observers marked suspicious and benign lesions. Analysis was undertaken using jackknife alternative free-response receiver operating characteristics weighted figure of merit (FoM). The cancer detection fraction (CDF) was estimated for a representative image set from screening. RESULTS: No significant differences in the FoMs between the DR detectors were measured. For calcification clusters and non-calcification lesions, both CR detectors' FoMs were significantly lower than for DR detectors. The calcification cluster's FoM for CR NIP was significantly better than for CR PIP. The estimated CDFs with CR PIP and CR NIP detectors were up to 15% and 22% lower, respectively, than for DR detectors. CONCLUSION: Cancer detection is affected by detector type, and the use of CR in mammography should be reconsidered. KEY POINTS: The type of mammography detector can affect the cancer detection rates. CR detectors performed worse than DR detectors in mammography. Needle phosphor CR performed better than powder phosphor CR. Calcification clusters detection is more sensitive to detector type than other cancers.


Subject(s)
Breast Neoplasms/diagnostic imaging , Calcinosis/diagnostic imaging , Mammography/instrumentation , Aged , Early Detection of Cancer/instrumentation , Early Detection of Cancer/methods , Female , Humans , Mammography/methods , Mass Screening/instrumentation , Mass Screening/methods , Middle Aged , Needles , Observer Variation , ROC Curve , Radiographic Image Enhancement/methods
3.
AJR Am J Roentgenol ; 203(2): 387-93, 2014 Aug.
Article in English | MEDLINE | ID: mdl-25055275

ABSTRACT

OBJECTIVE. The objective of our study was to investigate the effect of image processing on the detection of cancers in digital mammography images. MATERIALS AND METHODS. Two hundred seventy pairs of breast images (both breasts, one view) were collected from eight systems using Hologic amorphous selenium detectors: 80 image pairs showed breasts containing subtle malignant masses; 30 image pairs, biopsy-proven benign lesions; 80 image pairs, simulated calcification clusters; and 80 image pairs, no cancer (normal). The 270 image pairs were processed with three types of image processing: standard (full enhancement), low contrast (intermediate enhancement), and pseudo-film-screen (no enhancement). Seven experienced observers inspected the images, locating and rating regions they suspected to be cancer for likelihood of malignancy. The results were analyzed using a jackknife-alternative free-response receiver operating characteristic (JAFROC) analysis. RESULTS. The detection of calcification clusters was significantly affected by the type of image processing: The JAFROC figure of merit (FOM) decreased from 0.65 with standard image processing to 0.63 with low-contrast image processing (p = 0.04) and from 0.65 with standard image processing to 0.61 with film-screen image processing (p = 0.0005). The detection of noncalcification cancers was not significantly different among the image-processing types investigated (p > 0.40). CONCLUSION. These results suggest that image processing has a significant impact on the detection of calcification clusters in digital mammography. For the three image-processing versions and the system investigated, standard image processing was optimal for the detection of calcification clusters. The effect on cancer detection should be considered when selecting the type of image processing in the future.


Subject(s)
Breast Neoplasms/diagnostic imaging , Calcinosis/diagnostic imaging , Mammography/methods , Radiographic Image Enhancement/methods , Radiographic Image Interpretation, Computer-Assisted/methods , Aged , Biopsy , Female , Humans , Middle Aged , United Kingdom
4.
Radiology ; 268(1): 46-53, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23481165

ABSTRACT

PURPOSE: To establish the extent to which test set reading can represent actual clinical reporting in screening mammography. MATERIALS AND METHODS: Institutional ethics approval was granted, and informed consent was obtained from each participating screen reader. The need for informed consent with respect to the use of patient materials was waived. Two hundred mammographic examinations were selected from examinations reported by 10 individual expert screen readers, resulting in 10 reader-specific test sets. Data generated from actual clinical reports were compared with three test set conditions: clinical test set reading with prior images, laboratory test set reading with prior images, and laboratory test set reading without prior images. A further set of five expert screen readers was asked to interpret a common set of images in two identical test set conditions to establish a baseline for intraobserver variability. Confidence scores (from 1 to 4) were assigned to the respective decisions made by readers. Region-of-interest (ROI) figures of merit (FOMs) and side-specific sensitivity and specificity were described for the actual clinical reporting of each reader-specific test set and were compared with those for the three test set conditions. Agreement between pairs of readings was performed by using the Kendall coefficient of concordance. RESULTS: Moderate or acceptable levels of agreement were evident (W = 0.69-0.73, P < .01) when describing group performance between actual clinical reporting and test set conditions that were reasonably close to the established baseline (W = 0.77, P < .01) and were lowest when prior images were excluded. Higher median values for ROI FOMs were demonstrated for the test set conditions than for the actual clinical reporting values; this was possibly linked to changes in sensitivity. CONCLUSION: Reasonable levels of agreement between actual clinical reporting and test set conditions can be achieved, although inflated sensitivity may be evident with test set conditions.


Subject(s)
Breast Neoplasms/diagnostic imaging , Mammography , Professional Competence , Decision Making , Diagnosis, Differential , Female , Humans , Observer Variation , Reproducibility of Results , Sensitivity and Specificity
5.
Med Phys ; 39(6): 3202-13, 2012 Jun.
Article in English | MEDLINE | ID: mdl-22755704

ABSTRACT

PURPOSE: This study aims to investigate if microcalcification detection varies significantly when mammographic images are acquired using different image qualities, including: different detectors, dose levels, and different image processing algorithms. An additional aim was to determine how the standard European method of measuring image quality using threshold gold thickness measured with a CDMAM phantom and the associated limits in current EU guidelines relate to calcification detection. METHODS: One hundred and sixty two normal breast images were acquired on an amorphous selenium direct digital (DR) system. Microcalcification clusters extracted from magnified images of slices of mastectomies were electronically inserted into half of the images. The calcification clusters had a subtle appearance. All images were adjusted using a validated mathematical method to simulate the appearance of images from a computed radiography (CR) imaging system at the same dose, from both systems at half this dose, and from the DR system at quarter this dose. The original 162 images were processed with both Hologic and Agfa (Musica-2) image processing. All other image qualities were processed with Agfa (Musica-2) image processing only. Seven experienced observers marked and rated any identified suspicious regions. Free response operating characteristic (FROC) and ROC analyses were performed on the data. The lesion sensitivity at a nonlesion localization fraction (NLF) of 0.1 was also calculated. Images of the CDMAM mammographic test phantom were acquired using the automatic setting on the DR system. These images were modified to the additional image qualities used in the observer study. The images were analyzed using automated software. In order to assess the relationship between threshold gold thickness and calcification detection a power law was fitted to the data. RESULTS: There was a significant reduction in calcification detection using CR compared with DR: the alternative FROC (AFROC) area decreased from 0.84 to 0.63 and the ROC area decreased from 0.91 to 0.79 (p < 0.0001). This corresponded to a 30% drop in lesion sensitivity at a NLF equal to 0.1. Detection was also sensitive to the dose used. There was no significant difference in detection between the two image processing algorithms used (p > 0.05). It was additionally found that lower threshold gold thickness from CDMAM analysis implied better cluster detection. The measured threshold gold thickness passed the acceptable limit set in the EU standards for all image qualities except half dose CR. However, calcification detection varied significantly between image qualities. This suggests that the current EU guidelines may need revising. CONCLUSIONS: Microcalcification detection was found to be sensitive to detector and dose used. Standard measurements of image quality were a good predictor of microcalcification cluster detection.


Subject(s)
Calcinosis/diagnostic imaging , Mammography/methods , Radiographic Image Enhancement/methods , Breast Neoplasms/complications , Breast Neoplasms/diagnostic imaging , Calcinosis/complications , Humans , Image Processing, Computer-Assisted , Phantoms, Imaging , Quality Control , ROC Curve , Radiation Dosage
6.
AJR Am J Roentgenol ; 194(2): 469-74, 2010 Feb.
Article in English | MEDLINE | ID: mdl-20093611

ABSTRACT

OBJECTIVE: Orthopedic injury and intracranial hemorrhage are commonly encountered in emergency radiology, and accurate and timely diagnosis is important. The purpose of this study was to determine whether the diagnostic accuracy of handheld computing devices is comparable to that of monitors that might be used in emergency teleconsultation. SUBJECTS AND METHODS: Two handheld devices, a Dell Axim personal digital assistant (PDA) and an Apple iPod Touch device, were studied. The diagnostic efficacy of each device was tested against that of secondary-class monitors (primary class being clinical workstation display) for each of two image types-posteroanterior wrist radiographs and slices from CT of the brain-yielding four separate observer performance studies. Participants read a bank of 30 wrist or brain images searching for a specific abnormality (distal radial fracture, fresh intracranial bleed) and rated their confidence in their decisions. A total of 168 readings by examining radiologists of the American Board of Radiology were gathered, and the results were subjected to receiver operating characteristics analysis. RESULTS: In the PDA brain CT study, the scores of PDA readings were significantly higher than those of monitor readings for all observers (p < or = 0.01) and for radiologists who were not neuroradiology specialists (p < or = 0.05). No statistically significant differences between handheld device and monitor findings were found for the PDA wrist images or in the iPod Touch device studies, although some comparisons approached significance. CONCLUSION: Handheld devices show promise in the field of emergency teleconsultation for detection of basic orthopedic injuries and intracranial hemorrhage. Further investigation is warranted.


Subject(s)
Brain Injuries/diagnostic imaging , Computers, Handheld , Data Display , Emergencies , Radiology/instrumentation , User-Computer Interface , Wrist Injuries/diagnostic imaging , Humans , ROC Curve , Software , Tomography, X-Ray Computed
7.
AJR Am J Roentgenol ; 192(6): W271-4, 2009 Jun.
Article in English | MEDLINE | ID: mdl-19457787

ABSTRACT

OBJECTIVE: In this experimental study we assessed the diagnostic performance of digital linear slit scanning radiography compared with computed radiography (CR) for the detection of urinary calculi in an anthropomorphic phantom imitating patients weighing approximately 58-88 kg. CONCLUSION: Compared with CR, linear slit scanning radiography is superior for the detection of urinary stones and may be used for pretreatment localization and follow-up at a lower patient exposure.


Subject(s)
Body Burden , Radiation Protection/methods , Radiographic Image Enhancement/methods , Tomography, X-Ray Computed/methods , Urinary Calculi/diagnostic imaging , Humans , Phantoms, Imaging , Radiation Protection/instrumentation , Radiographic Image Enhancement/instrumentation
8.
Acad Radiol ; 15(4): 472-6, 2008 Apr.
Article in English | MEDLINE | ID: mdl-18342772

ABSTRACT

RATIONALE AND OBJECTIVES: In recent years, there has been increasing interest in the impact of environmental factors such as ambient light on radiologist performance. One commonly encountered distractor found within all clinical departments that has received little or no attention is acoustic noise. MATERIALS AND METHODS: The present work records the level of noises encountered within environments where radiologic images are viewed and establishes the impact of a clinically relevant level of noise on the ability of radiologists to perform a typical diagnostic task. Noise levels were recorded 10 times within each of 14 environments, 11 of which were locations where radiologic images are judged. Thirty chest images were then presented to 26 senior radiologists, who were asked to detect up to three nodular lesions within 30 posteroanterior chest x-ray images in the absence and presence of noise at an amplitude demonstrated in the clinical environment. Jackknife free-response receiver-operating characteristic analyses was performed on the free-response data. RESULTS: The results demonstrated that noise amplitudes rarely exceeded that encountered with normal conversation with the maximum mean value for an image-viewing environment being 56.1 dB. This level of noise had no impact on the ability of radiologists to identify chest lesions with figure of merits of 0.68, 0.69, and 0.68 with noise and 0.65, 0.68, and 0.67 without noise for chest radiologists, nonchest radiologists, and all radiologists, respectively. Equally, no differences were seen for false-positive and false-negative scores or on the time required to judge the images. CONCLUSION: These findings suggest that noise at levels encountered within areas where radiologic images are viewed is not a major distractor within the reporting environment, but the need for further work has been identified.


Subject(s)
Clinical Competence , Noise, Occupational/adverse effects , Radiology Department, Hospital , Acoustics , Humans , Occupational Exposure
9.
Med Phys ; 34(6): 2024-38, 2007 Jun.
Article in English | MEDLINE | ID: mdl-17654906

ABSTRACT

Computer-aided detection (CAD) has been attracting extensive research interest during the last two decades. It is recognized that the full potential of CAD can only be realized by improving the performance and robustness of CAD algorithms and this requires good evaluation methodology that would permit CAD designers to optimize their algorithms. Free-response receiver operating characteristic (FROC) curves are widely used to assess CAD performance, however, evaluation rarely proceeds beyond determination of lesion localization fraction (sensitivity) at an arbitrarily selected value of nonlesion localizations (false marks) per image. This work describes a FROC curve fitting procedure that uses a recent model of visual search that serves as a framework for the free-response task. A maximum likelihood procedure for estimating the parameters of the model from free-response data and fitting CAD generated FROC curves was implemented. Procedures were implemented to estimate two figures of merit and associated statistics such as 95% confidence intervals and goodness of fit. One of the figures of merit does not require the arbitrary specification of an operating point at which to evaluate CAD performance. For comparison a related method termed initial detection and candidate analysis was also implemented that is applicable when all suspicious regions are reported. The two methods were tested on seven mammography CAD data sets and both yielded good to excellent fits. The search model approach has the advantage that it can potentially be applied to radiologist generated free-response data where not all suspicious regions are reported, only the ones that are deemed sufficiently suspicious to warrant clinical follow-up. This work represents the first practical application of the search model to an important evaluation problem in diagnostic radiology. Software based on this work is expected to benefit CAD developers working in diverse areas of medical imaging.


Subject(s)
Algorithms , Artificial Intelligence , Breast Neoplasms/diagnostic imaging , Mammography/methods , Pattern Recognition, Automated/methods , Radiographic Image Interpretation, Computer-Assisted/methods , Female , Humans , Radiographic Image Enhancement/methods , Sensitivity and Specificity , Software , Software Validation
10.
Med Phys ; 44(6): 2207-2222, 2017 Jun.
Article in English | MEDLINE | ID: mdl-28382718

ABSTRACT

PURPOSE: The objective was to design and implement a bivariate extension to the contaminated binormal model (CBM) to fit paired receiver operating characteristic (ROC) datasets-possibly degenerate-with proper ROC curves. Paired datasets yield two correlated ratings per case. Degenerate datasets have no interior operating points and proper ROC curves do not inappropriately cross the chance diagonal. The existing method, developed more than three decades ago utilizes a bivariate extension to the binormal model, implemented in CORROC2 software, which yields improper ROC curves and cannot fit degenerate datasets. CBM can fit proper ROC curves to unpaired (i.e., yielding one rating per case) and degenerate datasets, and there is a clear scientific need to extend it to handle paired datasets. METHODS: In CBM, nondiseased cases are modeled by a probability density function (pdf) consisting of a unit variance peak centered at zero. Diseased cases are modeled with a mixture distribution whose pdf consists of two unit variance peaks, one centered at positive µ with integrated probability α, the mixing fraction parameter, corresponding to the fraction of diseased cases where the disease was visible to the radiologist, and one centered at zero, with integrated probability (1-α), corresponding to disease that was not visible. It is shown that: (a) for nondiseased cases the bivariate extension is a unit variances bivariate normal distribution centered at (0,0) with a specified correlation ρ1 ; (b) for diseased cases the bivariate extension is a mixture distribution with four peaks, corresponding to disease not visible in either condition, disease visible in only one condition, contributing two peaks, and disease visible in both conditions. An expression for the likelihood function is derived. A maximum likelihood estimation (MLE) algorithm, CORCBM, was implemented in the R programming language that yields parameter estimates and the covariance matrix of the parameters, and other statistics. A limited simulation validation of the method was performed. RESULTS: CORCBM and CORROC2 were applied to two datasets containing nine readers each contributing paired interpretations. CORCBM successfully fitted the data for all readers, whereas CORROC2 failed to fit a degenerate dataset. All fits were visually reasonable. All CORCBM fits were proper, whereas all CORROC2 fits were improper. CORCBM and CORROC2 were in agreement (a) in declaring only one of the nine readers as having significantly different performances in the two modalities; (b) in estimating higher correlations for diseased cases than for nondiseased ones; and (c) in finding that the intermodality correlation estimates for nondiseased cases were consistent between the two methods. All CORCBM fits yielded higher area under curve (AUC) than the CORROC2 fits, consistent with the fact that a proper ROC model like CORCBM is based on a likelihood-ratio-equivalent decision variable, and consequently yields higher performance than the binormal model-based CORROC2. The method gave satisfactory fits to four simulated datasets. CONCLUSIONS: CORCBM is a robust method for fitting paired ROC datasets, always yielding proper ROC curves, and able to fit degenerate datasets.


Subject(s)
Algorithms , Likelihood Functions , ROC Curve , Area Under Curve , Humans , Models, Statistical , Software
11.
Acad Radiol ; 13(10): 1187-93, 2006 Oct.
Article in English | MEDLINE | ID: mdl-16979067

ABSTRACT

RATIONALE AND OBJECTIVES: The free-response paradigm is being increasingly used in the assessment of medical imaging systems. The currently implemented method of analyzing the data, namely jackknife free-response (JAFROC) analysis, has some validation and applicability limitations. The purpose of this work is to address these limitations. MATERIALS AND METHODS: The general principles of modality evaluation and methodology validation are reviewed. A model for simulating free-response data was used to test the statistical validity of several methods of analyzing the data. The methods differed only in the choice of the figure of merit used to quantify performance. Statistical validity was judged by investigating the behaviors of the methods under null hypothesis conditions of no difference between modalities. RESULTS: The validity of the different methods of analyzing the data was found to be dependent on the choice of figure of merit. A figure of merit is identified that accommodates abnormal images with multiple (one or more) lesions, detections of which could have different clinical significances (weights). This figure of merit is shown to be statistically valid. An extension of the analysis to single reader interpretations of images from different modalities is also shown to be statistically valid. CONCLUSION: With the validated enhancements, JAFROC is expected to be of greater utility to users of the free-response method. The extension to single-reader interpretations should be of particular value to developers of image processing algorithms, including developers of computer-aided diagnosis algorithms.


Subject(s)
Algorithms , Data Interpretation, Statistical , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Software , Task Performance and Analysis , Observer Variation , Reproducibility of Results , Sensitivity and Specificity
12.
J Soc Inf Disp ; 14(10): 921-926, 2006 Oct.
Article in English | MEDLINE | ID: mdl-17710120

ABSTRACT

Researchers have developed visual discrimination models (VDMs) that can predict a human observer's ability to detect a target object superposed on an image. These models incorporate sophisticated knowledge of the properties of the human visual system. In the predictive approach, termed conventional VDM usage, two input images with and without a target are analyzed by an algorithm that calculates a just-noticeable-difference (JND) index, which is a taken as a measure of the detectability of the target. A new method of using the VDM is described, termed channelized VDM, which involves finding the linear combination of the VDM-generated channels (which are not used in conventional VDM analysis) that has optimal classification ability between normal and abnormal images. The classification ability can be measured using receiver operating characteristic (ROC) or two alternative forced choice (2AFC) experiments, and in special cases they can also be predicted by signal detection theory (SDT) based model-observer methods. In this study simulated background and nodule containing regions were used to validate the new method. It was found that the channelized VDM predictions were in excellent qualitative agreement with human-observer validated SDT predictions. Either VDM method (conventional or channelized) has potential applicability to soft-copy display optimization. An advantage of any VDM-based approach is that complex effects, such as visual masking, are automatically accounted for, which effects are usually not included in SDT-based methods.

13.
Med Phys ; 43(5): 2548, 2016 May.
Article in English | MEDLINE | ID: mdl-27147365

ABSTRACT

PURPOSE: The free-response receiver operating characteristic (FROC) method is being increasingly used to evaluate observer performance in search tasks. Data analysis requires definition of a figure of merit (FOM) quantifying performance. While a number of FOMs have been proposed, the recommended one, namely, the weighted alternative FROC (wAFROC) FOM, is not well understood. The aim of this work is to clarify the meaning of this FOM by relating it to the empirical area under a proposed wAFROC curve. METHODS: The weighted wAFROC FOM is defined in terms of a quasi-Wilcoxon statistic that involves weights, coding the clinical importance, assigned to each lesion. A new wAFROC curve is proposed, the y-axis of which incorporates the weights, giving more credit for marking clinically important lesions, while the x-axis is identical to that of the AFROC curve. An expression is derived relating the area under the empirical wAFROC curve to the wAFROC FOM. Examples are presented with small numbers of cases showing how AFROC and wAFROC curves are affected by correct and incorrect decisions and how the corresponding FOMs credit or penalize these decisions. The wAFROC, AFROC, and inferred ROC FOMs were applied to three clinical data sets involving multiple reader FROC interpretations in different modalities. RESULTS: It is shown analytically that the area under the empirical wAFROC curve equals the wAFROC FOM. This theorem is the FROC analog of a well-known theorem developed in 1975 for ROC analysis, which gave meaning to a Wilcoxon statistic based ROC FOM. A similar equivalence applies between the area under the empirical AFROC curve and the AFROC FOM. The examples show explicitly that the wAFROC FOM gives equal importance to all diseased cases, regardless of the number of lesions, a desirable statistical property not shared by the AFROC FOM. Applications to the clinical data sets show that the wAFROC FOM yields results comparable to that using the AFROC FOM. CONCLUSIONS: The equivalence theorem gives meaning to the weighted AFROC FOM, namely, it is identical to the empirical area under weighted AFROC curve.


Subject(s)
Models, Statistical , ROC Curve , Algorithms , Area Under Curve , Breast/diagnostic imaging , Breast Diseases/diagnostic imaging , Calcinosis/diagnostic imaging , Computer Simulation , Data Interpretation, Statistical , Datasets as Topic , Humans , Mammography/instrumentation , Mammography/methods , Models, Anatomic , Phantoms, Imaging , Positron-Emission Tomography/instrumentation , Positron-Emission Tomography/methods , Software , Tomography, X-Ray Computed/instrumentation , Tomography, X-Ray Computed/methods
14.
Phys Med ; 32(4): 568-74, 2016 Apr.
Article in English | MEDLINE | ID: mdl-27061872

ABSTRACT

PURPOSE: To investigate the relationship between image quality measurements and the clinical performance of digital mammographic systems. METHODS: Mammograms containing subtle malignant non-calcification lesions and simulated malignant calcification clusters were adapted to appear as if acquired by four types of detector. Observers searched for suspicious lesions and gave these a malignancy score. Analysis was undertaken using jackknife alternative free-response receiver operating characteristics weighted figure of merit (FoM). Images of a CDMAM contrast-detail phantom were adapted to appear as if acquired using the same four detectors as the clinical images. The resultant threshold gold thicknesses were compared to the FoMs using a linear regression model and an F-test was used to find if the gradient of the relationship was significantly non-zero. RESULTS: The detectors with the best image quality measurement also had the highest FoM values. The gradient of the inverse relationship between FoMs and threshold gold thickness for the 0.25mm diameter disk was significantly different from zero for calcification clusters (p=0.027), but not for non-calcification lesions (p=0.11). Systems performing just above the minimum image quality level set in the European Guidelines for Quality Assurance in Breast Cancer Screening and Diagnosis resulted in reduced cancer detection rates compared to systems performing at the achievable level. CONCLUSIONS: The clinical effectiveness of mammography for the task of detecting calcification clusters was found to be linked to image quality assessment using the CDMAM phantom. The European Guidelines should be reviewed as the current minimum image quality standards may be too low.


Subject(s)
Breast Neoplasms/diagnostic imaging , Mammography/methods , Breast Neoplasms/metabolism , Breast Neoplasms/pathology , Calcinosis/diagnostic imaging , Calcinosis/metabolism , Calcinosis/pathology , Female , Guidelines as Topic , Humans , Mammography/standards , Radiographic Image Enhancement/methods
15.
Med Phys ; 32(4): 1031-4, 2005 Apr.
Article in English | MEDLINE | ID: mdl-15895587

ABSTRACT

The authors compared two methodological approaches, Jackknife ROC and JAFROC, in analyzing data ascertained during FROC (free-response receiver operating characteristics) type studies. Observer rating data obtained from two observer performance studies were analyzed. During the first study, seven radiologists interpreted 120 mammography examinations depicting 57 masses under five different conditions with and without the results of computer-aided detection (CAD). In the second study, eight radiologists interpreted 110 examinations depicting 51 masses under six different display conditions with and without CAD results. Readers rated the detection task in a FROC type response. Jackknife ROC (using the software of LABMRMC with the highest rating per case) and JAFROC were used to compute differences, if any, in summary performance levels among all reading modes in each study as well as for all paired data sets. The results of the different analytical approaches are compared. The overall results for all modes were significantly different for the first study (p < 0.05) and not significant (p > 0.05) for the second study using either analytical approach. In the first study, the performance levels represented by three paired data sets were significantly different (p < 0.05) when computed using LABMRMC and four pairs were significantly different (p < 0.05) using JAFROC. In eight of ten pairs, JAFROC produced lower p values than LABMRMC. In the second study, LABMRMC showed no significant differences for any paired data sets and JAFROC showed a significant difference for one pair. In 15 of 16 pairs, p values computed by JAFROC were lower than those computed by LABMRMC.


Subject(s)
Breast Neoplasms/diagnosis , Mammography/methods , Radiographic Image Interpretation, Computer-Assisted/methods , Confidence Intervals , Diagnosis, Computer-Assisted , False Positive Reactions , Female , Humans , Observer Variation , ROC Curve , Reproducibility of Results , Software
16.
Radiat Prot Dosimetry ; 114(1-3): 26-31, 2005.
Article in English | MEDLINE | ID: mdl-15933077

ABSTRACT

The jackknife free-response receiver operating characteristic (JAFROC) method allows quantitative analysis of observer data such as that observed when radiologists interpret images, which could contain more than one lesion and a location can be reported for each perceived lesion. The method was recently validated with a perception-based simulation model that incorporated the detectability parameter of the standard binormal ROC model, and in addition allowed simultaneous samples from both noise and signal distributions. The total number of noise samples is an important new parameter that measures reader expertise. The new sampling model incorporates search, which is an integral part of lesion detection that has not been possible to model until now. The model was used to generate simulated FROC ratings data, which was used to assess the statistical validity of JAFROC analysis. We found that JAFROC analysis is a statistically valid approach for analysing FROC data and that JAFROC analysis exhibited significantly greater statistical power than the existing ROC approach.


Subject(s)
Observer Variation , Radiology/methods , Algorithms , Humans , Likelihood Functions , Models, Theoretical , ROC Curve , Radiographic Image Enhancement/methods , Software
17.
Med Phys ; 31(8): 2313-30, 2004 Aug.
Article in English | MEDLINE | ID: mdl-15377098

ABSTRACT

Although the receiver operating characteristic (ROC) paradigm is the accepted method for evaluation of diagnostic imaging systems, it has some serious shortcomings inasmuch as it is restricted to one observer report per image. By contrast the free-response ROC (FROC) paradigm and associated analysis method allows the observer to report multiple abnormalities within each imaging study, and uses the location of reported abnormalities to improve the measurement. Because the ROC method cannot accommodate multiple responses or use location information, its statistical power will suffer. The FROC paradigm/analysis has not enjoyed widespread acceptance because of concern about whether responses made to the same diagnostic study can be treated as independent. We propose a new jackknife FROC analysis method (JAFROC) that does not make the independence assumption. The new analysis method combines elements of FROC and the Dorfman-Berbaum-Metz (DBM) methods. To compare JAFROC to an earlier free-response analysis method (specifically the alternative free-response, or AFROC method), and to the DBM method, which uses conventional ROC scoring, we developed a model for generating simulated FROC data. The simulation model is based on an eye-movement model of how experts evaluate images. It allowed us to examine null hypothesis (NH) behavior and statistical power of the different methods. We found that AFROC analysis did not pass the NH test, being unduly conservative. Both the JAFROC method and the DBM method passed the NH test, but JAFROC had more statistical power than the DBM method. The results of this comparison suggest that future studies of diagnostic performance may enjoy improved statistical power or reduced sample size requirements through the use of the JAFROC method.


Subject(s)
Algorithms , Models, Theoretical , Software , Humans , Mammography/methods , Observer Variation , Reproducibility of Results
18.
Acad Radiol ; 21(4): 538-45, 2014 Apr.
Article in English | MEDLINE | ID: mdl-24594424

ABSTRACT

RATIONALE AND OBJECTIVES: The purpose of this study was to compare lesion-detection performance when interpreting computed tomography (CT) images that are acquired for attenuation correction when performing single photon emission computed tomography/computed tomography (SPECT/CT) myocardial perfusion studies. In the United Kingdom, there is a requirement that these images be interpreted; thus, it is necessary to understand observer performance on these images. MATERIALS AND METHODS: An anthropomorphic chest phantom with inserted spherical lesions of different sizes and contrasts was scanned on five different SPECT/CT systems using site-specific CT protocols for SPECT/CT myocardial perfusion imaging. Twenty-one observers (0-4 years of CT experience) searched 26 image slices (17 abnormal, containing 1-3 lesions, and 9 normal, containing no lesions) for each CT acquisition. The observers marked and rated perceived lesions under the free-response paradigm. Four analyses were conducted using jackknife alternative free-response receiver operating characteristic (JAFROC) analysis: (1) 20-pixel acceptance radius (AR) with all 21 readers, abbreviated to 20/ALL analysis, (2) 40-pixel AR with 21 readers (40/ALL), (3) 20-pixel AR with 14 readers experienced in CT (20/EXP), and (4) 20-pixel AR with 7 readers with no CT experience (20/NOT). The significance level of the test was set so as to conservatively control the overall probability of a type I error to <0.05. RESULTS: The mean JAFROC figure of merit (FOM) for the five CT acquisitions for the 20/ALL study were 0.602, 0.639, 0.372, 0.475, and 0.719 with a significant difference in lesion-detection performance evident between all individual treatment pairs (P < .0001) with the exception of the 1-2 pairing, which was not significant (these differed only in milliamp seconds). System 5, which had the highest performance, had the smallest slice thickness and the largest matrix size. For the other analyses, the system orderings remained unchanged, and the significance of FOM difference findings remained identical to those for 20/ALL, with one exception: for 20/EXP analysis the 1-2 difference became significant with the higher milliamp seconds superior. Improved detection performance was associated with a smaller slice thickness, increased matrix size, and, to a lesser extent, increased tube charge. CONCLUSIONS: Protocol variations for CT-based attenuation correction (AC) in SPECT/CT imaging have a measurable impact on lesion-detection performance. The results imply that z-axis resolution and matrix size had the greatest impact on lesion detection, with a weaker but detectable dependence on the product of milliamp and seconds.


Subject(s)
Algorithms , Incidental Findings , Lung Neoplasms/diagnostic imaging , Phantoms, Imaging , Radiographic Image Enhancement/methods , Radiographic Image Interpretation, Computer-Assisted/methods , Tomography, X-Ray Computed/methods , Artifacts , Clinical Competence , Humans , Observer Variation , Radiography, Thoracic/methods , Reproducibility of Results , Sensitivity and Specificity , Tomography, Emission-Computed, Single-Photon/methods
19.
Acad Radiol ; 20(7): 915-9, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23583665

ABSTRACT

In the receiver operating characteristic paradigm the observer assigns a single rating to each image and the location of the perceived abnormality, if any, is ignored. In the free-response receiver operating characteristic paradigm the observer is free to mark and rate as many suspicious regions as are considered clinically reportable. Credit for a correct localization is given only if a mark is sufficiently close to an actual lesion; otherwise, the observer's mark is scored as a location-level false positive. Until fairly recently there existed no accepted method for analyzing the resulting relatively unstructured data containing random numbers of mark-rating pairs per image. This report reviews the history of work in this field, which has now spanned more than five decades. It introduces terminology used to describe the paradigm, proposed measures of performance (figures of merit), ways of visualizing the data (operating characteristics), and software for analyzing free-response receiver operating characteristic studies.


Subject(s)
Breast Neoplasms/diagnostic imaging , Data Interpretation, Statistical , Mammography/statistics & numerical data , ROC Curve , Female , Humans , Mammography/methods , Radiographic Image Interpretation, Computer-Assisted/methods , Reproducibility of Results , Software
20.
Phys Med Biol ; 57(10): 2873-904, 2012 May 21.
Article in English | MEDLINE | ID: mdl-22516804

ABSTRACT

Laboratory receiver operating characteristic (ROC) studies, that are often used to evaluate medical imaging systems, differ from 'live' clinical interpretations in several respects which could compromise their clinical relevance. The aim was to develop methodology for quantifying the clinical relevance of a laboratory ROC study. A simulator was developed to generate ROC ratings data and binary clinical interpretations classified as correct or incorrect for a common set of images interpreted under clinical and laboratory conditions. The area under the trapezoidal ROC curve (AUC) was used as the laboratory figure-of-merit and the fraction of correct clinical decisions as the clinical figure-of-merit. Conventional agreement measures (Pearson, Spearman, Kendall and kappa) between the bootstrap-induced fluctuations of the two figures of merit were estimated. A jackknife pseudovalue transformation applied to the figures of merit was also investigated as a way to capture agreement existing at the individual image level that could be lost at the figure-of-merit level. It is shown that the pseudovalues define a relevance-ROC curve. The area under this curve (rAUC) measures the ability of the laboratory figure-of-merit-based pseudovalues to correctly classify incorrect versus correct clinical interpretations. Therefore, rAUC is a measure of the clinical relevance of an ROC study. The conventional measures and rAUC were compared under varying simulator conditions. It was found that design details of the ROC study, namely the number of bins, the difficulty level of the images, the ratio of disease-present to disease-absent images and the unavoidable difference between laboratory and clinical performance levels, can lead to serious underestimation of the agreement as indicated by conventional agreement measures, even for perfectly correlated data, while rAUC showed high agreement and was relatively immune to these details. At the same time rAUC was sensitive to factors such as intrinsic correlation between the laboratory and clinical decision variables and differences in reporting thresholds that are expected to influence agreement both at the individual image level and at the figure-of-merit level. Suggestions are made for how to conduct relevance-ROC studies aimed at assessing agreement between laboratory and clinical interpretations. The method could be used to evaluate the clinical relevance of alternative scalar figures of merit, such as the sensitivity at a predifined specificity.


Subject(s)
Image Interpretation, Computer-Assisted/methods , Models, Theoretical , Area Under Curve , Humans , Laboratories , Observer Variation , ROC Curve
SELECTION OF CITATIONS
SEARCH DETAIL