Search | VHL Search Portal

Effect of finite sample size on feature selection and classification: a simulation study.

Way, Ted W; Sahiner, Berkman; Hadjiiski, Lubomir M; Chan, Heang-Ping.

Med Phys ; 37(2): 907-20, 2010 Feb.

Article in English | MEDLINE | ID: mdl-20229900

ABSTRACT

PURPOSE: The small number of samples available for training and testing is often the limiting factor in finding the most effective features and designing an optimal computer-aided diagnosis (CAD) system. Training on a limited set of samples introduces bias and variance in the performance of a CAD system relative to that trained with an infinite sample size. In this work, the authors conducted a simulation study to evaluate the performances of various combinations of classifiers and feature selection techniques and their dependence on the class distribution, dimensionality, and the training sample size. The understanding of these relationships will facilitate development of effective CAD systems under the constraint of limited available samples. METHODS: Three feature selection techniques, the stepwise feature selection (SFS), sequential floating forward search (SFFS), and principal component analysis (PCA), and two commonly used classifiers, Fisher's linear discriminant analysis (LDA) and support vector machine (SVM), were investigated. Samples were drawn from multidimensional feature spaces of multivariate Gaussian distributions with equal or unequal covariance matrices and unequal means, and with equal covariance matrices and unequal means estimated from a clinical data set. Classifier performance was quantified by the area under the receiver operating characteristic curve Az. The mean Az values obtained by resubstitution and hold-out methods were evaluated for training sample sizes ranging from 15 to 100 per class. The number of simulated features available for selection was chosen to be 50, 100, and 200. RESULTS: It was found that the relative performance of the different combinations of classifier and feature selection method depends on the feature space distributions, the dimensionality, and the available training sample sizes. The LDA and SVM with radial kernel performed similarly for most of the conditions evaluated in this study, although the SVM classifier showed a slightly higher hold-out performance than LDA for some conditions and vice versa for other conditions. PCA was comparable to or better than SFS and SFFS for LDA at small samples sizes, but inferior for SVM with polynomial kernel. For the class distributions simulated from clinical data, PCA did not show advantages over the other two feature selection methods. Under this condition, the SVM with radial kernel performed better than the LDA when few training samples were available, while LDA performed better when a large number of training samples were available. CONCLUSIONS: None of the investigated feature selection-classifier combinations provided consistently superior performance under the studied conditions for different sample sizes and feature space distributions. In general, the SFFS method was comparable to the SFS method while PCA may have an advantage for Gaussian feature spaces with unequal covariance matrices. The performance of the SVM with radial kernel was better than, or comparable to, that of the SVM with polynomial kernel under most conditions studied.

Subject(s)

Algorithms , Artificial Intelligence , Pattern Recognition, Automated/methods , Radiographic Image Enhancement/methods , Radiographic Image Interpretation, Computer-Assisted/methods , Computer Simulation , Models, Biological , Reproducibility of Results , Sample Size , Sensitivity and Specificity , Signal Processing, Computer-Assisted

Computer-aided diagnosis of pulmonary nodules on CT scans: improvement of classification performance with nodule surface features.

Way, Ted W; Sahiner, Berkman; Chan, Heang-Ping; Hadjiiski, Lubomir; Cascade, Philip N; Chughtai, Aamer; Bogot, Naama; Kazerooni, Ella.

Med Phys ; 36(7): 3086-98, 2009 Jul.

Article in English | MEDLINE | ID: mdl-19673208

ABSTRACT

The purpose of this work is to develop a computer-aided diagnosis (CAD) system to differentiate malignant and benign lung nodules on CT scans. A fully automated system was designed to segment the nodule from its surrounding structured background in a local volume of interest (VOI) and to extract image features for classification. Image segmentation was performed with a 3D active contour method. The initial contour was obtained as the boundary of a binary object generated by k-means clustering within the VOI and smoothed by morphological opening. A data set of 256 lung nodules (124 malignant and 132 benign) from 152 patients was used in this study. In addition to morphological and texture features, the authors designed new nodule surface features to characterize the lung nodule surface smoothness and shape irregularity. The effects of two demographic features, age and gender, as adjunct to the image features were also investigated. A linear discriminant analysis (LDA) classifier built with features from stepwise feature selection was trained using simplex optimization to select the most effective features. A two-loop leave-one-out resampling scheme was developed to reduce the optimistic bias in estimating the test performance of the CAD system. The area under the receiver operating characteristic curve, A(z), for the test cases improved significantly (p < 0.05) from 0.821 +/- 0.026 to 0.857 +/- 0.023 when the newly developed image features were included with the original morphological and texture features. A similar experiment performed on the data set restricted to primary cancers and benign nodules, excluding the metastatic cancers, also resulted in an improved test A(z), though the improvement did not reach statistical significance (p = 0.07). The two demographic features did not significantly affect the performance of the CAD system (p > 0.05) when they were added to the feature space containing the morphological, texture, and new gradient field and radius features. To investigate if a support vector machine (SVM) classifier can achieve improved performance over the LDA classifier, we compared the performance of the LDA and SVMs with various kernels and parameters. Principal component analysis was used to reduce the dimensionality of the feature space for both the LDA and the SVM classifiers. When the number of selected principal components was varied, the highest test A(z) among the SVMs of various kernels and parameters was slightly higher than that of the LDA in one-loop leave-one-case-out resampling. However, no SVM with fixed architecture consistently performed better than the LDA in the range of principal components selected. This study demonstrated that the authors' proposed segmentation and feature extraction techniques are promising for classifying lung nodules on CT images.

Subject(s)

Diagnosis, Computer-Assisted , Image Interpretation, Computer-Assisted/methods , Lung Neoplasms/diagnostic imaging , Lung Neoplasms/diagnosis , Tomography, X-Ray Computed/methods , Age Factors , Algorithms , Area Under Curve , Discriminant Analysis , Female , Humans , Imaging, Three-Dimensional , Lung Neoplasms/pathology , Male , Neoplasm Metastasis/diagnosis , Neoplasm Metastasis/diagnostic imaging , Neoplasm Metastasis/pathology , Principal Component Analysis , Sex Factors

Quantitative CT of lung nodules: dependence of calibration on patient body size, anatomic region, and calibration nodule size for single- and dual-energy techniques.

Goodsitt, Mitchell M; Chan, Heang-Ping; Way, Ted W; Schipper, Mathew J; Larson, Sandra C; Christodoulou, Emmanuel G.

Med Phys ; 36(7): 3107-21, 2009 Jul.

Article in English | MEDLINE | ID: mdl-19673210

ABSTRACT

Calcium concentration may be a useful feature for distinguishing benign from malignant lung nodules in computer-aided diagnosis. The calcium concentration can be estimated from the measured CT number of the nodule and a CT number vs calcium concentration calibration line that is derived from CT scans of two or more calcium reference standards. To account for CT number nonuniformity in the reconstruction field, such calibration lines may be obtained at multiple locations within lung regions in an anthropomorphic phantom. The authors performed a study to investigate the effects of patient body size, anatomic region, and calibration nodule size on the derived calibration lines at ten lung region positions using both single energy (SE) and dual energy (DE) CT techniques. Simulated spherical lung nodules of two concentrations (50 and 100 mg/cc CaCO3) were employed. Nodules of three different diameters (4.8, 9.5, and 16 mm) were scanned in a simulated thorax section representing the middle of the chest with large lung regions. The 4.8 and 9.5 mm nodules were also scanned in a section representing the upper chest with smaller lung regions. Fat rings were added to the peripheries of the phantoms to simulate larger patients. Scans were acquired on a GE-VCT scanner at 80, 120, and 140 kVp and were repeated three times for each condition. The average absolute CT number separations between the calibration lines were computed. In addition, under- or overestimates were determined when the calibration lines for one condition (e.g., small patient) were used to estimate the CaCO3 concentrations of nodules for a different condition (e.g., large patient). The authors demonstrated that, in general, DE is a more accurate method for estimating the calcium contents of lung nodules. The DE calibration lines within the lung field were less affected by patient body size, calibration nodule size, and nodule position than the SE calibration lines. Under- or overestimates in CaCO3 concentrations of nodules were also in general smaller in quantity with DE than with SE. However, because the slopes of the calibration lines for DE were about one-half the slopes for SE, the relative improvement in the concentration estimates for DE as compared to SE was about one-half the relative improvement in the separation between the calibration lines. Results in the middle of the chest thorax section with large lungs were nearly completely consistent with the above generalization. On the other hand, results in the upper-chest thorax section with smaller lungs and greater amounts of muscle and bone were mixed. A repeat of the entire study in the upper thorax section yielded similar mixed results. Most of the inconsistencies occurred for the 4.8 mm nodules and may be attributed to errors caused by beam hardening, volume averaging, and insufficient sampling. Targeted, higher resolution reconstructions of the smaller nodules, application of high atomic number filters to the high energy x-ray beam for improved spectral separation, and other future developments in DECT may alleviate these problems and further substantiate the superior accuracy of DECT in quantifying the calcium concentrations of lung nodules.

Subject(s)

Calcium Carbonate/analysis , Lung Neoplasms/diagnostic imaging , Tomography, X-Ray Computed/methods , Algorithms , Body Size , Calibration , Humans , Image Processing, Computer-Assisted , Lung/chemistry , Lung/diagnostic imaging , Lung/pathology , Lung Neoplasms/chemistry , Lung Neoplasms/pathology , Phantoms, Imaging , Software , Tomography Scanners, X-Ray Computed

Effect of CT scanning parameters on volumetric measurements of pulmonary nodules by 3D active contour segmentation: a phantom study.

Way, Ted W; Chan, Heang-Ping; Goodsitt, Mitchell M; Sahiner, Berkman; Hadjiiski, Lubomir M; Zhou, Chuan; Chughtai, Aamer.

Phys Med Biol ; 53(5): 1295-312, 2008 Mar 07.

Article in English | MEDLINE | ID: mdl-18296763

ABSTRACT

The purpose of this study is to investigate the effects of CT scanning and reconstruction parameters on automated segmentation and volumetric measurements of nodules in CT images. Phantom nodules of known sizes were used so that segmentation accuracy could be quantified in comparison to ground-truth volumes. Spherical nodules having 4.8, 9.5 and 16 mm diameters and 50 and 100 mg cc(-1) calcium contents were embedded in lung-tissue-simulating foam which was inserted in the thoracic cavity of a chest section phantom. CT scans of the phantom were acquired with a 16-slice scanner at various tube currents, pitches, fields-of-view and slice thicknesses. Scans were also taken using identical techniques either within the same day or five months apart for study of reproducibility. The phantom nodules were segmented with a three-dimensional active contour (3DAC) model that we previously developed for use on patient nodules. The percentage volume errors relative to the ground-truth volumes were estimated under the various imaging conditions. There was no statistically significant difference in volume error for repeated CT scans or scans taken with techniques where only pitch, field of view, or tube current (mA) were changed. However, the slice thickness significantly (p < 0.05) affected the volume error. Therefore, to evaluate nodule growth, consistent imaging conditions and high resolution should be used for acquisition of the serial CT scans, especially for smaller nodules. Understanding the effects of scanning and reconstruction parameters on volume measurements by 3DAC allows better interpretation of data and assessment of growth. Tracking nodule growth with computerized segmentation methods would reduce inter- and intraobserver variabilities.

Subject(s)

Imaging, Three-Dimensional/methods , Lung/anatomy & histology , Lung/diagnostic imaging , Phantoms, Imaging , Tomography, X-Ray Computed/instrumentation , Observer Variation , Organ Size , Reproducibility of Results

Accuracy of the CT numbers of simulated lung nodules imaged with multi-detector CT scanners.

Goodsitt, Mitchell M; Chan, Heang-Ping; Way, Ted W; Larson, Sandra C; Christodoulou, Emmanuel G; Kim, Jeomsoon.

Med Phys ; 33(8): 3006-17, 2006 Aug.

Article in English | MEDLINE | ID: mdl-16964879

ABSTRACT

A study was performed to determine the accuracies and reproducibilities of the CT numbers of simulated lung nodules imaged with multi-detector CT scanners. The nodules were simulated by spherical balls of three diameters (4.8, 9.5, and 16 mm) and two compositions (50 and 100 mg/cc CaCO3 in water-equivalent plastic). All were scanned in a liquid-water-filled container at the center of a water-equivalent-plastic phantom and in air cavities within the same phantom using GE multi-detector CT scanners. The nodules were also scanned within simulated lung regions in an anthropomorphic thorax section phantom that was bolused on both sides with water-equivalent slabs. Results were compared for three scanning protocols--the protocol for the National Lung Screening Trial (NLST), the protocol for the Lung Tissue Research Consortium (LTRC) study, and a high resolution (small pitch, thin slice and small scan interval) higher dose "gold standard" protocol. Scans were repeated three times with each protocol to assess reproducibility. The CT numbers of the nodules in water were found to be nearly independent of nodule size. However, the presence and the size of an air cavity surrounding a nodule had a significant effect (e.g., the CT number of a 50 mg/cc nodule was 64 HU in water, 37 HU in a 1.8 cm diameter air cavity, and 19 HU in a 4.4 cm diameter air cavity). This variability of CT number with size of air cavity may affect the results of the LTRC study in which patients are scanned at both full inspiration and full expiration. The CT numbers of the 9.5 and 16 mm diameter nodules within the anthropomorphic phantom were highly reproducible (average standard deviations of 2 HU or less) for all protocols. On the other hand, both accuracy and reproducibility were significantly degraded for the 4.8 mm diameter nodules, especially for the NLST (2.5 mm thickness, 2 mm slice interval) technique. Use of thinner slice (1.25 mm) and slice interval (1.25 mm) scans that can be reconstructed retrospectively from the multi-detector helical CT projection data of the standard NLST protocol yield CT numbers for the 4.8 mm diameter nodules that are more accurate and reproducible than those of the standard NLST technique. In general, the CT numbers of the nodules were found to be lower at positions near the centers of the lungs and near the spine, which is probably due to increased beam hardening in those regions. Also, larger nodules were found to have higher CT numbers than smaller nodules, consistent with results obtained on early single slice GE CT scanners. Until manufacturers develop quantitative CT scanners with improved x-ray beam hardening and scatter corrections, it is recommended that reference phantoms be employed to more accurately assess the calcium contents of patient lung nodules in screening and tissue characterization studies and in eventual computer-aided detection and diagnosis applications.

Subject(s)

Algorithms , Radiographic Image Enhancement/instrumentation , Radiographic Image Interpretation, Computer-Assisted/instrumentation , Solitary Pulmonary Nodule/diagnostic imaging , Tomography, X-Ray Computed/instrumentation , Transducers , Equipment Design , Equipment Failure Analysis , Humans , Information Storage and Retrieval/methods , Lung Neoplasms/diagnostic imaging , Phantoms, Imaging , Radiographic Image Enhancement/methods , Radiographic Image Interpretation, Computer-Assisted/methods , Reproducibility of Results , Sensitivity and Specificity , Tomography, X-Ray Computed/methods

Computer-aided diagnosis of pulmonary nodules on CT scans: segmentation and classification using 3D active contours.

Way, Ted W; Hadjiiski, Lubomir M; Sahiner, Berkman; Chan, Heang-Ping; Cascade, Philip N; Kazerooni, Ella A; Bogot, Naama; Zhou, Chuan.

Med Phys ; 33(7): 2323-37, 2006 Jul.

Article in English | MEDLINE | ID: mdl-16898434

ABSTRACT

We are developing a computer-aided diagnosis (CAD) system to classify malignant and benign lung nodules found on CT scans. A fully automated system was designed to segment the nodule from its surrounding structured background in a local volume of interest (VOI) and to extract image features for classification. Image segmentation was performed with a three-dimensional (3D) active contour (AC) method. A data set of 96 lung nodules (44 malignant, 52 benign) from 58 patients was used in this study. The 3D AC model is based on two-dimensional AC with the addition of three new energy components to take advantage of 3D information: (1) 3D gradient, which guides the active contour to seek the object surface, (2) 3D curvature, which imposes a smoothness constraint in the z direction, and (3) mask energy, which penalizes contours that grow beyond the pleura or thoracic wall. The search for the best energy weights in the 3D AC model was guided by a simplex optimization method. Morphological and gray-level features were extracted from the segmented nodule. The rubber band straightening transform (RBST) was applied to the shell of voxels surrounding the nodule. Texture features based on run-length statistics were extracted from the RBST image. A linear discriminant analysis classifier with stepwise feature selection was designed using a second simplex optimization to select the most effective features. Leave-one-case-out resampling was used to train and test the CAD system. The system achieved a test area under the receiver operating characteristic curve (A(z)) of 0.83 +/- 0.04. Our preliminary results indicate that use of the 3D AC model and the 3D texture features surrounding the nodule is a promising approach to the segmentation and classification of lung nodules with CAD. The segmentation performance of the 3D AC model trained with our data set was evaluated with 23 nodules available in the Lung Image Database Consortium (LIDC). The lung nodule volumes segmented by the 3D AC model for best classification were generally larger than those outlined by the LIDC radiologists using visual judgment of nodule boundaries.

Subject(s)

Diagnosis, Computer-Assisted/methods , Lung Neoplasms/diagnostic imaging , Lung Neoplasms/diagnosis , Solitary Pulmonary Nodule/diagnostic imaging , Solitary Pulmonary Nodule/diagnosis , Tomography, X-Ray Computed/methods , Biopsy , False Positive Reactions , Humans , Image Processing, Computer-Assisted/methods , Imaging, Three-Dimensional , Models, Statistical , Neoplasm Metastasis , ROC Curve

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL