Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
Bioengineering (Basel) ; 11(6)2024 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-38927849

RESUMO

Quantitative and objective evaluation tools are essential for assessing the performance of machine learning (ML)-based magnetic resonance imaging (MRI) reconstruction methods. However, the commonly used fidelity metrics, such as mean squared error (MSE), structural similarity (SSIM), and peak signal-to-noise ratio (PSNR), often fail to capture fundamental and clinically relevant MR image quality aspects. To address this, we propose evaluation of ML-based MRI reconstruction using digital image quality phantoms and automated evaluation methods. Our phantoms are based upon the American College of Radiology (ACR) large physical phantom but created in k-space to simulate their MR images, and they can vary in object size, signal-to-noise ratio, resolution, and image contrast. Our evaluation pipeline incorporates evaluation metrics of geometric accuracy, intensity uniformity, percentage ghosting, sharpness, signal-to-noise ratio, resolution, and low-contrast detectability. We demonstrate the utility of our proposed pipeline by assessing an example ML-based reconstruction model across various training and testing scenarios. The performance results indicate that training data acquired with a lower undersampling factor and coils of larger anatomical coverage yield a better performing model. The comprehensive and standardized pipeline introduced in this study can help to facilitate a better understanding of the performance and guide future development and advancement of ML-based reconstruction algorithms.

2.
ArXiv ; 2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38745699

RESUMO

Background: The findings of the 2023 AAPM Grand Challenge on Deep Generative Modeling for Learning Medical Image Statistics are reported in this Special Report. Purpose: The goal of this challenge was to promote the development of deep generative models for medical imaging and to emphasize the need for their domain-relevant assessments via the analysis of relevant image statistics. Methods: As part of this Grand Challenge, a common training dataset and an evaluation procedure was developed for benchmarking deep generative models for medical image synthesis. To create the training dataset, an established 3D virtual breast phantom was adapted. The resulting dataset comprised about 108,000 images of size 512×512. For the evaluation of submissions to the Challenge, an ensemble of 10,000 DGM-generated images from each submission was employed. The evaluation procedure consisted of two stages. In the first stage, a preliminary check for memorization and image quality (via the Fréchet Inception Distance (FID)) was performed. Submissions that passed the first stage were then evaluated for the reproducibility of image statistics corresponding to several feature families including texture, morphology, image moments, fractal statistics and skeleton statistics. A summary measure in this feature space was employed to rank the submissions. Additional analyses of submissions was performed to assess DGM performance specific to individual feature families, the four classes in the training data, and also to identify various artifacts. Results: Fifty-eight submissions from 12 unique users were received for this Challenge. Out of these 12 submissions, 9 submissions passed the first stage of evaluation and were eligible for ranking. The top-ranked submission employed a conditional latent diffusion model, whereas the joint runners-up employed a generative adversarial network, followed by another network for image superresolution. In general, we observed that the overall ranking of the top 9 submissions according to our evaluation method (i) did not match the FID-based ranking, and (ii) differed with respect to individual feature families. Another important finding from our additional analyses was that different DGMs demonstrated similar kinds of artifacts. Conclusions: This Grand Challenge highlighted the need for domain-specific evaluation to further DGM design as well as deployment. It also demonstrated that the specification of a DGM may differ depending on its intended use.

3.
Med Phys ; 51(2): 978-990, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38127330

RESUMO

BACKGROUND: Deep learning (DL) CT denoising models have the potential to improve image quality for lower radiation dose exams. These models are generally trained with large quantities of adult patient image data. However, CT, and increasingly DL denoising methods, are used in both adult and pediatric populations. Pediatric body habitus and size can differ significantly from adults and vary dramatically from newborns to adolescents. Ensuring that pediatric subgroups of different body sizes are not disadvantaged by DL methods requires evaluations capable of assessing performance in each subgroup. PURPOSE: To assess DL CT denoising in pediatric and adult-sized patients, we built a framework of computer simulated image quality (IQ) control phantoms and evaluation methodology. METHODS: The computer simulated IQ phantoms in the framework featured pediatric-sized versions of standard CatPhan 600 and MITA-LCD phantoms with a range of diameters matching the mean effective diameters of pediatric patients ranging from newborns to 18 years old. These phantoms were used in simulating CT images that were then inputs for a DL denoiser to evaluate performance in different sized patients. Adult CT test images were simulated using standard-sized phantoms scanned with adult scan protocols. Pediatric CT test images were simulated with pediatric-sized phantoms and adjusted pediatric protocols. The framework's evaluation methodology consisted of denoising both adult and pediatric test images then assessing changes in image quality, including noise, image sharpness, CT number accuracy, and low contrast detectability. To demonstrate the use of the framework, a REDCNN denoising model trained on adult patient images was evaluated. To validate that the DL model performance measured with the proposed pediatric IQ phantoms was representative of performance in more realistic patient anatomy, anthropomorphic pediatric XCAT phantoms of the same age range were also used to compare noise reduction performance. RESULTS: Using the proposed pediatric-sized IQ phantom framework, size differences between adult and pediatric-sized phantoms were observed to substantially influence the adult trained DL denoising model's performance. When applied to adult images, the DL model achieved a 60% reduction in noise standard deviation without substantial loss in sharpness in mid or high spatial frequencies. However, in smaller phantoms the denoising performance dropped due to different image noise textures resulting from the smaller field of view (FOV) between adult and pediatric protocols. In the validation study, noise reduction trends in the pediatric-sized IQ phantoms were found to be consistent with those found in anthropomorphic phantoms. CONCLUSION: We developed a framework of using pediatric-sized IQ phantoms for pediatric subgroup evaluation of DL denoising models. Using the framework, we found the performance of an adult trained DL denoiser did not generalize well in the smaller diameter phantoms corresponding to younger pediatric patient sizes. Our work suggests noise texture differences from FOV changes between adult and pediatric protocols can contribute to poor generalizability in DL denoising and that the proposed framework is an effective means to identify these performance disparities for a given model.


Assuntos
Aprendizado Profundo , Recém-Nascido , Adulto , Humanos , Criança , Adolescente , Tomografia Computadorizada por Raios X/métodos , Razão Sinal-Ruído , Imagens de Fantasmas , Ruído , Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Doses de Radiação
4.
J Am Coll Radiol ; 20(8): 738-741, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37400046

RESUMO

Radiology has been a pioneer in adopting artificial intelligence (AI)-enabled devices into the clinic. However, initial clinical experience has identified concerns of inconsistent device performance across different patient populations. Medical devices, including those using AI, are cleared by the FDA for their specific indications for use (IFUs). IFU describes the disease or condition the device will diagnose or treat, including a description of the intended patient population. Performance data evaluated during the premarket submission support the IFU and include the intended patient population. Understanding the IFUs of a given device is thus critical to ensuring that the device is used properly and performs as expected. When devices do not perform as expected or malfunction, medical device reporting is an important way to provide feedback about the device to the manufacturer, the FDA, and other users. This article describes the ways to retrieve the IFU and performance data information as well as the FDA medical device reporting systems for unexpected performance discrepancy. It is crucial that imaging professionals, including radiologists, know how to access and use these tools to improve the informed use of medical devices for patients of all ages.


Assuntos
Inteligência Artificial , Aprovação de Equipamentos , Criança , Humanos
5.
MAGMA ; 36(3): 347-354, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37191776

RESUMO

Although there has been a resurgence of interest in low field magnetic resonance imaging (MRI) systems in recent years, low field MRI is not a new concept. FDA has a long history of evaluating the safety and effectiveness of MRI systems encompassing a wide range of field strengths. Many systems seeking marketing authorization today include new technological features (such as artificial intelligence), but this does not fundamentally change the regulatory paradigm for MR systems. In this review, we discuss some of the US regulatory considerations for low field magnetic resonance imaging (MRI) systems, including applicability of existing laws and regulations and how the U.S. Food and Drug Administration (FDA) evaluates low field MRI systems for market authorization. We also discuss regulatory considerations in the review of low field MRI systems incorporating novel AI technology. We foresee that MRI systems of all field strengths intended for general diagnostic use will continue to be evaluated for marketing clearance by the metric of substantial equivalence set forth in the premarket notification pathway.


Assuntos
Inteligência Artificial , Imageamento por Ressonância Magnética , Estados Unidos , United States Food and Drug Administration
6.
IEEE Trans Med Imaging ; 42(6): 1799-1808, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37022374

RESUMO

In recent years, generative adversarial networks (GANs) have gained tremendous popularity for potential applications in medical imaging, such as medical image synthesis, restoration, reconstruction, translation, as well as objective image quality assessment. Despite the impressive progress in generating high-resolution, perceptually realistic images, it is not clear if modern GANs reliably learn the statistics that are meaningful to a downstream medical imaging application. In this work, the ability of a state-of-the-art GAN to learn the statistics of canonical stochastic image models (SIMs) that are relevant to objective assessment of image quality is investigated. It is shown that although the employed GAN successfully learned several basic first- and second-order statistics of the specific medical SIMs under consideration and generated images with high perceptual quality, it failed to correctly learn several per-image statistics pertinent to the these SIMs, highlighting the urgent need to assess medical image GANs in terms of objective measures of image quality.

7.
Med Phys ; 50(7): 4151-4172, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37057360

RESUMO

BACKGROUND: This study reports the results of a set of discrimination experiments using simulated images that represent the appearance of subtle lesions in low-dose computed tomography (CT) of the lungs. Noise in these images has a characteristic ramp-spectrum before apodization by noise control filters. We consider three specific diagnostic features that determine whether a lesion is considered malignant or benign, two system-resolution levels, and four apodization levels for a total of 24 experimental conditions. PURPOSE: The goal of the investigation is to better understand how well human observers perform subtle discrimination tasks like these, and the mechanisms of that performance. We use a forced-choice psychophysical paradigm to estimate observer efficiency and classification images. These measures quantify how effectively subjects can read the images, and how they use images to perform discrimination tasks across the different imaging conditions. MATERIALS AND METHODS: The simulated CT images used as stimuli in the psychophysical experiments are generated from high-resolution objects passed through a modulation transfer function (MTF) before down-sampling to the image-pixel grid. Acquisition noise is then added with a ramp noise-power spectrum (NPS), with subsequent smoothing through apodization filters. The features considered are lesion size, indistinct lesion boundary, and a nonuniform lesion interior. System resolution is implemented by an MTF with resolution (10% max.) of 0.47 or 0.58 cyc/mm. Apodization is implemented by a Shepp-Logan filter (Sinc profile) with various cutoffs. Six medically naïve subjects participated in the psychophysical studies, entailing training and testing components for each condition. Training consisted of staircase procedures to find the 80% correct threshold for each subject, and testing involved 2000 psychophysical trials at the threshold value for each subject. Human-observer performance is compared to the Ideal Observer to generate estimates of task efficiency. The significance of imaging factors is assessed using ANOVA. Classification images are used to estimate the linear template weights used by subjects to perform these tasks. Classification-image spectra are used to analyze subject weights in the spatial-frequency domain. RESULTS: Overall, average observer efficiency is relatively low in these experiments (10%-40%) relative to detection and localization studies reported previously. We find significant effects for feature type and apodization level on observer efficiency. Somewhat surprisingly, system resolution is not a significant factor. Efficiency effects of the different features appear to be well explained by the profile of the linear templates in the classification images. Increasingly strong apodization is found to both increase the classification-image weights and to increase the mean-frequency of the classification-image spectra. A secondary analysis of "Unapodized" classification images shows that this is largely due to observers undoing (inverting) the effects of apodization filters. CONCLUSIONS: These studies demonstrate that human observers can be relatively inefficient at feature-discrimination tasks in ramp-spectrum noise. Observers appear to be adapting to frequency suppression implemented in apodization filters, but there are residual effects that are not explained by spatial weighting patterns. The studies also suggest that the mechanisms for improving performance through the application of noise-control filters may require further investigation.


Assuntos
Processamento de Imagem Assistida por Computador , Tomografia Computadorizada por Raios X , Humanos , Processamento de Imagem Assistida por Computador/métodos , Imagens de Fantasmas , Algoritmos
8.
Nat Mach Intell ; 4(11): 922-929, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36935774

RESUMO

The metaverse integrates physical and virtual realities, enabling humans and their avatars to interact in an environment supported by technologies such as high-speed internet, virtual reality, augmented reality, mixed and extended reality, blockchain, digital twins and artificial intelligence (AI), all enriched by effectively unlimited data. The metaverse recently emerged as social media and entertainment platforms, but extension to healthcare could have a profound impact on clinical practice and human health. As a group of academic, industrial, clinical and regulatory researchers, we identify unique opportunities for metaverse approaches in the healthcare domain. A metaverse of 'medical technology and AI' (MeTAI) can facilitate the development, prototyping, evaluation, regulation, translation and refinement of AI-based medical practice, especially medical imaging-guided diagnosis and therapy. Here, we present metaverse use cases, including virtual comparative scanning, raw data sharing, augmented regulatory science and metaversed medical intervention. We discuss relevant issues on the ecosystem of the MeTAI metaverse including privacy, security and disparity. We also identify specific action items for coordinated efforts to build the MeTAI metaverse for improved healthcare quality, accessibility, cost-effectiveness and patient satisfaction.

9.
Med Phys ; 49(2): 836-853, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-34954845

RESUMO

PURPOSE: Deep learning (DL) is rapidly finding applications in low-dose CT image denoising. While having the potential to improve the image quality (IQ) over the filtered back projection method (FBP) and produce images quickly, performance generalizability of the data-driven DL methods is not fully understood yet. The main purpose of this work is to investigate the performance generalizability of a low-dose CT image denoising neural network in data acquired under different scan conditions, particularly relating to these three parameters: reconstruction kernel, slice thickness, and dose (noise) level. A secondary goal is to identify any underlying data property associated with the CT scan settings that might help predict the generalizability of the denoising network. METHODS: We select the residual encoder-decoder convolutional neural network (REDCNN) as an example of a low-dose CT image denoising technique in this work. To study how the network generalizes on the three imaging parameters, we grouped the CT volumes in the Low-Dose Grand Challenge (LDGC) data into three pairs of training datasets according to their imaging parameters, changing only one parameter in each pair. We trained REDCNN with them to obtain six denoising models. We test each denoising model on datasets of matching and mismatching parameters with respect to its training sets regarding dose, reconstruction kernel, and slice thickness, respectively, to evaluate the denoising performance changes. Denoising performances are evaluated on patient scans, simulated phantom scans, and physical phantom scans using IQ metrics including mean-squared error (MSE), contrast-dependent modulation transfer function (MTF), pixel-level noise power spectrum (pNPS), and low-contrast lesion detectability (LCD). RESULTS: REDCNN had larger MSE when the testing data were different from the training data in reconstruction kernel, but no significant MSE difference when varying slice thickness in the testing data. REDCNN trained with quarter-dose data had slightly worse MSE in denoising higher-dose images than that trained with mixed-dose data (17%-80%). The MTF tests showed that REDCNN trained with the two reconstruction kernels and slice thicknesses yielded images of similar image resolution. However, REDCNN trained with mixed-dose data preserved the low-contrast resolution better compared to REDCNN trained with quarter-dose data. In the pNPS test, it was found that REDCNN trained with smooth-kernel data could not remove high-frequency noise in the test data of sharp kernel, possibly because the lack of high-frequency noise in the smooth-kernel data limited the ability of the trained model in removing high-frequency noise. Finally, in the LCD test, REDCNN improved the lesion detectability over the original FBP images regardless of whether the training and testing data had matching reconstruction kernels. CONCLUSIONS: REDCNN is observed to be poorly generalizable between reconstruction kernels, more robust in denoising data of arbitrary dose levels when trained with mixed-dose data, and not highly sensitive to slice thickness. It is known that reconstruction kernel affects the in-plane pNPS shape of a CT image, whereas slice thickness and dose level do not, so it is possible that the generalizability performance of this CT image denoising network highly correlates to the pNPS similarity between the testing and training data.


Assuntos
Aprendizado Profundo , Algoritmos , Humanos , Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Imagens de Fantasmas , Doses de Radiação , Razão Sinal-Ruído , Tomografia Computadorizada por Raios X
10.
J Med Imaging (Bellingham) ; 7(4): 042802, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32118094

RESUMO

A recent study reported on an in-silico imaging trial that evaluated the performance of digital breast tomosynthesis (DBT) as a replacement for full-field digital mammography (FFDM) for breast cancer screening. In this in-silico trial, the whole imaging chain was simulated, including the breast phantom generation, the x-ray transport process, and computational readers for image interpretation. We focus on the design and performance characteristics of the computational reader in the above-mentioned trial. Location-known lesion (spiculated mass and clustered microcalcifications) detection tasks were used to evaluate the imaging system performance. The computational readers were designed based on the mechanism of a channelized Hotelling observer (CHO), and the reader models were selected to trend human performance. Parameters were tuned to ensure stable lesion detectability. A convolutional CHO that can adapt a round channel function to irregular lesion shapes was compared with the original CHO and was found to be suitable for detecting clustered microcalcifications but was less optimal in detecting spiculated masses. A three-dimensional CHO that operated on the multiple slices was compared with a two-dimensional (2-D) CHO that operated on three versions of 2-D slabs converted from the multiple slices and was found to be optimal in detecting lesions in DBT. Multireader multicase reader output analysis was used to analyze the performance difference between FFDM and DBT for various breast and lesion types. The results showed that DBT was more beneficial in detecting masses than detecting clustered microcalcifications compared with FFDM, consistent with the finding in a clinical imaging trial. Statistical uncertainty smaller than 0.01 standard error for the estimated performance differences was achieved with a dataset containing approximately 3000 breast phantoms. The computational reader design methodology presented provides evidence that model observers can be useful in-silico tools for supporting the performance comparison of breast imaging systems.

11.
Artigo em Inglês | MEDLINE | ID: mdl-33384465

RESUMO

We investigate a series of two-alternative forced-choice (2AFC) discrimination tasks based on malignant features of abnormalities in low-dose lung CT scans. A total of 3 tasks are evaluated, and these consist of a size-discrimination task, a boundary-sharpness task, and an irregular-interior task. Target and alternative signal profiles for these tasks are modulated by one of two system transfer functions and embedded in ramp-spectrum noise that has been apodized for noise control in one of 4 different ways. This gives the resulting images statistical properties that are related to weak ground-glass lesions in axial slices of low-dose lung CT images. We investigate observer performance in these tasks using a combination of statistical efficiency and classification images. We report results of 24 2AFC experiments involving the three tasks. A staircase procedure is used to find the approximate 80% correct discrimination threshold in each task, with a subsequent set of 2,000 trials at this threshold. These data are used to estimate statistical efficiency with respect to the ideal observer for each task, and to estimate the observer template using the classification-image methodology. We find efficiency varies between the different tasks with lowest efficiency in the boundary-sharpness task, and highest efficiency in the non-uniform interior task. All three tasks produce clearly visible patterns of positive and negative weighting in the classification images. The spatial frequency plots of classification images show how apodization results in larger weights at higher spatial frequencies.

12.
Med Phys ; 46(9): 3924-3928, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-31228352

RESUMO

PURPOSE: In silico imaging clinical trials are emerging alternative sources of evidence for regulatory evaluation and are typically cheaper and faster than human trials. In this Note, we describe the set of in silico imaging software tools used in the VICTRE (Virtual Clinical Trial for Regulatory Evaluation) which replicated a traditional trial using a computational pipeline. MATERIALS AND METHODS: We describe a complete imaging clinical trial software package for comparing two breast imaging modalities (digital mammography and digital breast tomosynthesis). First, digital breast models were developed based on procedural generation techniques for normal anatomy. Second, lesions were inserted in a subset of breast models. The breasts were imaged using GPU-accelerated Monte Carlo transport methods and read using image interpretation models for the presence of lesions. All in silico components were assembled into a computational pipeline. The VICTRE images were made available in DICOM format for ease of use and visualization. RESULTS: We describe an open-source collection of in silico tools for running imaging clinical trials. All tools and source codes have been made freely available. CONCLUSION: The open-source tools distributed as part of the VICTRE project facilitate the design and execution of other in silico imaging clinical trials. The entire pipeline can be run as a complete imaging chain, modified to match needs of other trial designs, or used as independent components to build additional pipelines.


Assuntos
Ensaios Clínicos como Assunto , Simulação por Computador , Mamografia/métodos , Humanos , Processamento de Imagem Assistida por Computador , Software
13.
Med Phys ; 46(4): 1634-1647, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30723944

RESUMO

PURPOSE: For computed tomography (CT) systems in which noise is nonstationary, a local noise power spectrum (NPS) is often needed to characterize its noise property. We have previously developed a data-efficient radial NPS method to estimate the two-dimensional (2D) local NPS for filtered back projection (FBP)-reconstructed fan-beam CT utilizing the polar separability of CT NPS. In this work, we extend this method to estimate three-dimensional (3D) local NPS for feldkamp-davis-kress (FDK)-reconstructed cone-beam CT (CBCT) volumes. METHODS: Starting from the 2D polar separability, we analyze the CBCT geometry and FDK image reconstruction process to derive the 3D expression of the polar separability for CBCT local NPS. With the polar separability, the 3D local NPS of CBCT can be decomposed into a 2D radial NPS shape function and a one-dimensional (1D) angular amplitude function with certain geometrical transforms. The 2D radial NPS shape function is a global function characterizing the noise correlation structure, while the 1D angular amplitude function is a local function reflecting the varying local noise amplitudes. The 3D radial local NPS method is constructed from the polar separability. We evaluate the accuracy of the 3D radial local NPS method using simulated and real CBCT data by comparing the radial local NPS estimates to a reference local NPS in terms of normalized mean squared error (NMSE) and a task-based performance metric (lesion detectability). RESULTS: In both simulated and physical CBCT examples, a very small NMSE (<5%) was achieved by the radial local NPS method from as few as two scans, while for the traditional local NPS method, about 20 scans were needed to reach this accuracy. The results also showed that the detectability-based system performances computed using the local NPS estimated with the NPS method developed in this work from two scans closely reflected the actual system performance. CONCLUSIONS: The polar separability greatly reduces the data dimensionality of the 3D CBCT local NPS. The radial local NPS method developed based on this property is shown to be capable of estimating the 3D local NPS from only two CBCT scans with acceptable accuracy. The minimum data requirement indicates the potential utility of local NPS in CBCT applications even for clinical situations.


Assuntos
Algoritmos , Tomografia Computadorizada de Feixe Cônico/métodos , Tomografia Computadorizada Quadridimensional/métodos , Processamento de Imagem Assistida por Computador/métodos , Neoplasias Pulmonares/diagnóstico por imagem , Imagens de Fantasmas , Humanos , Razão Sinal-Ruído
14.
Acad Radiol ; 26(7): 937-948, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30292564

RESUMO

RATIONALE AND OBJECTIVES: The quantitative assessment of volumetric CT for discriminating small changes in nodule size has been under-examined. This phantom study examined the effect of imaging protocol, nodule size, and measurement method on volume-based change discrimination across low and high object to background contrast tasks. MATERIALS AND METHODS: Eight spherical objects ranging in diameter from 5.0 mm to 5.75 mm and 8.0 mm to 8.75 mm with 0.25 mm increments were scanned within an anthropomorphic phantom with either foam-background (high-contrast task, ∼1000 HU object to background difference)) or gelatin-background (low-contrast task, ∼50 to 100 HU difference). Ten repeat acquisitions were collected for each protocol with varying exposures, reconstructed slice thicknesses and reconstruction kernels. Volume measurements were obtained using a matched-filter approach (MF) and a publicly available 3D segmentation-based tool (SB). Discrimination of nodule sizes was assessed using the area under the ROC curve (AUC). RESULTS: Using a low-dose (1.3 mGy), thin-slice (≤1.5 mm) protocol, changes of 0.25 mm in diameter were detected with AU = 1.0 for all baseline sizes for the high-contrast task regardless of measurement method. For the more challenging low-contrast task and same protocol, MF detected changes of 0.25 mm from baseline sizes ≥5.25 mm and volume changes ≥9.4% with AUC≥0.81 whereas corresponding results for SB were poor (AUC within 0.49-0.60). Performance for SB was improved, but still inconsistent, when exposure was increased to 4.4 mGy. CONCLUSION: The reliable discrimination of small changes in pulmonary nodule size with low-dose, thin-slice CT protocols suitable for lung cancer screening was dependent on the inter-related effects of nodule to background contrast and measurement method.


Assuntos
Neoplasias Pulmonares/diagnóstico por imagem , Nódulo Pulmonar Solitário/diagnóstico por imagem , Tomografia Computadorizada por Raios X/métodos , Área Sob a Curva , Detecção Precoce de Câncer/métodos , Humanos , Pulmão/diagnóstico por imagem , Neoplasias Pulmonares/patologia , Imagens de Fantasmas , Curva ROC , Doses de Radiação , Nódulo Pulmonar Solitário/patologia , Carga Tumoral
15.
Phys Med Biol ; 63(17): 175006, 2018 08 30.
Artigo em Inglês | MEDLINE | ID: mdl-30101756

RESUMO

Extracting coronary artery calcium (CAC) scores from contrast-enhanced computed tomography (CT) images using dual-energy (DE) based material decomposition has been shown feasible, mainly through patient studies. However, the quantitative performance of such DE-based CAC scores, particularly per stenosis, is underexamined due to lack of reference standard and repeated scans. In this work we conducted a comprehensive quantitative comparative analysis of CAC scores obtained with DE and compare to conventional unenhanced single-energy (SE) CT scans through phantom studies. Synthetic vessels filled with iodinated blood mimicking material and containing calcium stenoses of different sizes and densities were scanned with a third generation dual-source CT scanner in a chest phantom using a DE coronary CT angiography protocol with three exposures/CTDIvol: auto-mAs/8 mGy (automatic exposure), 160 mAs/20 mGy and 260 mAs/34 mGy and 10 repeats. As a control, a set of vessel phantoms without iodine was scanned using a standard SE CAC score protocol (3 mGy). Calcium volume, mass and Agatston scores were estimated for each stenosis. For DE dataset, image-based three-material decomposition was applied to remove iodine before scoring. Performance of DE-based calcium scores were analyzed on a per-stenosis level and compared to SE-based scores. There was excellent correlation between the DE- and SE-based scores (correlation coefficient r: 0.92-0.98). Percent bias for the calcium volume and mass scores varied as a function of stenosis size and density for both modalities. Precision (coefficient of variation) improved with larger and denser stenoses for both DE- and SE-based calcium scores. DE-based scores (20 mGy and 34 mGy) provided comparable per-stenosis precision to SE-based (3 mGy). Our findings suggest that on a per-stenosis level, DE-based CAC scores from contrast-enhanced CT images can achieve comparable quantification performance to conventional SE-based scores. However, DE-based CAC scoring required more dose compared with SE for high per-stenosis precision so some caution is necessary with clinical DE-based CAC scoring.


Assuntos
Angiografia por Tomografia Computadorizada/métodos , Doença da Artéria Coronariana/diagnóstico por imagem , Tomógrafos Computadorizados/normas , Calcificação Vascular/diagnóstico por imagem , Angiografia por Tomografia Computadorizada/instrumentação , Vasos Coronários/diagnóstico por imagem , Humanos , Imagens de Fantasmas , Reprodutibilidade dos Testes
16.
Med Phys ; 45(7): 3019-3030, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-29704868

RESUMO

PURPOSE: The task-based assessment of image quality using model observers is increasingly used for the assessment of different imaging modalities. However, the performance computation of model observers needs standardization as well as a well-established trust in its implementation methodology and uncertainty estimation. The purpose of this work was to determine the degree of equivalence of the channelized Hotelling observer performance and uncertainty estimation using an intercomparison exercise. MATERIALS AND METHODS: Image samples to estimate model observer performance for detection tasks were generated from two-dimensional CT image slices of a uniform water phantom. A common set of images was sent to participating laboratories to perform and document the following tasks: (a) estimate the detectability index of a well-defined CHO and its uncertainty in three conditions involving different sized targets all at the same dose, and (b) apply this CHO to an image set where ground truth was unknown to participants (lower image dose). In addition, and on an optional basis, we asked the participating laboratories to (c) estimate the performance of real human observers from a psychophysical experiment of their choice. Each of the 13 participating laboratories was confidentially assigned a participant number and image sets could be downloaded through a secure server. Results were distributed with each participant recognizable by its number and then each laboratory was able to modify their results with justification as model observer calculation are not yet a routine and potentially error prone. RESULTS: Detectability index increased with signal size for all participants and was very consistent for 6 mm sized target while showing higher variability for 8 and 10 mm sized target. There was one order of magnitude between the lowest and the largest uncertainty estimation. CONCLUSIONS: This intercomparison helped define the state of the art of model observer performance computation and with thirteen participants, reflects openness and trust within the medical imaging community. The performance of a CHO with explicitly defined channels and a relatively large number of test images was consistently estimated by all participants. In contrast, the paper demonstrates that there is no agreement on estimating the variance of detectability in the training and testing setting.


Assuntos
Processamento de Imagem Assistida por Computador , Laboratórios , Tomografia Computadorizada por Raios X , Variações Dependentes do Observador , Incerteza
17.
Med Phys ; 45(5): 1970-1984, 2018 May.
Artigo em Inglês | MEDLINE | ID: mdl-29532479

RESUMO

PURPOSE: This study investigates forced localization of targets in simulated images with statistical properties similar to trans-axial sections of x-ray computed tomography (CT) volumes. A total of 24 imaging conditions are considered, comprising two target sizes, three levels of background variability, and four levels of frequency apodization. The goal of the study is to better understand how human observers perform forced-localization tasks in images with CT-like statistical properties. METHODS: The transfer properties of CT systems are modeled by a shift-invariant transfer function in addition to apodization filters that modulate high spatial frequencies. The images contain noise that is the combination of a ramp-spectrum component, simulating the effect of acquisition noise in CT, and a power-law component, simulating the effect of normal anatomy in the background, which are modulated by the apodization filter as well. Observer performance is characterized using two psychophysical techniques: efficiency analysis and classification image analysis. Observer efficiency quantifies how much diagnostic information is being used by observers to perform a task, and classification images show how that information is being accessed in the form of a perceptual filter. RESULTS: Psychophysical studies from five subjects form the basis of the results. Observer efficiency ranges from 29% to 77% across the different conditions. The lowest efficiency is observed in conditions with uniform backgrounds, where significant effects of apodization are found. The classification images, estimated using smoothing windows, suggest that human observers use center-surround filters to perform the task, and these are subjected to a number of subsequent analyses. When implemented as a scanning linear filter, the classification images appear to capture most of the observer variability in efficiency (r2 = 0.86). The frequency spectra of the classification images show that frequency weights generally appear bandpass in nature, with peak frequency and bandwidth that vary with statistical properties of the images. CONCLUSIONS: In these experiments, the classification images appear to capture important features of human-observer performance. Frequency apodization only appears to have a significant effect on performance in the absence of anatomical variability, where the observers appear to underweight low spatial frequencies that have relatively little noise. Frequency weights derived from the classification images generally have a bandpass structure, with adaptation to different conditions seen in the peak frequency and bandwidth. The classification image spectra show relatively modest changes in response to different levels of apodization, with some evidence that observers are attempting to rebalance the apodized spectrum presented to them.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Razão Sinal-Ruído , Estatística como Assunto , Tomografia Computadorizada por Raios X
18.
JAMA Netw Open ; 1(7): e185474, 2018 11 02.
Artigo em Inglês | MEDLINE | ID: mdl-30646401

RESUMO

Importance: Expensive and lengthy clinical trials can delay regulatory evaluation of innovative technologies, affecting patient access to high-quality medical products. Simulation is increasingly being used in product development but rarely in regulatory applications. Objectives: To conduct a computer-simulated imaging trial evaluating digital breast tomosynthesis (DBT) as a replacement for digital mammography (DM) and to compare the results with a comparative clinical trial. Design, Setting, and Participants: The simulated Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) trial was designed to replicate a clinical trial that used human patients and radiologists. Images obtained with in silico versions of DM and DBT systems via fast Monte Carlo x-ray transport were interpreted by a computational reader detecting the presence of lesions. A total of 2986 synthetic image-based virtual patients with breast sizes and radiographic densities representative of a screening population and compressed thicknesses from 3.5 to 6 cm were generated using an analytic approach in which anatomical structures are randomly created within a predefined breast volume and compressed in the craniocaudal orientation. A positive cohort contained a digitally inserted microcalcification cluster or spiculated mass. Main Outcomes and Measures: The trial end point was the difference in area under the receiver operating characteristic curve between modalities for lesion detection. The trial was sized for an SE of 0.01 in the change in area under the curve (AUC), half the uncertainty in the comparative clinical trial. Results: In this trial, computational readers analyzed 31 055 DM and 27 960 DBT cases from 2986 virtual patients with the following Breast Imaging Reporting and Data System densities: 286 (9.6%) extremely dense, 1200 (40.2%) heterogeneously dense, 1200 (40.2%) scattered fibroglandular densities, and 300 (10.0%) almost entirely fat. The mean (SE) change in AUC was 0.0587 (0.0062) (P < .001) in favor of DBT. The change in AUC was larger for masses (mean [SE], 0.0903 [0.008]) than for calcifications (mean [SE], 0.0268 [0.004]), which was consistent with the findings of the comparative trial (mean [SE], 0.065 [0.017] for masses and -0.047 [0.032] for calcifications). Conclusions and Relevance: The results of the simulated VICTRE trial are consistent with the performance seen in the comparative trial. While further research is needed to assess the generalizability of these findings, in silico imaging trials represent a viable source of regulatory evidence for imaging devices.


Assuntos
Mamografia/métodos , Mamografia/normas , Mama/diagnóstico por imagem , Neoplasias da Mama/diagnóstico por imagem , Calcinose/diagnóstico por imagem , Simulação por Computador , Feminino , Humanos , Curva ROC
19.
Phys Med Biol ; 62(7): 2598-2611, 2017 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-28151728

RESUMO

We showed in our earlier work that the choice of reconstruction methods does not affect the optimization of DBT acquisition parameters (angular span and number of views) using simulated breast phantom images in detecting lesions with a channelized Hotelling observer (CHO). In this work we investigate whether the model-observer based conclusion is valid when using humans to interpret images. We used previously generated DBT breast phantom images and recruited human readers to find the optimal geometry settings associated with two reconstruction algorithms, filtered back projection (FBP) and simultaneous algebraic reconstruction technique (SART). The human reader results show that image quality trends as a function of the acquisition parameters are consistent between FBP and SART reconstructions. The consistent trends confirm that the optimization of DBT system geometry is insensitive to the choice of reconstruction algorithm. The results also show that humans perform better in SART reconstructed images than in FBP reconstructed images. In addition, we applied CHOs with three commonly used channel models, Laguerre-Gauss (LG) channels, square (SQR) channels and sparse difference-of-Gaussian (sDOG) channels. We found that LG channels predict human performance trends better than SQR and sDOG channel models for the task of detecting lesions in tomosynthesis backgrounds. Overall, this work confirms that the choice of reconstruction algorithm is not critical for optimizing DBT system acquisition parameters.


Assuntos
Algoritmos , Neoplasias da Mama/patologia , Mama/patologia , Processamento de Imagem Assistida por Computador/normas , Mamografia/métodos , Imagens de Fantasmas , Tomografia por Raios X/métodos , Mama/diagnóstico por imagem , Neoplasias da Mama/diagnóstico por imagem , Feminino , Humanos , Processamento de Imagem Assistida por Computador/métodos , Modelos Teóricos
20.
Quant Imaging Med Surg ; 7(6): 623-635, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29312867

RESUMO

BACKGROUND: To assess the volumetric measurement of small (≤1 cm) nonsolid nodules with computed tomography (CT), focusing on the interaction of state of the art iterative reconstruction (IR) methods and dose with nodule densities, sizes, and shapes. METHODS: Twelve synthetic nodules [5 and 10 mm in diameter, densities of -800, -630 and -10 Hounsfield units (HU), spherical and spiculated shapes] were scanned within an anthropomorphic phantom. Dose [computed tomography scan dose index (CTDIvol)] ranged from standard (4.1 mGy) to below screening levels (0.3 mGy). Data was reconstructed using filtered back-projection and two state-of-the-art IR methods (adaptive and model-based). Measurements were extracted with a previously validated matched filter-based estimator. Analysis of accuracy and precision was based on evaluation of percent bias (PB) and the repeatability coefficient (RC) respectively. RESULTS: Density had the most important effect on measurement error followed by the interaction of density with nodule size. The nonsolid -630 HU nodules had high accuracy and precision at levels comparable to solid (-10 HU) nonsolid, regardless of reconstruction method and with CTDIvol as low as 0.6 mGy. PB was <5% and <11% for the 10- and 5-mm in nominal diameter -630 HU nodules respectively, and RC was <5% and <12% for the same nodules. For nonsolid -800 HU nodules, PB increased to <11% and <30% for the 10- and 5-mm nodules respectively, whereas RC increased slightly overall but varied widely across dose and reconstruction algorithms for the 5-mm nodules. Model-based IR improved measurement accuracy for the 5-mm, low-density (-800, -630 HU) nodules. For other nodules the effect of reconstruction method was small. Dose did not affect volumetric accuracy and only affected slightly the precision of 5-mm nonsolid nodules. CONCLUSIONS: Reasonable values of both accuracy and precision were achieved for volumetric measurements of all 10-mm nonsolid nodules, and for the 5-mm nodules with -630 HU or higher density, when derived from scans acquired with below screening dose levels as low as 0.6 mGy and regardless of reconstruction algorithm.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA