RESUMEN
Purpose To determine whether saliency maps in radiology artificial intelligence (AI) are vulnerable to subtle perturbations of the input, which could lead to misleading interpretations, using prediction-saliency correlation (PSC) for evaluating the sensitivity and robustness of saliency methods. Materials and Methods In this retrospective study, locally trained deep learning models and a research prototype provided by a commercial vendor were systematically evaluated on 191 229 chest radiographs from the CheXpert dataset and 7022 MR images from a human brain tumor classification dataset. Two radiologists performed a reader study on 270 chest radiograph pairs. A model-agnostic approach for computing the PSC coefficient was used to evaluate the sensitivity and robustness of seven commonly used saliency methods. Results The saliency methods had low sensitivity (maximum PSC, 0.25; 95% CI: 0.12, 0.38) and weak robustness (maximum PSC, 0.12; 95% CI: 0.0, 0.25) on the CheXpert dataset, as demonstrated by leveraging locally trained model parameters. Further evaluation showed that the saliency maps generated from a commercial prototype could be irrelevant to the model output, without knowledge of the model specifics (area under the receiver operating characteristic curve decreased by 8.6% without affecting the saliency map). The human observer studies confirmed that it is difficult for experts to identify the perturbed images; the experts had less than 44.8% correctness. Conclusion Popular saliency methods scored low PSC values on the two datasets of perturbed chest radiographs, indicating weak sensitivity and robustness. The proposed PSC metric provides a valuable quantification tool for validating the trustworthiness of medical AI explainability. Keywords: Saliency Maps, AI Trustworthiness, Dynamic Consistency, Sensitivity, Robustness Supplemental material is available for this article. © RSNA, 2023 See also the commentary by Yanagawa and Sato in this issue.
Asunto(s)
Inteligencia Artificial , Radiología , Humanos , Estudios Retrospectivos , Radiografía , RadiólogosRESUMEN
PURPOSE: Aim of the recent study is to point out a method to optimize quality of CT scans in oncological patients with port systems. This study investigates the potential of photon counting computed tomography (PCCT) for reduction of beam hardening artifacts caused by port-implants in chest imaging by means of spectral reconstructions. METHOD: In this retrospective single-center study, 8 ROIs for 19 spectral reconstructions (polyenergetic imaging, monoenergetic reconstructions from 40 to 190 keV as well as iodine maps and virtual non contrast (VNC)) of 49 patients with pectoral port systems undergoing PCCT of the chest for staging of oncologic disease were measured. Mean values and standard deviation (SD) Hounsfield unit measurements of port-chamber associated hypo- and hyperdense artifacts, bilateral muscles and vessels has been carried out. Also, a structured assessment of artifacts and imaging findings was performed by two radiologists. RESULTS: A significant association of keV with iodine contrast as well as artifact intensity was noted (all p < 0.001). In qualitative assessment, utilization of 120 keV monoenergetic reconstructions could reduce severe and pronounced artifacts completely, as compared to lower keV reconstructions (p < 0.001). Regarding imaging findings, no significant difference between monoenergetic reconstructions was noted (all p > 0.05). In cases with very high iodine concentrations in the subclavian vein, image distortions were noted at 40 keV images (p < 0.01). CONCLUSIONS: The present study demonstrates that PCCT derived spectral reconstructions can be used in oncological imaging of the thorax to reduce port-derived beam-hardening artefacts. When evaluating image data sets within a staging, it can be particularly helpful to consider the 120 keV VMIs, in which the artefacts are comparatively low.
Asunto(s)
Artefactos , Radiografía Torácica , Tomografía Computarizada por Rayos X , Humanos , Masculino , Femenino , Persona de Mediana Edad , Anciano , Tomografía Computarizada por Rayos X/métodos , Radiografía Torácica/métodos , Estudios Retrospectivos , Adulto , Anciano de 80 o más Años , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Fotones , Reproducibilidad de los ResultadosRESUMEN
PURPOSE: We created an infrastructure for no code machine learning (NML) platform for non-programming physicians to create NML model. We tested the platform by creating an NML model for classifying radiographs for the presence and absence of clavicle fractures. METHODS: Our IRB-approved retrospective study included 4135 clavicle radiographs from 2039 patients (mean age 52 ± 20 years, F:M 1022:1017) from 13 hospitals. Each patient had two-view clavicle radiographs with axial and anterior-posterior projections. The positive radiographs had either displaced or non-displaced clavicle fractures. We configured the NML platform to automatically retrieve the eligible exams using the series' unique identification from the hospital virtual network archive via web access to DICOM Objects. The platform trained a model until the validation loss plateaus. Once the testing was complete, the platform provided the receiver operating characteristics curve and confusion matrix for estimating sensitivity, specificity, and accuracy. RESULTS: The NML platform successfully retrieved 3917 radiographs (3917/4135, 94.7 %) and parsed them for creating a ML classifier with 2151 radiographs in the training, 100 radiographs for validation, and 1666 radiographs in testing datasets (772 radiographs with clavicle fracture, 894 without clavicle fracture). The network identified clavicle fracture with 90 % sensitivity, 87 % specificity, and 88 % accuracy with AUC of 0.95 (confidence interval 0.94-0.96). CONCLUSION: A NML platform can help physicians create and test machine learning models from multicenter imaging datasets such as the one in our study for classifying radiographs based on the presence of clavicle fracture.
Asunto(s)
Clavícula , Fracturas Óseas , Aprendizaje Automático , Humanos , Clavícula/lesiones , Clavícula/diagnóstico por imagen , Fracturas Óseas/diagnóstico por imagen , Fracturas Óseas/clasificación , Femenino , Persona de Mediana Edad , Masculino , Estudios Retrospectivos , Sensibilidad y Especificidad , Adulto , Radiografía/métodosRESUMEN
PURPOSE: The objective of our IAEA-coordinated international study was to assess CT practices and radiation doses from multiple hospitals across several African countries. METHODS: The study included 13 hospitals from Africa which contributed information on minimum of 20 consecutive patients who underwent head, chest, and/or abdomen-pelvis CT. Prior to the data recording step, all hospitals had a mandatory one-hour training on the best practices in recording the relevant data elements. The recorded data elements included patient age, weight, protocol name, scanner information, acquisition parameters, and radiation dose descriptors including phase-specific CT dose index volume (CTDIvol in mGy) and dose length product (DLP in mGy.cm). We estimated the median and interquartile range of body-region specific CTDIvol and DLP and compared data across sites and countries using the Kruskal-Wallis H Test for non-normal distribution, analysis of variance. RESULTS: A total of 1061 patients (mean age 50 ± 19 years) were included in the study. 16 % of CT exams had no stated clinical indications for CT examinations of the head (32/343, 9 %), chest (50/281, 18 %), abdomen-pelvis (67/243, 28 %), and/or chest-abdomen-pelvis CT (24/194, 12 %). Most hospitals used multiphase CT protocols for abdomen-pelvis (9/11 hospitals) and chest CT (10/12 hospitals), regardless of clinical indications. Total median DLP values for head (953 mGy.cm), chest (405 mGy.cm), and abdomen-pelvis (1195 mGy.cm) CT were above the UK, German, and American College of Radiology Diagnostic Reference Levels (DRLs). CONCLUSIONS: Concerning variations in CT practices and protocols across several hospitals in Africa were demonstrated, emphasizing the need for better protocol optimization to improve patient safety.
Asunto(s)
Agencias Internacionales , Dosis de Radiación , Tomografía Computarizada por Rayos X , Humanos , África , Persona de Mediana Edad , Masculino , Femenino , Adulto , Energía Nuclear , AncianoRESUMEN
OBJECTIVE: To assess the effectiveness of low contrast volume (LCV) chest CT performed with multiple contrast agents on multivendor CT with varying scanning techniques. METHODS: The study included 361 patients (65 ± 15 years; M: F 173:188) who underwent LCV chest CT on one of the six 64-256 detector-row CT scanners using single-energy (SECT) or dual-energy (DECT) modes. All patients were scanned with either a fixed-LCV (LCVf, n = 103) or weight-based LCV (LCVw, n = 258) protocol. Two thoracic radiologists independently assessed all LCV CT and patients' prior standard contrast volume (SCV, n = 263) chest CT for optimality of contrast enhancement in thoracic vasculature, cardiac chambers, and in pleuro-parenchymal and mediastinal abnormalities. CT attenuations were recorded in the main pulmonary trunk, ascending, and descending thoracic aorta. To assess the interobserver agreement, pulmonary arterial enhancement was divided into two groups: optimal or suboptimal. RESULTS: There was no significant difference among patients' BMI (p = 0.883) in the three groups. DECT had a significantly higher aortic arterial enhancement (250 ± 99HU vs 228 ± 76 HU for SECT, p < 0.001). Optimal enhancement was present in 558 of 624 chest CT (89.4%), whereas 66 of 624 chest CT with suboptimal enhancement was noted in 48 of 258 LCVw (18.6%) and 14 of 103 LCVf (13.6%). Most patients with suboptimal enhancement with LCVw injection protocol were overweight/obese (30/48; 62.5%), (p < 0.001). CONCLUSION: LCV chest CT can be performed across complex multivendor, multicontrast media, multiscanner, and multiprotocol CT practices. However, LCV chest CT examinations can result in suboptimal contrast enhancement in patients with larger body habitus.
Asunto(s)
Medios de Contraste , Tomografía Computarizada por Rayos X , Humanos , Tomografía Computarizada por Rayos X/métodos , Tórax , Aorta , Arteria PulmonarRESUMEN
Purpose: Motion-impaired CT images can result in limited or suboptimal diagnostic interpretation (with missed or miscalled lesions) and patient recall. We trained and tested an artificial intelligence (AI) model for identifying substantial motion artifacts on CT pulmonary angiography (CTPA) that have a negative impact on diagnostic interpretation. Methods: With IRB approval and HIPAA compliance, we queried our multicenter radiology report database (mPower, Nuance) for CTPA reports between July 2015 and March 2022 for the following terms: "motion artifacts", "respiratory motion", "technically inadequate", and "suboptimal" or "limited exam". All CTPA reports were from two quaternary (Site A, n = 335; B, n = 259) and a community (C, n = 199) healthcare sites. A thoracic radiologist reviewed CT images of all positive hits for motion artifacts (present or absent) and their severity (no diagnostic effect or major diagnostic impairment). Coronal multiplanar images from 793 CTPA exams were de-identified and exported offline into an AI model building prototype (Cognex Vision Pro, Cognex Corporation) to train an AI model to perform two-class classification ("motion" or "no motion") with data from the three sites (70% training dataset, n = 554; 30% validation dataset, n = 239). Separately, data from Site A and Site C were used for training and validating; testing was performed on the Site B CTPA exams. A five-fold repeated cross-validation was performed to evaluate the model performance with accuracy and receiver operating characteristics analysis (ROC). Results: Among the CTPA images from 793 patients (mean age 63 ± 17 years; 391 males, 402 females), 372 had no motion artifacts, and 421 had substantial motion artifacts. The statistics for the average performance of the AI model after five-fold repeated cross-validation for the two-class classification included 94% sensitivity, 91% specificity, 93% accuracy, and 0.93 area under the ROC curve (AUC: 95% CI 0.89-0.97). Conclusion: The AI model used in this study can successfully identify CTPA exams with diagnostic interpretation limiting motion artifacts in multicenter training and test datasets. Clinical relevance: The AI model used in the study can help alert technologists about the presence of substantial motion artifacts on CTPA, where a repeat image acquisition can help salvage diagnostic information.
RESUMEN
RATIONALE AND OBJECTIVES: Suboptimal chest radiographs (CXR) can limit interpretation of critical findings. Radiologist-trained AI models were evaluated for differentiating suboptimal(sCXR) and optimal(oCXR) chest radiographs. MATERIALS AND METHODS: Our IRB-approved study included 3278 CXRs from adult patients (mean age 55 ± 20 years) identified from a retrospective search of CXR in radiology reports from 5 sites. A chest radiologist reviewed all CXRs for the cause of suboptimality. The de-identified CXRs were uploaded into an AI server application for training and testing 5 AI models. The training set consisted of 2202 CXRs (n = 807 oCXR; n = 1395 sCXR) while 1076 CXRs (n = 729 sCXR; n = 347 oCXR) were used for testing. Data were analyzed with the Area under the curve (AUC) for the model's ability to classify oCXR and sCXR correctly. RESULTS: For the two-class classification into sCXR or oCXR from all sites, for CXR with missing anatomy, AI had sensitivity, specificity, accuracy, and AUC of 78%, 95%, 91%, 0.87(95% CI 0.82-0.92), respectively. AI identified obscured thoracic anatomy with 91% sensitivity, 97% specificity, 95% accuracy, and 0.94 AUC (95% CI 0.90-0.97). Inadequate exposure with 90% sensitivity, 93% specificity, 92% accuracy, and AUC of 0.91 (95% CI 0.88-0.95). The presence of low lung volume was identified with 96% sensitivity, 92% specificity, 93% accuracy, and 0.94 AUC (95% CI 0.92-0.96). The sensitivity, specificity, accuracy, and AUC of AI in identifying patient rotation were 92%, 96%, 95%, and 0.94 (95% CI 0.91-0.98), respectively. CONCLUSION: The radiologist-trained AI models can accurately classify optimal and suboptimal CXRs. Such AI models at the front end of radiographic equipment can enable radiographers to repeat sCXRs when necessary.
Asunto(s)
Pulmón , Radiografía Torácica , Adulto , Humanos , Persona de Mediana Edad , Anciano , Pulmón/diagnóstico por imagen , Estudios Retrospectivos , Radiografía , RadiólogosRESUMEN
Chest radiographs (CXR) are the most performed imaging tests and rank high among the radiographic exams with suboptimal quality and high rejection rates. Suboptimal CXRs can cause delays in patient care and pitfalls in radiographic interpretation, given their ubiquitous use in the diagnosis and management of acute and chronic ailments. Suboptimal CXRs can also compound and lead to high inter-radiologist variations in CXR interpretation. While advances in radiography with transitions to computerized and digital radiography have reduced the prevalence of suboptimal exams, the problem persists. Advances in machine learning and artificial intelligence (AI), particularly in the radiographic acquisition, triage, and interpretation of CXRs, could offer a plausible solution for suboptimal CXRs. We review the literature on suboptimal CXRs and the potential use of AI to help reduce the prevalence of suboptimal CXRs.
RESUMEN
The multitude of artificial intelligence (AI)-based solutions, vendors, and platforms poses a challenging proposition to an already complex clinical radiology practice. Apart from assessing and ensuring acceptable local performance and workflow fit to improve imaging services, AI tools require multiple stakeholders, including clinical, technical, and financial, who collaborate to move potential deployable applications to full clinical deployment in a structured and efficient manner. Postdeployment monitoring and surveillance of such tools require an infrastructure that ensures proper and safe use. Herein, the authors describe their experience and framework for implementing and supporting the use of AI applications in radiology workflow.
Asunto(s)
Inteligencia Artificial , Radiología , Radiología/métodos , Diagnóstico por Imagen , Flujo de Trabajo , ComercioRESUMEN
Sino-nasal organized hematoma (OH) is an uncommon, benign condition of the sinuses. It mimics neoplasm in its clinical presentation as well on imaging appearance. Careful evaluation of the clinical history and imaging features is essential to avoid misdiagnosis. We present an interesting case of sino-nasal organizing hematoma in a 26-year-old male patient, masquerading as sino-nasal neoplasm.
RESUMEN
INTRODUCTION: Pulmonary embolism is a common cause of cardiopulmonary mortality and morbidity worldwide. Survivors of acute pulmonary embolism may experience dyspnea, report reduced exercise capacity, or develop overt pulmonary hypertension. Clinicians must be alert for these phenomena and appreciate the modalities and investigations available for evaluation. AREAS COVERED: In this review, the current understanding of available contemporary imaging and physiologic modalities is discussed, based on available literature and professional society guidelines. The purpose of the review is to provide clinicians with an overview of these modalities, their strengths and disadvantages, and how and when these investigations can support the clinical work-up of patients post-pulmonary embolism. EXPERT OPINION: Echocardiography is a first test in symptomatic patients post-pulmonary embolism, with ventilation/perfusion scanning vital to determination of whether there is chronic residual emboli. The role of computed tomography and magnetic resonance in assessing the pulmonary arterial tree in post-pulmonary embolism patients is evolving. Functional testing, in particular cardiopulmonary exercise testing, is emerging as an important modality to quantify and determine cause of functional limitation. It is possible that future investigations of the post-pulmonary embolism recovery period will better inform treatment decisions for acute pulmonary embolism patients.
Asunto(s)
Hipertensión Pulmonar , Embolia Pulmonar , Enfermedad Aguda , Enfermedad Crónica , Prueba de Esfuerzo , Humanos , Arteria Pulmonar , Embolia Pulmonar/diagnóstico por imagenRESUMEN
(1) Background: Optimal anatomic coverage is important for radiation-dose optimization. We trained and tested (R2.2.4) two (R3-2) deep learning (DL) algorithms on a machine vision tool library platform (Cognex Vision Pro Deep Learning software) to recognize anatomic landmarks and classify chest CT as those with optimum, under-scanned, or over-scanned scan length. (2) Methods: To test our hypothesis, we performed a study with 428 consecutive chest CT examinations (mean age 70 ± 14 years; male:female 190:238) performed at one of the four hospitals. CT examinations from two hospitals were used to train the DL classification algorithms to identify lung apices and bases. The developed algorithms were then tested on the data from the remaining two hospitals. For each CT, we recorded the scan lengths above and below the lung apices and bases. Model performance was assessed with receiver operating characteristics (ROC) analysis. (3) Results: The two DL models for lung apex and bases had high sensitivity, specificity, accuracy, and areas under the curve (AUC) for identifying under-scanning (100%, 99%, 99%, and 0.999 (95% CI 0.996-1.000)) and over-scanning (99%, 99%, 99%, and 0.998 (95%CI 0.992-1.000)). (4) Conclusions: Our DL models can accurately identify markers for missing anatomic coverage and over-scanning in chest CTs.
RESUMEN
BACKGROUND: Missed findings in chest X-ray interpretation are common and can have serious consequences. METHODS: Our study included 2407 chest radiographs (CXRs) acquired at three Indian and five US sites. To identify CXRs reported as normal, we used a proprietary radiology report search engine based on natural language processing (mPower, Nuance). Two thoracic radiologists reviewed all CXRs and recorded the presence and clinical significance of abnormal findings on a 5-point scale (1-not important; 5-critical importance). All CXRs were processed with the AI model (Qure.ai) and outputs were recorded for the presence of findings. Data were analyzed to obtain area under the ROC curve (AUC). RESULTS: Of 410 CXRs (410/2407, 18.9%) with unreported/missed findings, 312 (312/410, 76.1%) findings were clinically important: pulmonary nodules (n = 157), consolidation (60), linear opacities (37), mediastinal widening (21), hilar enlargement (17), pleural effusions (11), rib fractures (6) and pneumothoraces (3). AI detected 69 missed findings (69/131, 53%) with an AUC of up to 0.935. The AI model was generalizable across different sites, geographic locations, patient genders and age groups. CONCLUSION: A substantial number of important CXR findings are missed; the AI model can help to identify and reduce the frequency of important missed findings in a generalizable manner.