RESUMO
BACKGROUND: Although brain activities in Alzheimer's disease (AD) might be evaluated MRI and PET, the relationships between brain temperature (BT), the index of diffusivity along the perivascular space (ALPS index), and amyloid deposition in the cerebral cortex are still unclear. PURPOSE: To investigate the relationship between metabolic imaging measurements and clinical information in patients with AD and normal controls (NCs). STUDY TYPE: Retrospective analysis of a prospective dataset. POPULATION: 58 participants (78.3 ± 6.8 years; 30 female): 29 AD patients and 29 age- and sex-matched NCs from the Open Access Series of Imaging Studies dataset. FIELD STRENGTH/SEQUENCE: 3T; T1-weighted magnetization-prepared rapid gradient-echo, diffusion tensor imaging with 64 directions, and dynamic 18 F-florbetapir PET. ASSESSMENT: Imaging metrics were compared between AD and NCs. These included BT calculated by the diffusivity of the lateral ventricles, ALPS index that reflects the glymphatic system, the mean standardized uptake value ratio (SUVR) of amyloid PET in the cerebral cortex and clinical information, such as age, sex, and MMSE. STATISTICAL TESTS: Pearson's or Spearman's correlation and multiple linear regression analyses. P values <0.05 were defined as statistically significant. RESULTS: Significant positive correlations were found between BT and ALPS index (r = 0.44 for NCs), while significant negative correlations were found between age and ALPS index (rs = -0.43 for AD and - 0.47 for NCs). The SUVR of amyloid PET was not significantly associated with BT (P = 0.81 for AD and 0.21 for NCs) or ALPS index (P = 0.10 for AD and 0.52 for NCs). In the multiple regression analysis, age was significantly associated with BT, while age, sex, and presence of AD were significantly associated with the ALPS index. DATA CONCLUSION: Impairment of the glymphatic system measured using MRI was associated with lower BT and aging. LEVEL OF EVIDENCE: 3 TECHNICAL EFFICACY STAGE: 1.
Assuntos
Doença de Alzheimer , Humanos , Feminino , Doença de Alzheimer/diagnóstico por imagem , Doença de Alzheimer/metabolismo , Imagem de Tensor de Difusão/métodos , Estudos Retrospectivos , Estudos Prospectivos , Acesso à Informação , Tomografia por Emissão de Pósitrons/métodos , Imageamento por Ressonância Magnética/métodos , Amiloide , Proteínas Amiloidogênicas , Córtex CerebralRESUMO
OBJECTIVES: Large language models like GPT-4 have demonstrated potential for diagnosis in radiology. Previous studies investigating this potential primarily utilized quizzes from academic journals. This study aimed to assess the diagnostic capabilities of GPT-4-based Chat Generative Pre-trained Transformer (ChatGPT) using actual clinical radiology reports of brain tumors and compare its performance with that of neuroradiologists and general radiologists. METHODS: We collected brain MRI reports written in Japanese from preoperative brain tumor patients at two institutions from January 2017 to December 2021. The MRI reports were translated into English by radiologists. GPT-4 and five radiologists were presented with the same textual findings from the reports and asked to suggest differential and final diagnoses. The pathological diagnosis of the excised tumor served as the ground truth. McNemar's test and Fisher's exact test were used for statistical analysis. RESULTS: In a study analyzing 150 radiological reports, GPT-4 achieved a final diagnostic accuracy of 73%, while radiologists' accuracy ranged from 65 to 79%. GPT-4's final diagnostic accuracy using reports from neuroradiologists was higher at 80%, compared to 60% using those from general radiologists. In the realm of differential diagnoses, GPT-4's accuracy was 94%, while radiologists' fell between 73 and 89%. Notably, for these differential diagnoses, GPT-4's accuracy remained consistent whether reports were from neuroradiologists or general radiologists. CONCLUSION: GPT-4 exhibited good diagnostic capability, comparable to neuroradiologists in differentiating brain tumors from MRI reports. GPT-4 can be a second opinion for neuroradiologists on final diagnoses and a guidance tool for general radiologists and residents. CLINICAL RELEVANCE STATEMENT: This study evaluated GPT-4-based ChatGPT's diagnostic capabilities using real-world clinical MRI reports from brain tumor cases, revealing that its accuracy in interpreting brain tumors from MRI findings is competitive with radiologists. KEY POINTS: We investigated the diagnostic accuracy of GPT-4 using real-world clinical MRI reports of brain tumors. GPT-4 achieved final and differential diagnostic accuracy that is comparable with neuroradiologists. GPT-4 has the potential to improve the diagnostic process in clinical radiology.
RESUMO
OBJECTIVES: To compare the diagnostic accuracy of Generative Pre-trained Transformer (GPT)-4-based ChatGPT, GPT-4 with vision (GPT-4V) based ChatGPT, and radiologists in musculoskeletal radiology. MATERIALS AND METHODS: We included 106 "Test Yourself" cases from Skeletal Radiology between January 2014 and September 2023. We input the medical history and imaging findings into GPT-4-based ChatGPT and the medical history and images into GPT-4V-based ChatGPT, then both generated a diagnosis for each case. Two radiologists (a radiology resident and a board-certified radiologist) independently provided diagnoses for all cases. The diagnostic accuracy rates were determined based on the published ground truth. Chi-square tests were performed to compare the diagnostic accuracy of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and radiologists. RESULTS: GPT-4-based ChatGPT significantly outperformed GPT-4V-based ChatGPT (p < 0.001) with accuracy rates of 43% (46/106) and 8% (9/106), respectively. The radiology resident and the board-certified radiologist achieved accuracy rates of 41% (43/106) and 53% (56/106). The diagnostic accuracy of GPT-4-based ChatGPT was comparable to that of the radiology resident, but was lower than that of the board-certified radiologist although the differences were not significant (p = 0.78 and 0.22, respectively). The diagnostic accuracy of GPT-4V-based ChatGPT was significantly lower than those of both radiologists (p < 0.001 and < 0.001, respectively). CONCLUSION: GPT-4-based ChatGPT demonstrated significantly higher diagnostic accuracy than GPT-4V-based ChatGPT. While GPT-4-based ChatGPT's diagnostic performance was comparable to radiology residents, it did not reach the performance level of board-certified radiologists in musculoskeletal radiology. CLINICAL RELEVANCE STATEMENT: GPT-4-based ChatGPT outperformed GPT-4V-based ChatGPT and was comparable to radiology residents, but it did not reach the level of board-certified radiologists in musculoskeletal radiology. Radiologists should comprehend ChatGPT's current performance as a diagnostic tool for optimal utilization. KEY POINTS: This study compared the diagnostic performance of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and radiologists in musculoskeletal radiology. GPT-4-based ChatGPT was comparable to radiology residents, but did not reach the level of board-certified radiologists. When utilizing ChatGPT, it is crucial to input appropriate descriptions of imaging findings rather than the images.
RESUMO
PURPOSE: The noteworthy performance of Chat Generative Pre-trained Transformer (ChatGPT), an artificial intelligence text generation model based on the GPT-4 architecture, has been demonstrated in various fields; however, its potential applications in neuroradiology remain unexplored. This study aimed to evaluate the diagnostic performance of GPT-4 based ChatGPT in neuroradiology. METHODS: We collected 100 consecutive "Case of the Week" cases from the American Journal of Neuroradiology between October 2021 and September 2023. ChatGPT generated a diagnosis from patient's medical history and imaging findings for each case. Then the diagnostic accuracy rate was determined using the published ground truth. Each case was categorized by anatomical location (brain, spine, and head & neck), and brain cases were further divided into central nervous system (CNS) tumor and non-CNS tumor groups. Fisher's exact test was conducted to compare the accuracy rates among the three anatomical locations, as well as between the CNS tumor and non-CNS tumor groups. RESULTS: ChatGPT achieved a diagnostic accuracy rate of 50% (50/100 cases). There were no significant differences between the accuracy rates of the three anatomical locations (p = 0.89). The accuracy rate was significantly lower for the CNS tumor group compared to the non-CNS tumor group in the brain cases (16% [3/19] vs. 62% [36/58], p < 0.001). CONCLUSION: This study demonstrated the diagnostic performance of ChatGPT in neuroradiology. ChatGPT's diagnostic accuracy varied depending on disease etiologies, and its diagnostic accuracy was significantly lower in CNS tumors compared to non-CNS tumors.
Assuntos
Inteligência Artificial , Neoplasias , Humanos , Cabeça , Encéfalo , PescoçoRESUMO
PURPOSE: Cranial nerve involvement (CNI) influences the treatment strategies and prognosis of head and neck tumors. However, its incidence in skull base chordomas and chondrosarcomas remains to be investigated. This study evaluated the imaging features of chordoma and chondrosarcoma, with a focus on the differences in CNI. METHODS: Forty-two patients (26 and 16 patients with chordomas and chondrosarcomas, respectively) treated at our institution between January 2007 and January 2023 were included in this retrospective study. Imaging features, such as the maximum diameter, tumor location (midline or off-midline), calcification, signal intensity on T2-weighted image, mean apparent diffusion coefficient (ADC) values, contrast enhancement, and CNI, were evaluated and compared using Fisher's exact test or the Mann-Whitney U-test. The odds ratio (OR) was calculated to evaluate the association between the histological type and imaging features. RESULTS: The incidence of CNI in chondrosarcomas was significantly higher than that in chordomas (63% vs. 8%, P < 0.001). An off-midline location was more common in chondrosarcomas than in chordomas (86% vs. 13%; P < 0.001). The mean ADC values of chondrosarcomas were significantly higher than those of chordomas (P < 0.001). Significant associations were identified between chondrosarcomas and CNI (OR = 20.00; P < 0.001), location (OR = 53.70; P < 0.001), and mean ADC values (OR = 1.01; P = 0.002). CONCLUSION: The incidence of CNI and off-midline location in chondrosarcomas was significantly higher than that in chordomas. CNI, tumor location, and the mean ADC can help distinguish between these entities.
Assuntos
Condrossarcoma , Cordoma , Neoplasias da Base do Crânio , Humanos , Feminino , Masculino , Estudos Retrospectivos , Pessoa de Meia-Idade , Cordoma/diagnóstico por imagem , Cordoma/patologia , Adulto , Condrossarcoma/diagnóstico por imagem , Condrossarcoma/patologia , Idoso , Neoplasias da Base do Crânio/diagnóstico por imagem , Meios de Contraste , Adolescente , Imageamento por Ressonância Magnética/métodosRESUMO
Magnetic resonance imaging (MRI) is an essential tool for evaluating pelvic disorders affecting the prostate, bladder, uterus, ovaries, and/or rectum. Since the diagnostic pathway of pelvic MRI can involve various complex procedures depending on the affected organ, the Reporting and Data System (RADS) is used to standardize image acquisition and interpretation. Artificial intelligence (AI), which encompasses machine learning and deep learning algorithms, has been integrated into both pelvic MRI and the RADS, particularly for prostate MRI. This review outlines recent developments in the use of AI in various stages of the pelvic MRI diagnostic pathway, including image acquisition, image reconstruction, organ and lesion segmentation, lesion detection and classification, and risk stratification, with special emphasis on recent trends in multi-center studies, which can help to improve the generalizability of AI.
Assuntos
Inteligência Artificial , Imageamento por Ressonância Magnética , Humanos , Imageamento por Ressonância Magnética/métodos , Feminino , Masculino , Pelve/diagnóstico por imagemRESUMO
Background Carbon 11 (11C)-methionine is a useful PET radiotracer for the management of patients with glioma, but radiation exposure and lack of molecular imaging facilities limit its use. Purpose To generate synthetic methionine PET images from contrast-enhanced (CE) MRI through an artificial intelligence (AI)-based image-to-image translation model and to compare its performance for grading and prognosis of gliomas with that of real PET. Materials and Methods An AI-based model to generate synthetic methionine PET images from CE MRI was developed and validated from patients who underwent both methionine PET and CE MRI at a university hospital from January 2007 to December 2018 (institutional data set). Pearson correlation coefficients for the maximum and mean tumor to background ratio (TBRmax and TBRmean, respectively) of methionine uptake and the lesion volume between synthetic and real PET were calculated. Two additional open-source glioma databases of preoperative CE MRI without methionine PET were used as the external test set. Using the TBRs, the area under the receiver operating characteristic curve (AUC) for classifying high-grade and low-grade gliomas and overall survival were evaluated. Results The institutional data set included 362 patients (mean age, 49 years ± 19 [SD]; 195 female, 167 male; training, n = 294; validation, n = 34; test, n = 34). In the internal test set, Pearson correlation coefficients were 0.68 (95% CI: 0.47, 0.81), 0.76 (95% CI: 0.59, 0.86), and 0.92 (95% CI: 0.85, 0.95) for TBRmax, TBRmean, and lesion volume, respectively. The external test set included 344 patients with gliomas (mean age, 53 years ± 15; 192 male, 152 female; high grade, n = 269). The AUC for TBRmax was 0.81 (95% CI: 0.75, 0.86) and the overall survival analysis showed a significant difference between the high (2-year survival rate, 27%) and low (2-year survival rate, 71%; P < .001) TBRmax groups. Conclusion The AI-based model-generated synthetic methionine PET images strongly correlated with real PET images and showed good performance for glioma grading and prognostication. Published under a CC BY 4.0 license. Supplemental material is available for this article.
Assuntos
Neoplasias Encefálicas , Glioma , Humanos , Masculino , Feminino , Pessoa de Meia-Idade , Metionina , Neoplasias Encefálicas/diagnóstico por imagem , Neoplasias Encefálicas/patologia , Inteligência Artificial , Tomografia por Emissão de Pósitrons/métodos , Gradação de Tumores , Glioma/diagnóstico por imagem , Glioma/patologia , Imageamento por Ressonância Magnética/métodos , RacemetioninaRESUMO
BACKGROUND: Early diagnosis of rotator cuff tears is essential for appropriate and timely treatment. Although radiography is the most used technique in clinical practice, it is difficult to accurately rule out rotator cuff tears as an initial imaging diagnostic modality. Deep learning-based artificial intelligence has recently been applied in medicine, especially diagnostic imaging. This study aimed to develop a deep learning algorithm as a screening tool for rotator cuff tears based on radiography. METHODS: We used 2803 shoulder radiographs of the true anteroposterior view to develop the deep learning algorithm. Radiographs were labeled 0 and 1 as intact or low-grade partial-thickness rotator cuff tears and high-grade partial or full-thickness rotator cuff tears, respectively. The diagnosis of rotator cuff tears was determined based on arthroscopic findings. The diagnostic performance of the deep learning algorithm was assessed by calculating the area under the curve (AUC), sensitivity, negative predictive value (NPV), and negative likelihood ratio (LR-) of test datasets with a cutoff value of expected high sensitivity determination based on validation datasets. Furthermore, the diagnostic performance for each rotator cuff tear size was evaluated. RESULTS: The AUC, sensitivity, NPV, and LR- with expected high sensitivity determination were 0.82, 84/92 (91.3%), 102/110 (92.7%), and 0.16, respectively. The sensitivity, NPV, and LR- for full-thickness rotator cuff tears were 69/73 (94.5%), 102/106 (96.2%), and 0.10, respectively, while the diagnostic performance for partial-thickness rotator cuff tears was low at 15/19 (78.9%), NPV of 102/106 (96.2%) and LR- of 0.39. CONCLUSIONS: Our algorithm had a high diagnostic performance for full-thickness rotator cuff tears. The deep learning algorithm based on shoulder radiography helps screen rotator cuff tears by setting an appropriate cutoff value. LEVEL OF EVIDENCE: Level III: Diagnostic Study.
RESUMO
Although there is no solid agreement for artificial intelligence (AI), it refers to a computer system with intelligence similar to that of humans. Deep learning appeared in 2006, and more than 10 years have passed since the third AI boom was triggered by improvements in computing power, algorithm development, and the use of big data. In recent years, the application and development of AI technology in the medical field have intensified internationally. There is no doubt that AI will be used in clinical practice to assist in diagnostic imaging in the future. In qualitative diagnosis, it is desirable to develop an explainable AI that at least represents the basis of the diagnostic process. However, it must be kept in mind that AI is a physician-assistant system, and the final decision should be made by the physician while understanding the limitations of AI. The aim of this article is to review the application of AI technology in diagnostic imaging from PubMed database while particularly focusing on diagnostic imaging in thorax such as lesion detection and qualitative diagnosis in order to help radiologists and clinicians to become more familiar with AI in thorax.
Assuntos
Inteligência Artificial , Aprendizado Profundo , Humanos , Algoritmos , Tórax , Diagnóstico por ImagemRESUMO
This review outlines the current status and challenges of the clinical applications of artificial intelligence in liver imaging using computed tomography or magnetic resonance imaging based on a topic analysis of PubMed search results using latent Dirichlet allocation. LDA revealed that "segmentation," "hepatocellular carcinoma and radiomics," "metastasis," "fibrosis," and "reconstruction" were current main topic keywords. Automatic liver segmentation technology using deep learning is beginning to assume new clinical significance as part of whole-body composition analysis. It has also been applied to the screening of large populations and the acquisition of training data for machine learning models and has resulted in the development of imaging biomarkers that have a significant impact on important clinical issues, such as the estimation of liver fibrosis, recurrence, and prognosis of malignant tumors. Deep learning reconstruction is expanding as a new technological clinical application of artificial intelligence and has shown results in reducing contrast and radiation doses. However, there is much missing evidence, such as external validation of machine learning models and the evaluation of the diagnostic performance of specific diseases using deep learning reconstruction, suggesting that the clinical application of these technologies is still in development.
Assuntos
Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , Inteligência Artificial , Carcinoma Hepatocelular/diagnóstico por imagem , Tomografia Computadorizada por Raios X , Neoplasias Hepáticas/diagnóstico por imagemRESUMO
Accurate estimation of mortality and time to death at admission for COVID-19 patients is important and several deep learning models have been created for this task. However, there are currently no prognostic models which use end-to-end deep learning to predict time to event for admitted COVID-19 patients using chest radiographs and clinical data. We retrospectively implemented a new artificial intelligence model combining DeepSurv (a multiple-perceptron implementation of the Cox proportional hazards model) and a convolutional neural network (CNN) using 1356 COVID-19 inpatients. For comparison, we also prepared DeepSurv only with clinical data, DeepSurv only with images (CNNSurv), and Cox proportional hazards models. Clinical data and chest radiographs at admission were used to estimate patient outcome (death or discharge) and duration to the outcome. The Harrel's concordance index (c-index) of the DeepSurv with CNN model was 0.82 (0.75-0.88) and this was significantly higher than the DeepSurv only with clinical data model (c-index = 0.77 (0.69-0.84), p = 0.011), CNNSurv (c-index = 0.70 (0.63-0.79), p = 0.001), and the Cox proportional hazards model (c-index = 0.71 (0.63-0.79), p = 0.001). These results suggest that the time-to-event prognosis model became more accurate when chest radiographs and clinical data were used together.
Assuntos
COVID-19 , Aprendizado Profundo , Humanos , Inteligência Artificial , Estudos Retrospectivos , RadiografiaRESUMO
OBJECTIVE: The purpose of this study was to develop an artificial intelligence (AI)-based model to detect features of atrial fibrillation (AF) on chest radiographs. METHODS: This retrospective study included consecutively collected chest radiographs of patients who had echocardiography at our institution from July 2016 to May 2019. Eligible radiographs had been acquired within 30 days of the echocardiography. These radiographs were labeled as AF-positive or AF-negative based on the associated electronic medical records; then, each patient was randomly divided into training, validation, and test datasets in an 8:1:1 ratio. A deep learning-based model to classify radiographs as with or without AF was trained on the training dataset, tuned with the validation dataset, and evaluated with the test dataset. RESULTS: The training dataset included 11,105 images (5637 patients; 3145 male, mean age ± standard deviation, 68 ± 14 years), the validation dataset included 1388 images (704 patients, 397 male, 67 ± 14 years), and the test dataset included 1375 images (706 patients, 395 male, 68 ± 15 years). Applying the model to the validation and test datasets gave a respective area under the curve of 0.81 (95% confidence interval, 0.78-0.85) and 0.80 (0.76-0.84), sensitivity of 0.76 (0.70-0.81) and 0.70 (0.64-0.76), specificity of 0.75 (0.72-0.77) and 0.74 (0.72-0.77), and accuracy of 0.75 (0.72-0.77) and 0.74 (0.71-0.76). CONCLUSION: Our AI can identify AF on chest radiographs, which provides a new way for radiologists to infer AF. KEY POINTS: ⢠A deep learning-based model was trained to detect atrial fibrillation in chest radiographs, showing that there are indicators of atrial fibrillation visible even on static images. ⢠The validation and test datasets each gave a solid performance with area under the curve, sensitivity, and specificity of 0.81, 0.76, and 0.75, respectively, for the validation dataset, and 0.80, 0.70, and 0.74, respectively, for the test dataset. ⢠The saliency maps highlighted anatomical areas consistent with those reported for atrial fibrillation on chest radiographs, such as the atria.
Assuntos
Inteligência Artificial , Fibrilação Atrial , Aprendizado Profundo , Idoso , Idoso de 80 Anos ou mais , Fibrilação Atrial/diagnóstico por imagem , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Radiografia , Radiografia Torácica/métodos , Estudos RetrospectivosRESUMO
PURPOSE: To develop a deep learning (DL) model to generate synthetic, 2-dimensional subtraction angiograms free of artifacts from native abdominal angiograms. MATERIALS AND METHODS: In this retrospective study, 2-dimensional digital subtraction angiography (2D-DSA) images and native angiograms were consecutively collected from July 2019 to March 2020. Images were divided into motion-free (training, validation, and motion-free test datasets) and motion-artifact (motion-artifact test dataset) sets. A total of 3,185, 393, 383, and 345 images from 87 patients (mean age, 71 years ± 10; 64 men and 23 women) were included in the training, validation, motion-free, and motion-artifact test datasets, respectively. Native angiograms and 2D-DSA image pairs were used to train and validate an image-to-image translation model to generate synthetic DL-based subtraction angiography (DLSA) images. DLSA images were quantitatively evaluated by the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) using the motion-free dataset and were qualitatively evaluated via visual assessments by radiologists with a numerical rating scale using the motion-artifact dataset. RESULTS: The DLSA images showed a mean PSNR (± standard deviation) of 43.05 dB ± 3.65 and mean SSIM of 0.98 ± 0.01, indicating high agreement with the original 2D-DSA images in the motion-free dataset. Qualitative visual evaluation by radiologists of the motion-artifact dataset showed that DLSA images contained fewer motion artifacts than 2D-DSA images. Additionally, DLSA images scored similar to or higher than 2D-DSA images for vascular visualization and clinical usefulness. CONCLUSIONS: The developed DL model generated synthetic, motion-free subtraction images from abdominal angiograms with similar imaging characteristics to 2D-DSA images.
Assuntos
Aprendizado Profundo , Idoso , Angiografia Digital/métodos , Artefatos , Feminino , Humanos , Masculino , Estudos Retrospectivos , Razão Sinal-RuídoRESUMO
Background Digital subtraction angiography (DSA) generates an image by subtracting a mask image from a dynamic angiogram. However, patient movement-caused misregistration artifacts can result in unclear DSA images that interrupt procedures. Purpose To train and to validate a deep learning (DL)-based model to produce DSA-like cerebral angiograms directly from dynamic angiograms and then quantitatively and visually evaluate these angiograms for clinical usefulness. Materials and Methods A retrospective model development and validation study was conducted on dynamic and DSA image pairs consecutively collected from January 2019 through April 2019. Angiograms showing misregistration were first separated per patient by two radiologists and sorted into the misregistration test data set. Nonmisregistration angiograms were divided into development and external test data sets at a ratio of 8:1 per patient. The development data set was divided into training and validation data sets at ratio of 3:1 per patient. The DL model was created by using the training data set, tuned with the validation data set, and then evaluated quantitatively with the external test data set and visually with the misregistration test data set. Quantitative evaluations used the peak signal-to-noise ratio (PSNR) and the structural similarity (SSIM) with mixed liner models. Visual evaluation was conducted by using a numerical rating scale. Results The training, validation, nonmisregistration test, and misregistration test data sets included 10 751, 2784, 1346, and 711 paired images collected from 40 patients (mean age, 62 years ± 11 [standard deviation]; 33 women). In the quantitative evaluation, DL-generated angiograms showed a mean PSNR value of 40.2 dB ± 4.05 and a mean SSIM value of 0.97 ± 0.02, indicating high coincidence with the paired DSA images. In the visual evaluation, the median ratings of the DL-generated angiograms were similar to or better than those of the original DSA images for all 24 sequences. Conclusion The deep learning-based model provided clinically useful cerebral angiograms free from clinically significant artifacts directly from dynamic angiograms. Published under a CC BY 4.0 license. Supplemental material is available for this article.
Assuntos
Angiografia Cerebral , Aprendizado Profundo , Aumento da Imagem/métodos , Adulto , Idoso , Idoso de 80 Anos ou mais , Angiografia Digital , Artefatos , Feminino , Humanos , Processamento de Imagem Assistida por Computador/métodos , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos , Razão Sinal-RuídoRESUMO
BACKGROUND: We investigated the performance improvement of physicians with varying levels of chest radiology experience when using a commercially available artificial intelligence (AI)-based computer-assisted detection (CAD) software to detect lung cancer nodules on chest radiographs from multiple vendors. METHODS: Chest radiographs and their corresponding chest CT were retrospectively collected from one institution between July 2017 and June 2018. Two author radiologists annotated pathologically proven lung cancer nodules on the chest radiographs while referencing CT. Eighteen readers (nine general physicians and nine radiologists) from nine institutions interpreted the chest radiographs. The readers interpreted the radiographs alone and then reinterpreted them referencing the CAD output. Suspected nodules were enclosed with a bounding box. These bounding boxes were judged correct if there was significant overlap with the ground truth, specifically, if the intersection over union was 0.3 or higher. The sensitivity, specificity, accuracy, PPV, and NPV of the readers' assessments were calculated. RESULTS: In total, 312 chest radiographs were collected as a test dataset, including 59 malignant images (59 nodules of lung cancer) and 253 normal images. The model provided a modest boost to the reader's sensitivity, particularly helping general physicians. The performance of general physicians was improved from 0.47 to 0.60 for sensitivity, from 0.96 to 0.97 for specificity, from 0.87 to 0.90 for accuracy, from 0.75 to 0.82 for PPV, and from 0.89 to 0.91 for NPV while the performance of radiologists was improved from 0.51 to 0.60 for sensitivity, from 0.96 to 0.96 for specificity, from 0.87 to 0.90 for accuracy, from 0.76 to 0.80 for PPV, and from 0.89 to 0.91 for NPV. The overall increase in the ratios of sensitivity, specificity, accuracy, PPV, and NPV were 1.22 (1.14-1.30), 1.00 (1.00-1.01), 1.03 (1.02-1.04), 1.07 (1.03-1.11), and 1.02 (1.01-1.03) by using the CAD, respectively. CONCLUSION: The AI-based CAD was able to improve the ability of physicians to detect nodules of lung cancer in chest radiographs. The use of a CAD model can indicate regions physicians may have overlooked during their initial assessment.
Assuntos
Neoplasias Pulmonares/diagnóstico por imagem , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Radiografia Torácica/métodos , Nódulo Pulmonar Solitário/diagnóstico por imagem , Adulto , Idoso , Idoso de 80 Anos ou mais , Aprendizado Profundo , Feminino , Clínicos Gerais , Humanos , Pulmão/diagnóstico por imagem , Masculino , Pessoa de Meia-Idade , Radiologistas , Estudos Retrospectivos , Sensibilidade e EspecificidadeRESUMO
Purpose To develop and evaluate a supportive algorithm using deep learning for detecting cerebral aneurysms at time-of-flight MR angiography to provide a second assessment of images already interpreted by radiologists. Materials and Methods MR images reported by radiologists to contain aneurysms were extracted from four institutions for the period from November 2006 through October 2017. The images were divided into three data sets: training data set, internal test data set, and external test data set. The algorithm was constructed by deep learning with the training data set, and its sensitivity to detect aneurysms in the test data sets was evaluated. To find aneurysms that had been overlooked in the initial reports, two radiologists independently performed a blinded interpretation of aneurysm candidates detected by the algorithm. When there was disagreement, the final diagnosis was made in consensus. The number of newly detected aneurysms was also evaluated. Results The training data set, which provided training and validation data, included 748 aneurysms (mean size, 3.1 mm ± 2.0 [standard deviation]) from 683 examinations; 318 of these examinations were on male patients (mean age, 63 years ± 13) and 365 were on female patients (mean age, 64 years ± 13). Test data were provided by the internal test data set (649 aneurysms [mean size, 4.1 mm ± 3.2] in 521 examinations, including 177 male patients and 344 female patients with mean age of 66 years ± 12 and 67 years ± 13, respectively) and the external test data set (80 aneurysms [mean size, 4.1 mm ± 2.1] in 67 examinations, including 19 male patients and 48 female patients with mean age of 63 years ± 12 and 68 years ± 12, respectively). The sensitivity was 91% (592 of 649) and 93% (74 of 80) for the internal and external test data sets, respectively. The algorithm improved aneurysm detection in the internal and external test data sets by 4.8% (31 of 649) and 13% (10 of 80), respectively, compared with the initial reports. Conclusion A deep learning algorithm detected cerebral aneurysms in radiologic reports with high sensitivity and improved aneurysm detection compared with the initial reports. © RSNA, 2018 See also the editorial by Flanders in this issue.