Pesquisa | Portal Regional da BVS

1.

Diagnostic performances of Claude 3 Opus and Claude 3.5 Sonnet from patient history and key images in Radiology's "Diagnosis Please" cases.

Kurokawa, Ryo; Ohizumi, Yuji; Kanzawa, Jun; Kurokawa, Mariko; Sonoda, Yuki; Nakamura, Yuta; Kiguchi, Takao; Gonoi, Wataru; Abe, Osamu.

Jpn J Radiol ; 2024 Aug 03.

Artigo em Inglês | MEDLINE | ID: mdl-39096483

RESUMO

PURPOSE: The diagnostic performance of large language artificial intelligence (AI) models when utilizing radiological images has yet to be investigated. We employed Claude 3 Opus (released on March 4, 2024) and Claude 3.5 Sonnet (released on June 21, 2024) to investigate their diagnostic performances in response to the Radiology's Diagnosis Please quiz questions. MATERIALS AND METHODS: In this study, the AI models were tasked with listing the primary diagnosis and two differential diagnoses for 322 quiz questions from Radiology's "Diagnosis Please" cases, which included cases 1 to 322, published from 1998 to 2023. The analyses were performed under the following conditions: (1) Condition 1: submitter-provided clinical history (text) alone. (2) Condition 2: submitter-provided clinical history and imaging findings (text). (3) Condition 3: clinical history (text) and key images (PNG file). We applied McNemar's test to evaluate differences in the correct response rates for the overall accuracy under Conditions 1, 2, and 3 for each model and between the models. RESULTS: The correct diagnosis rates were 58/322 (18.0%) and 69/322 (21.4%), 201/322 (62.4%) and 209/322 (64.9%), and 80/322 (24.8%) and 97/322 (30.1%) for Conditions 1, 2, and 3 for Claude 3 Opus and Claude 3.5 Sonnet, respectively. The models provided the correct answer as a differential diagnosis in up to 26/322 (8.1%) for Opus and 23/322 (7.1%) for Sonnet. Statistically significant differences were observed in the correct response rates among all combinations of Conditions 1, 2, and 3 for each model (p < 0.01). Claude 3.5 Sonnet outperformed in all conditions, but a statistically significant difference was observed only in the comparison for Condition 3 (30.1% vs. 24.8%, p = 0.028). CONCLUSION: Two AI models demonstrated a significantly improved diagnostic performance when inputting both key images and clinical history. The models' ability to identify important differential diagnoses under these conditions was also confirmed.

2.

Effect of deep learning reconstruction on the assessment of pancreatic cystic lesions using computed tomography.

Kanzawa, Jun; Yasaka, Koichiro; Ohizumi, Yuji; Morita, Yuichi; Kurokawa, Mariko; Abe, Osamu.

Radiol Phys Technol ; 2024 Aug 15.

Artigo em Inglês | MEDLINE | ID: mdl-39147953

RESUMO

This study aimed to compare the image quality and detection performance of pancreatic cystic lesions between computed tomography (CT) images reconstructed by deep learning reconstruction (DLR) and filtered back projection (FBP). This retrospective study included 54 patients (mean age: 67.7 ± 13.1) who underwent contrast-enhanced CT from May 2023 to August 2023. Among eligible patients, 30 and 24 were positive and negative for pancreatic cystic lesions, respectively. DLR and FBP were used to reconstruct portal venous phase images. Objective image quality analyses calculated quantitative image noise, signal-to-noise ratio (SNR), and contrast-to-noise ratio (CNR) using regions of interest on the abdominal aorta, pancreatic lesion, and pancreatic parenchyma. Three blinded radiologists performed subjective image quality assessment and lesion detection tests. Lesion depiction, normal structure illustration, subjective image noise, and overall image quality were utilized as subjective image quality indicators. DLR significantly reduced quantitative image noise compared with FBP (p < 0.001). SNR and CNR were significantly improved in DLR compared with FBP (p < 0.001). Three radiologists rated significantly higher scores for DLR in all subjective image quality indicators (p ≤ 0.029). Performance of DLR and FBP were comparable in lesion detection, with no statistically significant differences in the area under the receiver operating characteristic curve, sensitivity, specificity and accuracy. DLR reduced image noise and improved image quality with a clearer depiction of pancreatic structures. These improvements may have a positive effect on evaluating pancreatic cystic lesions, which can contribute to appropriate management of these lesions.

3.

The Fine-Tuned Large Language Model for Extracting the Progressive Bone Metastasis from Unstructured Radiology Reports.

Kanemaru, Noriko; Yasaka, Koichiro; Fujita, Nana; Kanzawa, Jun; Abe, Osamu.

J Imaging Inform Med ; 2024 Aug 26.

Artigo em Inglês | MEDLINE | ID: mdl-39187702

RESUMO

Early detection of patients with impending bone metastasis is crucial for prognosis improvement. This study aimed to investigate the feasibility of a fine-tuned, locally run large language model (LLM) in extracting patients with bone metastasis in unstructured Japanese radiology report and to compare its performance with manual annotation. This retrospective study included patients with "metastasis" in radiological reports (April 2018-January 2019, August-May 2022, and April-December 2023 for training, validation, and test datasets of 9559, 1498, and 7399 patients, respectively). Radiologists reviewed the clinical indication and diagnosis sections of the radiological report (used as input data) and classified them into groups 0 (no bone metastasis), 1 (progressive bone metastasis), and 2 (stable or decreased bone metastasis). The data for group 0 was under-sampled in training and test datasets due to group imbalance. The best-performing model from the validation set was subsequently tested using the testing dataset. Two additional radiologists (readers 1 and 2) were involved in classifying radiological reports within the test dataset for testing purposes. The fine-tuned LLM, reader 1, and reader 2 demonstrated an accuracy of 0.979, 0.996, and 0.993, sensitivity for groups 0/1/2 of 0.988/0.947/0.943, 1.000/1.000/0.966, and 1.000/0.982/0.954, and time required for classification (s) of 105, 2312, and 3094 in under-sampled test dataset (n = 711), respectively. Fine-tuned LLM extracted patients with bone metastasis, demonstrating satisfactory performance that was comparable to or slightly lower than manual annotation by radiologists in a noticeably shorter time.

4.

Automated classification of brain MRI reports using fine-tuned large language models.

Kanzawa, Jun; Yasaka, Koichiro; Fujita, Nana; Fujiwara, Shin; Abe, Osamu.

Neuroradiology ; 2024 Jul 12.

Artigo em Inglês | MEDLINE | ID: mdl-38995393

RESUMO

PURPOSE: This study aimed to investigate the efficacy of fine-tuned large language models (LLM) in classifying brain MRI reports into pretreatment, posttreatment, and nontumor cases. METHODS: This retrospective study included 759, 284, and 164 brain MRI reports for training, validation, and test dataset. Radiologists stratified the reports into three groups: nontumor (group 1), posttreatment tumor (group 2), and pretreatment tumor (group 3) cases. A pretrained Bidirectional Encoder Representations from Transformers Japanese model was fine-tuned using the training dataset and evaluated on the validation dataset. The model which demonstrated the highest accuracy on the validation dataset was selected as the final model. Two additional radiologists were involved in classifying reports in the test datasets for the three groups. The model's performance on test dataset was compared to that of two radiologists. RESULTS: The fine-tuned LLM attained an overall accuracy of 0.970 (95% CI: 0.930-0.990). The model's sensitivity for group 1/2/3 was 1.000/0.864/0.978. The model's specificity for group1/2/3 was 0.991/0.993/0.958. No statistically significant differences were found in terms of accuracy, sensitivity, and specificity between the LLM and human readers (p ≥ 0.371). The LLM completed the classification task approximately 20-26-fold faster than the radiologists. The area under the receiver operating characteristic curve for discriminating groups 2 and 3 from group 1 was 0.994 (95% CI: 0.982-1.000) and for discriminating group 3 from groups 1 and 2 was 0.992 (95% CI: 0.982-1.000). CONCLUSION: Fine-tuned LLM demonstrated a comparable performance with radiologists in classifying brain MRI reports, while requiring substantially less time.

5.

Diagnostic performances of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro in "Diagnosis Please" cases.

Sonoda, Yuki; Kurokawa, Ryo; Nakamura, Yuta; Kanzawa, Jun; Kurokawa, Mariko; Ohizumi, Yuji; Gonoi, Wataru; Abe, Osamu.

Jpn J Radiol ; 2024 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-38954192

RESUMO

PURPOSE: Large language models (LLMs) are rapidly advancing and demonstrating high performance in understanding textual information, suggesting potential applications in interpreting patient histories and documented imaging findings. As LLMs continue to improve, their diagnostic abilities are expected to be enhanced further. However, there is a lack of comprehensive comparisons between LLMs from different manufacturers. In this study, we aimed to test the diagnostic performance of the three latest major LLMs (GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro) using Radiology Diagnosis Please Cases, a monthly diagnostic quiz series for radiology experts. MATERIALS AND METHODS: Clinical history and imaging findings, provided textually by the case submitters, were extracted from 324 quiz questions originating from Radiology Diagnosis Please cases published between 1998 and 2023. The top three differential diagnoses were generated by GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro, using their respective application programming interfaces. A comparative analysis of diagnostic performance among these three LLMs was conducted using Cochrane's Q and post hoc McNemar's tests. RESULTS: The respective diagnostic accuracies of GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro for primary diagnosis were 41.0%, 54.0%, and 33.9%, which further improved to 49.4%, 62.0%, and 41.0%, when considering the accuracy of any of the top three differential diagnoses. Significant differences in the diagnostic performance were observed among all pairs of models. CONCLUSION: Claude 3 Opus outperformed GPT-4o and Gemini 1.5 Pro in solving radiology quiz cases. These models appear capable of assisting radiologists when supplied with accurate evaluations and worded descriptions of imaging findings.

6.

Fine-Tuned Large Language Model for Extracting Patients on Pretreatment for Lung Cancer from a Picture Archiving and Communication System Based on Radiological Reports.

Yasaka, Koichiro; Kanzawa, Jun; Kanemaru, Noriko; Koshino, Saori; Abe, Osamu.

J Imaging Inform Med ; 2024 Jul 02.

Artigo em Inglês | MEDLINE | ID: mdl-38955964

RESUMO

This study aimed to investigate the performance of a fine-tuned large language model (LLM) in extracting patients on pretreatment for lung cancer from picture archiving and communication systems (PACS) and comparing it with that of radiologists. Patients whose radiological reports contained the term lung cancer (3111 for training, 124 for validation, and 288 for test) were included in this retrospective study. Based on clinical indication and diagnosis sections of the radiological report (used as input data), they were classified into four groups (used as reference data): group 0 (no lung cancer), group 1 (pretreatment lung cancer present), group 2 (after treatment for lung cancer), and group 3 (planning radiation therapy). Using the training and validation datasets, fine-tuning of the pretrained LLM was conducted ten times. Due to group imbalance, group 2 data were undersampled in the training. The performance of the best-performing model in the validation dataset was assessed in the independent test dataset. For testing purposes, two other radiologists (readers 1 and 2) were also involved in classifying radiological reports. The overall accuracy of the fine-tuned LLM, reader 1, and reader 2 was 0.983, 0.969, and 0.969, respectively. The sensitivity for differentiating group 0/1/2/3 by LLM, reader 1, and reader 2 was 1.000/0.948/0.991/1.000, 0.750/0.879/0.996/1.000, and 1.000/0.931/0.978/1.000, respectively. The time required for classification by LLM, reader 1, and reader 2 was 46s/2539s/1538s, respectively. Fine-tuned LLM effectively extracted patients on pretreatment for lung cancer from PACS with comparable performance to radiologists in a shorter time.

7.

New liver window width in detecting hepatocellular carcinoma on dynamic contrast-enhanced computed tomography with deep learning reconstruction.

Okimoto, Naomasa; Yasaka, Koichiro; Cho, Shinichi; Koshino, Saori; Kanzawa, Jun; Asari, Yusuke; Fujita, Nana; Kubo, Takatoshi; Suzuki, Yuichi; Abe, Osamu.

Radiol Phys Technol ; 17(3): 658-665, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-38837119

RESUMO

Changing a window width (WW) alters appearance of noise and contrast of CT images. The aim of this study was to investigate the impact of adjusted WW for deep learning reconstruction (DLR) in detecting hepatocellular carcinomas (HCCs) on CT with DLR. This retrospective study included thirty-five patients who underwent abdominal dynamic contrast-enhanced CT. DLR was used to reconstruct arterial, portal, and delayed phase images. The investigation of the optimal WW involved two blinded readers. Then, five other blinded readers independently read the image sets for detection of HCCs and evaluation of image quality with optimal or conventional liver WW. The optimal WW for detection of HCC was 119 (rounded to 120 in the subsequent analyses) Hounsfield unit (HU), which was the average of adjusted WW in the arterial, portal, and delayed phases. The average figures of merit for the readers for the jackknife alternative free-response receiver operating characteristic analysis to detect HCC were 0.809 (reader 1/2/3/4/5, 0.765/0.798/0.892/0.764/0.827) in the optimal WW (120 HU) and 0.765 (reader 1/2/3/4/5, 0.707/0.769/0.838/0.720/0.791) in the conventional WW (150 HU), and statistically significant difference was observed between them (p < 0.001). Image quality in the optimal WW was superior to those in the conventional WW, and significant difference was seen for some readers (p < 0.041). The optimal WW for detection of HCC was narrower than conventional WW on dynamic contrast-enhanced CT with DLR. Compared with the conventional liver WW, optimal liver WW significantly improved detection performance of HCC.

Assuntos

Carcinoma Hepatocelular , Meios de Contraste , Aprendizado Profundo , Processamento de Imagem Assistida por Computador , Neoplasias Hepáticas , Tomografia Computadorizada por Raios X , Humanos , Carcinoma Hepatocelular/diagnóstico por imagem , Neoplasias Hepáticas/diagnóstico por imagem , Masculino , Feminino , Tomografia Computadorizada por Raios X/métodos , Pessoa de Meia-Idade , Idoso , Estudos Retrospectivos , Processamento de Imagem Assistida por Computador/métodos , Fígado/diagnóstico por imagem , Idoso de 80 Anos ou mais , Adulto

8.

Super-resolution Deep Learning Reconstruction for 3D Brain MR Imaging: Improvement of Cranial Nerve Depiction and Interobserver Agreement in Evaluations of Neurovascular Conflict.

Yasaka, Koichiro; Kanzawa, Jun; Nakaya, Moto; Kurokawa, Ryo; Tajima, Taku; Akai, Hiroyuki; Yoshioka, Naoki; Akahane, Masaaki; Ohtomo, Kuni; Abe, Osamu; Kiryu, Shigeru.

Acad Radiol ; 2024 Jun 18.

Artigo em Inglês | MEDLINE | ID: mdl-38897913

RESUMO

RATIONALE AND OBJECTIVES: To determine if super-resolution deep learning reconstruction (SR-DLR) improves the depiction of cranial nerves and interobserver agreement when assessing neurovascular conflict in 3D fast asymmetric spin echo (3D FASE) brain MR images, as compared to deep learning reconstruction (DLR). MATERIALS AND METHODS: This retrospective study involved reconstructing 3D FASE MR images of the brain for 37 patients using SR-DLR and DLR. Three blinded readers conducted qualitative image analyses, evaluating the degree of neurovascular conflict, structure depiction, sharpness, noise, and diagnostic acceptability. Quantitative analyses included measuring edge rise distance (ERD), edge rise slope (ERS), and full width at half maximum (FWHM) using the signal intensity profile along a linear region of interest across the center of the basilar artery. RESULTS: Interobserver agreement on the degree of neurovascular conflict of the facial nerve was generally higher with SR-DLR (0.429-0.923) compared to DLR (0.175-0.689). SR-DLR exhibited increased subjective image noise compared to DLR (p ≥ 0.008). However, all three readers found SR-DLR significantly superior in terms of sharpness (p < 0.001); cranial nerve depiction, particularly of facial and acoustic nerves, as well as the osseous spiral lamina (p < 0.001); and diagnostic acceptability (p ≤ 0.002). The FWHM (mm)/ERD (mm)/ERS (mm-1) for SR-DLR and DLR was 3.1-4.3/0.9-1.1/8795.5-10,703.5 and 3.3-4.8/1.4-2.1/5157.9-7705.8, respectively, with SR-DLR's image sharpness being significantly superior (p ≤ 0.001). CONCLUSION: SR-DLR enhances image sharpness, leading to improved cranial nerve depiction and a tendency for greater interobserver agreement regarding facial nerve neurovascular conflict.

9.

Deep learning reconstruction for improving the visualization of acute brain infarct on computed tomography.

Okimoto, Naomasa; Yasaka, Koichiro; Fujita, Nana; Watanabe, Yusuke; Kanzawa, Jun; Abe, Osamu.

Neuroradiology ; 66(1): 63-71, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-37991522

RESUMO

PURPOSE: This study aimed to investigate the impact of deep learning reconstruction (DLR) on acute infarct depiction compared with hybrid iterative reconstruction (Hybrid IR). METHODS: This retrospective study included 29 (75.8 ± 13.2 years, 20 males) and 26 (64.4 ± 12.4 years, 18 males) patients with and without acute infarction, respectively. Unenhanced head CT images were reconstructed with DLR and Hybrid IR. In qualitative analyses, three readers evaluated the conspicuity of lesions based on five regions and image quality. A radiologist placed regions of interest on the lateral ventricle, putamen, and white matter in quantitative analyses, and the standard deviation of CT attenuation (i.e., quantitative image noise) was recorded. RESULTS: Conspicuity of acute infarct in DLR was superior to that in Hybrid IR, and a statistically significant difference was observed for two readers (p ≤ 0.038). Conspicuity of acute infarct with time from onset to CT imaging at < 24 h in DLR was significantly improved compared with Hybrid IR for all readers (p ≤ 0.020). Image noise in DLR was significantly reduced compared with Hybrid IR with both the qualitative and quantitative analyses (p < 0.001 for all). CONCLUSION: DLR in head CT helped improve acute infarct depiction, especially those with time from onset to CT imaging at < 24 h.

Assuntos

Aprendizado Profundo , Masculino , Humanos , Estudos Retrospectivos , Infarto Encefálico , Encéfalo , Tomografia Computadorizada por Raios X , Interpretação de Imagem Radiográfica Assistida por Computador , Doses de Radiação , Algoritmos

10.

Massive true thymic hyperplasia with osseous metaplasia.

Kanzawa, Jun; Matsuki, Mitsuru; Kano, Shintaro; Nakamata, Akihiro; Nakata, Waka; Furukawa, Rieko; Baba, Katsuhisa; Ono, Shigeru; Mori, Harushi.

Radiol Case Rep ; 18(6): 2307-2310, 2023 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-37153480

RESUMO

True thymic hyperplasia is defined as an increase in both the size and weight of the gland, while maintaining a normal microscopic architecture. Massive true thymic hyperplasia is a rare type of hyperplasia that compresses adjacent structures and causes various symptoms. Limited reports address the imaging findings of massive true thymic hyperplasia. Herein, we report a case of massive true thymic hyperplasia in a 3-year-old girl with no remarkable medical history. Contrast-enhanced CT revealed an anterior mediastinal mass with a bilobed configuration containing punctate and linear calcifications in curvilinear septa, which corresponded to lamellar bone deposits in the interlobular septa. To our knowledge, this is the first report of massive true thymic hyperplasia with osseous metaplasia. We also discuss the imaging features and etiology of massive true thymic hyperplasia with osseous metaplasia.

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA