Búsqueda | Portal de Búsqueda de la BVS España

GPT-4 Turbo with Vision fails to outperform text-only GPT-4 Turbo in the Japan Diagnostic Radiology Board Examination.

Hirano, Yuichiro; Hanaoka, Shouhei; Nakao, Takahiro; Miki, Soichiro; Kikuchi, Tomohiro; Nakamura, Yuta; Nomura, Yukihiro; Yoshikawa, Takeharu; Abe, Osamu.

Jpn J Radiol ; 42(8): 918-926, 2024 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-38733472

RESUMEN

PURPOSE: To assess the performance of GPT-4 Turbo with Vision (GPT-4TV), OpenAI's latest multimodal large language model, by comparing its ability to process both text and image inputs with that of the text-only GPT-4 Turbo (GPT-4 T) in the context of the Japan Diagnostic Radiology Board Examination (JDRBE). MATERIALS AND METHODS: The dataset comprised questions from JDRBE 2021 and 2023. A total of six board-certified diagnostic radiologists discussed the questions and provided ground-truth answers by consulting relevant literature as necessary. The following questions were excluded: those lacking associated images, those with no unanimous agreement on answers, and those including images rejected by the OpenAI application programming interface. The inputs for GPT-4TV included both text and images, whereas those for GPT-4 T were entirely text. Both models were deployed on the dataset, and their performance was compared using McNemar's exact test. The radiological credibility of the responses was assessed by two diagnostic radiologists through the assignment of legitimacy scores on a five-point Likert scale. These scores were subsequently used to compare model performance using Wilcoxon's signed-rank test. RESULTS: The dataset comprised 139 questions. GPT-4TV correctly answered 62 questions (45%), whereas GPT-4 T correctly answered 57 questions (41%). A statistical analysis found no significant performance difference between the two models (P = 0.44). The GPT-4TV responses received significantly lower legitimacy scores from both radiologists than the GPT-4 T responses. CONCLUSION: No significant enhancement in accuracy was observed when using GPT-4TV with image input compared with that of using text-only GPT-4 T for JDRBE questions.

Asunto(s)

Radiología , Humanos , Japón , Radiología/educación , Consejos de Especialidades , Competencia Clínica , Evaluación Educacional/métodos

"KAIZEN" method realizing implementation of deep-learning models for COVID-19 CT diagnosis in real world hospitals.

Okada, Naoki; Umemura, Yutaka; Shi, Shoi; Inoue, Shusuke; Honda, Shun; Matsuzawa, Yohsuke; Hirano, Yuichiro; Kikuyama, Ayano; Yamakawa, Miho; Gyobu, Tomoko; Hosomi, Naohiro; Minami, Kensuke; Morita, Natsushiro; Watanabe, Atsushi; Yamasaki, Hiroyuki; Fukaguchi, Kiyomitsu; Maeyama, Hiroki; Ito, Kaori; Okamoto, Ken; Harano, Kouhei; Meguro, Naohito; Unita, Ryo; Koshiba, Shinichi; Endo, Takuro; Yamamoto, Tomonori; Yamashita, Tomoya; Shinba, Toshikazu; Fujimi, Satoshi.

Sci Rep ; 14(1): 1672, 2024 01 19.

Artículo en Inglés | MEDLINE | ID: mdl-38243054

RESUMEN

Numerous COVID-19 diagnostic imaging Artificial Intelligence (AI) studies exist. However, none of their models were of potential clinical use, primarily owing to methodological defects and the lack of implementation considerations for inference. In this study, all development processes of the deep-learning models are performed based on strict criteria of the "KAIZEN checklist", which is proposed based on previous AI development guidelines to overcome the deficiencies mentioned above. We develop and evaluate two binary-classification deep-learning models to triage COVID-19: a slice model examining a Computed Tomography (CT) slice to find COVID-19 lesions; a series model examining a series of CT images to find an infected patient. We collected 2,400,200 CT slices from twelve emergency centers in Japan. Area Under Curve (AUC) and accuracy were calculated for classification performance. The inference time of the system that includes these two models were measured. For validation data, the slice and series models recognized COVID-19 with AUCs and accuracies of 0.989 and 0.982, 95.9% and 93.0% respectively. For test data, the models' AUCs and accuracies were 0.958 and 0.953, 90.0% and 91.4% respectively. The average inference time per case was 2.83 s. Our deep-learning system realizes accuracy and inference speed high enough for practical use. The systems have already been implemented in four hospitals and eight are under progression. We released an application software and implementation code for free in a highly usable state to allow its use in Japan and globally.

Asunto(s)

COVID-19 , Aprendizaje Profundo , Humanos , COVID-19/diagnóstico por imagen , Inteligencia Artificial , Tomografía Computarizada por Rayos X/métodos , Programas Informáticos , Prueba de COVID-19

Efficient CO₂ conversion to CO using chemical looping over Co-In oxide.

Makiura, Jun-Ichiro; Kakihara, Sota; Higo, Takuma; Ito, Naoki; Hirano, Yuichiro; Sekine, Yasushi.

Chem Commun (Camb) ; 58(31): 4837-4840, 2022 Apr 14.

Artículo en Inglés | MEDLINE | ID: mdl-35297931

RESUMEN

CO2 conversion to CO by reverse water-gas shift using chemical looping (RWGS-CL) can be conducted at lower temperatures (ca. 723-823 K) than the conventional catalytic RWGS (>973 K), and has been attracting attention as an efficient process for CO production from CO2. In this study, Co-In2O3 was developed as an oxygen storage material (OSM) that can realize an efficient RWGS-CL process. Co-In2O3 showed a high CO2 splitting rate in the mid-temperature range (723-823 K) compared with previously reported materials and had high durability through redox cycles. Importantly, the maximum CO2 conversion in the CO2 splitting step (ca. 80%) was much higher than the equilibrium conversion of catalytic RWGS in the mid-temperature range, indicating that Co-In2O3 is a suitable OSM for the RWGS-CL process.

The Development of a Skin Cancer Classification System for Pigmented Skin Lesions Using Deep Learning.

Jinnai, Shunichi; Yamazaki, Naoya; Hirano, Yuichiro; Sugawara, Yohei; Ohe, Yuichiro; Hamamoto, Ryuji.

Biomolecules ; 10(8)2020 07 29.

Artículo en Inglés | MEDLINE | ID: mdl-32751349

RESUMEN

Recent studies have demonstrated the usefulness of convolutional neural networks (CNNs) to classify images of melanoma, with accuracies comparable to those achieved by dermatologists. However, the performance of a CNN trained with only clinical images of a pigmented skin lesion in a clinical image classification task, in competition with dermatologists, has not been reported to date. In this study, we extracted 5846 clinical images of pigmented skin lesions from 3551 patients. Pigmented skin lesions included malignant tumors (malignant melanoma and basal cell carcinoma) and benign tumors (nevus, seborrhoeic keratosis, senile lentigo, and hematoma/hemangioma). We created the test dataset by randomly selecting 666 patients out of them and picking one image per patient, and created the training dataset by giving bounding-box annotations to the rest of the images (4732 images, 2885 patients). Subsequently, we trained a faster, region-based CNN (FRCNN) with the training dataset and checked the performance of the model on the test dataset. In addition, ten board-certified dermatologists (BCDs) and ten dermatologic trainees (TRNs) took the same tests, and we compared their diagnostic accuracy with FRCNN. For six-class classification, the accuracy of FRCNN was 86.2%, and that of the BCDs and TRNs was 79.5% (p = 0.0081) and 75.1% (p < 0.00001), respectively. For two-class classification (benign or malignant), the accuracy, sensitivity, and specificity were 91.5%, 83.3%, and 94.5% by FRCNN; 86.6%, 86.3%, and 86.6% by BCD; and 85.3%, 83.5%, and 85.9% by TRN, respectively. False positive rates and positive predictive values were 5.5% and 84.7% by FRCNN, 13.4% and 70.5% by BCD, and 14.1% and 68.5% by TRN, respectively. We compared the classification performance of FRCNN with 20 dermatologists. As a result, the classification accuracy of FRCNN was better than that of the dermatologists. In the future, we plan to implement this system in society and have it used by the general public, in order to improve the prognosis of skin cancer.

Asunto(s)

Aprendizaje Profundo , Melanoma/clasificación , Neoplasias Cutáneas/clasificación , Piel/patología , Humanos , Melanoma/patología , Redes Neurales de la Computación , Neoplasias Cutáneas/patología

No improvement found with GPT-4o: results of additional experiments in the Japan Diagnostic Radiology Board Examination.

Hirano, Yuichiro; Hanaoka, Shouhei; Nakao, Takahiro; Miki, Soichiro; Kikuchi, Tomohiro; Nakamura, Yuta; Nomura, Yukihiro; Yoshikawa, Takeharu; Abe, Osamu.

Jpn J Radiol ; 2024 Jun 28.

Artículo en Inglés | MEDLINE | ID: mdl-38937409

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA