Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 61
Filter
Add more filters

Country/Region as subject
Affiliation country
Publication year range
1.
J Obstet Gynaecol Res ; 48(11): 2973-2978, 2022 Nov.
Article in English | MEDLINE | ID: mdl-35915563

ABSTRACT

Imaging and histological changes occurring in adenomyosis due to pregnancy are unclear. A 38-year-old nulliparous woman presented with dysmenorrhea and infertility. Pelvic magnetic resonance imaging (MRI) showed diffuse-type adenomyosis. Following pregnancy by in vitro fertilization, she was hospitalized at 23 weeks of gestation due to fetal growth restriction and subsequently diagnosed with preeclampsia. A second MRI performed due to an elevated inflammatory response at 31 weeks of gestation detected no obvious degenerative findings. An emergency cesarean section was performed at 33 weeks of gestation because of nonreassuring fetal status. On postpartum day 2, she showed uterine tenderness with a dramatically elevated inflammatory response. A third MRI showed cyst-like degenerations with hemorrhagic changes without abscess. By postpartum day 7, she was quickly relieved and discharged from the hospital. A fourth MRI at postpartum month 4 confirmed the disappearance of degenerations. This is the first report of imaging findings of early postpartum degeneration of adenomyosis.


Subject(s)
Adenomyosis , Cysts , Humans , Pregnancy , Female , Adult , Cesarean Section , Magnetic Resonance Imaging , Postpartum Period , Hemorrhage
2.
J Obstet Gynaecol Res ; 48(5): 1265-1270, 2022 May.
Article in English | MEDLINE | ID: mdl-35174573

ABSTRACT

Uterine fibroids are known to degenerate during pregnancy, but it is unknown if similar pathologic condition occurs in adenomyosis. A 38-year-old para 1 woman exhibited uterine tenderness and a markedly elevated inflammatory response at 22 weeks of gestation. Based on magnetic resonance imaging (MRI) findings indicative of hemorrhagic components in an adenomyosis lesion, we judged these features resulted from degeneration of adenomyosis after excluding the possibility of underlying infection by amniocentesis. Although these symptoms improved with conservative management, nonreassuring fetal status prompted an emergency cesarean section at 27 weeks of gestation. MRI performed 4 months postpartum revealed the degeneration had completely disappeared. The present case confirms the presence of a pathologic condition-transient degeneration in adenomyosis-which is triggered by pregnancy.


Subject(s)
Adenomyosis , Leiomyoma , Pregnancy Complications , Adenomyosis/diagnosis , Adult , Cesarean Section , Female , Hemorrhage , Humans , Magnetic Resonance Imaging , Male , Pregnancy
3.
BMC Med Inform Decis Mak ; 21(1): 262, 2021 09 11.
Article in English | MEDLINE | ID: mdl-34511100

ABSTRACT

BACKGROUND: It is essential for radiologists to communicate actionable findings to the referring clinicians reliably. Natural language processing (NLP) has been shown to help identify free-text radiology reports including actionable findings. However, the application of recent deep learning techniques to radiology reports, which can improve the detection performance, has not been thoroughly examined. Moreover, free-text that clinicians input in the ordering form (order information) has seldom been used to identify actionable reports. This study aims to evaluate the benefits of two new approaches: (1) bidirectional encoder representations from transformers (BERT), a recent deep learning architecture in NLP, and (2) using order information in addition to radiology reports. METHODS: We performed a binary classification to distinguish actionable reports (i.e., radiology reports tagged as actionable in actual radiological practice) from non-actionable ones (those without an actionable tag). 90,923 Japanese radiology reports in our hospital were used, of which 788 (0.87%) were actionable. We evaluated four methods, statistical machine learning with logistic regression (LR) and with gradient boosting decision tree (GBDT), and deep learning with a bidirectional long short-term memory (LSTM) model and a publicly available Japanese BERT model. Each method was used with two different inputs, radiology reports alone and pairs of order information and radiology reports. Thus, eight experiments were conducted to examine the performance. RESULTS: Without order information, BERT achieved the highest area under the precision-recall curve (AUPRC) of 0.5138, which showed a statistically significant improvement over LR, GBDT, and LSTM, and the highest area under the receiver operating characteristic curve (AUROC) of 0.9516. Simply coupling the order information with the radiology reports slightly increased the AUPRC of BERT but did not lead to a statistically significant improvement. This may be due to the complexity of clinical decisions made by radiologists. CONCLUSIONS: BERT was assumed to be useful to detect actionable reports. More sophisticated methods are required to use order information effectively.


Subject(s)
Natural Language Processing , Radiology , Humans , Logistic Models , Machine Learning , Radiography
4.
J Digit Imaging ; 34(2): 418-427, 2021 04.
Article in English | MEDLINE | ID: mdl-33555397

ABSTRACT

The purposes of this study are to propose an unsupervised anomaly detection method based on a deep neural network (DNN) model, which requires only normal images for training, and to evaluate its performance with a large chest radiograph dataset. We used the auto-encoding generative adversarial network (α-GAN) framework, which is a combination of a GAN and a variational autoencoder, as a DNN model. A total of 29,684 frontal chest radiographs from the Radiological Society of North America Pneumonia Detection Challenge dataset were used for this study (16,880 male and 12,804 female patients; average age, 47.0 years). All these images were labeled as "Normal," "No Opacity/Not Normal," or "Opacity" by board-certified radiologists. About 70% (6,853/9,790) of the Normal images were randomly sampled as the training dataset, and the rest were randomly split into the validation and test datasets in a ratio of 1:2 (7,610 and 15,221). Our anomaly detection system could correctly visualize various lesions including a lung mass, cardiomegaly, pleural effusion, bilateral hilar lymphadenopathy, and even dextrocardia. Our system detected the abnormal images with an area under the receiver operating characteristic curve (AUROC) of 0.752. The AUROCs for the abnormal labels Opacity and No Opacity/Not Normal were 0.838 and 0.704, respectively. Our DNN-based unsupervised anomaly detection method could successfully detect various diseases or anomalies in chest radiographs by training with only the normal images.


Subject(s)
Neural Networks, Computer , Radiography, Thoracic , Female , Humans , Male , Middle Aged , ROC Curve , Radiography , Radiologists
5.
NMR Biomed ; 31(7): e3938, 2018 07.
Article in English | MEDLINE | ID: mdl-29846988

ABSTRACT

Major depressive disorder (MDD) is a globally prevalent psychiatric disorder that results from disruption of multiple neural circuits involved in emotional regulation. Although previous studies using diffusion tensor imaging (DTI) found smaller values of fractional anisotropy (FA) in the white matter, predominantly in the frontal lobe, of patients with MDD, studies using diffusion kurtosis imaging (DKI) are scarce. Here, we used DKI whole-brain analysis with tract-based spatial statistics (TBSS) to investigate the brain microstructural abnormalities in MDD. Twenty-six patients with MDD and 42 age- and sex-matched control subjects were enrolled. To investigate the microstructural pathology underlying the observations in DKI, a compartment model analysis was conducted focusing on the corpus callosum. In TBSS, the patients with MDD showed significantly smaller values of FA in the genu and frontal portion of the body of the corpus callosum. The patients also had smaller values of mean kurtosis (MK) and radial kurtosis (RK), but MK and RK abnormalities were distributed more widely compared with FA, predominantly in the frontal lobe but also in the parietal, occipital, and temporal lobes. Within the callosum, the regions with smaller MK and RK were located more posteriorly than the region with smaller FA. Model analysis suggested significantly smaller values of intra-neurite signal fraction in the body of the callosum and greater fiber dispersion in the genu, which were compatible with the existing literature of white matter pathology in MDD. Our results show that DKI is capable of demonstrating microstructural alterations in the brains of patients with MDD that cannot be fully depicted by conventional DTI. Though the issues of model validation and parameter estimation still remain, it is suggested that diffusion MRI combined with a biophysical model is a promising approach for investigation of the pathophysiology of MDD.


Subject(s)
Depressive Disorder, Major/diagnostic imaging , Depressive Disorder, Major/pathology , Diffusion Tensor Imaging , White Matter/pathology , Adult , Algorithms , Case-Control Studies , Computer Simulation , Corpus Callosum/diagnostic imaging , Corpus Callosum/pathology , Female , Humans , Male , Statistics as Topic , White Matter/diagnostic imaging
6.
J Magn Reson Imaging ; 47(4): 948-953, 2018 04.
Article in English | MEDLINE | ID: mdl-28836310

ABSTRACT

BACKGROUND: The usefulness of computer-assisted detection (CAD) for detecting cerebral aneurysms has been reported; therefore, the improved performance of CAD will help to detect cerebral aneurysms. PURPOSE: To develop a CAD system for intracranial aneurysms on unenhanced magnetic resonance angiography (MRA) images based on a deep convolutional neural network (CNN) and a maximum intensity projection (MIP) algorithm, and to demonstrate the usefulness of the system by training and evaluating it using a large dataset. STUDY TYPE: Retrospective study. SUBJECTS: There were 450 cases with intracranial aneurysms. The diagnoses of brain aneurysms were made on the basis of MRA, which was performed as part of a brain screening program. FIELD STRENGTH/SEQUENCE: Noncontrast-enhanced 3D time-of-flight (TOF) MRA on 3T MR scanners. ASSESSMENT: In our CAD, we used a CNN classifier that predicts whether each voxel is inside or outside aneurysms by inputting MIP images generated from a volume of interest (VOI) around the voxel. The CNN was trained in advance using manually inputted labels. We evaluated our method using 450 cases with intracranial aneurysms, 300 of which were used for training, 50 for parameter tuning, and 100 for the final evaluation. STATISTICAL TESTS: Free-response receiver operating characteristic (FROC) analysis. RESULTS: Our CAD system detected 94.2% (98/104) of aneurysms with 2.9 false positives per case (FPs/case). At a sensitivity of 70%, the number of FPs/case was 0.26. DATA CONCLUSION: We showed that the combination of a CNN and an MIP algorithm is useful for the detection of intracranial aneurysms. LEVEL OF EVIDENCE: 4 Technical Efficacy: Stage 1 J. Magn. Reson. Imaging 2018;47:948-953.


Subject(s)
Cerebral Angiography/methods , Image Interpretation, Computer-Assisted/methods , Imaging, Three-Dimensional/methods , Intracranial Aneurysm/diagnostic imaging , Magnetic Resonance Angiography/methods , Female , Humans , Image Processing, Computer-Assisted , Male , Middle Aged , Retrospective Studies , Sensitivity and Specificity
7.
J Digit Imaging ; 30(5): 629-639, 2017 Oct.
Article in English | MEDLINE | ID: mdl-28405834

ABSTRACT

We propose a generalized framework for developing computer-aided detection (CADe) systems whose characteristics depend only on those of the training dataset. The purpose of this study is to show the feasibility of the framework. Two different CADe systems were experimentally developed by a prototype of the framework, but with different training datasets. The CADe systems include four components; preprocessing, candidate area extraction, candidate detection, and candidate classification. Four pretrained algorithms with dedicated optimization/setting methods corresponding to the respective components were prepared in advance. The pretrained algorithms were sequentially trained in the order of processing of the components. In this study, two different datasets, brain MRA with cerebral aneurysms and chest CT with lung nodules, were collected to develop two different types of CADe systems in the framework. The performances of the developed CADe systems were evaluated by threefold cross-validation. The CADe systems for detecting cerebral aneurysms in brain MRAs and for detecting lung nodules in chest CTs were successfully developed using the respective datasets. The framework was shown to be feasible by the successful development of the two different types of CADe systems. The feasibility of this framework shows promise for a new paradigm in the development of CADe systems: development of CADe systems without any lesion specific algorithm designing.


Subject(s)
Algorithms , Diagnosis, Computer-Assisted/methods , Intracranial Aneurysm/diagnostic imaging , Magnetic Resonance Angiography/methods , Multiple Pulmonary Nodules/diagnostic imaging , Tomography, X-Ray Computed/methods , Feasibility Studies , Female , Humans , Male , Middle Aged
8.
Nihon Hoshasen Gijutsu Gakkai Zasshi ; 70(11): 1290-6, 2014 Nov.
Article in Japanese | MEDLINE | ID: mdl-25410336

ABSTRACT

Magnetic resonance imaging (MRI) enables the evaluation of organ structure and function. Oxygen-enhanced MRI (O2-enhanced MRI) is a method for evaluating the pulmonary ventilation function using oxygen as a contrast agent. We created the Cine View of Relative Enhancement Ratio Map (Cine RER map) in O2-enhanced MRI to easily observe the contrast effect for clinical use. Relative enhancement ratio (RER) was determined as the pixel values of the Cine RER map. Moreover, six healthy volunteers underwent O2-enhanced MRI to determine the appropriate scale width of the Cine RER map. We calculated each RER and set 0 to 1.27 as the scale width of the Cine RER map based on the results. The Cine RER map made it possible to observe the contrast effect over time and thus is a convenient tool for evaluating the pulmonary ventilation function in O2-enhanced MRI.


Subject(s)
Image Enhancement/methods , Magnetic Resonance Imaging/methods , Adult , Humans , Male , Oxygen , Phantoms, Imaging , Young Adult
9.
Insights Imaging ; 15(1): 102, 2024 Apr 05.
Article in English | MEDLINE | ID: mdl-38578554

ABSTRACT

OBJECTIVES: To investigate the relationship between low kidney volume and subsequent estimated glomerular filtration rate (eGFR) decline in eGFR category G2 (60-89 mL/min/1.73 m2) population. METHODS: In this retrospective study, we evaluated 5531 individuals with eGFR category G2 who underwent medical checkups at our institution between November 2006 and October 2017. Exclusion criteria were absent for follow-up visit, missing data, prior renal surgery, current renal disease under treatment, large renal masses, and horseshoe kidney. We developed a 3D U-net-based automated system for renal volumetry on CT images. Participants were grouped by sex-specific kidney volume deviations set at mean minus one standard deviation. After 1:1 propensity score matching, we obtained 397 pairs of individuals in the low kidney volume (LKV) and control groups. The primary endpoint was progression of eGFR categories within 5 years, assessed using Cox regression analysis. RESULTS: This study included 3220 individuals (mean age, 60.0 ± 9.7 years; men, n = 2209). The kidney volume was 404.6 ± 67.1 and 376.8 ± 68.0 cm3 in men and women, respectively. The low kidney volume (LKV) cutoff was 337.5 and 308.8 cm3 for men and women, respectively. LKV was a significant risk factor for the endpoint with an adjusted hazard ratio of 1.64 (95% confidence interval: 1.09-2.45; p = 0.02). CONCLUSION: Low kidney volume may adversely affect subsequent eGFR maintenance; hence, the use of imaging metrics may help predict eGFR decline. CRITICAL RELEVANCE STATEMENT: Low kidney volume is a significant predictor of reduced kidney function over time; thus, kidney volume measurements could aid in early identification of individuals at risk for declining kidney health. KEY POINTS: • This study explores how kidney volume affects subsequent kidney function maintenance. • Low kidney volume was associated with estimated glomerular filtration rate decreases. • Low kidney volume is a prognostic indicator of estimated glomerular filtration rate decline.

10.
AJNR Am J Neuroradiol ; 45(10): 1506-1511, 2024 Oct 03.
Article in English | MEDLINE | ID: mdl-38719605

ABSTRACT

BACKGROUND AND PURPOSE: The rise of large language models such as generative pretrained transformers (GPTs) has sparked considerable interest in radiology, especially in interpreting radiologic reports and image findings. While existing research has focused on GPTs estimating diagnoses from radiologic descriptions, exploring alternative diagnostic information sources is also crucial. This study introduces the use of GPTs (GPT-3.5 Turbo and GPT-4) for information retrieval and summarization, searching relevant case reports via PubMed, and investigates their potential to aid diagnosis. MATERIALS AND METHODS: From October 2021 to December 2023, we selected 115 cases from the "Case of the Week" series on the American Journal of Neuroradiology website. Their Description and Legend sections were presented to the GPTs for the 2 tasks. For the Direct Diagnosis task, the models provided 3 differential diagnoses that were considered correct if they matched the diagnosis in the diagnosis section. For the Case Report Search task, the models generated 2 keywords per case, creating PubMed search queries to extract up to 3 relevant reports. A response was considered correct if reports containing the disease name stated in the diagnosis section were extracted. The McNemar test was used to evaluate whether adding a Case Report Search to Direct Diagnosis improved overall accuracy. RESULTS: In the Direct Diagnosis task, GPT-3.5 Turbo achieved a correct response rate of 26% (30/115 cases), whereas GPT-4 achieved 41% (47/115). For the Case Report Search task, GPT-3.5 Turbo scored 10% (11/115), and GPT-4 scored 7% (8/115). Correct responses totaled 32% (37/115) with 3 overlapping cases for GPT-3.5 Turbo, whereas GPT-4 had 43% (50/115) of correct responses with 5 overlapping cases. Adding Case Report Search improved GPT-3.5 Turbo's performance (P = .023) but not that of GPT-4 (P = .248). CONCLUSIONS: The effectiveness of adding Case Report Search to GPT-3.5 Turbo was particularly pronounced, suggesting its potential as an alternative diagnostic approach to GPTs, particularly in scenarios where direct diagnoses from GPTs are not obtainable. Nevertheless, the overall performance of GPT models in both direct diagnosis and case report retrieval tasks remains not optimal, and users should be aware of their limitations.


Subject(s)
Information Storage and Retrieval , Humans , Diagnosis, Differential , Information Storage and Retrieval/methods
11.
Radiol Phys Technol ; 17(3): 725-738, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39048847

ABSTRACT

In this study, we investigated the application of distributed learning, including federated learning and cyclical weight transfer, in the development of computer-aided detection (CADe) software for (1) cerebral aneurysm detection in magnetic resonance (MR) angiography images and (2) brain metastasis detection in brain contrast-enhanced MR images. We used datasets collected from various institutions, scanner vendors, and magnetic field strengths for each target CADe software. We compared the performance of multiple strategies, including a centralized strategy, in which software development is conducted at a development institution after collecting de-identified data from multiple institutions. Our results showed that the performance of CADe software trained through distributed learning was equal to or better than that trained through the centralized strategy. However, the distributed learning strategies that achieved the highest performance depend on the target CADe software. Hence, distributed learning can become one of the strategies for CADe software development using data collected from multiple institutions.


Subject(s)
Intracranial Aneurysm , Magnetic Resonance Imaging , Humans , Magnetic Resonance Imaging/methods , Intracranial Aneurysm/diagnostic imaging , Image Processing, Computer-Assisted/methods , Software , Brain Neoplasms/diagnostic imaging , Head/diagnostic imaging , Machine Learning , Automation
12.
JMIR Med Educ ; 10: e54393, 2024 Mar 12.
Article in English | MEDLINE | ID: mdl-38470459

ABSTRACT

BACKGROUND: Previous research applying large language models (LLMs) to medicine was focused on text-based information. Recently, multimodal variants of LLMs acquired the capability of recognizing images. OBJECTIVE: We aim to evaluate the image recognition capability of generative pretrained transformer (GPT)-4V, a recent multimodal LLM developed by OpenAI, in the medical field by testing how visual information affects its performance to answer questions in the 117th Japanese National Medical Licensing Examination. METHODS: We focused on 108 questions that had 1 or more images as part of a question and presented GPT-4V with the same questions under two conditions: (1) with both the question text and associated images and (2) with the question text only. We then compared the difference in accuracy between the 2 conditions using the exact McNemar test. RESULTS: Among the 108 questions with images, GPT-4V's accuracy was 68% (73/108) when presented with images and 72% (78/108) when presented without images (P=.36). For the 2 question categories, clinical and general, the accuracies with and those without images were 71% (70/98) versus 78% (76/98; P=.21) and 30% (3/10) versus 20% (2/10; P≥.99), respectively. CONCLUSIONS: The additional information from the images did not significantly improve the performance of GPT-4V in the Japanese National Medical Licensing Examination.


Subject(s)
Licensure , Medicine , Japan , Language
13.
Int J Comput Assist Radiol Surg ; 19(10): 1991-2000, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39003437

ABSTRACT

PURPOSE: Many large radiographic datasets of lung nodules are available, but the small and hard-to-detect nodules are rarely validated by computed tomography. Such difficult nodules are crucial for training nodule detection methods. This lack of difficult nodules for training can be addressed by artificial nodule synthesis algorithms, which can create artificially embedded nodules. This study aimed to develop and evaluate a novel cost function for training networks to detect such lesions. Embedding artificial lesions in healthy medical images is effective when positive cases are insufficient for network training. Although this approach provides both positive (lesion-embedded) images and the corresponding negative (lesion-free) images, no known methods effectively use these pairs for training. This paper presents a novel cost function for segmentation-based detection networks when positive-negative pairs are available. METHODS: Based on the classic U-Net, new terms were added to the original Dice loss for reducing false positives and the contrastive learning of diseased regions in the image pairs. The experimental network was trained and evaluated, respectively, on 131,072 fully synthesized pairs of images simulating lung cancer and real chest X-ray images from the Japanese Society of Radiological Technology dataset. RESULTS: The proposed method outperformed RetinaNet and a single-shot multibox detector. The sensitivities were 0.688 and 0.507 when the number of false positives per image was 0.2, respectively, with and without fine-tuning under the leave-one-case-out setting. CONCLUSION: To our knowledge, this is the first study in which a method for detecting pulmonary nodules in chest X-ray images was evaluated on a real clinical dataset after being trained on fully synthesized images. The synthesized dataset is available at https://zenodo.org/records/10648433 .


Subject(s)
Lung Neoplasms , Solitary Pulmonary Nodule , Humans , Lung Neoplasms/diagnostic imaging , Solitary Pulmonary Nodule/diagnostic imaging , Radiographic Image Interpretation, Computer-Assisted/methods , Tomography, X-Ray Computed/methods , Radiography, Thoracic/methods , Algorithms , Multiple Pulmonary Nodules/diagnostic imaging , Neural Networks, Computer
14.
J Imaging Inform Med ; 37(3): 1217-1227, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38351224

ABSTRACT

To generate synthetic medical data incorporating image-tabular hybrid data by merging an image encoding/decoding model with a table-compatible generative model and assess their utility. We used 1342 cases from the Stony Brook University Covid-19-positive cases, comprising chest X-ray radiographs (CXRs) and tabular clinical data as a private dataset (pDS). We generated a synthetic dataset (sDS) through the following steps: (I) dimensionally reducing CXRs in the pDS using a pretrained encoder of the auto-encoding generative adversarial networks (αGAN) and integrating them with the correspondent tabular clinical data; (II) training the conditional tabular GAN (CTGAN) on this combined data to generate synthetic records, encompassing encoded image features and clinical data; and (III) reconstructing synthetic images from these encoded image features in the sDS using a pretrained decoder of the αGAN. The utility of sDS was assessed by the performance of the prediction models for patient outcomes (deceased or discharged). For the pDS test set, the area under the receiver operating characteristic (AUC) curve was calculated to compare the performance of prediction models trained separately with pDS, sDS, or a combination of both. We created an sDS comprising CXRs with a resolution of 256 × 256 pixels and tabular data containing 13 variables. The AUC for the outcome was 0.83 when the model was trained with the pDS, 0.74 with the sDS, and 0.87 when combining pDS and sDS for training. Our method is effective for generating synthetic records consisting of both images and tabular clinical data.


Subject(s)
COVID-19 , Radiography, Thoracic , SARS-CoV-2 , Humans , COVID-19/diagnostic imaging , Radiography, Thoracic/methods , Female , Male , Middle Aged , Aged , ROC Curve , Adult
15.
Radiol Phys Technol ; 17(1): 103-111, 2024 Mar.
Article in English | MEDLINE | ID: mdl-37917288

ABSTRACT

The purpose of the study was to develop a liver nodule diagnostic method that accurately localizes and classifies focal liver lesions and identifies the specific liver segments in which they reside by integrating a liver segment division algorithm using a four-dimensional (4D) fully convolutional residual network (FC-ResNet) with a localization and classification model. We retrospectively collected data and divided 106 gadolinium-ethoxybenzyl-diethylenetriamine pentaacetic acid-enhanced magnetic resonance examinations into Case-sets 1, 2, and 3. A liver segment division algorithm was developed using a 4D FC-ResNet and trained with semi-automatically created silver-standard annotations; performance was evaluated using manually created gold-standard annotations by calculating the Dice scores for each liver segment. The performance of the liver nodule diagnostic method was assessed by comparing the results with those of the original radiology reports. The mean Dice score between the output of the liver segment division model and the gold standard was 0.643 for Case-set 2 (normal liver contours) and 0.534 for Case-set 1 (deformed liver contours). Among the 64 lesions in Case-set 3, the diagnostic method localized 37 lesions, classified 33 lesions, and identified the liver segments for 30 lesions. A total of 28 lesions were true positives, matching the original radiology reports. The liver nodule diagnostic method, which integrates a liver segment division algorithm with a lesion localization and classification model, exhibits great potential for localizing and classifying focal liver lesions and identifying the liver segments in which they reside. Further improvements and validation using larger sample sizes will enhance its performance and clinical applicability.


Subject(s)
Contrast Media , Liver Neoplasms , Humans , Liver Neoplasms/diagnostic imaging , Liver Neoplasms/pathology , Retrospective Studies , Liver/diagnostic imaging , Gadolinium DTPA , Magnetic Resonance Imaging/methods
16.
Jpn J Radiol ; 42(10): 1100-1109, 2024 Oct.
Article in English | MEDLINE | ID: mdl-38856878

ABSTRACT

Medicine and deep learning-based artificial intelligence (AI) engineering represent two distinct fields each with decades of published history. The current rapid convergence of deep learning and medicine has led to significant advancements, yet it has also introduced ambiguity regarding data set terms common to both fields, potentially leading to miscommunication and methodological discrepancies. This narrative review aims to give historical context for these terms, accentuate the importance of clarity when these terms are used in medical deep learning contexts, and offer solutions to mitigate misunderstandings by readers from either field. Through an examination of historical documents, including articles, writing guidelines, and textbooks, this review traces the divergent evolution of terms for data sets and their impact. Initially, the discordant interpretations of the word 'validation' in medical and AI contexts are explored. We then show that in the medical field as well, terms traditionally used in the deep learning domain are becoming more common, with the data for creating models referred to as the 'training set', the data for tuning of parameters referred to as the 'validation (or tuning) set', and the data for the evaluation of models as the 'test set'. Additionally, the test sets used for model evaluation are classified into internal (random splitting, cross-validation, and leave-one-out) sets and external (temporal and geographic) sets. This review then identifies often misunderstood terms and proposes pragmatic solutions to mitigate terminological confusion in the field of deep learning in medicine. We support the accurate and standardized description of these data sets and the explicit definition of data set splitting terminologies in each publication. These are crucial methods for demonstrating the robustness and generalizability of deep learning applications in medicine. This review aspires to enhance the precision of communication, thereby fostering more effective and transparent research methodologies in this interdisciplinary field.


Subject(s)
Deep Learning , Terminology as Topic , Humans , History, 20th Century , Artificial Intelligence
17.
Int J Comput Assist Radiol Surg ; 19(3): 581-590, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38180621

ABSTRACT

PURPOSE: Standardized uptake values (SUVs) derived from 18F-fluoro-2-deoxy-D-glucose positron emission tomography/computed tomography are a crucial parameter for identifying tumors or abnormalities in an organ. Moreover, exploring ways to improve the identification of tumors or abnormalities using a statistical measurement tool is important in clinical research. Therefore, we developed a fully automatic method to create a personally normalized Z-score map of the liver SUV. METHODS: The normalized Z-score map for each patient was created using the SUV mean and standard deviation estimated from blood-test-derived variables, such as alanine aminotransferase and aspartate aminotransferase, as well as other demographic information. This was performed using the least absolute shrinkage and selection operator (LASSO)-based estimation formula. We also used receiver operating characteristic (ROC) to analyze the results of people with and without hepatic tumors and compared them to the ROC curve of normal SUV. RESULTS: A total of 7757 people were selected for this study. Of these, 7744 were healthy, while 13 had abnormalities. The area under the ROC curve results indicated that the anomaly detection approach (0.91) outperformed only the maximum SUV (0.89). To build the LASSO regression, sets of covariates, including sex, weight, body mass index, blood glucose level, triglyceride, total cholesterol, γ-glutamyl transpeptidase, total protein, creatinine, insulin, albumin, and cholinesterase, were used to determine the SUV mean, whereas weight was used to determine the SUV standard deviation. CONCLUSION: The Z-score normalizes the mean and standard deviation. It is effective in ROC curve analysis and increases the clarity of the abnormality. This normalization is a key technique for effective measurement of maximum glucose consumption by tumors in the liver.


Subject(s)
Fluorodeoxyglucose F18 , Neoplasms , Humans , Radiopharmaceuticals , Positron-Emission Tomography/methods , Neoplasms/diagnostic imaging , Liver/diagnostic imaging
18.
Jpn J Radiol ; 42(8): 918-926, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38733472

ABSTRACT

PURPOSE: To assess the performance of GPT-4 Turbo with Vision (GPT-4TV), OpenAI's latest multimodal large language model, by comparing its ability to process both text and image inputs with that of the text-only GPT-4 Turbo (GPT-4 T) in the context of the Japan Diagnostic Radiology Board Examination (JDRBE). MATERIALS AND METHODS: The dataset comprised questions from JDRBE 2021 and 2023. A total of six board-certified diagnostic radiologists discussed the questions and provided ground-truth answers by consulting relevant literature as necessary. The following questions were excluded: those lacking associated images, those with no unanimous agreement on answers, and those including images rejected by the OpenAI application programming interface. The inputs for GPT-4TV included both text and images, whereas those for GPT-4 T were entirely text. Both models were deployed on the dataset, and their performance was compared using McNemar's exact test. The radiological credibility of the responses was assessed by two diagnostic radiologists through the assignment of legitimacy scores on a five-point Likert scale. These scores were subsequently used to compare model performance using Wilcoxon's signed-rank test. RESULTS: The dataset comprised 139 questions. GPT-4TV correctly answered 62 questions (45%), whereas GPT-4 T correctly answered 57 questions (41%). A statistical analysis found no significant performance difference between the two models (P = 0.44). The GPT-4TV responses received significantly lower legitimacy scores from both radiologists than the GPT-4 T responses. CONCLUSION: No significant enhancement in accuracy was observed when using GPT-4TV with image input compared with that of using text-only GPT-4 T for JDRBE questions.


Subject(s)
Radiology , Humans , Japan , Radiology/education , Specialty Boards , Clinical Competence , Educational Measurement/methods
19.
Int J Comput Assist Radiol Surg ; 19(8): 1527-1536, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38625446

ABSTRACT

PURPOSE: The quality and bias of annotations by annotators (e.g., radiologists) affect the performance changes in computer-aided detection (CAD) software using machine learning. We hypothesized that the difference in the years of experience in image interpretation among radiologists contributes to annotation variability. In this study, we focused on how the performance of CAD software changes with retraining by incorporating cases annotated by radiologists with varying experience. METHODS: We used two types of CAD software for lung nodule detection in chest computed tomography images and cerebral aneurysm detection in magnetic resonance angiography images. Twelve radiologists with different years of experience independently annotated the lesions, and the performance changes were investigated by repeating the retraining of the CAD software twice, with the addition of cases annotated by each radiologist. Additionally, we investigated the effects of retraining using integrated annotations from multiple radiologists. RESULTS: The performance of the CAD software after retraining differed among annotating radiologists. In some cases, the performance was degraded compared to that of the initial software. Retraining using integrated annotations showed different performance trends depending on the target CAD software, notably in cerebral aneurysm detection, where the performance decreased compared to using annotations from a single radiologist. CONCLUSIONS: Although the performance of the CAD software after retraining varied among the annotating radiologists, no direct correlation with their experience was found. The performance trends differed according to the type of CAD software used when integrated annotations from multiple radiologists were used.


Subject(s)
Intracranial Aneurysm , Radiologists , Software , Tomography, X-Ray Computed , Humans , Intracranial Aneurysm/diagnostic imaging , Intracranial Aneurysm/diagnosis , Tomography, X-Ray Computed/methods , Diagnosis, Computer-Assisted/methods , Clinical Competence , Magnetic Resonance Angiography/methods , Machine Learning , Observer Variation , Lung Neoplasms/diagnostic imaging , Lung Neoplasms/diagnosis , Image Interpretation, Computer-Assisted/methods , Solitary Pulmonary Nodule/diagnostic imaging , Solitary Pulmonary Nodule/diagnosis
20.
Clin Nutr ; 43(1): 134-141, 2024 01.
Article in English | MEDLINE | ID: mdl-38041939

ABSTRACT

BACKGROUND & AIMS: While skeletal muscle index (SMI) is the most widely used indicator of low muscle mass (or sarcopenia) in oncology, optimal cut-offs (or definitions) to better predict survival are not standardized. METHODS: We compared five major definitions of SMI-based low muscle mass using an Asian patient cohort with gastrointestinal or genitourinary cancers. We analyzed 2015 patients with surgically-treated gastrointestinal (n = 1382) or genitourinary (n = 633) cancer with pre-surgical computed tomography images. We assessed the associations of clinical parameters, including low muscle mass by each definition, with cancer-specific survival (CSS) and overall survival (OS). RESULTS: During a median follow-up period of 61 months, 303 (15%) died of cancer, and 147 died of other causes. An Asian-based definition diagnosed 17.8% of patients as having low muscle mass, while the other Caucasian-based ones classified most (>70%) patients as such. All definitions significantly discriminated both CSS and OS between patients with low or normal muscle mass. Low muscle mass using any definition but one predicted a lower CSS on multivariate Cox regression analyses. All definitions were independent predictors of lower OS. The original multivariate model without incorporating low muscle mass had c-indices of 0.63 for CSS and 0.66 for OS, which increased to 0.64-0.67 for CSS and 0.67-0.70 for OS when low muscle mass was considered. The model with an Asian-based definition had the highest c-indices (0.67 for CSS and 0.70 for OS). CONCLUSIONS: The Asian-specific definition had the best predictive ability for mortality in this Asian patient cohort.


Subject(s)
Neoplasms , Sarcopenia , Humans , Prognosis , Sarcopenia/etiology , Muscle, Skeletal/diagnostic imaging , Muscle, Skeletal/pathology , Tomography, X-Ray Computed , Neoplasms/complications , Retrospective Studies
SELECTION OF CITATIONS
SEARCH DETAIL