RESUMO
Deep learning applied to whole-slide histopathology images (WSIs) has the potential to enhance precision oncology and alleviate the workload of experts. However, developing these models necessitates large amounts of data with ground truth labels, which can be both time-consuming and expensive to obtain. Pathology reports are typically unstructured or poorly structured texts, and efforts to implement structured reporting templates have been unsuccessful, as these efforts lead to perceived extra workload. In this study, we hypothesised that large language models (LLMs), such as the generative pre-trained transformer 4 (GPT-4), can extract structured data from unstructured plain language reports using a zero-shot approach without requiring any re-training. We tested this hypothesis by utilising GPT-4 to extract information from histopathological reports, focusing on two extensive sets of pathology reports for colorectal cancer and glioblastoma. We found a high concordance between LLM-generated structured data and human-generated structured data. Consequently, LLMs could potentially be employed routinely to extract ground truth data for machine learning from unstructured pathology reports in the future. © 2023 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.
Assuntos
Glioblastoma , Medicina de Precisão , Humanos , Aprendizado de Máquina , Reino UnidoRESUMO
PURPOSE OF REVIEW: To evaluate the current applications and prospects of artificial intelligence and machine learning in diagnosing and managing axial spondyloarthritis (axSpA), focusing on their role in medical imaging, predictive modelling, and patient monitoring. RECENT FINDINGS: Artificial intelligence, particularly deep learning, is showing promise in diagnosing axSpA assisting with X-ray, computed tomography (CT) and MRI analyses, with some models matching or outperforming radiologists in detecting sacroiliitis and markers. Moreover, it is increasingly being used in predictive modelling of disease progression and personalized treatment, and could aid risk assessment, treatment response and clinical subtype identification. Variable study designs, sample sizes and the predominance of retrospective, single-centre studies still limit the generalizability of results. SUMMARY: Artificial intelligence technologies have significant potential to advance the diagnosis and treatment of axSpA, providing more accurate, efficient and personalized healthcare solutions. However, their integration into clinical practice requires rigorous validation, ethical and legal considerations, and comprehensive training for healthcare professionals. Future advances in artificial intelligence could complement clinical expertise and improve patient care through improved diagnostic accuracy and tailored therapeutic strategies, but the challenge remains to ensure that these technologies are validated in prospective multicentre trials and ethically integrated into patient care.
Assuntos
Inteligência Artificial , Espondiloartrite Axial , Aprendizado de Máquina , Humanos , Espondiloartrite Axial/diagnóstico , Aprendizado Profundo , Tomografia Computadorizada por Raios X/métodos , Imageamento por Ressonância Magnética/métodosRESUMO
Background Rapid advances in large language models (LLMs) have led to the development of numerous commercial and open-source models. While recent publications have explored OpenAI's GPT-4 to extract information of interest from radiology reports, there has not been a real-world comparison of GPT-4 to leading open-source models. Purpose To compare different leading open-source LLMs to GPT-4 on the task of extracting relevant findings from chest radiograph reports. Materials and Methods Two independent datasets of free-text radiology reports from chest radiograph examinations were used in this retrospective study performed between February 2, 2024, and February 14, 2024. The first dataset consisted of reports from the ImaGenome dataset, providing reference standard annotations from the MIMIC-CXR database acquired between 2011 and 2016. The second dataset consisted of randomly selected reports created at the Massachusetts General Hospital between July 2019 and July 2021. In both datasets, the commercial models GPT-3.5 Turbo and GPT-4 were compared with open-source models that included Mistral-7B and Mixtral-8 × 7B (Mistral AI), Llama 2-13B and Llama 2-70B (Meta), and Qwen1.5-72B (Alibaba Group), as well as CheXbert and CheXpert-labeler (Stanford ML Group), in their ability to accurately label the presence of multiple findings in radiograph text reports using zero-shot and few-shot prompting. The McNemar test was used to compare F1 scores between models. Results On the ImaGenome dataset (n = 450), the open-source model with the highest score, Llama 2-70B, achieved micro F1 scores of 0.97 and 0.97 for zero-shot and few-shot prompting, respectively, compared with the GPT-4 F1 scores of 0.98 and 0.98 (P > .99 and < .001 for superiority of GPT-4). On the institutional dataset (n = 500), the open-source model with the highest score, an ensemble model, achieved micro F1 scores of 0.96 and 0.97 for zero-shot and few-shot prompting, respectively, compared with the GPT-4 F1 scores of 0.98 and 0.97 (P < .001 and > .99 for superiority of GPT-4). Conclusion Although GPT-4 was superior to open-source models in zero-shot report labeling, few-shot prompting with a small number of example reports closely matched the performance of GPT-4. The benefit of few-shot prompting varied across datasets and models. © RSNA, 2024 Supplemental material is available for this article.
Assuntos
Radiografia Torácica , Humanos , Radiografia Torácica/métodos , Estudos Retrospectivos , Processamento de Linguagem NaturalRESUMO
Structured reporting (SR) has long been a goal in radiology to standardize and improve the quality of radiology reports. Despite evidence that SR reduces errors, enhances comprehensiveness, and increases adherence to guidelines, its widespread adoption has been limited. Recently, large language models (LLMs) have emerged as a promising solution to automate and facilitate SR. Therefore, this narrative review aims to provide an overview of LLMs for SR in radiology and beyond. We found that the current literature on LLMs for SR is limited, comprising ten studies on the generative pre-trained transformer (GPT)-3.5 (n = 5) and/or GPT-4 (n = 8), while two studies additionally examined the performance of Perplexity and Bing Chat or IT5. All studies reported promising results and acknowledged the potential of LLMs for SR, with six out of ten studies demonstrating the feasibility of multilingual applications. Building upon these findings, we discuss limitations, regulatory challenges, and further applications of LLMs in radiology report processing, encompassing four main areas: documentation, translation and summarization, clinical evaluation, and data mining. In conclusion, this review underscores the transformative potential of LLMs to improve efficiency and accuracy in SR and radiology report processing. KEY POINTS: Question How can LLMs help make SR in radiology more ubiquitous? Findings Current literature leveraging LLMs for SR is sparse but shows promising results, including the feasibility of multilingual applications. Clinical relevance LLMs have the potential to transform radiology report processing and enable the widespread adoption of SR. However, their future role in clinical practice depends on overcoming current limitations and regulatory challenges, including opaque algorithms and training data.
RESUMO
This study demonstrates that GPT-4V outperforms GPT-4 across radiology subspecialties in analyzing 207 cases with 1312 images from the Radiological Society of North America Case Collection.
Assuntos
Radiologia , Radiologia/métodos , Radiologia/estatística & dados numéricos , Humanos , Processamento de Imagem Assistida por Computador/métodosRESUMO
[This corrects the article DOI: 10.2196/54948.].
RESUMO
BACKGROUND: The successful integration of artificial intelligence (AI) in healthcare depends on the global perspectives of all stakeholders. This study aims to answer the research question: What are the attitudes of medical, dental, and veterinary students towards AI in education and practice, and what are the regional differences in these perceptions? METHODS: An anonymous online survey was developed based on a literature review and expert panel discussions. The survey assessed students' AI knowledge, attitudes towards AI in healthcare, current state of AI education, and preferences for AI teaching. It consisted of 16 multiple-choice items, eight demographic queries, and one free-field comment section. Medical, dental, and veterinary students from various countries were invited to participate via faculty newsletters and courses. The survey measured technological literacy, AI knowledge, current state of AI education, preferences for AI teaching, and attitudes towards AI in healthcare using Likert scales. Data were analyzed using descriptive statistics, Mann-Whitney U-test, Kruskal-Wallis test, and Dunn-Bonferroni post hoc test. RESULTS: The survey included 4313 medical, 205 dentistry, and 78 veterinary students from 192 faculties and 48 countries. Most participants were from Europe (51.1%), followed by North/South America (23.3%) and Asia (21.3%). Students reported positive attitudes towards AI in healthcare (median: 4, IQR: 3-4) and a desire for more AI teaching (median: 4, IQR: 4-5). However, they had limited AI knowledge (median: 2, IQR: 2-2), lack of AI courses (76.3%), and felt unprepared to use AI in their careers (median: 2, IQR: 1-3). Subgroup analyses revealed significant differences between the Global North and South (r = 0.025 to 0.185, all P < .001) and across continents (r = 0.301 to 0.531, all P < .001), with generally small effect sizes. CONCLUSIONS: This large-scale international survey highlights medical, dental, and veterinary students' positive perceptions of AI in healthcare, their strong desire for AI education, and the current lack of AI teaching in medical curricula worldwide. The study identifies a need for integrating AI education into medical curricula, considering regional differences in perceptions and educational needs. TRIAL REGISTRATION: Not applicable (no clinical trial).
Assuntos
Inteligência Artificial , Humanos , Estudos Transversais , Inquéritos e Questionários , Masculino , Feminino , Educação em Odontologia , Educação em Veterinária , Estudantes de Medicina/psicologia , Estudantes de Odontologia/psicologia , Estudantes de Odontologia/estatística & dados numéricos , Adulto , Adulto Jovem , Educação Médica , Currículo , Atitude do Pessoal de SaúdeRESUMO
Generative models, such as DALL-E 2 (OpenAI), could represent promising future tools for image generation, augmentation, and manipulation for artificial intelligence research in radiology, provided that these models have sufficient medical domain knowledge. Herein, we show that DALL-E 2 has learned relevant representations of x-ray images, with promising capabilities in terms of zero-shot text-to-image generation of new images, the continuation of an image beyond its original boundaries, and the removal of elements; however, its capabilities for the generation of images with pathological abnormalities (eg, tumors, fractures, and inflammation) or computed tomography, magnetic resonance imaging, or ultrasound images are still limited. The use of generative models for augmenting and generating radiological data thus seems feasible, even if the further fine-tuning and adaptation of these models to their respective domains are required first.
Assuntos
Inteligência Artificial , Radiologia , Humanos , Tomografia Computadorizada por Raios X/métodos , Imageamento por Ressonância Magnética/métodos , UltrassonografiaRESUMO
Background MRI is frequently used for early diagnosis of axial spondyloarthritis (axSpA). However, evaluation is time-consuming and requires profound expertise because noninflammatory degenerative changes can mimic axSpA, and early signs may therefore be missed. Deep neural networks could function as assistance for axSpA detection. Purpose To create a deep neural network to detect MRI changes in sacroiliac joints indicative of axSpA. Materials and Methods This retrospective multicenter study included MRI examinations of five cohorts of patients with clinical suspicion of axSpA collected at university and community hospitals between January 2006 and September 2020. Data from four cohorts were used as the training set, and data from one cohort as the external test set. Each MRI examination in the training and test sets was scored by six and seven raters, respectively, for inflammatory changes (bone marrow edema, enthesitis) and structural changes (erosions, sclerosis). A deep learning tool to detect changes indicative of axSpA was developed. First, a neural network to homogenize the images, then a classification network were trained. Performance was evaluated with use of area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. P < .05 was considered indicative of statistically significant difference. Results Overall, 593 patients (mean age, 37 years ± 11 [SD]; 302 women) were studied. Inflammatory and structural changes were found in 197 of 477 patients (41%) and 244 of 477 (51%), respectively, in the training set and 25 of 116 patients (22%) and 26 of 116 (22%) in the test set. The AUCs were 0.94 (95% CI: 0.84, 0.97) for all inflammatory changes, 0.88 (95% CI: 0.80, 0.95) for inflammatory changes fulfilling the Assessment of SpondyloArthritis international Society definition, and 0.89 (95% CI: 0.81, 0.96) for structural changes indicative of axSpA. Sensitivity and specificity on the external test set were 22 of 25 patients (88%) and 65 of 91 patients (71%), respectively, for inflammatory changes and 22 of 26 patients (85%) and 70 of 90 patients (78%) for structural changes. Conclusion Deep neural networks can detect inflammatory or structural changes to the sacroiliac joint indicative of axial spondyloarthritis at MRI. © RSNA, 2022 Online supplemental material is available for this article.
Assuntos
Espondiloartrite Axial , Aprendizado Profundo , Espondilartrite , Humanos , Feminino , Adulto , Articulação Sacroilíaca/diagnóstico por imagem , Espondilartrite/diagnóstico por imagem , Imageamento por Ressonância Magnética/métodosRESUMO
MOTIVATION: The development of deep, bidirectional transformers such as Bidirectional Encoder Representations from Transformers (BERT) led to an outperformance of several Natural Language Processing (NLP) benchmarks. Especially in radiology, large amounts of free-text data are generated in daily clinical workflow. These report texts could be of particular use for the generation of labels in machine learning, especially for image classification. However, as report texts are mostly unstructured, advanced NLP methods are needed to enable accurate text classification. While neural networks can be used for this purpose, they must first be trained on large amounts of manually labelled data to achieve good results. In contrast, BERT models can be pre-trained on unlabelled data and then only require fine tuning on a small amount of manually labelled data to achieve even better results. RESULTS: Using BERT to identify the most important findings in intensive care chest radiograph reports, we achieve areas under the receiver operation characteristics curve of 0.98 for congestion, 0.97 for effusion, 0.97 for consolidation and 0.99 for pneumothorax, surpassing the accuracy of previous approaches with comparatively little annotation effort. Our approach could therefore help to improve information extraction from free-text medical reports. Availability and implementationWe make the source code for fine-tuning the BERT-models freely available at https://github.com/fast-raidiology/bert-for-radiology. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Aprendizado Profundo , Humanos , Armazenamento e Recuperação da Informação , Aprendizado de Máquina , Processamento de Linguagem Natural , Redes Neurais de ComputaçãoRESUMO
OBJECTIVE: Training a convolutional neural network (CNN) to detect the most common causes of shoulder pain on plain radiographs and to assess its potential value in serving as an assistive device to physicians. MATERIALS AND METHODS: We used a CNN of the ResNet-50 architecture which was trained on 2700 shoulder radiographs from clinical practice of multiple institutions. All radiographs were reviewed and labeled for six findings: proximal humeral fractures, joint dislocation, periarticular calcification, osteoarthritis, osteosynthesis, and joint endoprosthesis. The trained model was then evaluated on a separate test dataset, which was previously annotated by three independent expert radiologists. Both the training and the test datasets included radiographs of highly variable image quality to reflect the clinical situation and to foster robustness of the CNN. Performance of the model was evaluated using receiver operating characteristic (ROC) curves, the thereof derived AUC as well as sensitivity and specificity. RESULTS: The developed CNN demonstrated a high accuracy with an area under the curve (AUC) of 0.871 for detecting fractures, 0.896 for joint dislocation, 0.945 for osteoarthritis, and 0.800 for periarticular calcifications. It also detected osteosynthesis and endoprosthesis with near perfect accuracy (AUC 0.998 and 1.0, respectively). Sensitivity and specificity were 0.75 and 0.86 for fractures, 0.95 and 0.65 for joint dislocation, 0.90 and 0.86 for osteoarthrosis, and 0.60 and 0.89 for calcification. CONCLUSION: CNNs have the potential to serve as an assistive device by providing clinicians a means to prioritize worklists or providing additional safety in situations of increased workload.
Assuntos
Aprendizado Profundo , Área Sob a Curva , Humanos , Redes Neurais de Computação , Curva ROC , Radiografia , Estudos Retrospectivos , Dor de OmbroRESUMO
BACKGROUND: Microwave ablation (MWA) is a minimally invasive treatment option for solid tumors and belongs to the local ablative therapeutic techniques, based on thermal tissue coagulation. So far there are mainly ex vivo studies that describe tissue shrinkage during MWA. PURPOSE: To characterize short-term volume changes of the ablated zone following hepatic MWA in an in vivo porcine liver model using contrast-enhanced computer tomography (CECT). MATERIAL AND METHODS: We performed multiple hepatic MWA with constant energy parameters in healthy, narcotized and laparotomized domestic pigs. The volumes of the ablated areas were calculated from venous phase CT scans, immediately after the ablation and in short-term courses of up to 2 h after MWA. RESULTS: In total, 19 thermally ablated areas in 10 porcine livers could be analyzed (n = 6 with two volume measurements during the measurement period and n = 13 with three measurements). Both groups showed a statistically significant but heterogeneous volume reduction of up to 12% (median 6%) of the ablated zones in CECT scans during the measurement period (P < 0.001 [n = 13] and P = 0.042 [n = 6]). However, the dimension and dynamics of volume changes were heterogenous both absolutely and relatively. CONCLUSION: We observed a significant short-term volume reduction of ablated liver tissue in vivo. This volume shrinkage must be considered in clinical practice for technically successful tumor treatment by MWA and therefore it should be further investigated in in vivo studies.
Assuntos
Técnicas de Ablação/métodos , Fígado/diagnóstico por imagem , Fígado/cirurgia , Tomografia Computadorizada por Raios X/métodos , Animais , Meios de Contraste , Modelos Animais de Doenças , Intensificação de Imagem Radiográfica/métodos , SuínosRESUMO
PURPOSE: Emphysema and chronic obstructive lung disease were previously identified as major risk factors for severe disease progression in COVID-19. Computed tomography (CT)-based lung-density analysis offers a fast, reliable, and quantitative assessment of lung density. Therefore, we aimed to assess the benefit of CT-based lung density measurements to predict possible severe disease progression in COVID-19. MATERIAL AND METHODS: Thirty COVID-19-positive patients were included in this retrospective study. Lung density was quantified based on routinely acquired chest CTs. Presence of COVID-19 was confirmed by reverse transcription polymerase chain reaction (RT-PCR). Wilcoxon test was used to compare two groups of patients. A multivariate regression analysis, adjusted for age and sex, was employed to model the relative increase of risk for severe disease, depending on the measured densities. RESULTS: Intensive care unit (ICU) patients or patients requiring mechanical ventilation showed a lower proportion of medium- and low-density lung volume compared to patients on the normal ward, but a significantly larger volume of high-density lung volume (12.26 dl IQR 4.65 dl vs. 7.51 dl vs. IQR 5.39 dl, p = 0.039). In multivariate regression analysis, high-density lung volume was identified as a significant predictor of severe disease. CONCLUSIONS: The amount of high-density lung tissue showed a significant association with severe COVID-19, with odds ratios of 1.42 (95% CI: 1.09-2.00) and 1.37 (95% CI: 1.03-2.11) for requiring intensive care and mechanical ventilation, respectively. Acknowledging our small sample size as an important limitation; our study might thus suggest that high-density lung tissue could serve as a possible predictor of severe COVID-19.
RESUMO
OBJECTIVES: To assess the potential of T1 mapping-based extracellular volume fraction (ECV) for the identification of higher grade clear cell renal cell carcinoma (cRCC), based on histopathology as the reference standard. METHODS: For this single-center, institutional review board-approved prospective study, 27 patients (17 men, median age 62 ± 12.4 years) with pathologic diagnosis of cRCC (nucleolar International Society of Urological Pathology (ISUP) grading) received abdominal MRI scans at 1.5 T using a modified Look-Locker inversion recovery (MOLLI) sequence between January 2017 and June 2018. Quantitative T1 values were measured at different time points (pre- and postcontrast agent administration) and quantification of the ECV was performed on MRI and histological sections (H&E staining). RESULTS: Reduction in T1 value after contrast agent administration and MR-derived ECV were reliable predictors for differentiating higher from lower grade cRCC. Postcontrast T1diff values (T1diff = T1 difference between the native and nephrogenic phase) and MR-derived ECV were significantly higher for higher grade cRCC (ISUP grades 3-4) compared with lower grade cRCC (ISUP grades 1-2) (p < 0.001). A cutoff value of 700 ms could distinguish higher grade from lower grade tumors with 100% (95% CI 0.69-1.00) sensitivity and 82% (95% CI 0.57-0.96) specificity. There was a positive and strong correlation between MR-derived ECV and histological ECV (p < 0.01, r = 0.88). Interobserver agreement for quantitative longitudinal relaxation times in the T1 maps was excellent. CONCLUSIONS: T1 mapping with ECV measurement could represent a novel in vivo biomarker for the classification of cRCC regarding their nucleolar grade, providing incremental diagnostic value as a quantitative MR marker. KEY POINTS: ⢠Reduction in MRI T1 relaxation times after contrast agent administration and MR-derived extracellular volume fraction are useful parameters for grading of clear cell renal cell carcinoma (cRCC). ⢠T1 differences between the native and the nephrogenic phase are higher for higher grade cRCC compared with lower grade cRCC and MRI-derived extracellular volume fraction (ECV) and histological ECV show a strong correlation. ⢠T1 mapping with ECV measurement may be helpful for the noninvasive assessment of cRCC pathology, being a safe and feasible method, and it has potential to optimize individualized treatment options, e.g., in the decision of active surveillance.
Assuntos
Carcinoma de Células Renais/patologia , Neoplasias Renais/patologia , Rim/patologia , Imageamento por Ressonância Magnética/métodos , Estadiamento de Neoplasias/métodos , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Estudos Prospectivos , Curva ROC , Reprodutibilidade dos TestesRESUMO
Background: Accurate lesion visualization after microwave ablation (MWA) remains a challenge. Computed tomography perfusion (CTP) has been proposed to improve visualization, but it was shown that different perfusion-models delivered different results on the same data set.Purpose: Comparison of different perfusion algorithms and identification of the algorithm enables for the best imaging of lesion after hepatic MWA.Materials and methods: 10 MWA with consecutive CTP were performed in healthy pigs. Parameter-maps were generated using a single-input-dual-compartment-model with Patlak's algorithm (PM), a dual-input-maximum-slope-model (DIMS), a dual-input-one-compartment-model (DIOC), a single-(SIDC) and dual-input-deconvolution-model (DIDC). Parameter-maps for hepatic arterial (AF) and portal venous blood flow (PF), mean transit time, hepatic blood volume (HBV) and capillary permeability were compared regarding the values of the normal liver tissue (NLT), lesion, contrast- and signal-to-noise ratios (SNR, CNR) and inter- and intrarater-reliability using the intraclass correlation coefficient, Bland-Altman plots and linear regression.Results: Perfusion values differed between algorithms with especially large fluctuations for the DIOC. A reliable differentiation of lesion margin appears feasible with parameter-maps of PF and HBV for most algorithms, except for the DIOC due to large fluctuations in PF. All algorithms allowed for a demarcation of the central necrotic zone based on hepatic AF and HBV. The DIDC showed the highest CNR and the best inter- and intrarater reliability.Conclusion: The DIDC appears to be the most feasible model to visualize margins and necrosis zones after microwave ablation, but due to high computational demand, a single input deconvolution algorithm might be preferable in clinical practice.
Assuntos
Técnicas de Ablação/métodos , Tomografia Computadorizada Quadridimensional/métodos , Micro-Ondas/uso terapêutico , Neoplasias/tratamento farmacológico , Neoplasias/radioterapia , Algoritmos , Animais , Modelos Animais de Doenças , Humanos , SuínosRESUMO
This study compares 2 large language models and their performance vs that of competing open-source models.