Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 56
Filtrar
1.
Acad Radiol ; 2024 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-39353826

RESUMO

PURPOSE: To quantitatively and qualitatively evaluate and compare the performance of leading large language models (LLMs), including proprietary models (GPT-4, GPT-3.5 Turbo, Claude-3-Opus, and Gemini Ultra) and open-source models (Mistral-7b and Mistral-8×7b), in simplifying 109 interventional radiology reports. METHODS: Qualitative performance was assessed using a five-point Likert scale for accuracy, completeness, clarity, clinical relevance, naturalness, and error rates, including trust-breaking and post-therapy misconduct errors. Quantitative readability was assessed using Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), SMOG Index, and Dale-Chall Readability Score (DCRS). Paired t-tests and Bonferroni-corrected p-values were used for statistical analysis. RESULTS: Qualitative evaluation showed no significant differences between GPT-4 and Claude-3-Opus for any metrics evaluated (all Bonferroni-corrected p-values: p = 1), while they outperformed other assessed models across five qualitative metrics (p < 0.001). GPT-4 had the fewest content and trust-breaking errors, with Claude-3-Opus second. However, all models exhibited some level of trust-breaking and post-therapy misconduct errors, with GPT-4-Turbo and GPT-3.5-Turbo with few-shot prompting showing the lowest error rates, and Mistral-7B and Mistral-8×7B showing the highest. Quantitatively, GPT-4 surpassed Claude-3-Opus in all readability metrics (all p < 0.001), with a median FRE score of 69.01 (IQR: 64.88-73.14) versus 59.74 (IQR: 55.47-64.01) for Claude-3-Opus. GPT-4 also outperformed GPT-3.5-Turbo and Gemini Ultra (both p < 0.001). Inter-rater reliability was strong (κ = 0.77-0.84). CONCLUSIONS: GPT-4 and Claude-3-Opus demonstrated superior performance in generating simplified IR reports, but the presence of errors across all models, including trust-breaking errors, highlights the need for further refinement and validation before clinical implementation. CLINICAL RELEVANCE/APPLICATIONS: With the increasing complexity of interventional radiology (IR) procedures and the growing availability of electronic health records, simplifying IR reports is critical to improving patient understanding and clinical decision-making. This study provides insights into the performance of various LLMs in rewriting IR reports, which can help in selecting the most suitable model for clinical patient-centered applications.

2.
NPJ Digit Med ; 7(1): 288, 2024 Oct 23.
Artigo em Inglês | MEDLINE | ID: mdl-39443664

RESUMO

Large language models (LLMs) have broad medical knowledge and can reason about medical information across many domains, holding promising potential for diverse medical applications in the near future. In this study, we demonstrate a concerning vulnerability of LLMs in medicine. Through targeted manipulation of just 1.1% of the weights of the LLM, we can deliberately inject incorrect biomedical facts. The erroneous information is then propagated in the model's output while maintaining performance on other biomedical tasks. We validate our findings in a set of 1025 incorrect biomedical facts. This peculiar susceptibility raises serious security and trustworthiness concerns for the application of LLMs in healthcare settings. It accentuates the need for robust protective measures, thorough verification mechanisms, and stringent management of access to these models, ensuring their reliable and safe use in medical practice.

3.
Radiology ; 313(1): e241139, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39470431

RESUMO

Background Rapid advances in large language models (LLMs) have led to the development of numerous commercial and open-source models. While recent publications have explored OpenAI's GPT-4 to extract information of interest from radiology reports, there has not been a real-world comparison of GPT-4 to leading open-source models. Purpose To compare different leading open-source LLMs to GPT-4 on the task of extracting relevant findings from chest radiograph reports. Materials and Methods Two independent datasets of free-text radiology reports from chest radiograph examinations were used in this retrospective study performed between February 2, 2024, and February 14, 2024. The first dataset consisted of reports from the ImaGenome dataset, providing reference standard annotations from the MIMIC-CXR database acquired between 2011 and 2016. The second dataset consisted of randomly selected reports created at the Massachusetts General Hospital between July 2019 and July 2021. In both datasets, the commercial models GPT-3.5 Turbo and GPT-4 were compared with open-source models that included Mistral-7B and Mixtral-8 × 7B (Mistral AI), Llama 2-13B and Llama 2-70B (Meta), and Qwen1.5-72B (Alibaba Group), as well as CheXbert and CheXpert-labeler (Stanford ML Group), in their ability to accurately label the presence of multiple findings in radiograph text reports using zero-shot and few-shot prompting. The McNemar test was used to compare F1 scores between models. Results On the ImaGenome dataset (n = 450), the open-source model with the highest score, Llama 2-70B, achieved micro F1 scores of 0.97 and 0.97 for zero-shot and few-shot prompting, respectively, compared with the GPT-4 F1 scores of 0.98 and 0.98 (P > .99 and < .001 for superiority of GPT-4). On the institutional dataset (n = 500), the open-source model with the highest score, an ensemble model, achieved micro F1 scores of 0.96 and 0.97 for zero-shot and few-shot prompting, respectively, compared with the GPT-4 F1 scores of 0.98 and 0.97 (P < .001 and > .99 for superiority of GPT-4). Conclusion Although GPT-4 was superior to open-source models in zero-shot report labeling, few-shot prompting with a small number of example reports closely matched the performance of GPT-4. The benefit of few-shot prompting varied across datasets and models. © RSNA, 2024 Supplemental material is available for this article.


Assuntos
Radiografia Torácica , Humanos , Radiografia Torácica/métodos , Estudos Retrospectivos , Processamento de Linguagem Natural
4.
Eur Radiol ; 2024 Oct 23.
Artigo em Inglês | MEDLINE | ID: mdl-39438330

RESUMO

Structured reporting (SR) has long been a goal in radiology to standardize and improve the quality of radiology reports. Despite evidence that SR reduces errors, enhances comprehensiveness, and increases adherence to guidelines, its widespread adoption has been limited. Recently, large language models (LLMs) have emerged as a promising solution to automate and facilitate SR. Therefore, this narrative review aims to provide an overview of LLMs for SR in radiology and beyond. We found that the current literature on LLMs for SR is limited, comprising ten studies on the generative pre-trained transformer (GPT)-3.5 (n = 5) and/or GPT-4 (n = 8), while two studies additionally examined the performance of Perplexity and Bing Chat or IT5. All studies reported promising results and acknowledged the potential of LLMs for SR, with six out of ten studies demonstrating the feasibility of multilingual applications. Building upon these findings, we discuss limitations, regulatory challenges, and further applications of LLMs in radiology report processing, encompassing four main areas: documentation, translation and summarization, clinical evaluation, and data mining. In conclusion, this review underscores the transformative potential of LLMs to improve efficiency and accuracy in SR and radiology report processing. KEY POINTS: Question How can LLMs help make SR in radiology more ubiquitous? Findings Current literature leveraging LLMs for SR is sparse but shows promising results, including the feasibility of multilingual applications. Clinical relevance LLMs have the potential to transform radiology report processing and enable the widespread adoption of SR. However, their future role in clinical practice depends on overcoming current limitations and regulatory challenges, including opaque algorithms and training data.

5.
BMC Med Educ ; 24(1): 1066, 2024 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-39342231

RESUMO

BACKGROUND: The successful integration of artificial intelligence (AI) in healthcare depends on the global perspectives of all stakeholders. This study aims to answer the research question: What are the attitudes of medical, dental, and veterinary students towards AI in education and practice, and what are the regional differences in these perceptions? METHODS: An anonymous online survey was developed based on a literature review and expert panel discussions. The survey assessed students' AI knowledge, attitudes towards AI in healthcare, current state of AI education, and preferences for AI teaching. It consisted of 16 multiple-choice items, eight demographic queries, and one free-field comment section. Medical, dental, and veterinary students from various countries were invited to participate via faculty newsletters and courses. The survey measured technological literacy, AI knowledge, current state of AI education, preferences for AI teaching, and attitudes towards AI in healthcare using Likert scales. Data were analyzed using descriptive statistics, Mann-Whitney U-test, Kruskal-Wallis test, and Dunn-Bonferroni post hoc test. RESULTS: The survey included 4313 medical, 205 dentistry, and 78 veterinary students from 192 faculties and 48 countries. Most participants were from Europe (51.1%), followed by North/South America (23.3%) and Asia (21.3%). Students reported positive attitudes towards AI in healthcare (median: 4, IQR: 3-4) and a desire for more AI teaching (median: 4, IQR: 4-5). However, they had limited AI knowledge (median: 2, IQR: 2-2), lack of AI courses (76.3%), and felt unprepared to use AI in their careers (median: 2, IQR: 1-3). Subgroup analyses revealed significant differences between the Global North and South (r = 0.025 to 0.185, all P < .001) and across continents (r = 0.301 to 0.531, all P < .001), with generally small effect sizes. CONCLUSIONS: This large-scale international survey highlights medical, dental, and veterinary students' positive perceptions of AI in healthcare, their strong desire for AI education, and the current lack of AI teaching in medical curricula worldwide. The study identifies a need for integrating AI education into medical curricula, considering regional differences in perceptions and educational needs. TRIAL REGISTRATION: Not applicable (no clinical trial).


Assuntos
Inteligência Artificial , Humanos , Estudos Transversais , Inquéritos e Questionários , Masculino , Feminino , Educação em Odontologia , Educação em Veterinária , Estudantes de Medicina/psicologia , Estudantes de Odontologia/psicologia , Estudantes de Odontologia/estatística & dados numéricos , Adulto , Adulto Jovem , Educação Médica , Currículo , Atitude do Pessoal de Saúde
6.
medRxiv ; 2024 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-39281753

RESUMO

In clinical science and practice, text data, such as clinical letters or procedure reports, is stored in an unstructured way. This type of data is not a quantifiable resource for any kind of quantitative investigations and any manual review or structured information retrieval is time-consuming and costly. The capabilities of Large Language Models (LLMs) mark a paradigm shift in natural language processing and offer new possibilities for structured Information Extraction (IE) from medical free text. This protocol describes a workflow for LLM based information extraction (LLM-AIx), enabling extraction of predefined entities from unstructured text using privacy preserving LLMs. By converting unstructured clinical text into structured data, LLM-AIx addresses a critical barrier in clinical research and practice, where the efficient extraction of information is essential for improving clinical decision-making, enhancing patient outcomes, and facilitating large-scale data analysis. The protocol consists of four main processing steps: 1) Problem definition and data preparation, 2) data preprocessing, 3) LLM-based IE and 4) output evaluation. LLM-AIx allows integration on local hospital hardware without the need of transferring any patient data to external servers. As example tasks, we applied LLM-AIx for the anonymization of fictitious clinical letters from patients with pulmonary embolism. Additionally, we extracted symptoms and laterality of the pulmonary embolism of these fictitious letters. We demonstrate troubleshooting for potential problems within the pipeline with an IE on a real-world dataset, 100 pathology reports from the Cancer Genome Atlas Program (TCGA), for TNM stage extraction. LLM-AIx can be executed without any programming knowledge via an easy-to-use interface and in no more than a few minutes or hours, depending on the LLM model selected.

8.
Insights Imaging ; 15(1): 208, 2024 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-39143443

RESUMO

AIM: To determine the effectiveness of functional stress testing and computed tomography angiography (CTA) for diagnosis of obstructive coronary artery disease (CAD). METHODS AND RESULTS: Two-thousand nine-hundred twenty symptomatic stable chest pain patients were included in the international Collaborative Meta-Analysis of Cardiac CT consortium to compare CTA with exercise electrocardiography (exercise-ECG) and single-photon emission computed tomography (SPECT) for diagnosis of CAD defined as ≥ 50% diameter stenosis by invasive coronary angiography (ICA) as reference standard. Generalised linear mixed models were used for calculating the diagnostic accuracy of each diagnostic test including non-diagnostic results as dependent variables in a logistic regression model with random intercepts and slopes. Covariates were the reference standard ICA, the type of diagnostic method, and their interactions. CTA showed significantly better diagnostic performance (p < 0.0001) with a sensitivity of 94.6% (95% CI 92.7-96) and a specificity of 76.3% (72.2-80) compared to exercise-ECG with 54.9% (47.9-61.7) and 60.9% (53.4-66.3), SPECT with 72.9% (65-79.6) and 44.9% (36.8-53.4), respectively. The positive predictive value of CTA was ≥ 50% in patients with a clinical pretest probability of 10% or more while this was the case for ECG and SPECT at pretest probabilities of ≥ 40 and 28%. CTA reliably excluded obstructive CAD with a post-test probability of below 15% in patients with a pretest probability of up to 74%. CONCLUSION: In patients with stable chest pain, CTA is more effective than functional testing for the diagnosis as well as for reliable exclusion of obstructive CAD. CTA should become widely adopted in patients with intermediate pretest probability. SYSTEMATIC REVIEW REGISTRATION: PROSPERO Database for Systematic Reviews-CRD42012002780. CRITICAL RELEVANCE STATEMENT: In symptomatic stable chest pain patients, coronary CTA is more effective than functional testing for diagnosis and reliable exclusion of obstructive CAD in intermediate pretest probability of CAD. KEY POINTS: Coronary computed tomography angiography showed significantly better diagnostic performance (p < 0.0001) for diagnosis of coronary artery disease compared to exercise-ECG and SPECT. The positive predictive value of coronary computed tomography angiography was ≥ 50% in patients with a clinical pretest probability of at least 10%, for ECG ≥ 40%, and for SPECT 28%. Coronary computed tomography angiography reliably excluded obstructive coronary artery disease with a post-test probability of below 15% in patients with a pretest probability of up to 74%.

12.
Radiol Artif Intell ; 6(5): e230502, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39017033

RESUMO

Purpose To develop and evaluate a publicly available deep learning model for segmenting and classifying cardiac implantable electronic devices (CIEDs) on Digital Imaging and Communications in Medicine (DICOM) and smartphone-based chest radiographs. Materials and Methods This institutional review board-approved retrospective study included patients with implantable pacemakers, cardioverter defibrillators, cardiac resynchronization therapy devices, and cardiac monitors who underwent chest radiography between January 2012 and January 2022. A U-Net model with a ResNet-50 backbone was created to classify CIEDs on DICOM and smartphone images. Using 2321 chest radiographs in 897 patients (median age, 76 years [range, 18-96 years]; 625 male, 272 female), CIEDs were categorized into four manufacturers, 27 models, and one "other" category. Five smartphones were used to acquire 11 072 images. Performance was reported using the Dice coefficient on the validation set for segmentation or balanced accuracy on the test set for manufacturer and model classification, respectively. Results The segmentation tool achieved a mean Dice coefficient of 0.936 (IQR: 0.890-0.958). The model had an accuracy of 94.36% (95% CI: 90.93%, 96.84%; 251 of 266) for CIED manufacturer classification and 84.21% (95% CI: 79.31%, 88.30%; 224 of 266) for CIED model classification. Conclusion The proposed deep learning model, trained on both traditional DICOM and smartphone images, showed high accuracy for segmentation and classification of CIEDs on chest radiographs. Keywords: Conventional Radiography, Segmentation Supplemental material is available for this article. © RSNA, 2024 See also the commentary by Júdice de Mattos Farina and Celi in this issue.


Assuntos
Aprendizado Profundo , Desfibriladores Implantáveis , Radiografia Torácica , Smartphone , Humanos , Idoso , Feminino , Masculino , Adolescente , Radiografia Torácica/normas , Pessoa de Meia-Idade , Idoso de 80 Anos ou mais , Estudos Retrospectivos , Adulto , Adulto Jovem , Marca-Passo Artificial
13.
J Med Internet Res ; 26: e54948, 2024 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-38691404

RESUMO

This study demonstrates that GPT-4V outperforms GPT-4 across radiology subspecialties in analyzing 207 cases with 1312 images from the Radiological Society of North America Case Collection.


Assuntos
Radiologia , Radiologia/métodos , Radiologia/estatística & dados numéricos , Humanos , Processamento de Imagem Assistida por Computador/métodos
14.
Curr Opin Rheumatol ; 36(4): 267-273, 2024 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-38533807

RESUMO

PURPOSE OF REVIEW: To evaluate the current applications and prospects of artificial intelligence and machine learning in diagnosing and managing axial spondyloarthritis (axSpA), focusing on their role in medical imaging, predictive modelling, and patient monitoring. RECENT FINDINGS: Artificial intelligence, particularly deep learning, is showing promise in diagnosing axSpA assisting with X-ray, computed tomography (CT) and MRI analyses, with some models matching or outperforming radiologists in detecting sacroiliitis and markers. Moreover, it is increasingly being used in predictive modelling of disease progression and personalized treatment, and could aid risk assessment, treatment response and clinical subtype identification. Variable study designs, sample sizes and the predominance of retrospective, single-centre studies still limit the generalizability of results. SUMMARY: Artificial intelligence technologies have significant potential to advance the diagnosis and treatment of axSpA, providing more accurate, efficient and personalized healthcare solutions. However, their integration into clinical practice requires rigorous validation, ethical and legal considerations, and comprehensive training for healthcare professionals. Future advances in artificial intelligence could complement clinical expertise and improve patient care through improved diagnostic accuracy and tailored therapeutic strategies, but the challenge remains to ensure that these technologies are validated in prospective multicentre trials and ethically integrated into patient care.


Assuntos
Inteligência Artificial , Espondiloartrite Axial , Aprendizado de Máquina , Humanos , Espondiloartrite Axial/diagnóstico , Aprendizado Profundo , Tomografia Computadorizada por Raios X/métodos , Imageamento por Ressonância Magnética/métodos
15.
JAMA ; 331(15): 1320-1321, 2024 04 16.
Artigo em Inglês | MEDLINE | ID: mdl-38497956

RESUMO

This study compares 2 large language models and their performance vs that of competing open-source models.


Assuntos
Inteligência Artificial , Diagnóstico por Imagem , Anamnese , Idioma
16.
J Pathol ; 262(3): 310-319, 2024 03.
Artigo em Inglês | MEDLINE | ID: mdl-38098169

RESUMO

Deep learning applied to whole-slide histopathology images (WSIs) has the potential to enhance precision oncology and alleviate the workload of experts. However, developing these models necessitates large amounts of data with ground truth labels, which can be both time-consuming and expensive to obtain. Pathology reports are typically unstructured or poorly structured texts, and efforts to implement structured reporting templates have been unsuccessful, as these efforts lead to perceived extra workload. In this study, we hypothesised that large language models (LLMs), such as the generative pre-trained transformer 4 (GPT-4), can extract structured data from unstructured plain language reports using a zero-shot approach without requiring any re-training. We tested this hypothesis by utilising GPT-4 to extract information from histopathological reports, focusing on two extensive sets of pathology reports for colorectal cancer and glioblastoma. We found a high concordance between LLM-generated structured data and human-generated structured data. Consequently, LLMs could potentially be employed routinely to extract ground truth data for machine learning from unstructured pathology reports in the future. © 2023 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.


Assuntos
Glioblastoma , Medicina de Precisão , Humanos , Aprendizado de Máquina , Reino Unido
18.
Acta Radiol Open ; 12(10): 20584601231213740, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38034076

RESUMO

Background: The growing role of artificial intelligence (AI) in healthcare, particularly radiology, requires its unbiased and fair development and implementation, starting with the constitution of the scientific community. Purpose: To examine the gender and country distribution among academic editors in leading computer science and AI journals. Material and Methods: This cross-sectional study analyzed the gender and country distribution among editors-in-chief, senior, and associate editors in all 75 Q1 computer science and AI journals in the Clarivate Journal Citations Report and SCImago Journal Ranking 2022. Gender was determined using an open-source algorithm (Gender Guesser™), selecting the gender with the highest calibrated probability. Result: Among 4,948 editorial board members, women were underrepresented in all positions (editors-in-chief/senior editors/associate editors: 14%/18%/17%). The proportion of women correlated positively with the SCImago Journal Rank indicator (ρ = 0.329; p = .004). The U.S., the U.K., and China comprised 50% of editors, while Australia, Finland, Estonia, Denmark, the Netherlands, the U.K., Switzerland, and Slovenia had the highest women editor representation per million women population. Conclusion: Our results highlight gender and geographic disparities on leading computer science and AI journal editorial boards, with women being underrepresented in all positions and a disproportional relationship between the Global North and South.

19.
Joint Bone Spine ; 91(3): 105651, 2023 Oct 04.
Artigo em Inglês | MEDLINE | ID: mdl-37797827

RESUMO

Rheumatic disorders present a global health challenge, marked by inflammation and damage to joints, bones, and connective tissues. Accurate, timely diagnosis and appropriate management are crucial for favorable patient outcomes. Magnetic resonance imaging (MRI) has become indispensable in rheumatology, but interpretation remains laborious and variable. Artificial intelligence (AI), including machine learning (ML) and deep learning (DL), offers a means to improve and advance MRI analysis. This review examines current AI applications in rheumatology MRI analysis, addressing diagnostic support, disease classification, activity assessment, and progression monitoring. AI demonstrates promise, with high sensitivity, specificity, and accuracy, achieving or surpassing expert performance. The review also discusses clinical implementation challenges and future research directions to enhance rheumatic disease diagnosis and management.

20.
Med Sci Educ ; 33(4): 1007-1012, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37546190

RESUMO

The increasing use of artificial intelligence (AI) in medicine is associated with new ethical challenges and responsibilities. However, special considerations and concerns should be addressed when integrating AI applications into medical education, where healthcare, AI, and education ethics collide. This commentary explores the biomedical ethical responsibilities of medical institutions in incorporating AI applications into medical education by identifying potential concerns and limitations, with the goal of implementing applicable recommendations. The recommendations presented are intended to assist in developing institutional guidelines for the ethical use of AI for medical educators and students.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA