Pesquisa | Biblioteca Virtual em Saúde

1.

An open-source fine-tuned large language model for radiological impression generation: a multi-reader performance study.

Serapio, Adrian; Chaudhari, Gunvant; Savage, Cody; Lee, Yoo Jin; Vella, Maya; Sridhar, Shravan; Schroeder, Jamie Lee; Liu, Jonathan; Yala, Adam; Sohn, Jae Ho.

BMC Med Imaging ; 24(1): 254, 2024 Sep 27.

Artigo em Inglês | MEDLINE | ID: mdl-39333958

RESUMO

BACKGROUND: The impression section integrates key findings of a radiology report but can be subjective and variable. We sought to fine-tune and evaluate an open-source Large Language Model (LLM) in automatically generating impressions from the remainder of a radiology report across different imaging modalities and hospitals. METHODS: In this institutional review board-approved retrospective study, we collated a dataset of CT, US, and MRI radiology reports from the University of California San Francisco Medical Center (UCSFMC) (n = 372,716) and the Zuckerberg San Francisco General (ZSFG) Hospital and Trauma Center (n = 60,049), both under a single institution. The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score, an automatic natural language evaluation metric that measures word overlap, was used for automatic natural language evaluation. A reader study with five cardiothoracic radiologists was performed to more strictly evaluate the model's performance on a specific modality (CT chest exams) with a radiologist subspecialist baseline. We stratified the results of the reader performance study based on the diagnosis category and the original impression length to gauge case complexity. RESULTS: The LLM achieved ROUGE-L scores of 46.51, 44.2, and 50.96 on UCSFMC and upon external validation, ROUGE-L scores of 40.74, 37.89, and 24.61 on ZSFG across the CT, US, and MRI modalities respectively, implying a substantial degree of overlap between the model-generated impressions and impressions written by the subspecialist attending radiologists, but with a degree of degradation upon external validation. In our reader study, the model-generated impressions achieved overall mean scores of 3.56/4, 3.92/4, 3.37/4, 18.29 s,12.32 words, and 84 while the original impression written by a subspecialist radiologist achieved overall mean scores of 3.75/4, 3.87/4, 3.54/4, 12.2 s, 5.74 words, and 89 for clinical accuracy, grammatical accuracy, stylistic quality, edit time, edit distance, and ROUGE-L score respectively. The LLM achieved the highest clinical accuracy ratings for acute/emergent findings and on shorter impressions. CONCLUSIONS: An open-source fine-tuned LLM can generate impressions to a satisfactory level of clinical accuracy, grammatical accuracy, and stylistic quality. Our reader performance study demonstrates the potential of large language models in drafting radiology report impressions that can aid in streamlining radiologists' workflows.

Assuntos

Processamento de Linguagem Natural , Humanos , Estudos Retrospectivos , Imageamento por Ressonância Magnética/métodos , Tomografia Computadorizada por Raios X/métodos , Variações Dependentes do Observador , Sistemas de Informação em Radiologia

2.

Apple Vision Pro: A Paradigm Shift in Medical Technology.

Shanbhag, Nandan M; Bin Sumaida, Abdulrahman; Al Shamisi, Khalifa; Balaraj, Khalid.

Cureus ; 16(9): e69608, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-39308843

RESUMO

The introduction of Apple Vision Pro (AVP) marks a significant milestone in the intersection of technology and healthcare, offering unique capabilities in mixed reality, which Apple terms "spatial computing." This narrative review aims to explore the various applications of AVP in medical technology, emphasizing its impact on patient care, clinical practices, medical education, and future directions. The review synthesizes findings from multiple studies and articles published between January 2023 and May 2024, highlighting AVP's potential to enhance visualization in diagnostic imaging and surgical planning, assist visually impaired patients, and revolutionize medical education through immersive learning environments. Despite its promise, challenges remain in integrating AVP into existing healthcare systems and understanding its long-term impact on patient outcomes. As research continues, AVP is poised to play a pivotal role in the future of medicine, offering a transformative tool for healthcare professionals.

3.

Enhancing Medical Interview Skills Through AI-Simulated Patient Interactions: Nonrandomized Controlled Trial.

Yamamoto, Akira; Koda, Masahide; Ogawa, Hiroko; Miyoshi, Tomoko; Maeda, Yoshinobu; Otsuka, Fumio; Ino, Hideo.

JMIR Med Educ ; 10: e58753, 2024 Sep 23.

Artigo em Inglês | MEDLINE | ID: mdl-39312284

RESUMO

BACKGROUND: Medical interviewing is a critical skill in clinical practice, yet opportunities for practical training are limited in Japanese medical schools, necessitating urgent measures. Given advancements in artificial intelligence (AI) technology, its application in the medical field is expanding. However, reports on its application in medical interviews in medical education are scarce. OBJECTIVE: This study aimed to investigate whether medical students' interview skills could be improved by engaging with AI-simulated patients using large language models, including the provision of feedback. METHODS: This nonrandomized controlled trial was conducted with fourth-year medical students in Japan. A simulation program using large language models was provided to 35 students in the intervention group in 2023, while 110 students from 2022 who did not participate in the intervention were selected as the control group. The primary outcome was the score on the Pre-Clinical Clerkship Objective Structured Clinical Examination (pre-CC OSCE), a national standardized clinical skills examination, in medical interviewing. Secondary outcomes included surveys such as the Simulation-Based Training Quality Assurance Tool (SBT-QA10), administered at the start and end of the study. RESULTS: The AI intervention group showed significantly higher scores on medical interviews than the control group (AI group vs control group: mean 28.1, SD 1.6 vs 27.1, SD 2.2; P=.01). There was a trend of inverse correlation between the SBT-QA10 and pre-CC OSCE scores (regression coefficient -2.0 to -2.1). No significant safety concerns were observed. CONCLUSIONS: Education through medical interviews using AI-simulated patients has demonstrated safety and a certain level of educational effectiveness. However, at present, the educational effects of this platform on nonverbal communication skills are limited, suggesting that it should be used as a supplementary tool to traditional simulation education.

Assuntos

Inteligência Artificial , Competência Clínica , Simulação de Paciente , Humanos , Feminino , Masculino , Estudantes de Medicina , Japão , Avaliação Educacional/métodos , Entrevistas como Assunto/métodos , Educação de Graduação em Medicina/métodos , Treinamento por Simulação/métodos

4.

Impact of a Digital Scribe System on Clinical Documentation Time and Quality: Usability Study.

van Buchem, Marieke Meija; Kant, Ilse M J; King, Liza; Kazmaier, Jacqueline; Steyerberg, Ewout W; Bauer, Martijn P.

JMIR AI ; 3: e60020, 2024 Sep 23.

Artigo em Inglês | MEDLINE | ID: mdl-39312397

RESUMO

BACKGROUND: Physicians spend approximately half of their time on administrative tasks, which is one of the leading causes of physician burnout and decreased work satisfaction. The implementation of natural language processing-assisted clinical documentation tools may provide a solution. OBJECTIVE: This study investigates the impact of a commercially available Dutch digital scribe system on clinical documentation efficiency and quality. METHODS: Medical students with experience in clinical practice and documentation (n=22) created a total of 430 summaries of mock consultations and recorded the time they spent on this task. The consultations were summarized using 3 methods: manual summaries, fully automated summaries, and automated summaries with manual editing. We then randomly reassigned the summaries and evaluated their quality using a modified version of the Physician Documentation Quality Instrument (PDQI-9). We compared the differences between the 3 methods in descriptive statistics, quantitative text metrics (word count and lexical diversity), the PDQI-9, Recall-Oriented Understudy for Gisting Evaluation scores, and BERTScore. RESULTS: The median time for manual summarization was 202 seconds against 186 seconds for editing an automatic summary. Without editing, the automatic summaries attained a poorer PDQI-9 score than manual summaries (median PDQI-9 score 25 vs 31, P<.001, ANOVA test). Automatic summaries were found to have higher word counts but lower lexical diversity than manual summaries (P<.001, independent t test). The study revealed variable impacts on PDQI-9 scores and summarization time across individuals. Generally, students viewed the digital scribe system as a potentially useful tool, noting its ease of use and time-saving potential, though some criticized the summaries for their greater length and rigid structure. CONCLUSIONS: This study highlights the potential of digital scribes in improving clinical documentation processes by offering a first summary draft for physicians to edit, thereby reducing documentation time without compromising the quality of patient records. Furthermore, digital scribes may be more beneficial to some physicians than to others and could play a role in improving the reusability of clinical documentation. Future studies should focus on the impact and quality of such a system when used by physicians in clinical practice.

5.

Large language model triaging of simulated nephrology patient inbox messages.

Pham, Justin H; Thongprayoon, Charat; Miao, Jing; Suppadungsuk, Supawadee; Koirala, Priscilla; Craici, Iasmina M; Cheungpasitporn, Wisit.

Front Artif Intell ; 7: 1452469, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39315245

RESUMO

Background: Efficient triage of patient communications is crucial for timely medical attention and improved care. This study evaluates ChatGPT's accuracy in categorizing nephrology patient inbox messages, assessing its potential in outpatient settings. Methods: One hundred and fifty simulated patient inbox messages were created based on cases typically encountered in everyday practice at a nephrology outpatient clinic. These messages were triaged as non-urgent, urgent, and emergent by two nephrologists. The messages were then submitted to ChatGPT-4 for independent triage into the same categories. The inquiry process was performed twice with a two-week period in between. ChatGPT responses were graded as correct (agreement with physicians), overestimation (higher priority), or underestimation (lower priority). Results: In the first trial, ChatGPT correctly triaged 140 (93%) messages, overestimated the priority of 4 messages (3%), and underestimated the priority of 6 messages (4%). In the second trial, it correctly triaged 140 (93%) messages, overestimated the priority of 9 (6%), and underestimated the priority of 1 (1%). The accuracy did not depend on the urgency level of the message (p = 0.19). The internal agreement of ChatGPT responses was 92% with an intra-rater Kappa score of 0.88. Conclusion: ChatGPT-4 demonstrated high accuracy in triaging nephrology patient messages, highlighting the potential for AI-driven triage systems to enhance operational efficiency and improve patient care in outpatient clinics.

6.

Extracting Critical Information from Unstructured Clinicians' Notes Data to Identify Dementia Severity Using a Rule-Based Approach: Feasibility Study.

Prakash, Ravi; Dupre, Matthew E; Østbye, Truls; Xu, Hanzhang.

JMIR Aging ; 7: e57926, 2024 Sep 24.

Artigo em Inglês | MEDLINE | ID: mdl-39316421

RESUMO

BACKGROUND: The severity of Alzheimer disease and related dementias (ADRD) is rarely documented in structured data fields in electronic health records (EHRs). Although this information is important for clinical monitoring and decision-making, it is often undocumented or "hidden" in unstructured text fields and not readily available for clinicians to act upon. OBJECTIVE: We aimed to assess the feasibility and potential bias in using keywords and rule-based matching for obtaining information about the severity of ADRD from EHR data. METHODS: We used EHR data from a large academic health care system that included patients with a primary discharge diagnosis of ADRD based on ICD-9 (International Classification of Diseases, Ninth Revision) and ICD-10 (International Statistical Classification of Diseases, Tenth Revision) codes between 2014 and 2019. We first assessed the presence of ADRD severity information and then the severity of ADRD in the EHR. Clinicians' notes were used to determine the severity of ADRD based on two criteria: (1) scores from the Mini Mental State Examination and Montreal Cognitive Assessment and (2) explicit terms for ADRD severity (eg, "mild dementia" and "advanced Alzheimer disease"). We compiled a list of common ADRD symptoms, cognitive test names, and disease severity terms, refining it iteratively based on previous literature and clinical expertise. Subsequently, we used rule-based matching in Python using standard open-source data analysis libraries to identify the context in which specific words or phrases were mentioned. We estimated the prevalence of documented ADRD severity and assessed the performance of our rule-based algorithm. RESULTS: We included 9115 eligible patients with over 65,000 notes from the providers. Overall, 22.93% (2090/9115) of patients were documented with mild ADRD, 20.87% (1902/9115) were documented with moderate or severe ADRD, and 56.20% (5123/9115) did not have any documentation of the severity of their ADRD. For the task of determining the presence of any ADRD severity information, our algorithm achieved an accuracy of >95%, specificity of >95%, sensitivity of >90%, and an F1-score of >83%. For the specific task of identifying the actual severity of ADRD, the algorithm performed well with an accuracy of >91%, specificity of >80%, sensitivity of >88%, and F1-score of >92%. Comparing patients with mild ADRD to those with more advanced ADRD, the latter group tended to contain older, more likely female, and Black patients, and having received their diagnoses in primary care or in-hospital settings. Relative to patients with undocumented ADRD severity, those with documented ADRD severity had a similar distribution in terms of sex, race, and rural or urban residence. CONCLUSIONS: Our study demonstrates the feasibility of using a rule-based matching algorithm to identify ADRD severity from unstructured EHR report data. However, it is essential to acknowledge potential biases arising from differences in documentation practices across various health care systems.

Assuntos

Demência , Registros Eletrônicos de Saúde , Estudos de Viabilidade , Índice de Gravidade de Doença , Humanos , Demência/diagnóstico , Masculino , Feminino , Idoso , Doença de Alzheimer/diagnóstico , Idoso de 80 Anos ou mais

7.

Exploring Large Language Models for Detecting Online Vaccine Reactions.

Khademi, Sedigh; Palmer, Christopher; Dimaguila, Gerardo Luis; Javed, Muhammad; Buttery, Jim.

Stud Health Technol Inform ; 318: 30-35, 2024 Sep 24.

Artigo em Inglês | MEDLINE | ID: mdl-39320177

RESUMO

Social media offers a rich source of real-time health data, including potential vaccine reactions. However, extracting meaningful insights is challenging due to the noisy nature of social media content. This paper explores using large language models (LLMs) and prompt engineering to detect personal mentions of vaccine reactions. Different prompting strategies were evaluated on two LLM models (GPT-3.5 and GPT-4) using Reddit data focused on shingles (zoster) vaccines. Zero-shot and few-shot learning approaches with both standard and chain-of-thought prompts were compared. The findings demonstrate that GPT-based models with carefully crafted chain-of-thought prompts could identify the relevant social media posts. Few-shot learning helped GPT4 models to identify more of the marginal cases, although less precisely. The use of LLMs for classification with lightweight supervised pretrained language models (PLMs) found that PLMs outperform LLMs. However, a potential benefit in using LLMs to help identify records for training PLMs was revealed, especially to eliminate false negatives, and LLMs could be used as classifiers when insufficient data exists to train a PLM.

Assuntos

Mídias Sociais , Humanos , Processamento de Linguagem Natural , Vacinas , Aprendizado de Máquina

8.

Generative AI and large language models in nuclear medicine: current status and future prospects.

Hirata, Kenji; Matsui, Yusuke; Yamada, Akira; Fujioka, Tomoyuki; Yanagawa, Masahiro; Nakaura, Takeshi; Ito, Rintaro; Ueda, Daiju; Fujita, Shohei; Tatsugami, Fuminari; Fushimi, Yasutaka; Tsuboyama, Takahiro; Kamagata, Koji; Nozaki, Taiki; Fujima, Noriyuki; Kawamura, Mariko; Naganawa, Shinji.

Ann Nucl Med ; 2024 Sep 25.

Artigo em Inglês | MEDLINE | ID: mdl-39320419

RESUMO

This review explores the potential applications of Large Language Models (LLMs) in nuclear medicine, especially nuclear medicine examinations such as PET and SPECT, reviewing recent advancements in both fields. Despite the rapid adoption of LLMs in various medical specialties, their integration into nuclear medicine has not yet been sufficiently explored. We first discuss the latest developments in nuclear medicine, including new radiopharmaceuticals, imaging techniques, and clinical applications. We then analyze how LLMs are being utilized in radiology, particularly in report generation, image interpretation, and medical education. We highlight the potential of LLMs to enhance nuclear medicine practices, such as improving report structuring, assisting in diagnosis, and facilitating research. However, challenges remain, including the need for improved reliability, explainability, and bias reduction in LLMs. The review also addresses the ethical considerations and potential limitations of AI in healthcare. In conclusion, LLMs have significant potential to transform existing frameworks in nuclear medicine, making it a critical area for future research and development.

9.

ChatGPT Generated Otorhinolaryngology Multiple-Choice Questions: Quality, Psychometric Properties, and Suitability for Assessments.

Lotto, Cecilia; Sheppard, Sean C; Anschuetz, Wilma; Stricker, Daniel; Molinari, Giulia; Huwendiek, Sören; Anschuetz, Lukas.

OTO Open ; 8(3): e70018, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39328276

RESUMO

Objective: To explore Chat Generative Pretrained Transformer's (ChatGPT's) capability to create multiple-choice questions about otorhinolaryngology (ORL). Study Design: Experimental question generation and exam simulation. Setting: Tertiary academic center. Methods: ChatGPT 3.5 was prompted: "Can you please create a challenging 20-question multiple-choice questionnaire about clinical cases in otolaryngology, offering five answer options?." The generated questionnaire was sent to medical students, residents, and consultants. Questions were investigated regarding quality criteria. Answers were anonymized and the resulting data was analyzed in terms of difficulty and internal consistency. Results: ChatGPT 3.5 generated 20 exam questions of which 1 question was considered off-topic, 3 questions had a false answer, and 3 questions had multiple correct answers. Subspecialty theme repartition was as follows: 5 questions were on otology, 5 about rhinology, and 10 questions addressed head and neck. The qualities of focus and relevance were good while the vignette and distractor qualities were low. The level of difficulty was suitable for undergraduate medical students (n = 24), but too easy for residents (n = 30) or consultants (n = 10) in ORL. Cronbach's α was highest (.69) with 15 selected questions using students' results. Conclusion: ChatGPT 3.5 is able to generate grammatically correct simple ORL multiple choice questions for a medical student level. However, the overall quality of the questions was average, needing thorough review and revision by a medical expert to ensure suitability in future exams.

10.

Effectiveness of the Medical Chatbot PROSCA to Inform Patients About Prostate Cancer: Results of a Randomized Controlled Trial.

Baumgärtner, Kilian; Byczkowski, Michael; Schmid, Tamara; Muschko, Marc; Woessner, Philipp; Gerlach, Axel; Bonekamp, David; Schlemmer, Heinz-Peter; Hohenfellner, Markus; Görtz, Magdalena.

Eur Urol Open Sci ; 69: 80-88, 2024 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-39329071

RESUMO

Background and objective: Artificial intelligence (AI)-powered conversational agents are increasingly finding application in health care, as these can provide patient education at any time. However, their effectiveness in medical settings remains largely unexplored. This study aimed to assess the impact of the chatbot "PROState cancer Conversational Agent" (PROSCA), which was trained to provide validated support from diagnostic tests to treatment options for men facing prostate cancer (PC) diagnosis. Methods: The chatbot PROSCA, developed by urologists at Heidelberg University Hospital and SAP SE, was evaluated through a randomized controlled trial (RCT). Patients were assigned to either the chatbot group, receiving additional access to PROSCA alongside standard information by urologists, or the control group (1:1), receiving standard information. A total of 112 men were included, of whom 103 gave feedback at study completion. Key findings and limitations: Over time, patients' information needs decreased significantly more in the chatbot group than in the control group (p = 0.035). In the chatbot group, 43/54 men (79.6%) used PROSCA, and all of them found it easy to use. Of the men, 71.4% agreed that the chatbot improved their informedness about PC and 90.7% would like to use PROSCA again. Limitations are study sample size, single-center design, and specific clinical application. Conclusions and clinical implications: With the introduction of the PROSCA chatbot, we created and evaluated an innovative, evidence-based AI health information tool as an additional source of information for PC. Our RCT results showed significant benefits of the chatbot in reducing patients' information needs and enhancing their understanding of PC. This easy-to-use AI tool provides accurate, timely, and accessible support, demonstrating its value in the PC diagnosis process. Future steps include further customization of the chatbot's responses and integration with the existing health care systems to maximize its impact on patient outcomes. Patient summary: This study evaluated an artificial intelligence-powered chatbot-PROSCA, a digital tool designed to support men facing prostate cancer diagnosis by providing validated information from diagnosis to treatment. Results showed that patients who used the chatbot as an additional tool felt better informed than those who received standard information from urologists. The majority of users appreciated the ease of use of the chatbot and expressed a desire to use it again; this suggests that PROSCA could be a valuable resource to improve patient understanding in prostate cancer diagnosis.

11.

Artificial Intelligence in Dental Education: Opportunities and Challenges of Large Language Models and Multimodal Foundation Models.

Claman, Daniel; Sezgin, Emre.

JMIR Med Educ ; 10: e52346, 2024 Sep 27.

Artigo em Inglês | MEDLINE | ID: mdl-39331527

RESUMO

Unlabelled: Instructional and clinical technologies have been transforming dental education. With the emergence of artificial intelligence (AI), the opportunities of using AI in education has increased. With the recent advancement of generative AI, large language models (LLMs) and foundation models gained attention with their capabilities in natural language understanding and generation as well as combining multiple types of data, such as text, images, and audio. A common example has been ChatGPT, which is based on a powerful LLM-the GPT model. This paper discusses the potential benefits and challenges of incorporating LLMs in dental education, focusing on periodontal charting with a use case to outline capabilities of LLMs. LLMs can provide personalized feedback, generate case scenarios, and create educational content to contribute to the quality of dental education. However, challenges, limitations, and risks exist, including bias and inaccuracy in the content created, privacy and security concerns, and the risk of overreliance. With guidance and oversight, and by effectively and ethically integrating LLMs, dental education can incorporate engaging and personalized learning experiences for students toward readiness for real-life clinical practice.

Assuntos

Inteligência Artificial , Educação em Odontologia , Humanos , Educação em Odontologia/métodos , Modelos Educacionais

12.

LLMDiff: Diffusion Model Using Frozen LLM Transformers for Precipitation Nowcasting.

She, Lei; Zhang, Chenghong; Man, Xin; Shao, Jie.

Sensors (Basel) ; 24(18)2024 Sep 19.

Artigo em Inglês | MEDLINE | ID: mdl-39338794

RESUMO

Precipitation nowcasting, which involves the short-term, high-resolution prediction of rainfall, plays a crucial role in various real-world applications. In recent years, researchers have increasingly utilized deep learning-based methods in precipitation nowcasting. The exponential growth of spatiotemporal observation data has heightened interest in recent advancements such as denoising diffusion models, which offer appealing prospects due to their inherent probabilistic nature that aligns well with the complexities of weather forecasting. Successful application of diffusion models in rainfall prediction tasks requires relevant conditions and effective utilization to direct the forecasting process of the diffusion model. In this paper, we propose a probabilistic spatiotemporal model for precipitation nowcasting, named LLMDiff. The architecture of LLMDiff includes two networks: a conditional encoder-decoder network and a denoising network. The conditional network provides conditional information to guide the denoising network for high-quality predictions related to real-world earth systems. Additionally, we utilize a frozen transformer block from pre-trained large language models (LLMs) in the denoising network as a universal visual encoder layer, which enables the accurate estimation of motion trend by considering long-term temporal context information and capturing temporal dependencies within the frame sequence. Our experimental results demonstrate that LLMDiff outperforms state-of-the-art models on the SEVIR dataset.

13.

eHealth Assistant AI Chatbot Using a Large Language Model to Provide Personalized Answers through Secure Decentralized Communication.

Pap, Iuliu Alexandru; Oniga, Stefan.

Sensors (Basel) ; 24(18)2024 Sep 23.

Artigo em Inglês | MEDLINE | ID: mdl-39338885

RESUMO

In this paper, we present the implementation of an artificial intelligence health assistant designed to complement a previously built eHealth data acquisition system for helping both patients and medical staff. The assistant allows users to query medical information in a smarter, more natural way, respecting patient privacy and using secure communications through a chat style interface based on the Matrix decentralized open protocol. Assistant responses are constructed locally by an interchangeable large language model (LLM) that can form rich and complete answers like most human medical staff would. Restricted access to patient information and other related resources is provided to the LLM through various methods for it to be able to respond correctly based on specific patient data. The Matrix protocol allows deployments to be run in an open federation; hence, the system can be easily scaled.

Assuntos

Inteligência Artificial , Telemedicina , Humanos , Segurança Computacional , Comunicação , Idioma

14.

FuseLinker: Leveraging LLM's pre-trained text embeddings and domain knowledge to enhance GNN-based link prediction on biomedical knowledge graphs.

Xiao, Yongkang; Zhang, Sinian; Zhou, Huixue; Li, Mingchen; Yang, Han; Zhang, Rui.

J Biomed Inform ; 158: 104730, 2024 Sep 24.

Artigo em Inglês | MEDLINE | ID: mdl-39326691

RESUMO

OBJECTIVE: To develop the FuseLinker, a novel link prediction framework for biomedical knowledge graphs (BKGs), which fully exploits the graph's structural, textual and domain knowledge information. We evaluated the utility of FuseLinker in the graph-based drug repurposing task through detailed case studies. METHODS: FuseLinker leverages fused pre-trained text embedding and domain knowledge embedding to enhance the graph neural network (GNN)-based link prediction model tailored for BKGs. This framework includes three parts: a) obtain text embeddings for BKGs using embedding-visible large language models (LLMs), b) learn the representations of medical ontology as domain knowledge information by employing the Poincaré graph embedding method, and c) fuse these embeddings and further learn the graph structure representations of BKGs by applying a GNN-based link prediction model. We evaluated FuseLinker against traditional knowledge graph embedding models and a conventional GNN-based link prediction model across four public BKG datasets. Additionally, we examined the impact of using different embedding-visible LLMs on FuseLinker's performance. Finally, we investigated FuseLinker's ability to generate medical hypotheses through two drug repurposing case studies for Sorafenib and Parkinson's disease. RESULTS: By comparing FuseLinker with baseline models on four BKGs, our method demonstrates superior performance. The Mean Reciprocal Rank (MRR) and Area Under receiver operating characteristic Curve (AUROC) for KEGG50k, Hetionet, SuppKG and ADInt are 0.969 and 0.987, 0.548 and 0.903, 0.739 and 0.928, and 0.831 and 0.890, respectively. CONCLUSION: Our study demonstrates that FuseLinker is an effective novel link prediction framework that integrates multiple graph information and shows significant potential for practical applications in biomedical and clinical tasks. Source code and data are available at https://github.com/YKXia0/FuseLinker.

15.

Assessing the utility of large language models for phenotype-driven gene prioritization in the diagnosis of rare genetic disease.

Kim, Junyoung; Wang, Kai; Weng, Chunhua; Liu, Cong.

Am J Hum Genet ; 2024 Sep 04.

Artigo em Inglês | MEDLINE | ID: mdl-39255797

RESUMO

Phenotype-driven gene prioritization is fundamental to diagnosing rare genetic disorders. While traditional approaches rely on curated knowledge graphs with phenotype-gene relations, recent advancements in large language models (LLMs) promise a streamlined text-to-gene solution. In this study, we evaluated five LLMs, including two generative pre-trained transformers (GPT) series and three Llama2 series, assessing their performance across task completeness, gene prediction accuracy, and adherence to required output structures. We conducted experiments, exploring various combinations of models, prompts, phenotypic input types, and task difficulty levels. Our findings revealed that the best-performed LLM, GPT-4, achieved an average accuracy of 17.0% in identifying diagnosed genes within the top 50 predictions, which still falls behind traditional tools. However, accuracy increased with the model size. Consistent results were observed over time, as shown in the dataset curated after 2023. Advanced techniques such as retrieval-augmented generation (RAG) and few-shot learning did not improve the accuracy. Sophisticated prompts were more likely to enhance task completeness, especially in smaller models. Conversely, complicated prompts tended to decrease output structure compliance rate. LLMs also achieved better-than-random prediction accuracy with free-text input, though performance was slightly lower than with standardized concept input. Bias analysis showed that highly cited genes, such as BRCA1, TP53, and PTEN, are more likely to be predicted. Our study provides valuable insights into integrating LLMs with genomic analysis, contributing to the ongoing discussion on their utilization in clinical workflows.

16.

Semantic search using protein large language models detects class II microcins in bacterial genomes.

Kulikova, Anastasiya V; Parker, Jennifer K; Davies, Bryan W; Wilke, Claus O.

mSystems ; : e0104424, 2024 Sep 18.

Artigo em Inglês | MEDLINE | ID: mdl-39291976

RESUMO

Class II microcins are antimicrobial peptides that have shown some potential as novel antibiotics. However, to date, only 10 class II microcins have been described, and the discovery of novel microcins has been hampered by their short length and high sequence divergence. Here, we ask if we can use numerical embeddings generated by protein large language models to detect microcins in bacterial genome assemblies and whether this method can outperform sequence-based methods such as BLAST. We find that embeddings detect known class II microcins much more reliably than does BLAST and that any two microcins tend to have a small distance in embedding space even though they typically are highly diverged at the sequence level. In data sets of Escherichia coli, Klebsiella spp., and Enterobacter spp. genomes, we further find novel putative microcins that were previously missed by sequence-based search methods. IMPORTANCE: Antibiotic resistance is becoming an increasingly serious problem in modern medicine, but the development pipeline for conventional antibiotics is not promising. Therefore, alternative approaches to combat bacterial infections are urgently needed. One such approach may be to employ naturally occurring antibacterial peptides produced by bacteria to kill competing bacteria. A promising class of such peptides are class II microcins. However, only a small number of class II microcins have been discovered to date, and the discovery of further such microcins has been hampered by their high sequence divergence and short length, which can cause sequence-based search methods to fail. Here, we demonstrate that a more robust method for microcin discovery can be built on the basis of a protein large language model, and we use this method to identify several putative novel class II microcins.

17.

Extracting seizure control metrics from clinic notes of patients with epilepsy: A natural language processing approach.

Fernandes, Marta; Cardall, Aidan; Moura, Lidia Mvr; McGraw, Christopher; Zafar, Sahar F; Westover, M Brandon.

Epilepsy Res ; 207: 107451, 2024 Sep 10.

Artigo em Inglês | MEDLINE | ID: mdl-39276641

RESUMO

OBJECTIVES: Monitoring seizure control metrics is key to clinical care of patients with epilepsy. Manually abstracting these metrics from unstructured text in electronic health records (EHR) is laborious. We aimed to abstract the date of last seizure and seizure frequency from clinical notes of patients with epilepsy using natural language processing (NLP). METHODS: We extracted seizure control metrics from notes of patients seen in epilepsy clinics from two hospitals in Boston. Extraction was performed with the pretrained model RoBERTa_for_seizureFrequency_QA, for both date of last seizure and seizure frequency, combined with regular expressions. We designed the algorithm to categorize the timing of last seizure ("today", "1-6â¯days ago", "1-4 weeks ago", "more than 1-3â¯months ago", "more than 3-6â¯months ago", "more than 6-12â¯months ago", "more than 1-2â¯years ago", "more than 2 years ago") and seizure frequency ("innumerable", "multiple", "daily", "weekly", "monthly", "once per year", "less than once per year"). Our ground truth consisted of structured questionnaires filled out by physicians. Model performance was measured using the areas under the receiving operating characteristic curve (AUROC) and precision recall curve (AUPRC) for categorical labels, and median absolute error (MAE) for ordinal labels, with 95â¯% confidence intervals (CI) estimated via bootstrapping. RESULTS: Our cohort included 1773 adult patients with a total of 5658 visits with reported seizure control metrics, seen in epilepsy clinics between December 2018 and May 2022. The cohort average age was 42 years old, the majority were female (57â¯%), White (81â¯%) and non-Hispanic (85â¯%). The models achieved an MAE (95â¯% CI) for date of last seizure of 4 (4.00-4.86) weeks, and for seizure frequency of 0.02 (0.02-0.02) seizures per day. CONCLUSIONS: Our NLP approach demonstrates that the extraction of seizure control metrics from EHR is feasible allowing for large-scale EHR research.

18.

Large language models and artificial intelligence chatbots in vascular surgery.

Lareyre, Fabien; Nasr, Bahaa; Poggi, Elise; Lorenzo, Gilles Di; Ballaith, Ali; Sliti, Imen; Chaudhuri, Arindam; Raffort, Juliette.

Semin Vasc Surg ; 37(3): 314-320, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-39277347

RESUMO

Natural language processing is a subfield of artificial intelligence that aims to analyze human oral or written language. The development of large language models has brought innovative perspectives in medicine, including the potential use of chatbots and virtual assistants. Nevertheless, the benefits and pitfalls of such technology need to be carefully evaluated before their use in health care. The aim of this narrative review was to provide an overview of potential applications of large language models and artificial intelligence chatbots in the field of vascular surgery, including clinical practice, research, and education. In light of the results, we discuss current limits and future directions.

Assuntos

Inteligência Artificial , Processamento de Linguagem Natural , Procedimentos Cirúrgicos Vasculares , Humanos

19.

Clustering of clinical symptoms using large language models reveals low diagnostic specificity of proposed alternatives to consensus mast cell activation syndrome criteria.

Solomon, Benjamin D; Khatri, Purvesh.

J Allergy Clin Immunol ; 2024 Sep 13.

Artigo em Inglês | MEDLINE | ID: mdl-39278360

RESUMO

BACKGROUND: The rate of diagnosis of mast cell activation syndrome (MCAS) has increased since the disorder's original description as a mastocytosis-like phenotype. While a set of consortium MCAS criteria is well described and widely accepted, this increase occurs in the setting of a broader set of proposed alternative MCAS criteria. OBJECTIVE: Effective diagnostic criteria must minimize the range of unrelated diagnoses that can be erroneously classified as the condition of interest. We sought to determine if the symptoms associated with alternative MCAS criteria result in less concise or consistent diagnostic alternatives, reducing diagnostic specificity. METHODS: We used multiple large language models, including ChatGPT, Claude, and Gemini, to bootstrap the probabilities of diagnoses that are compatible with consortium or alternative MCAS criteria. We utilized diversity and network analysis to quantify diagnostic precision and specificity compared to control diagnostic criteria including systemic lupus erythematosus (SLE), Kawasaki disease, and migraines. RESULTS: Compared to consortium MCAS criteria, alternative MCAS criteria are associated with more variable (Shannon diversity 5.8 vs. 4.6, respectively; p-value=0.004) and less precise (mean Bray-Curtis similarity 0.07 vs 0.19, respectively; p-value=0.004) diagnoses. The diagnosis networks derived from consortium and alternative MCAS criteria had lower between-network similarity compared to the similarity between diagnosis networks derived from two distinct SLE criteria (cosine similarity 0.55 vs. 0.86, respectively; p-value=0.0022). CONCLUSION: Alternative MCAS criteria are associated with a distinct set of diagnoses compared to consortium MCAS criteria and have lower diagnostic consistency. This lack of specificity is pronounced in relation to multiple control criteria, raising the concern that alternative criteria could disproportionately contribute to MCAS overdiagnosis, to the exclusion of more appropriate diagnoses.

20.

SciScribe: Automating & Contextualizing Literature Reviews in Cardiac Surgery.

Mahboubi, Rashed; Dinkla, Kasper; Weiss, Aaron; Acierto, Anthony; Staar, Peter; Robinson, Justin; Salim Hammoud, Miza; Karamlou, Tara.

J Thorac Cardiovasc Surg ; 2024 Sep 13.

Artigo em Inglês | MEDLINE | ID: mdl-39278616

RESUMO

OBJECTIVES: The task of writing structured content reviews and guidelines has grown stronger and more complex. We propose to go beyond search tools, toward curation tools, by automating time-consuming and repetitive steps of extracting and organizing information. METHODS: SciScribe is built as an extension of IBM's Deep Search platform, which provides document processing and search capabilities. This platform was used to ingest and search full-content publications from PubMed Central (PMC) and official, structured records from the ClinicalTrials and OpenPayments databases. Author names and NCT numbers, mentioned within the publications, were used to link publications to these official records as context. Search strategies involve traditional keyword-based search as well as natural language question and answering via large language models (LLMs). RESULTS: SciScribe is a web-based tool that helps accelerate literature reviews through key features: 1. Accumulate a personal collection from publication sources, such as PMC or other sources; 2. Incorporate contextual information from external databases into the presented papers, promoting a more informed assessment by readers. 3. Semantic question and answering of a document to quickly assess relevance and hierarchical organization. 4. Semantic question answering for each document within a collection, collated into tables. CONCLUSIONS: Emergent language processing techniques open new avenues to accelerate and enhance the literature review process, for which we have demonstrated a use case implementation within cardiac surgery. SciScribe automates and accelerates this process, mitigates errors associated with repetition and fatigue, as well as contextualizes results by linking relevant external data sources, instantaneously.

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA