Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 4.636
Filtrar
Más filtros

Intervalo de año de publicación
1.
JMIR AI ; 3: e49546, 2024 Oct 02.
Artículo en Inglés | MEDLINE | ID: mdl-39357045

RESUMEN

BACKGROUND: Women have been underrepresented in clinical trials for many years. Machine-learning models trained on clinical trial abstracts may capture and amplify biases in the data. Specifically, word embeddings are models that enable representing words as vectors and are the building block of most natural language processing systems. If word embeddings are trained on clinical trial abstracts, predictive models that use the embeddings will exhibit gender performance gaps. OBJECTIVE: We aim to capture temporal trends in clinical trials through temporal distribution matching on contextual word embeddings (specifically, BERT) and explore its effect on the bias manifested in downstream tasks. METHODS: We present TeDi-BERT, a method to harness the temporal trend of increasing women's inclusion in clinical trials to train contextual word embeddings. We implement temporal distribution matching through an adversarial classifier, trying to distinguish old from new clinical trial abstracts based on their embeddings. The temporal distribution matching acts as a form of domain adaptation from older to more recent clinical trials. We evaluate our model on 2 clinical tasks: prediction of unplanned readmission to the intensive care unit and hospital length of stay prediction. We also conduct an algorithmic analysis of the proposed method. RESULTS: In readmission prediction, TeDi-BERT achieved area under the receiver operating characteristic curve of 0.64 for female patients versus the baseline of 0.62 (P<.001), and 0.66 for male patients versus the baseline of 0.64 (P<.001). In the length of stay regression, TeDi-BERT achieved a mean absolute error of 4.56 (95% CI 4.44-4.68) for female patients versus 4.62 (95% CI 4.50-4.74, P<.001) and 4.54 (95% CI 4.44-4.65) for male patients versus 4.6 (95% CI 4.50-4.71, P<.001). CONCLUSIONS: In both clinical tasks, TeDi-BERT improved performance for female patients, as expected; but it also improved performance for male patients. Our results show that accuracy for one gender does not need to be exchanged for bias reduction, but rather that good science improves clinical results for all. Contextual word embedding models trained to capture temporal trends can help mitigate the effects of bias that changes over time in the training data.

2.
JMIR Med Inform ; 12: e63010, 2024 Oct 02.
Artículo en Inglés | MEDLINE | ID: mdl-39357052

RESUMEN

BACKGROUND: Generative artificial intelligence (GAI) systems by Google have recently been updated from Bard to Gemini and Gemini Advanced as of December 2023. Gemini is a basic, free-to-use model after a user's login, while Gemini Advanced operates on a more advanced model requiring a fee-based subscription. These systems have the potential to enhance medical diagnostics. However, the impact of these updates on comprehensive diagnostic accuracy remains unknown. OBJECTIVE: This study aimed to compare the accuracy of the differential diagnosis lists generated by Gemini Advanced, Gemini, and Bard across comprehensive medical fields using case report series. METHODS: We identified a case report series with relevant final diagnoses published in the American Journal Case Reports from January 2022 to March 2023. After excluding nondiagnostic cases and patients aged 10 years and younger, we included the remaining case reports. After refining the case parts as case descriptions, we input the same case descriptions into Gemini Advanced, Gemini, and Bard to generate the top 10 differential diagnosis lists. In total, 2 expert physicians independently evaluated whether the final diagnosis was included in the lists and its ranking. Any discrepancies were resolved by another expert physician. Bonferroni correction was applied to adjust the P values for the number of comparisons among 3 GAI systems, setting the corrected significance level at P value <.02. RESULTS: In total, 392 case reports were included. The inclusion rates of the final diagnosis within the top 10 differential diagnosis lists were 73% (286/392) for Gemini Advanced, 76.5% (300/392) for Gemini, and 68.6% (269/392) for Bard. The top diagnoses matched the final diagnoses in 31.6% (124/392) for Gemini Advanced, 42.6% (167/392) for Gemini, and 31.4% (123/392) for Bard. Gemini demonstrated higher diagnostic accuracy than Bard both within the top 10 differential diagnosis lists (P=.02) and as the top diagnosis (P=.001). In addition, Gemini Advanced achieved significantly lower accuracy than Gemini in identifying the most probable diagnosis (P=.002). CONCLUSIONS: The results of this study suggest that Gemini outperformed Bard in diagnostic accuracy following the model update. However, Gemini Advanced requires further refinement to optimize its performance for future artificial intelligence-enhanced diagnostics. These findings should be interpreted cautiously and considered primarily for research purposes, as these GAI systems have not been adjusted for medical diagnostics nor approved for clinical use.


Asunto(s)
Inteligencia Artificial , Humanos , Diagnóstico Diferencial , Estudios Transversales
3.
J Med Internet Res ; 26: e60601, 2024 Oct 03.
Artículo en Inglés | MEDLINE | ID: mdl-39361955

RESUMEN

BACKGROUND: Medical texts present significant domain-specific challenges, and manually curating these texts is a time-consuming and labor-intensive process. To address this, natural language processing (NLP) algorithms have been developed to automate text processing. In the biomedical field, various toolkits for text processing exist, which have greatly improved the efficiency of handling unstructured text. However, these existing toolkits tend to emphasize different perspectives, and none of them offer generation capabilities, leaving a significant gap in the current offerings. OBJECTIVE: This study aims to describe the development and preliminary evaluation of Ascle. Ascle is tailored for biomedical researchers and clinical staff with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle provides 4 advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. METHODS: We fine-tuned 32 domain-specific language models and evaluated them thoroughly on 27 established benchmarks. In addition, for the question-answering task, we developed a retrieval-augmented generation (RAG) framework for large language models that incorporated a medical knowledge graph with ranking techniques to enhance the reliability of generated answers. Additionally, we conducted a physician validation to assess the quality of generated content beyond automated metrics. RESULTS: The fine-tuned models and RAG framework consistently enhanced text generation tasks. For example, the fine-tuned models improved the machine translation task by 20.27 in terms of BLEU score. In the question-answering task, the RAG framework raised the ROUGE-L score by 18% over the vanilla models. Physician validation of generated answers showed high scores for readability (4.95/5) and relevancy (4.43/5), with a lower score for accuracy (3.90/5) and completeness (3.31/5). CONCLUSIONS: This study introduces the development and evaluation of Ascle, a user-friendly NLP toolkit designed for medical text generation. All code is publicly available through the Ascle GitHub repository. All fine-tuned language models can be accessed through Hugging Face.


Asunto(s)
Procesamiento de Lenguaje Natural , Humanos , Algoritmos , Programas Informáticos
4.
Comput Biol Med ; 182: 109233, 2024 Oct 02.
Artículo en Inglés | MEDLINE | ID: mdl-39362002

RESUMEN

BACKGROUND: Patient medical information often exists in unstructured text containing abbreviations and acronyms deemed essential to conserve time and space but posing challenges for automated interpretation. Leveraging the efficacy of Transformers in natural language processing, our objective was to use the knowledge acquired by a language model and continue its pre-training to develop an European Portuguese (PT-PT) healthcare-domain language model. METHODS: After carrying out a filtering process, Albertina PT-PT 900M was selected as our base language model, and we continued its pre-training using more than 2.6 million electronic medical records from Portugal's largest public hospital. MediAlbertina 900M has been created through domain adaptation on this data using masked language modelling. RESULTS: The comparison with our baseline was made through the usage of both perplexity, which decreased from about 20 to 1.6 values, and the fine-tuning and evaluation of information extraction models such as Named Entity Recognition and Assertion Status. MediAlbertina PT-PT outperformed Albertina PT-PT in both tasks by 4-6% on recall and f1-score. CONCLUSIONS: This study contributes with the first publicly available medical language model trained with PT-PT data. It underscores the efficacy of domain adaptation and offers a contribution to the scientific community in overcoming obstacles of non-English languages. With MediAlbertina, further steps can be taken to assist physicians, in creating decision support systems or building medical timelines in order to perform profiling, by fine-tuning MediAlbertina for PT- PT medical tasks.

5.
Front Artif Intell ; 7: 1393903, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39351510

RESUMEN

Introduction: Recent advances in generative Artificial Intelligence (AI) and Natural Language Processing (NLP) have led to the development of Large Language Models (LLMs) and AI-powered chatbots like ChatGPT, which have numerous practical applications. Notably, these models assist programmers with coding queries, debugging, solution suggestions, and providing guidance on software development tasks. Despite known issues with the accuracy of ChatGPT's responses, its comprehensive and articulate language continues to attract frequent use. This indicates potential for ChatGPT to support educators and serve as a virtual tutor for students. Methods: To explore this potential, we conducted a comprehensive analysis comparing the emotional content in responses from ChatGPT and human answers to 2000 questions sourced from Stack Overflow (SO). The emotional aspects of the answers were examined to understand how the emotional tone of AI responses compares to that of human responses. Results: Our analysis revealed that ChatGPT's answers are generally more positive compared to human responses. In contrast, human answers often exhibit emotions such as anger and disgust. Significant differences were observed in emotional expressions between ChatGPT and human responses, particularly in the emotions of anger, disgust, and joy. Human responses displayed a broader emotional spectrum compared to ChatGPT, suggesting greater emotional variability among humans. Discussion: The findings highlight a distinct emotional divergence between ChatGPT and human responses, with ChatGPT exhibiting a more uniformly positive tone and humans displaying a wider range of emotions. This variance underscores the need for further research into the role of emotional content in AI and human interactions, particularly in educational contexts where emotional nuances can impact learning and communication.

6.
J Educ Perioper Med ; 26(3): E729, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39354917

RESUMEN

Background: Natural language processing is a collection of techniques designed to empower computer systems to comprehend and/or produce human language. The purpose of this investigation was to train several large language models (LLMs) to explore the tradeoff between model complexity and performance while classifying narrative feedback on trainees into the Accreditation Council for Graduate Medical Education subcompetencies. We hypothesized that classification accuracy would increase with model complexity. Methods: The authors fine-tuned several transformer-based LLMs (Bidirectional Encoder Representations from Transformers [BERT]-base, BERT-medium, BERT-small, BERT-mini, BERT-tiny, and SciBERT) to predict Accreditation Council for Graduate Medical Education subcompetencies on a curated dataset of 10 218 feedback comments. Performance was compared with the authors' previous work, which trained a FastText model on the same dataset. Performance metrics included F1 score for global model performance and area under the receiver operating characteristic curve for each competency. Results: No models were superior to FastText. Only BERT-tiny performed worse than FastText. The smallest model with comparable performance to FastText, BERT-mini, was 94% smaller. Area under the receiver operating characteristic curve for each competency was similar on BERT-mini and FastText with the exceptions of Patient Care 7 (Situational Awareness and Crisis Management) and Systems-Based Practice. Discussion: Transformer-based LLMs were fine-tuned to understand anesthesiology graduate medical education language. Complex LLMs did not outperform FastText. However, equivalent performance was achieved with a model that was 94% smaller, which may allow model deployment on personal devices to enhance speed and data privacy. This work advances our understanding of best practices when integrating LLMs into graduate medical education.

7.
Front Artif Intell ; 7: 1408817, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39359648

RESUMEN

Large language models have been shown to excel in many different tasks across disciplines and research sites. They provide novel opportunities to enhance educational research and instruction in different ways such as assessment. However, these methods have also been shown to have fundamental limitations. These relate, among others, to hallucinating knowledge, explainability of model decisions, and resource expenditure. As such, more conventional machine learning algorithms might be more convenient for specific research problems because they allow researchers more control over their research. Yet, the circumstances in which either conventional machine learning or large language models are preferable choices are not well understood. This study seeks to answer the question to what extent either conventional machine learning algorithms or a recently advanced large language model performs better in assessing students' concept use in a physics problem-solving task. We found that conventional machine learning algorithms in combination outperformed the large language model. Model decisions were then analyzed via closer examination of the models' classifications. We conclude that in specific contexts, conventional machine learning can supplement large language models, especially when labeled data is available.

8.
3D Print Addit Manuf ; 11(4): 1495-1509, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39360130

RESUMEN

Bioprinting is a rapidly evolving field, as represented by the exponential growth of articles and reviews published each year on the topic. As the number of publications increases, there is a need for an automatic tool that can help researchers do more comprehensive literature analysis, standardize the nomenclature, and so accelerate the development of novel manufacturing techniques and materials for the field. In this context, we propose an automatic keyword annotation model, based on Natural Language Processing (NLP) techniques, that can be used to find insights in the bioprinting scientific literature. The approach is based on two main data sources, the abstracts and related author keywords, which are used to train a composite model based on (i) an embeddings part (using the FastText algorithm), which generates word vectors for an input keyword, and (ii) a classifier part (using the Support Vector Machine algorithm), to label the keyword based on its word vector into a manufacturing technique, employed material, or application of the bioprinted product. The composite model was trained and optimized based on a two-stage optimization procedure to yield the best classification performance. The annotated author keywords were then reprojected on the abstract collection to both generate a lexicon of the bioprinting field and extract relevant information, like technology trends and the relationship between manufacturing-material-application. The proposed approach can serve as a basis for more complex NLP-related analysis toward the automated analysis of the bioprinting literature.

9.
JMIR Form Res ; 8: e50141, 2024 Oct 10.
Artículo en Inglés | MEDLINE | ID: mdl-39388695

RESUMEN

BACKGROUND: Varicoceles affect up to 30% of postpubertal adolescent males. Studying this population remains difficult due to this topic's sensitive nature. Using the popularity of social media in this cohort and natural language processing (NLP) techniques, our aim was to identify perceptions of adolescent males on an internet varicocele forum to inform how physicians may better evaluate and counsel this pediatric population. OBJECTIVE: We aimed to characterize themes of discussion and specific concerns expressed by adolescents using a mixed methods approach involving quantitative NLP and qualitative annotation of an online varicocele community. METHODS: We extracted posts from the Reddit community "r/varicocele" (5100 members) with criteria of discussant age ≤21 years and word count >20. We used qualitative thematic analysis and the validated constant comparative method, as well as an NLP technique called the meaning extraction method with principal component analysis (MEM/PCA), to identify discussion themes. Two investigators independently interrogated 150 randomly selected posts to further characterize content based on NLP-identified themes and calculated the Kaiser-Meyer-Olkin (KMO) statistic and the Bartlett test. Both quantitative and qualitative approaches were then compared to identify key themes of discussion. RESULTS: A total of 1103 posts met eligibility criteria from July 2015 to June 2022. Among the 150 randomly selected posts, MEM/PCA and qualitative thematic analysis separately revealed key themes: an overview of varicocele (40/150, 27%), management (29/150, 19%), postprocedural experience (28/150, 19%), seeking community (26/150, 17%) and second opinions after visiting a physician (27/150, 18%). Quantitative analysis also identified "hypogonadism" and "semen analysis" as concerns when discussing their condition. The KMO statistic was >0.60 and the Bartlett test was <0.01, indicating the appropriateness of MEM/PCA. The mean age was 17.5 (SD 2.2; range 14-21) years, and there were trends toward higher-grade (40/45, 89% had a grade of ≥2) and left-sided varicoceles. Urologists were the topic of over 50% (53/82) of discussions among discussants, and varicocelectomy remained the intervention receiving the most interest. A total of 60% (90/150) of discussants described symptomatic varicoceles, with 62 of 90 reporting pain, 24 of 90 reporting hypogonadism symptoms, and 45 of 90 reporting aesthetics as the primary concern. CONCLUSIONS: We applied a mixed methods approach to identify uncensored concerns of adolescents with varicoceles. Both qualitative and quantitative approaches identified that adolescents often turned to social media as an adjunct to doctors' visits and to seek peer support. This population prioritized symptom control, with an emphasis on pain, aesthetics, sexual function, and hypogonadism. These data highlight how each adolescent may approach varicoceles uniquely, informing urologists how to better interface with this pediatric population. Additionally, these data may highlight the key drivers of decision-making when electing for procedural management of varicoceles.


Asunto(s)
Varicocele , Varicocele/cirugía , Humanos , Masculino , Adolescente , Investigación Cualitativa , Medios de Comunicación Sociales , Procesamiento de Lenguaje Natural , Adulto Joven , Internet
10.
Cureus ; 16(9): e69030, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39391440

RESUMEN

This study analyses the topic of stress and anxiety in 3,765 Reddit posts to determine key themes and emotional undertones using natural language processing (NLP) techniques. Five major category topics are identified from the posts using the latent Dirichlet allocation (LDA) algorithm. The topics identified are general discontent and lack of direction; panic and anxiety attacks; physical symptoms of anxiety, stress, and mental health concerns; and seeking help for anxiety. Sentiment analysis with the help of TextBlob showed a neutral score, for the most part: an average polarity score of 0.009 and a subjectivity score of 0.494. Several kinds of visualizations, including word clouds, bar charts, and pie charts, have been used to show the distribution and importance of these topics. These findings underscore the important role played by online communities in extending their support to those in distress because of mental health problems. This information is very important to mental health professionals and researchers. This study shows the effectiveness of using a combination of topic modeling and sentiment analysis to identify problems related to mental health discussed on social media. These results direct the possibilities for future research in using advanced NLP techniques and expanding to larger datasets.

11.
J Med Internet Res ; 26: e52142, 2024 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-39393064

RESUMEN

BACKGROUND: Obesity is a chronic, multifactorial, and relapsing disease, affecting people of all ages worldwide, and is directly related to multiple complications. Understanding public attitudes and perceptions toward obesity is essential for developing effective health policies, prevention strategies, and treatment approaches. OBJECTIVE: This study investigated the sentiments of the general public, celebrities, and important organizations regarding obesity using social media data, specifically from Twitter (subsequently rebranded as X). METHODS: The study analyzes a dataset of 53,414 tweets related to obesity posted on Twitter during the COVID-19 pandemic, from April 2019 to December 2022. Sentiment analysis was performed using the XLM-RoBERTa-base model, and topic modeling was conducted using the BERTopic library. RESULTS: The analysis revealed that tweets regarding obesity were predominantly negative. Spikes in Twitter activity correlated with significant political events, such as the exchange of obesity-related comments between US politicians and criticism of the United Kingdom's obesity campaign. Topic modeling identified 243 clusters representing various obesity-related topics, such as childhood obesity; the US President's obesity struggle; COVID-19 vaccinations; the UK government's obesity campaign; body shaming; racism and high obesity rates among Black American people; smoking, substance abuse, and alcohol consumption among people with obesity; environmental risk factors; and surgical treatments. CONCLUSIONS: Twitter serves as a valuable source for understanding obesity-related sentiments and attitudes among the public, celebrities, and influential organizations. Sentiments regarding obesity were predominantly negative. Negative portrayals of obesity by influential politicians and celebrities were shown to contribute to negative public sentiments, which can have adverse effects on public health. It is essential for public figures to be mindful of their impact on public opinion and the potential consequences of their statements.


Asunto(s)
COVID-19 , Obesidad , Opinión Pública , Medios de Comunicación Sociales , Humanos , COVID-19/psicología , COVID-19/epidemiología , COVID-19/prevención & control , Obesidad/psicología , Obesidad/epidemiología , Estudios Transversales , Emociones , Pandemias , Reino Unido , Estados Unidos , SARS-CoV-2
12.
J Biomed Inform ; : 104735, 2024 Oct 09.
Artículo en Inglés | MEDLINE | ID: mdl-39393477

RESUMEN

OBJECTIVE: Medical laboratory testing is essential in healthcare, providing crucial data for diagnosis and treatment. Nevertheless, patients' lab testing results are often transferred via fax across healthcare organizations and are not immediately available for timely clinical decision making. Thus, it is important to develop new technologies to accurately extract lab testing information from scanned laboratory reports. This study aims to develop an advanced deep learning-based Optical Character Recognition (OCR) method to identify tables containing lab testing results in scanned laboratory reports. METHODS: Extracting tabular data from scanned lab reports involves two stages: table detection (i.e., identifying the area of a table object) and table recognition (i.e., identifying and extracting tabular structures and contents). DETR R18 algorithm as well as YOLOv8s were involved for table detection, and we compared the performance of PaddleOCR and the encoder-dual-decoder (EDD) model for table recognition. 650 tables from 632 randomly selected laboratory test reports were annotated and used to train and evaluate those models. For table detection evaluation, we used metrics such as Average Precision (AP), Average Recall (AR), AP50, and AP75. For table recognition evaluation, we employed Tree-Edit Distance (TEDS). RESULTS: For table detection, fine-tuned DETR R18 demonstrated superior performance (AP50: 0.774; AP75: 0.644; AP: 0.601; AR: 0.766). In terms of table recognition, fine-tuned EDD outperformed other models with a TEDS score of 0.815. The proposed OCR pipeline (fine-tuned DETR R18 and fine-tuned EDD), demonstrated impressive results, achieving a TEDS score of 0.699 and a TEDS structure score of 0.764. CONCLUSIONS: Our study presents a dedicated OCR pipeline for scanned clinical documents, utilizing state-of-the-art deep learning models for region-of-interest detection and table recognition. The high TEDS scores demonstrate the effectiveness of our approach, which has significant implications for clinical data analysis and decision-making.

13.
J Psychiatr Res ; 179: 322-329, 2024 Sep 24.
Artículo en Inglés | MEDLINE | ID: mdl-39353293

RESUMEN

Suicide is a leading cause of death. Suicide rates are particularly elevated among Department of Veterans Affairs (VA) patients. While VA has made impactful suicide prevention advances, efforts primarily target high-risk patients with documented suicide risk. This high-risk population accounts for less than 10% of VA patient suicide deaths. We previously evaluated epidemiological patterns among VA patients that had lower classified suicide risk and derived moderate- and low-risk groupings. Expanding upon VA's leading suicide prediction model, this study uses national VA data to refine high-, moderate-, and low-risk specific suicide prediction methods. We selected all VA patients who died by suicide in 2017 or 2018 (n = 4584), matching each case with five controls who remained alive during treatment year and shared suicide risk percentiles. We extracted all sample unstructured electronic health record notes, analyzed them using natural language processing, and applied machine-learning classification algorithms to develop risk-tier-specific predictive models. We calculated area under the curve (AUC) and suicide risk concentration to evaluate predictive accuracy and analyzed derived words. RESULTS: Our high-risk model (AUC = 0.621 (95% CI: 0.55-0.68)), moderate-risk (AUC = 0.669 (95% CI: 0.64-0.71)), and low-risk (AUC = 0.673 (95% CI: 0.63-0.72)) models offered significant predictive accuracy over VA's leading suicide prediction algorithm. Derived words varied considerably, the high-risk model including chronic condition service words, moderate-risk model including outpatient care, and low-risk model including acute condition care. Study suggests benefit of leveraging unstructured electronic health records and expands prediction resources for non-high-risk suicide decedents, an historically underserved population.

14.
Indian J Otolaryngol Head Neck Surg ; 76(5): 4986-4996, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39376323

RESUMEN

This systematic literature review aims to study the role and impact of artificial intelligence (AI) in transforming Ear, Nose, and Throat (ENT) healthcare. It aims to compare and analyse literature that applied AI algorithms for ENT disease prediction and detection based on their effectiveness, methods, dataset, and performance. We have also discussed ENT specialists' challenges and AI's role in solving them. This review also discusses the challenges faced by AI researchers. This systematic review was completed using PRISMA guidelines. Data was extracted from several reputable digital databases, including PubMed, Medline, SpringerLink, Elsevier, Google Scholar, ScienceDirect, and IEEExplore. The search criteria included studies recently published between 2018 and 2024 related to the application of AI for ENT healthcare. After removing duplicate studies and quality assessments, we reviewed eligible articles and responded to the research questions. This review aims to provide a comprehensive overview of the current state of AI applications in ENT healthcare. Among the 3257 unique studies, 27 were selected as primary studies. About 62.5% of the included studies were effective in providing disease predictions. We found that Pretrained DL models are more in application than CNN algorithms when employed for ENT disease predictions. The accuracy of models ranged between 75 and 97%. We also observed the effectiveness of conversational AI models such as ChatGPT in the ENT discipline. The research in AI for ENT is advancing rapidly. Most of the models have achieved accuracy above 90%. However, the lack of good-quality data and data variability limits the overall ability of AI models to perform better for ENT disease prediction. Further research needs to be conducted while considering factors such as external validation and the issue of class imbalance.

15.
BMC Med Inform Decis Mak ; 24(1): 296, 2024 Oct 10.
Artículo en Inglés | MEDLINE | ID: mdl-39390479

RESUMEN

BACKGROUND: Social and behavioral determinants of health (SBDH) are associated with a variety of health and utilization outcomes, yet these factors are not routinely documented in the structured fields of electronic health records (EHR). The objective of this study was to evaluate different machine learning approaches for detection of SBDH from the unstructured clinical notes in the EHR. METHODS: Latent Semantic Indexing (LSI) was applied to 2,083,180 clinical notes corresponding to 46,146 patients in the MIMIC-III dataset. Using LSI, patients were ranked based on conceptual relevance to a set of keywords (lexicons) pertaining to 15 different SBDH categories. For Generative Pretrained Transformer (GPT) models, API requests were made with a Python script to connect to the OpenAI services in Azure, using gpt-3.5-turbo-1106 and gpt-4-1106-preview models. Prediction of SBDH categories were performed using a logistic regression model that included age, gender, race and SBDH ICD-9 codes. RESULTS: LSI retrieved patients according to 15 SBDH domains, with an overall average PPV ≥ 83%. Using manually curated gold standard (GS) sets for nine SBDH categories, the macro-F1 score of LSI (0.74) was better than ICD-9 (0.71) and GPT-3.5 (0.54), but lower than GPT-4 (0.80). Due to document size limitations, only a subset of the GS cases could be processed by GPT-3.5 (55.8%) and GPT-4 (94.2%), compared to LSI (100%). Using common GS subsets for nine different SBDH categories, the macro-F1 of ICD-9 combined with either LSI (mean 0.88, 95% CI 0.82-0.93), GPT-3.5 (0.86, 0.82-0.91) or GPT-4 (0.88, 0.83-0.94) was not significantly different. After including age, gender, race and ICD-9 in a logistic regression model, the AUC for prediction of six out of the nine SBDH categories was higher for LSI compared to GPT-4.0. CONCLUSIONS: These results demonstrate that the LSI approach performs comparable to more recent large language models, such as GPT-3.5 and GPT-4.0, when using the same set of documents. Importantly, LSI is robust, deterministic, and does not have document-size limitations or cost implications, which make it more amenable to real-world applications in health systems.


Asunto(s)
Registros Electrónicos de Salud , Semántica , Determinantes Sociales de la Salud , Humanos , Aprendizaje Automático , Masculino , Femenino , Adulto , Persona de Mediana Edad
16.
J Med Internet Res ; 26: e51635, 2024 Oct 04.
Artículo en Inglés | MEDLINE | ID: mdl-39365643

RESUMEN

Hospital pharmacy plays an important role in ensuring medical care quality and safety, especially in the area of drug information retrieval, therapy guidance, and drug-drug interaction management. ChatGPT is a powerful artificial intelligence language model that can generate natural-language texts. Here, we explored the applications and reflections of ChatGPT in hospital pharmacy, where it may enhance the quality and efficiency of pharmaceutical care. We also explored ChatGPT's prospects in hospital pharmacy and discussed its working principle, diverse applications, and practical cases in daily operations and scientific research. Meanwhile, the challenges and limitations of ChatGPT, such as data privacy, ethical issues, bias and discrimination, and human oversight, are discussed. ChatGPT is a promising tool for hospital pharmacy, but it requires careful evaluation and validation before it can be integrated into clinical practice. Some suggestions for future research and development of ChatGPT in hospital pharmacy are provided.


Asunto(s)
Servicio de Farmacia en Hospital , Humanos , Inteligencia Artificial , Procesamiento de Lenguaje Natural
17.
BMC Med Inform Decis Mak ; 24(1): 283, 2024 Oct 03.
Artículo en Inglés | MEDLINE | ID: mdl-39363322

RESUMEN

AIMS: The primary goal of this study is to evaluate the capabilities of Large Language Models (LLMs) in understanding and processing complex medical documentation. We chose to focus on the identification of pathologic complete response (pCR) in narrative pathology reports. This approach aims to contribute to the advancement of comprehensive reporting, health research, and public health surveillance, thereby enhancing patient care and breast cancer management strategies. METHODS: The study utilized two analytical pipelines, developed with open-source LLMs within the healthcare system's computing environment. First, we extracted embeddings from pathology reports using 15 different transformer-based models and then employed logistic regression on these embeddings to classify the presence or absence of pCR. Secondly, we fine-tuned the Generative Pre-trained Transformer-2 (GPT-2) model by attaching a simple feed-forward neural network (FFNN) layer to improve the detection performance of pCR from pathology reports. RESULTS: In a cohort of 351 female breast cancer patients who underwent neoadjuvant chemotherapy (NAC) and subsequent surgery between 2010 and 2017 in Calgary, the optimized method displayed a sensitivity of 95.3% (95%CI: 84.0-100.0%), a positive predictive value of 90.9% (95%CI: 76.5-100.0%), and an F1 score of 93.0% (95%CI: 83.7-100.0%). The results, achieved through diverse LLM integration, surpassed traditional machine learning models, underscoring the potential of LLMs in clinical pathology information extraction. CONCLUSIONS: The study successfully demonstrates the efficacy of LLMs in interpreting and processing digital pathology data, particularly for determining pCR in breast cancer patients post-NAC. The superior performance of LLM-based pipelines over traditional models highlights their significant potential in extracting and analyzing key clinical data from narrative reports. While promising, these findings highlight the need for future external validation to confirm the reliability and broader applicability of these methods.


Asunto(s)
Neoplasias de la Mama , Humanos , Neoplasias de la Mama/patología , Femenino , Persona de Mediana Edad , Redes Neurales de la Computación , Procesamiento de Lenguaje Natural , Adulto , Anciano , Terapia Neoadyuvante , Respuesta Patológica Completa
18.
BMC Med Inform Decis Mak ; 24(1): 289, 2024 Oct 08.
Artículo en Inglés | MEDLINE | ID: mdl-39375687

RESUMEN

PURPOSE: Rare diseases pose significant challenges in diagnosis and treatment due to their low prevalence and heterogeneous clinical presentations. Unstructured clinical notes contain valuable information for identifying rare diseases, but manual curation is time-consuming and prone to subjectivity. This study aims to develop a hybrid approach combining dictionary-based natural language processing (NLP) tools with large language models (LLMs) to improve rare disease identification from unstructured clinical reports. METHODS: We propose a novel hybrid framework that integrates the Orphanet Rare Disease Ontology (ORDO) and the Unified Medical Language System (UMLS) to create a comprehensive rare disease vocabulary. SemEHR, a dictionary-based NLP tool, is employed to extract rare disease mentions from clinical notes. To refine the results and improve accuracy, we leverage various LLMs, including LLaMA3, Phi3-mini, and domain-specific models like OpenBioLLM and BioMistral. Different prompting strategies, such as zero-shot, few-shot, and knowledge-augmented generation, are explored to optimize the LLMs' performance. RESULTS: The proposed hybrid approach demonstrates superior performance compared to traditional NLP systems and standalone LLMs. LLaMA3 and Phi3-mini achieve the highest F1 scores in rare disease identification. Few-shot prompting with 1-3 examples yields the best results, while knowledge-augmented generation shows limited improvement. Notably, the approach uncovers a significant number of potential rare disease cases not documented in structured diagnostic records, highlighting its ability to identify previously unrecognized patients. CONCLUSION: The hybrid approach combining dictionary-based NLP tools with LLMs shows great promise for improving rare disease identification from unstructured clinical reports. By leveraging the strengths of both techniques, the method demonstrates superior performance and the potential to uncover hidden rare disease cases. Further research is needed to address limitations related to ontology mapping and overlapping case identification, and to integrate the approach into clinical practice for early diagnosis and improved patient outcomes.


Asunto(s)
Procesamiento de Lenguaje Natural , Enfermedades Raras , Unified Medical Language System , Enfermedades Raras/diagnóstico , Humanos , Fenotipo , Registros Electrónicos de Salud , Ontologías Biológicas
19.
Artif Intell Med ; 157: 102985, 2024 Sep 30.
Artículo en Inglés | MEDLINE | ID: mdl-39383708

RESUMEN

Developing technology to assist medical experts in their everyday decision-making is currently a hot topic in the field of Artificial Intelligence (AI). This is specially true within the framework of Evidence-Based Medicine (EBM), where the aim is to facilitate the extraction of relevant information using natural language as a tool for mediating in human-AI interaction. In this context, AI techniques can be beneficial in finding arguments for past decisions in evolution notes or patient journeys, especially when different doctors are involved in a patient's care. In those documents the decision-making process towards treating the patient is reported. Thus, applying Natural Language Processing (NLP) techniques has the potential to assist doctors in extracting arguments for a more comprehensive understanding of the decisions made. This work focuses on the explanatory argument identification step by setting up the task in a Question Answering (QA) scenario in which clinicians ask questions to the AI model to assist them in identifying those arguments. In order to explore the capabilities of current AI-based language models, we present a new dataset which, unlike previous work: (i) includes not only explanatory arguments for the correct hypothesis, but also arguments to reason on the incorrectness of other hypotheses; (ii) the explanations are written originally in Spanish by doctors to reason over cases from the Spanish Residency Medical Exams. Furthermore, this new benchmark allows us to set up a novel extractive task by identifying the explanation written by medical doctors that supports the correct answer within an argumentative text. An additional benefit of our approach lies in its ability to evaluate the extractive performance of language models using automatic metrics, which in the Antidote CasiMedicos dataset corresponds to a 74.47 F1 score. Comprehensive experimentation shows that our novel dataset and approach is an effective technique to help practitioners in identifying relevant evidence-based explanations for medical questions.

20.
Phenomics ; 4(3): 234-249, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-39398421

RESUMEN

Depression is one of the most common mental disorders, and rates of depression in individuals increase each year. Traditional diagnostic methods are primarily based on professional judgment, which is prone to individual bias. Therefore, it is crucial to design an effective and robust diagnostic method for automated depression detection. Current artificial intelligence approaches are limited in their abilities to extract features from long sentences. In addition, current models are not as robust with large input dimensions. To solve these concerns, a multimodal fusion model comprised of text, audio, and video for both depression detection and assessment tasks was developed. In the text modality, pre-trained sentence embedding was utilized to extract semantic representation along with Bidirectional long short-term memory (BiLSTM) to predict depression. This study also used Principal component analysis (PCA) to reduce the dimensionality of the input feature space and Support vector machine (SVM) to predict depression based on audio modality. In the video modality, Extreme gradient boosting (XGBoost) was employed to conduct both feature selection and depression detection. The final predictions were given by outputs of the different modalities with an ensemble voting algorithm. Experiments on the Distress analysis interview corpus wizard-of-Oz (DAIC-WOZ) dataset showed a great improvement of performance, with a weighted F1 score of 0.85, a Root mean square error (RMSE) of 5.57, and a Mean absolute error (MAE) of 4.48. Our proposed model outperforms the baseline in both depression detection and assessment tasks, and was shown to perform better than other existing state-of-the-art depression detection methods. Supplementary Information: The online version contains supplementary material available at 10.1007/s43657-023-00152-8.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA