Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 751
Filtrar
1.
JMIR AI ; 3: e49546, 2024 Oct 02.
Artículo en Inglés | MEDLINE | ID: mdl-39357045

RESUMEN

BACKGROUND: Women have been underrepresented in clinical trials for many years. Machine-learning models trained on clinical trial abstracts may capture and amplify biases in the data. Specifically, word embeddings are models that enable representing words as vectors and are the building block of most natural language processing systems. If word embeddings are trained on clinical trial abstracts, predictive models that use the embeddings will exhibit gender performance gaps. OBJECTIVE: We aim to capture temporal trends in clinical trials through temporal distribution matching on contextual word embeddings (specifically, BERT) and explore its effect on the bias manifested in downstream tasks. METHODS: We present TeDi-BERT, a method to harness the temporal trend of increasing women's inclusion in clinical trials to train contextual word embeddings. We implement temporal distribution matching through an adversarial classifier, trying to distinguish old from new clinical trial abstracts based on their embeddings. The temporal distribution matching acts as a form of domain adaptation from older to more recent clinical trials. We evaluate our model on 2 clinical tasks: prediction of unplanned readmission to the intensive care unit and hospital length of stay prediction. We also conduct an algorithmic analysis of the proposed method. RESULTS: In readmission prediction, TeDi-BERT achieved area under the receiver operating characteristic curve of 0.64 for female patients versus the baseline of 0.62 (P<.001), and 0.66 for male patients versus the baseline of 0.64 (P<.001). In the length of stay regression, TeDi-BERT achieved a mean absolute error of 4.56 (95% CI 4.44-4.68) for female patients versus 4.62 (95% CI 4.50-4.74, P<.001) and 4.54 (95% CI 4.44-4.65) for male patients versus 4.6 (95% CI 4.50-4.71, P<.001). CONCLUSIONS: In both clinical tasks, TeDi-BERT improved performance for female patients, as expected; but it also improved performance for male patients. Our results show that accuracy for one gender does not need to be exchanged for bias reduction, but rather that good science improves clinical results for all. Contextual word embedding models trained to capture temporal trends can help mitigate the effects of bias that changes over time in the training data.

2.
JMIR Res Protoc ; 13: e55511, 2024 Oct 07.
Artículo en Inglés | MEDLINE | ID: mdl-39374059

RESUMEN

BACKGROUND: Suicide stands as a global public health concern with a pronounced impact, especially in low- and middle-income countries, where it remains largely unnoticed as a significant health concern, leading to delays in diagnosis and intervention. South Asia, in particular, has seen limited development in this area of research, and applying existing models from other regions is challenging due to cost constraints and the region's distinct linguistics and behavior. Social media analysis, notably on platforms such as Facebook (Meta Platforms Inc), offers the potential for detecting major depressive disorder and aiding individuals at risk of suicidal ideation. OBJECTIVE: This study primarily focuses on India and Bangladesh, both South Asian countries. It aims to construct a predictive model for suicidal ideation by incorporating unique, unexplored features along with masked content from both public and private Facebook profiles. Moreover, the research aims to fill the existing research gap by addressing the distinct challenges posed by South Asia's unique behavioral patterns, socioeconomic conditions, and linguistic nuances. Ultimately, this research strives to enhance suicide prevention efforts in the region by offering a cost-effective solution. METHODS: This quantitative research study will gather data through a web-based platform. Initially, participants will be asked a few demographic questions and to complete the 9-item Patient Health Questionnaire assessment. Eligible participants who provide consent will receive an email requesting them to upload a ZIP file of their Facebook data. The study will begin by determining whether Facebook is the primary application for the participants based on their active hours and Facebook use duration. Subsequently, the predictive model will incorporate a wide range of previously unexplored variables, including anonymous postings, and textual analysis features, such as captions, biographic information, group membership, preferred pages, interactions with advertisement content, and search history. The model will also analyze the use of emojis and the types of games participants engage with on Facebook. RESULTS: The study obtained approval from the scientific review committee on October 2, 2023, and subsequently received institutional review committee ethical clearance on December 8, 2023. Our system is anticipated to automatically detect posts related to depression by analyzing the text and use pattern of the individual with the best accuracy possible. Ultimately, our research aims to have practical utility in identifying individuals who may be at risk of depression or in need of mental health support. CONCLUSIONS: This initiative aims to enhance engagement in suicidal ideation medical care in South Asia to improve health outcomes. It is set to be the first study to consider predicting participants' primary social application use before analyzing their content to forecast behavior and mental states. The study holds the potential to revolutionize strategies and offer insights for scalable, accessible interventions while maintaining quality through comprehensive Facebook feature analysis. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): DERR1-10.2196/55511.


Asunto(s)
Medios de Comunicación Sociales , Ideación Suicida , Humanos , India/epidemiología , Bangladesh/epidemiología , Estudios de Cohortes , Femenino , Adulto , Masculino , Depresión/epidemiología , Depresión/psicología , Adulto Joven , Adolescente , Persona de Mediana Edad , Encuestas y Cuestionarios , Trastorno Depresivo Mayor/epidemiología , Trastorno Depresivo Mayor/psicología
3.
Cureus ; 16(9): e69030, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39391440

RESUMEN

This study analyses the topic of stress and anxiety in 3,765 Reddit posts to determine key themes and emotional undertones using natural language processing (NLP) techniques. Five major category topics are identified from the posts using the latent Dirichlet allocation (LDA) algorithm. The topics identified are general discontent and lack of direction; panic and anxiety attacks; physical symptoms of anxiety, stress, and mental health concerns; and seeking help for anxiety. Sentiment analysis with the help of TextBlob showed a neutral score, for the most part: an average polarity score of 0.009 and a subjectivity score of 0.494. Several kinds of visualizations, including word clouds, bar charts, and pie charts, have been used to show the distribution and importance of these topics. These findings underscore the important role played by online communities in extending their support to those in distress because of mental health problems. This information is very important to mental health professionals and researchers. This study shows the effectiveness of using a combination of topic modeling and sentiment analysis to identify problems related to mental health discussed on social media. These results direct the possibilities for future research in using advanced NLP techniques and expanding to larger datasets.

4.
J Med Internet Res ; 26: e52142, 2024 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-39393064

RESUMEN

BACKGROUND: Obesity is a chronic, multifactorial, and relapsing disease, affecting people of all ages worldwide, and is directly related to multiple complications. Understanding public attitudes and perceptions toward obesity is essential for developing effective health policies, prevention strategies, and treatment approaches. OBJECTIVE: This study investigated the sentiments of the general public, celebrities, and important organizations regarding obesity using social media data, specifically from Twitter (subsequently rebranded as X). METHODS: The study analyzes a dataset of 53,414 tweets related to obesity posted on Twitter during the COVID-19 pandemic, from April 2019 to December 2022. Sentiment analysis was performed using the XLM-RoBERTa-base model, and topic modeling was conducted using the BERTopic library. RESULTS: The analysis revealed that tweets regarding obesity were predominantly negative. Spikes in Twitter activity correlated with significant political events, such as the exchange of obesity-related comments between US politicians and criticism of the United Kingdom's obesity campaign. Topic modeling identified 243 clusters representing various obesity-related topics, such as childhood obesity; the US President's obesity struggle; COVID-19 vaccinations; the UK government's obesity campaign; body shaming; racism and high obesity rates among Black American people; smoking, substance abuse, and alcohol consumption among people with obesity; environmental risk factors; and surgical treatments. CONCLUSIONS: Twitter serves as a valuable source for understanding obesity-related sentiments and attitudes among the public, celebrities, and influential organizations. Sentiments regarding obesity were predominantly negative. Negative portrayals of obesity by influential politicians and celebrities were shown to contribute to negative public sentiments, which can have adverse effects on public health. It is essential for public figures to be mindful of their impact on public opinion and the potential consequences of their statements.


Asunto(s)
COVID-19 , Obesidad , Opinión Pública , Medios de Comunicación Sociales , Humanos , COVID-19/psicología , COVID-19/epidemiología , COVID-19/prevención & control , Obesidad/psicología , Obesidad/epidemiología , Estudios Transversales , Emociones , Pandemias , Reino Unido , Estados Unidos , SARS-CoV-2
5.
Acad Radiol ; 2024 Sep 07.
Artículo en Inglés | MEDLINE | ID: mdl-39245597

RESUMEN

RATIONALE AND OBJECTIVE: To compare the performance of large language model (LLM) based Gemini and Generative Pre-trained Transformers (GPTs) in data mining and generating structured reports based on free-text PET/CT reports for breast cancer after user-defined tasks. MATERIALS AND METHODS: Breast cancer patients (mean age, 50 years ± 11 [SD]; all female) who underwent consecutive 18F-FDG PET/CT for follow-up between July 2005 and October 2023 were retrospectively included in the study. A total of twenty reports from 10 patients were used to train user-defined text prompts for Gemini and GPTs, by which structured PET/CT reports were generated. The natural language processing (NLP) generated structured reports and the structured reports annotated by nuclear medicine physicians were compared in terms of data extraction accuracy and capacity of progress decision-making. Statistical methods, including chi-square test, McNemar test and paired samples t-test, were employed in the study. RESULTS: The structured PET/CT reports for 131 patients were generated by using the two NLP techniques, including Gemini and GPTs. In general, GPTs exhibited superiority over Gemini in data mining in terms of primary lesion size (89.6% vs. 53.8%, p < 0.001) and metastatic lesions (96.3% vs 89.6%, p < 0.001). Moreover, GPTs outperformed Gemini in making decision for progress (p < 0.001) and semantic similarity (F1 score 0.930 vs 0.907, p < 0.001) for reports. CONCLUSION: GPTs outperformed Gemini in generating structured reports based on free-text PET/CT reports, which is potentially applied in clinical practice. DATA AVAILABILITY: The data used and/or analyzed during the current study are available from the corresponding author on reasonable request.

6.
Cureus ; 16(8): e66209, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39233986

RESUMEN

Extended reality (XR) simulations are becoming increasingly common in educational settings, particularly in medical education. Advancing XR devices to enhance these simulations is a booming field of research. This study seeks to understand the value of a novel, non-wearable mixed reality (MR) display during interactions with a simulated holographic patient, specifically in taking a medical history. Twenty-one first-year medical students at the University of North Carolina at Chapel Hill participated in the virtual patient (VP) simulations. On a five-point Likert scale, students overwhelmingly agreed with the statement that the simulations helped ensure they were progressing along learning objectives related to taking a patient history. However, they found that, at present, the simulations can only partially correct mistakes or provide clear feedback. This finding demonstrates that the novel hardware solution can help students engage in the activity, but the underlying software may need adjustment to attain sufficient pedagogical validity.

7.
Data Brief ; 56: 110855, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39286413

RESUMEN

With the soaring demand for healthcare systems, chatbots are gaining tremendous popularity and research attention. Numerous language-centric research on healthcare is conducted day by day. Despite significant advances in Arabic Natural Language Processing (NLP), challenges remain in natural language classification and generation due to the lack of suitable datasets. The primary shortcoming of these models is the lack of suitable Arabic datasets for training. To address this, authors introduce a large Arabic Healthcare Dataset (AHD) of textual data. The dataset consists of over 808k questions and answers across 90 categories, offered to the research community for Arabic computational linguistics. Authors anticipate that this rich dataset would make a great aid for a variety of NLP tasks on Arabic textual data, especially for text classification and generation purposes. Authors present the data in raw form. AHD is composed of main dataset scraped from medical website, which is Altibbi website. AHD is made public and freely available at http://data.mendeley.com/datasets/mgj29ndgrk/5.

8.
JMIR Aging ; 7: e54655, 2024 Sep 16.
Artículo en Inglés | MEDLINE | ID: mdl-39283659

RESUMEN

BACKGROUND: About one-third of older adults aged 65 years and older often have mild cognitive impairment or dementia. Acoustic and psycho-linguistic features derived from conversation may be of great diagnostic value because speech involves verbal memory and cognitive and neuromuscular processes. The relative decline in these processes, however, may not be linear and remains understudied. OBJECTIVE: This study aims to establish associations between cognitive abilities and various attributes of speech and natural language production. To date, the majority of research has been cross-sectional, relying mostly on data from structured interactions and restricted to textual versus acoustic analyses. METHODS: In a sample of 71 older (mean age 83.3, SD 7.0 years) community-dwelling adults who completed qualitative interviews and cognitive testing, we investigated the performance of both acoustic and psycholinguistic features associated with cognitive deficits contemporaneously and at a 1-2 years follow up (mean follow-up time 512.3, SD 84.5 days). RESULTS: Combined acoustic and psycholinguistic features achieved high performance (F1-scores 0.73-0.86) and sensitivity (up to 0.90) in estimating cognitive deficits across multiple domains. Performance remained high when acoustic and psycholinguistic features were used to predict follow-up cognitive performance. The psycholinguistic features that were most successful at classifying high cognitive impairment reflected vocabulary richness, the quantity of speech produced, and the fragmentation of speech, whereas the analogous top-ranked acoustic features reflected breathing and nonverbal vocalizations such as giggles or laughter. CONCLUSIONS: These results suggest that both acoustic and psycholinguistic features extracted from qualitative interviews may be reliable markers of cognitive deficits in late life.


Asunto(s)
Disfunción Cognitiva , Psicolingüística , Humanos , Femenino , Masculino , Disfunción Cognitiva/diagnóstico , Disfunción Cognitiva/psicología , Anciano de 80 o más Años , Anciano , Pruebas Neuropsicológicas
9.
Aging Cell ; : e14345, 2024 Sep 25.
Artículo en Inglés | MEDLINE | ID: mdl-39323014

RESUMEN

MicroRNA plays a crucial role in post-transcriptional gene regulation and has recently emerged as a factor linked to aging, but the underlying regulatory mechanisms remain incompletely understood. In this study, we observed lifespan-extending effects in miR-80-deficient Caenorhabditis elegans at 20°C but not 25°C. At 20°C, miR-80 deletion leads to NLP-45 upregulation, which positively correlates to increased abu transcripts and extended lifespan. Supportively, we identified miR-80 binding regions in the 5' and 3' UTR of nlp-45. As the temperature rises to 25°C, wildtype increases miR-80 levels, but removal of miR-80 is accompanied by decreased nlp-45 expression, suggesting intervention from other temperature-sensitive mechanisms. These findings support the concept that microRNAs and neuropeptide-like proteins can form molecular regulatory networks involving downstream molecules to regulate lifespan, and such regulatory effects vary on environmental conditions. This study unveils the role of an axis of miR-80/NLP-45/UPRER components in regulating longevity, offering new insights on strategies of aging attenuation and health span prolongation.

10.
JMIR Infodemiology ; 4: e51156, 2024 Sep 13.
Artículo en Inglés | MEDLINE | ID: mdl-39269743

RESUMEN

BACKGROUND: The growing availability of big data spontaneously generated by social media platforms allows us to leverage natural language processing (NLP) methods as valuable tools to understand the opioid crisis. OBJECTIVE: We aimed to understand how NLP has been applied to Reddit (Reddit Inc) data to study opioid use. METHODS: We systematically searched for peer-reviewed studies and conference abstracts in PubMed, Scopus, PsycINFO, ACL Anthology, IEEE Xplore, and Association for Computing Machinery data repositories up to July 19, 2022. Inclusion criteria were studies investigating opioid use, using NLP techniques to analyze the textual corpora, and using Reddit as the social media data source. We were specifically interested in mapping studies' overarching goals and findings, methodologies and software used, and main limitations. RESULTS: In total, 30 studies were included, which were classified into 4 nonmutually exclusive overarching goal categories: methodological (n=6, 20% studies), infodemiology (n=22, 73% studies), infoveillance (n=7, 23% studies), and pharmacovigilance (n=3, 10% studies). NLP methods were used to identify content relevant to opioid use among vast quantities of textual data, to establish potential relationships between opioid use patterns or profiles and contextual factors or comorbidities, and to anticipate individuals' transitions between different opioid-related subreddits, likely revealing progression through opioid use stages. Most studies used an embedding technique (12/30, 40%), prediction or classification approach (12/30, 40%), topic modeling (9/30, 30%), and sentiment analysis (6/30, 20%). The most frequently used programming languages were Python (20/30, 67%) and R (2/30, 7%). Among the studies that reported limitations (20/30, 67%), the most cited was the uncertainty regarding whether redditors participating in these forums were representative of people who use opioids (8/20, 40%). The papers were very recent (28/30, 93%), from 2019 to 2022, with authors from a range of disciplines. CONCLUSIONS: This scoping review identified a wide variety of NLP techniques and applications used to support surveillance and social media interventions addressing the opioid crisis. Despite the clear potential of these methods to enable the identification of opioid-relevant content in Reddit and its analysis, there are limits to the degree of interpretive meaning that they can provide. Moreover, we identified the need for standardized ethical guidelines to govern the use of Reddit data to safeguard the anonymity and privacy of people using these forums.


Asunto(s)
Procesamiento de Lenguaje Natural , Medios de Comunicación Sociales , Humanos , Trastornos Relacionados con Opioides/epidemiología , Analgésicos Opioides/efectos adversos , Analgésicos Opioides/uso terapéutico
11.
J Psychiatr Res ; 179: 266-269, 2024 Sep 14.
Artículo en Inglés | MEDLINE | ID: mdl-39326221

RESUMEN

INTRODUCTION: The Danish Health Care Registers rely on the International Statistical Classification of Diseases and Related Health Problems (ICD)-classification and stand as a widely utilized resource for health epidemiological research. Eating disorders are multifaceted syndromes where two distinctive diagnoses are defined, anorexia nervosa (AN) and bulimia nervosa (BN). However, the validity of the registered diagnoses remains to be verified. Manuel chart review is often the method for validation of diagnosis codes, but there is limited research on how natural language processing (NLP) models could enhance this process. OBJECTIVE: To investigate the accuracy of the clinical use of ICD-10 diagnosis codes F50.0, F50.1, F50.2, and F50.3 in the Danish Health Care Registers, using a manual chart review assisted by NLP. METHOD: From a cohort of all individuals attending hospitals in Region of Southern Denmark with registered electronic health information, we extracted medical information from the electronic health journal on 100 individuals with each of the four diagnosis codes. After extraction, an NLP model with regular expression search patterns identified relevant text passages for manual chart review. RESULTS: Overall, 372 of the 400 diagnosis codes (93%) were correct. A diagnosis code for AN was correct in 90% of instances, 96% for atypical AN, 96% for BN and 90% for an atypical BN diagnosis code. CONCLUSION: We found that the accuracy of a diagnosis code F50.0, F50.1, F50.2, and F50.3 to be high. This confirms that the generally well-documented validity of the Danish health care registers also applies to the eating disorder diagnoses.

12.
J Med Internet Res ; 26: e60501, 2024 Sep 10.
Artículo en Inglés | MEDLINE | ID: mdl-39255030

RESUMEN

BACKGROUND: Prompt engineering, focusing on crafting effective prompts to large language models (LLMs), has garnered attention for its capabilities at harnessing the potential of LLMs. This is even more crucial in the medical domain due to its specialized terminology and language technicity. Clinical natural language processing applications must navigate complex language and ensure privacy compliance. Prompt engineering offers a novel approach by designing tailored prompts to guide models in exploiting clinically relevant information from complex medical texts. Despite its promise, the efficacy of prompt engineering in the medical domain remains to be fully explored. OBJECTIVE: The aim of the study is to review research efforts and technical approaches in prompt engineering for medical applications as well as provide an overview of opportunities and challenges for clinical practice. METHODS: Databases indexing the fields of medicine, computer science, and medical informatics were queried in order to identify relevant published papers. Since prompt engineering is an emerging field, preprint databases were also considered. Multiple data were extracted, such as the prompt paradigm, the involved LLMs, the languages of the study, the domain of the topic, the baselines, and several learning, design, and architecture strategies specific to prompt engineering. We include studies that apply prompt engineering-based methods to the medical domain, published between 2022 and 2024, and covering multiple prompt paradigms such as prompt learning (PL), prompt tuning (PT), and prompt design (PD). RESULTS: We included 114 recent prompt engineering studies. Among the 3 prompt paradigms, we have observed that PD is the most prevalent (78 papers). In 12 papers, PD, PL, and PT terms were used interchangeably. While ChatGPT is the most commonly used LLM, we have identified 7 studies using this LLM on a sensitive clinical data set. Chain-of-thought, present in 17 studies, emerges as the most frequent PD technique. While PL and PT papers typically provide a baseline for evaluating prompt-based approaches, 61% (48/78) of the PD studies do not report any nonprompt-related baseline. Finally, we individually examine each of the key prompt engineering-specific information reported across papers and find that many studies neglect to explicitly mention them, posing a challenge for advancing prompt engineering research. CONCLUSIONS: In addition to reporting on trends and the scientific landscape of prompt engineering, we provide reporting guidelines for future studies to help advance research in the medical field. We also disclose tables and figures summarizing medical prompt engineering papers available and hope that future contributions will leverage these existing works to better advance the field.


Asunto(s)
Procesamiento de Lenguaje Natural , Humanos , Informática Médica/métodos
13.
JMIR Med Inform ; 12: e52678, 2024 Sep 20.
Artículo en Inglés | MEDLINE | ID: mdl-39302636

RESUMEN

Background: Collaborative documentation (CD) is a behavioral health practice involving shared writing of clinic visit notes by providers and consumers. Despite widespread dissemination of CD, research on its effectiveness or impact on person-centered care (PCC) has been limited. Principles of PCC planning, a recovery-based approach to service planning that operationalizes PCC, can inform the measurement of person-centeredness within clinical documentation. Objective: This study aims to use the clinical informatics approach of natural language processing (NLP) to examine the impact of CD on person-centeredness in clinic visit notes. Using a dictionary-based approach, this study conducts a textual analysis of clinic notes from a community mental health center before and after staff were trained in CD. Methods: This study used visit notes (n=1981) from 10 providers in a community mental health center 6 months before and after training in CD. LIWC-22 was used to assess all notes using the Linguistic Inquiry and Word Count (LIWC) dictionary, which categorizes over 5000 linguistic and psychological words. Twelve LIWC categories were selected and mapped onto PCC planning principles through the consensus of 3 domain experts. The LIWC-22 contextualizer was used to extract sentence fragments from notes corresponding to LIWC categories. Then, fixed-effects modeling was used to identify differences in notes before and after CD training while accounting for nesting within the provider. Results: Sentence fragments identified by the contextualizing process illustrated how visit notes demonstrated PCC. The fixed effects analysis found a significant positive shift toward person-centeredness; this was observed in 6 of the selected LIWC categories post CD. Specifically, there was a notable increase in words associated with achievement (ß=.774, P<.001), power (ß=.831, P<.001), money (ß=.204, P<.001), physical health (ß=.427, P=.03), while leisure words decreased (ß=-.166, P=.002). Conclusions: By using a dictionary-based approach, the study identified how CD might influence the integration of PCC principles within clinical notes. Although the results were mixed, the findings highlight the potential effectiveness of CD in enhancing person-centeredness in clinic notes. By leveraging NLP techniques, this research illuminated the value of narrative clinical notes in assessing the quality of care in behavioral health contexts. These findings underscore the promise of NLP for quality assurance in health care settings and emphasize the need for refining algorithms to more accurately measure PCC.


Asunto(s)
Documentación , Procesamiento de Lenguaje Natural , Atención Dirigida al Paciente , Humanos , Documentación/métodos , Registros Electrónicos de Salud , Servicios Comunitarios de Salud Mental/organización & administración
14.
Sensors (Basel) ; 24(18)2024 Sep 19.
Artículo en Inglés | MEDLINE | ID: mdl-39338806

RESUMEN

The proliferation of fake news across multiple modalities has emerged as a critical challenge in the modern information landscape, necessitating advanced detection methods. This study proposes a comprehensive framework for fake news detection integrating text, images, and videos using machine learning and deep learning techniques. The research employs a dual-phased methodology, first analyzing textual data using various classifiers, then developing a multimodal approach combining BERT for text analysis and a modified CNN for visual data. Experiments on the ISOT fake news dataset and MediaEval 2016 image verification corpus demonstrate the effectiveness of the proposed models. For textual data, the Random Forest classifier achieved 99% accuracy, outperforming other algorithms. The multimodal approach showed superior performance compared to baseline models, with a 3.1% accuracy improvement over existing multimodal techniques. This research contributes to the ongoing efforts to combat misinformation by providing a robust, adaptable framework for detecting fake news across different media formats, addressing the complexities of modern information dissemination and manipulation.

15.
JMIR Med Inform ; 12: e58977, 2024 Sep 24.
Artículo en Inglés | MEDLINE | ID: mdl-39316418

RESUMEN

BACKGROUND: Natural language processing (NLP) techniques can be used to analyze large amounts of electronic health record texts, which encompasses various types of patient information such as quality of life, effectiveness of treatments, and adverse drug event (ADE) signals. As different aspects of a patient's status are stored in different types of documents, we propose an NLP system capable of processing 6 types of documents: physician progress notes, discharge summaries, radiology reports, radioisotope reports, nursing records, and pharmacist progress notes. OBJECTIVE: This study aimed to investigate the system's performance in detecting ADEs by evaluating the results from multitype texts. The main objective is to detect adverse events accurately using an NLP system. METHODS: We used data written in Japanese from 2289 patients with breast cancer, including medication data, physician progress notes, discharge summaries, radiology reports, radioisotope reports, nursing records, and pharmacist progress notes. Our system performs 3 processes: named entity recognition, normalization of symptoms, and aggregation of multiple types of documents from multiple patients. Among all patients with breast cancer, 103 and 112 with peripheral neuropathy (PN) received paclitaxel or docetaxel, respectively. We evaluate the utility of using multiple types of documents by correlation coefficient and regression analysis to compare their performance with each single type of document. All evaluations of detection rates with our system are performed 30 days after drug administration. RESULTS: Our system underestimates by 13.3 percentage points (74.0%-60.7%), as the incidence of paclitaxel-induced PN was 60.7%, compared with 74.0% in the previous research based on manual extraction. The Pearson correlation coefficient between the manual extraction and system results was 0.87 Although the pharmacist progress notes had the highest detection rate among each type of document, the rate did not match the performance using all documents. The estimated median duration of PN with paclitaxel was 92 days, whereas the previously reported median duration of PN with paclitaxel was 727 days. The number of events detected in each document was highest in the physician's progress notes, followed by the pharmacist's and nursing records. CONCLUSIONS: Considering the inherent cost that requires constant monitoring of the patient's condition, such as the treatment of PN, our system has a significant advantage in that it can immediately estimate the treatment duration without fine-tuning a new NLP model. Leveraging multitype documents is better than using single-type documents to improve detection performance. Although the onset time estimation was relatively accurate, the duration might have been influenced by the length of the data follow-up period. The results suggest that our method using various types of data can detect more ADEs from clinical documents.


Asunto(s)
Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Humanos , Estudios Retrospectivos , Japón , Neoplasias de la Mama/patología , Neoplasias de la Mama/tratamiento farmacológico , Femenino , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/diagnóstico , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/epidemiología , Pueblos del Este de Asia
16.
JMIR Aging ; 7: e57926, 2024 Sep 24.
Artículo en Inglés | MEDLINE | ID: mdl-39316421

RESUMEN

BACKGROUND: The severity of Alzheimer disease and related dementias (ADRD) is rarely documented in structured data fields in electronic health records (EHRs). Although this information is important for clinical monitoring and decision-making, it is often undocumented or "hidden" in unstructured text fields and not readily available for clinicians to act upon. OBJECTIVE: We aimed to assess the feasibility and potential bias in using keywords and rule-based matching for obtaining information about the severity of ADRD from EHR data. METHODS: We used EHR data from a large academic health care system that included patients with a primary discharge diagnosis of ADRD based on ICD-9 (International Classification of Diseases, Ninth Revision) and ICD-10 (International Statistical Classification of Diseases, Tenth Revision) codes between 2014 and 2019. We first assessed the presence of ADRD severity information and then the severity of ADRD in the EHR. Clinicians' notes were used to determine the severity of ADRD based on two criteria: (1) scores from the Mini Mental State Examination and Montreal Cognitive Assessment and (2) explicit terms for ADRD severity (eg, "mild dementia" and "advanced Alzheimer disease"). We compiled a list of common ADRD symptoms, cognitive test names, and disease severity terms, refining it iteratively based on previous literature and clinical expertise. Subsequently, we used rule-based matching in Python using standard open-source data analysis libraries to identify the context in which specific words or phrases were mentioned. We estimated the prevalence of documented ADRD severity and assessed the performance of our rule-based algorithm. RESULTS: We included 9115 eligible patients with over 65,000 notes from the providers. Overall, 22.93% (2090/9115) of patients were documented with mild ADRD, 20.87% (1902/9115) were documented with moderate or severe ADRD, and 56.20% (5123/9115) did not have any documentation of the severity of their ADRD. For the task of determining the presence of any ADRD severity information, our algorithm achieved an accuracy of >95%, specificity of >95%, sensitivity of >90%, and an F1-score of >83%. For the specific task of identifying the actual severity of ADRD, the algorithm performed well with an accuracy of >91%, specificity of >80%, sensitivity of >88%, and F1-score of >92%. Comparing patients with mild ADRD to those with more advanced ADRD, the latter group tended to contain older, more likely female, and Black patients, and having received their diagnoses in primary care or in-hospital settings. Relative to patients with undocumented ADRD severity, those with documented ADRD severity had a similar distribution in terms of sex, race, and rural or urban residence. CONCLUSIONS: Our study demonstrates the feasibility of using a rule-based matching algorithm to identify ADRD severity from unstructured EHR report data. However, it is essential to acknowledge potential biases arising from differences in documentation practices across various health care systems.


Asunto(s)
Demencia , Registros Electrónicos de Salud , Estudios de Factibilidad , Índice de Severidad de la Enfermedad , Humanos , Demencia/diagnóstico , Masculino , Femenino , Anciano , Enfermedad de Alzheimer/diagnóstico , Anciano de 80 o más Años
17.
JMIR AI ; 3: e60020, 2024 Sep 23.
Artículo en Inglés | MEDLINE | ID: mdl-39312397

RESUMEN

BACKGROUND: Physicians spend approximately half of their time on administrative tasks, which is one of the leading causes of physician burnout and decreased work satisfaction. The implementation of natural language processing-assisted clinical documentation tools may provide a solution. OBJECTIVE: This study investigates the impact of a commercially available Dutch digital scribe system on clinical documentation efficiency and quality. METHODS: Medical students with experience in clinical practice and documentation (n=22) created a total of 430 summaries of mock consultations and recorded the time they spent on this task. The consultations were summarized using 3 methods: manual summaries, fully automated summaries, and automated summaries with manual editing. We then randomly reassigned the summaries and evaluated their quality using a modified version of the Physician Documentation Quality Instrument (PDQI-9). We compared the differences between the 3 methods in descriptive statistics, quantitative text metrics (word count and lexical diversity), the PDQI-9, Recall-Oriented Understudy for Gisting Evaluation scores, and BERTScore. RESULTS: The median time for manual summarization was 202 seconds against 186 seconds for editing an automatic summary. Without editing, the automatic summaries attained a poorer PDQI-9 score than manual summaries (median PDQI-9 score 25 vs 31, P<.001, ANOVA test). Automatic summaries were found to have higher word counts but lower lexical diversity than manual summaries (P<.001, independent t test). The study revealed variable impacts on PDQI-9 scores and summarization time across individuals. Generally, students viewed the digital scribe system as a potentially useful tool, noting its ease of use and time-saving potential, though some criticized the summaries for their greater length and rigid structure. CONCLUSIONS: This study highlights the potential of digital scribes in improving clinical documentation processes by offering a first summary draft for physicians to edit, thereby reducing documentation time without compromising the quality of patient records. Furthermore, digital scribes may be more beneficial to some physicians than to others and could play a role in improving the reusability of clinical documentation. Future studies should focus on the impact and quality of such a system when used by physicians in clinical practice.

18.
J Med Internet Res ; 26: e55648, 2024 Sep 30.
Artículo en Inglés | MEDLINE | ID: mdl-39348189

RESUMEN

BACKGROUND: The release of ChatGPT (OpenAI) in November 2022 drastically reduced the barrier to using artificial intelligence by allowing a simple web-based text interface to a large language model (LLM). One use case where ChatGPT could be useful is in triaging patients at the site of a disaster using the Simple Triage and Rapid Treatment (START) protocol. However, LLMs experience several common errors including hallucinations (also called confabulations) and prompt dependency. OBJECTIVE: This study addresses the research problem: "Can ChatGPT adequately triage simulated disaster patients using the START protocol?" by measuring three outcomes: repeatability, reproducibility, and accuracy. METHODS: Nine prompts were developed by 5 disaster medicine physicians. A Python script queried ChatGPT Version 4 for each prompt combined with 391 validated simulated patient vignettes. Ten repetitions of each combination were performed for a total of 35,190 simulated triages. A reference standard START triage code for each simulated case was assigned by 2 disaster medicine specialists (JMF and MV), with a third specialist (LC) added if the first two did not agree. Results were evaluated using a gage repeatability and reproducibility study (gage R and R). Repeatability was defined as variation due to repeated use of the same prompt. Reproducibility was defined as variation due to the use of different prompts on the same patient vignette. Accuracy was defined as agreement with the reference standard. RESULTS: Although 35,102 (99.7%) queries returned a valid START score, there was considerable variability. Repeatability (use of the same prompt repeatedly) was 14% of the overall variation. Reproducibility (use of different prompts) was 4.1% of the overall variation. The accuracy of ChatGPT for START was 63.9% with a 32.9% overtriage rate and a 3.1% undertriage rate. Accuracy varied by prompt with a maximum of 71.8% and a minimum of 46.7%. CONCLUSIONS: This study indicates that ChatGPT version 4 is insufficient to triage simulated disaster patients via the START protocol. It demonstrated suboptimal repeatability and reproducibility. The overall accuracy of triage was only 63.9%. Health care professionals are advised to exercise caution while using commercial LLMs for vital medical determinations, given that these tools may commonly produce inaccurate data, colloquially referred to as hallucinations or confabulations. Artificial intelligence-guided tools should undergo rigorous statistical evaluation-using methods such as gage R and R-before implementation into clinical settings.


Asunto(s)
Triaje , Triaje/métodos , Humanos , Reproducibilidad de los Resultados , Simulación de Paciente , Medicina de Desastres/métodos , Desastres
19.
PeerJ Comput Sci ; 10: e2314, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39314723

RESUMEN

Predicting Bitcoin prices is crucial because they reflect trends in the overall cryptocurrency market. Owing to the market's short history and high price volatility, previous research has focused on the factors influencing Bitcoin price fluctuations. Although previous studies used sentiment analysis or diversified input features, this study's novelty lies in its utilization of data classified into more than five major categories. Moreover, the use of data spanning more than 2,000 days adds novelty to this study. With this extensive dataset, the authors aimed to predict Bitcoin prices across various timeframes using time series analysis. The authors incorporated a broad spectrum of inputs, including technical indicators, sentiment analysis from social media, news sources, and Google Trends. In addition, this study integrated macroeconomic indicators, on-chain Bitcoin transaction details, and traditional financial asset data. The primary objective was to evaluate extensive machine learning and deep learning frameworks for time series prediction, determine optimal window sizes, and enhance Bitcoin price prediction accuracy by leveraging diverse input features. Consequently, employing the bidirectional long short-term memory (Bi-LSTM) yielded significant results even without excluding the COVID-19 outbreak as a black swan outlier. Specifically, using a window size of 3, Bi-LSTM achieved a root mean squared error of 0.01824, mean absolute error of 0.01213, mean absolute percentage error of 2.97%, and an R-squared value of 0.98791. Additionally, to ascertain the importance of input features, gradient importance was examined to identify which variables specifically influenced prediction results. Ablation test was also conducted to validate the effectiveness and validity of input features. The proposed methodology provides a varied examination of the factors influencing price formation, helping investors make informed decisions regarding Bitcoin-related investments, and enabling policymakers to legislate considering these factors.

20.
PeerJ Comput Sci ; 10: e2252, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39314736

RESUMEN

The world faces the ongoing challenge of terrorism and extremism, which threaten the stability of nations, the security of their citizens, and the integrity of political, economic, and social systems. Given the complexity and multifaceted nature of this phenomenon, combating it requires a collective effort, with tailored methods to address its various aspects. Identifying the terrorist organization responsible for an attack is a critical step in combating terrorism. Historical data plays a pivotal role in this process, providing insights that can inform prevention and response strategies. With advancements in technology and artificial intelligence (AI), particularly in military applications, there is growing interest in utilizing these developments to enhance national and regional security against terrorism. Central to this effort are terrorism databases, which serve as rich resources for data on armed organizations, extremist entities, and terrorist incidents. The Global Terrorism Database (GTD) stands out as one of the most widely used and accessible resources for researchers. Recent progress in machine learning (ML), deep learning (DL), and natural language processing (NLP) offers promising avenues for improving the identification and classification of terrorist organizations. This study introduces a framework designed to classify and predict terrorist groups using bidirectional recurrent units and self-attention mechanisms, referred to as BiGRU-SA. This approach utilizes the comprehensive data in the GTD by integrating textual features extracted by DistilBERT with features that show a high correlation with terrorist organizations. Additionally, the Synthetic Minority Over-sampling Technique with Tomek links (SMOTE-T) was employed to address data imbalance and enhance the robustness of our predictions. The BiGRU-SA model captures temporal dependencies and contextual information within the data. By processing data sequences in both forward and reverse directions, BiGRU-SA offers a comprehensive view of the temporal dynamics, significantly enhancing classification accuracy. To evaluate the effectiveness of our framework, we compared ten models, including six traditional ML models and four DL algorithms. The proposed BiGRU-SA framework demonstrated outstanding performance in classifying 36 terrorist organizations responsible for terrorist attacks, achieving an accuracy of 98.68%, precision of 96.06%, sensitivity of 96.83%, specificity of 99.50%, and a Matthews correlation coefficient of 97.50%. Compared to state-of-the-art methods, the proposed model outperformed others, confirming its effectiveness and accuracy in the classification and prediction of terrorist organizations.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA