Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 6.716
1.
Med Ref Serv Q ; 43(2): 196-202, 2024.
Article En | MEDLINE | ID: mdl-38722609

Named entity recognition (NER) is a powerful computer system that utilizes various computing strategies to extract information from raw text input, since the early 1990s. With rapid advancement in AI and computing, NER models have gained significant attention and been serving as foundational tools across numerus professional domains to organize unstructured data for research and practical applications. This is particularly evident in the medical and healthcare fields, where NER models are essential in efficiently extract critical information from complex documents that are challenging for manual review. Despite its successes, NER present limitations in fully comprehending natural language nuances. However, the development of more advanced and user-friendly models promises to improve work experiences of professional users significantly.


Information Storage and Retrieval , Natural Language Processing , Information Storage and Retrieval/methods , Humans , Artificial Intelligence
2.
Sci Data ; 11(1): 482, 2024 May 10.
Article En | MEDLINE | ID: mdl-38730023

Prolonged and over-excessive interaction with cyberspace poses a threat to people's health and leads to the occurrence of Cyber-Syndrome, which covers not only physiological but also psychological disorders. This paper aims to create a tree-shaped gold-standard corpus that annotates the Cyber-Syndrome, clinical manifestations, and acupoints that can alleviate their symptoms or signs, designating this corpus as CS-A. In the CS-A corpus, this paper defines six entities and relations subject to annotation. There are 448 texts to annotate in total manually. After three rounds of updating the annotation guidelines, the inter-annotator agreement (IAA) improved significantly, resulting in a higher IAA score of 86.05%. The purpose of constructing CS-A corpus is to increase the popularity of Cyber-Syndrome and draw attention to its subtle impact on people's health. Meanwhile, annotated corpus promotes the development of natural language processing technology. Some model experiments can be implemented based on this corpus, such as optimizing and improving models for discontinuous entity recognition, nested entity recognition, etc. The CS-A corpus has been uploaded to figshare.


Acupuncture Points , Humans , Natural Language Processing , Computers , Internet
3.
Sci Rep ; 14(1): 10785, 2024 05 11.
Article En | MEDLINE | ID: mdl-38734712

Large language models (LLMs), like ChatGPT, Google's Bard, and Anthropic's Claude, showcase remarkable natural language processing capabilities. Evaluating their proficiency in specialized domains such as neurophysiology is crucial in understanding their utility in research, education, and clinical applications. This study aims to assess and compare the effectiveness of Large Language Models (LLMs) in answering neurophysiology questions in both English and Persian (Farsi) covering a range of topics and cognitive levels. Twenty questions covering four topics (general, sensory system, motor system, and integrative) and two cognitive levels (lower-order and higher-order) were posed to the LLMs. Physiologists scored the essay-style answers on a scale of 0-5 points. Statistical analysis compared the scores across different levels such as model, language, topic, and cognitive levels. Performing qualitative analysis identified reasoning gaps. In general, the models demonstrated good performance (mean score = 3.87/5), with no significant difference between language or cognitive levels. The performance was the strongest in the motor system (mean = 4.41) while the weakest was observed in integrative topics (mean = 3.35). Detailed qualitative analysis uncovered deficiencies in reasoning, discerning priorities, and knowledge integrating. This study offers valuable insights into LLMs' capabilities and limitations in the field of neurophysiology. The models demonstrate proficiency in general questions but face challenges in advanced reasoning and knowledge integration. Targeted training could address gaps in knowledge and causal reasoning. As LLMs evolve, rigorous domain-specific assessments will be crucial for evaluating advancements in their performance.


Language , Neurophysiology , Humans , Neurophysiology/methods , Natural Language Processing , Cognition/physiology
4.
Health Informatics J ; 30(2): 14604582241240680, 2024.
Article En | MEDLINE | ID: mdl-38739488

Objective: This study examined major themes and sentiments and their trajectories and interactions over time using subcategories of Reddit data. The aim was to facilitate decision-making for psychosocial rehabilitation. Materials and Methods: We utilized natural language processing techniques, including topic modeling and sentiment analysis, on a dataset consisting of more than 38,000 topics, comments, and posts collected from a subreddit dedicated to the experiences of people who tested positive for COVID-19. In this longitudinal exploratory analysis, we studied the dynamics between the most dominant topics and subjects' emotional states over an 18-month period. Results: Our findings highlight the evolution of the textual and sentimental status of major topics discussed by COVID survivors over an extended period of time during the pandemic. We particularly studied pre- and post-vaccination eras as a turning point in the timeline of the pandemic. The results show that not only does the relevance of topics change over time, but the emotions attached to them also vary. Major social events, such as the administration of vaccines or enforcement of nationwide policies, are also reflected through the discussions and inquiries of social media users. In particular, the emotional state (i.e., sentiments and polarity of their feelings) of those who have experienced COVID personally. Discussion: Cumulative societal knowledge regarding the COVID-19 pandemic impacts the patterns with which people discuss their experiences, concerns, and opinions. The subjects' emotional state with respect to different topics was also impacted by extraneous factors and events, such as vaccination. Conclusion: By mining major topics, sentiments, and trajectories demonstrated in COVID-19 survivors' interactions on Reddit, this study contributes to the emerging body of scholarship on COVID-19 survivors' mental health outcomes, providing insights into the design of mental health support and rehabilitation services for COVID-19 survivors.


COVID-19 , SARS-CoV-2 , Survivors , Humans , COVID-19/psychology , COVID-19/epidemiology , Survivors/psychology , Data Mining/methods , Pandemics , Natural Language Processing , Social Media/trends , Longitudinal Studies
5.
JMIR Public Health Surveill ; 10: e47064, 2024 May 10.
Article En | MEDLINE | ID: mdl-38728069

BACKGROUND: Smell disorders are commonly reported with COVID-19 infection. The smell-related issues associated with COVID-19 may be prolonged, even after the respiratory symptoms are resolved. These smell dysfunctions can range from anosmia (complete loss of smell) or hyposmia (reduced sense of smell) to parosmia (smells perceived differently) or phantosmia (smells perceived without an odor source being present). Similar to the difficulty that people experience when talking about their smell experiences, patients find it difficult to express or label the symptoms they experience, thereby complicating diagnosis. The complexity of these symptoms can be an additional burden for patients and health care providers and thus needs further investigation. OBJECTIVE: This study aims to explore the smell disorder concerns of patients and to provide an overview for each specific smell disorder by using the longitudinal survey conducted in 2020 by the Global Consortium for Chemosensory Research, an international research group that has been created ad hoc for studying chemosensory dysfunctions. We aimed to extend the existing knowledge on smell disorders related to COVID-19 by analyzing a large data set of self-reported descriptive comments by using methods from natural language processing. METHODS: We included self-reported data on the description of changes in smell provided by 1560 participants at 2 timepoints (second survey completed between 23 and 291 days). Text data from participants who still had smell disorders at the second timepoint (long-haulers) were compared with the text data of those who did not (non-long-haulers). Specifically, 3 aims were pursued in this study. The first aim was to classify smell disorders based on the participants' self-reports. The second aim was to classify the sentiment of each self-report by using a machine learning approach, and the third aim was to find particular food and nonfood keywords that were more salient among long-haulers than those among non-long-haulers. RESULTS: We found that parosmia (odds ratio [OR] 1.78, 95% CI 1.35-2.37; P<.001) as well as hyposmia (OR 1.74, 95% CI 1.34-2.26; P<.001) were more frequently reported in long-haulers than in non-long-haulers. Furthermore, a significant relationship was found between long-hauler status and sentiment of self-report (P<.001). Finally, we found specific keywords that were more typical for long-haulers than those for non-long-haulers, for example, fire, gas, wine, and vinegar. CONCLUSIONS: Our work shows consistent findings with those of previous studies, which indicate that self-reports, which can easily be extracted online, may offer valuable information to health care and understanding of smell disorders. At the same time, our study on self-reports provides new insights for future studies investigating smell disorders.


COVID-19 , Natural Language Processing , Olfaction Disorders , Self Report , Humans , COVID-19/complications , COVID-19/epidemiology , Olfaction Disorders/epidemiology , Olfaction Disorders/etiology , Cross-Sectional Studies , Male , Female , Longitudinal Studies , Middle Aged , Adult , Aged , Young Adult
6.
J Med Internet Res ; 26: e52399, 2024 05 13.
Article En | MEDLINE | ID: mdl-38739445

BACKGROUND: A large language model (LLM) is a machine learning model inferred from text data that captures subtle patterns of language use in context. Modern LLMs are based on neural network architectures that incorporate transformer methods. They allow the model to relate words together through attention to multiple words in a text sequence. LLMs have been shown to be highly effective for a range of tasks in natural language processing (NLP), including classification and information extraction tasks and generative applications. OBJECTIVE: The aim of this adapted Delphi study was to collect researchers' opinions on how LLMs might influence health care and on the strengths, weaknesses, opportunities, and threats of LLM use in health care. METHODS: We invited researchers in the fields of health informatics, nursing informatics, and medical NLP to share their opinions on LLM use in health care. We started the first round with open questions based on our strengths, weaknesses, opportunities, and threats framework. In the second and third round, the participants scored these items. RESULTS: The first, second, and third rounds had 28, 23, and 21 participants, respectively. Almost all participants (26/28, 93% in round 1 and 20/21, 95% in round 3) were affiliated with academic institutions. Agreement was reached on 103 items related to use cases, benefits, risks, reliability, adoption aspects, and the future of LLMs in health care. Participants offered several use cases, including supporting clinical tasks, documentation tasks, and medical research and education, and agreed that LLM-based systems will act as health assistants for patient education. The agreed-upon benefits included increased efficiency in data handling and extraction, improved automation of processes, improved quality of health care services and overall health outcomes, provision of personalized care, accelerated diagnosis and treatment processes, and improved interaction between patients and health care professionals. In total, 5 risks to health care in general were identified: cybersecurity breaches, the potential for patient misinformation, ethical concerns, the likelihood of biased decision-making, and the risk associated with inaccurate communication. Overconfidence in LLM-based systems was recognized as a risk to the medical profession. The 6 agreed-upon privacy risks included the use of unregulated cloud services that compromise data security, exposure of sensitive patient data, breaches of confidentiality, fraudulent use of information, vulnerabilities in data storage and communication, and inappropriate access or use of patient data. CONCLUSIONS: Future research related to LLMs should not only focus on testing their possibilities for NLP-related tasks but also consider the workflows the models could contribute to and the requirements regarding quality, integration, and regulations needed for successful implementation in practice.


Delphi Technique , Natural Language Processing , Humans , Machine Learning , Delivery of Health Care/methods , Medical Informatics/methods
7.
Clin Imaging ; 110: 110164, 2024 Jun.
Article En | MEDLINE | ID: mdl-38691911

Natural Language Processing (NLP), a form of Artificial Intelligence, allows free-text based clinical documentation to be integrated in ways that facilitate data analysis, data interpretation and formation of individualized medical and obstetrical care. In this cross-sectional study, we identified all births during the study period carrying the radiology-confirmed diagnosis of fibroid uterus in pregnancy (defined as size of largest diameter of >5 cm) by using an NLP platform and compared it to non-NLP derived data using ICD10 codes of the same diagnosis. We then compared the two sets of data and stratified documentation gaps by race. Using fibroid uterus in pregnancy as a marker, we found that Black patients were more likely to have the diagnosis entered late into the patient's chart or had missing documentation of the diagnosis. With appropriate algorithm definitions, cross referencing and thorough validation steps, NLP can contribute to identifying areas of documentation gaps and improve quality of care.


Documentation , Natural Language Processing , Uterine Neoplasms , Humans , Female , Pregnancy , Cross-Sectional Studies , Documentation/standards , Documentation/statistics & numerical data , Uterine Neoplasms/diagnostic imaging , Racism , Leiomyoma/diagnostic imaging , Adult , Obstetrics , Pregnancy Complications, Neoplastic/diagnostic imaging
8.
BMC Med Res Methodol ; 24(1): 114, 2024 May 17.
Article En | MEDLINE | ID: mdl-38760718

BACKGROUND: Smoking is a critical risk factor responsible for over eight million annual deaths worldwide. It is essential to obtain information on smoking habits to advance research and implement preventive measures such as screening of high-risk individuals. In most countries, including Denmark, smoking habits are not systematically recorded and at best documented within unstructured free-text segments of electronic health records (EHRs). This would require researchers and clinicians to manually navigate through extensive amounts of unstructured data, which is one of the main reasons that smoking habits are rarely integrated into larger studies. Our aim is to develop machine learning models to classify patients' smoking status from their EHRs. METHODS: This study proposes an efficient natural language processing (NLP) pipeline capable of classifying patients' smoking status and providing explanations for the decisions. The proposed NLP pipeline comprises four distinct components, which are; (1) considering preprocessing techniques to address abbreviations, punctuation, and other textual irregularities, (2) four cutting-edge feature extraction techniques, i.e. Embedding, BERT, Word2Vec, and Count Vectorizer, employed to extract the optimal features, (3) utilization of a Stacking-based Ensemble (SE) model and a Convolutional Long Short-Term Memory Neural Network (CNN-LSTM) for the identification of smoking status, and (4) application of a local interpretable model-agnostic explanation to explain the decisions rendered by the detection models. The EHRs of 23,132 patients with suspected lung cancer were collected from the Region of Southern Denmark during the period 1/1/2009-31/12/2018. A medical professional annotated the data into 'Smoker' and 'Non-Smoker' with further classifications as 'Active-Smoker', 'Former-Smoker', and 'Never-Smoker'. Subsequently, the annotated dataset was used for the development of binary and multiclass classification models. An extensive comparison was conducted of the detection performance across various model architectures. RESULTS: The results of experimental validation confirm the consistency among the models. However, for binary classification, BERT method with CNN-LSTM architecture outperformed other models by achieving precision, recall, and F1-scores between 97% and 99% for both Never-Smokers and Active-Smokers. In multiclass classification, the Embedding technique with CNN-LSTM architecture yielded the most favorable results in class-specific evaluations, with equal performance measures of 97% for Never-Smoker and measures in the range of 86 to 89% for Active-Smoker and 91-92% for Never-Smoker. CONCLUSION: Our proposed NLP pipeline achieved a high level of classification performance. In addition, we presented the explanation of the decision made by the best performing detection model. Future work will expand the model's capabilities to analyze longer notes and a broader range of categories to maximize its utility in further research and screening applications.


Electronic Health Records , Natural Language Processing , Smoking , Humans , Denmark/epidemiology , Electronic Health Records/statistics & numerical data , Smoking/epidemiology , Machine Learning , Female , Male , Middle Aged , Neural Networks, Computer
9.
PLoS One ; 19(5): e0301682, 2024.
Article En | MEDLINE | ID: mdl-38768143

AIMS: Alcohol cravings are considered a major factor in relapse among individuals with alcohol use disorder (AUD). This study aims to investigate the frequency and triggers of cravings in the daily lives of people with alcohol-related issues. Large amounts of data are analyzed with Artificial Intelligence (AI) methods to identify possible groupings and patterns. METHODS: For the analysis, posts from the online forum "stopdrinking" on the Reddit platform were used as the dataset from April 2017 to April 2022. The posts were filtered for craving content and processed using the word2vec method to map them into a multi-dimensional vector space. Statistical analyses were conducted to calculate the nature and frequency of craving contexts and triggers (location, time, social environment, and emotions) using word similarity scores. Additionally, the themes of the craving-related posts were semantically grouped using a Latent Dirichlet Allocation (LDA) topic model. The accuracy of the results was evaluated using two manually created test datasets. RESULTS: Approximately 16% of the forum posts discuss cravings. The number of craving-related posts decreases exponentially with the number of days since the author's last alcoholic drink. The topic model confirms that the majority of posts involve individual factors and triggers of cravings. The context analysis aligns with previous craving trigger findings related to the social environment, locations and emotions. Strong semantic craving similarities were found for the emotions boredom, stress and the location airport. The results for each method were successfully validated on test datasets. CONCLUSIONS: This exploratory approach is the first to analyze alcohol cravings in the daily lives of over 24,000 individuals, providing a foundation for further AI-based craving analyses. The analysis confirms commonly known craving triggers and even discovers new important craving contexts.


Behavior, Addictive , Craving , Natural Language Processing , Humans , Craving/physiology , Behavior, Addictive/psychology , Alcoholism/psychology , Emotions/physiology , Artificial Intelligence , Social Media
10.
Int J Public Health ; 69: 1606855, 2024.
Article En | MEDLINE | ID: mdl-38770181

Objectives: Suicide risk is elevated in lesbian, gay, bisexual, and transgender (LGBT) individuals. Limited data on LGBT status in healthcare systems hinder our understanding of this risk. This study used natural language processing to extract LGBT status and a deep neural network (DNN) to examine suicidal death risk factors among US Veterans. Methods: Data on 8.8 million veterans with visits between 2010 and 2017 was used. A case-control study was performed, and suicide death risk was analyzed by a DNN. Feature impacts and interactions on the outcome were evaluated. Results: The crude suicide mortality rate was higher in LGBT patients. However, after adjusting for over 200 risk and protective factors, known LGBT status was associated with reduced risk compared to LGBT-Unknown status. Among LGBT patients, black, female, married, and older Veterans have a higher risk, while Veterans of various religions have a lower risk. Conclusion: Our results suggest that disclosed LGBT status is not directly associated with an increase suicide death risk, however, other factors (e.g., depression and anxiety caused by stigma) are associated with suicide death risks.


Artificial Intelligence , Sexual and Gender Minorities , Suicide , Veterans , Humans , Male , Female , Sexual and Gender Minorities/statistics & numerical data , Sexual and Gender Minorities/psychology , Middle Aged , Case-Control Studies , Suicide/statistics & numerical data , Veterans/psychology , Veterans/statistics & numerical data , United States/epidemiology , Adult , Risk Factors , Aged , Natural Language Processing
11.
PLoS One ; 19(5): e0295248, 2024.
Article En | MEDLINE | ID: mdl-38771789

In the dynamic domain of logistics, effective communication is essential for streamlined operations. Our innovative solution, the Multi-Labeling Ensemble (MLEn), tackles the intricate task of extracting multi-labeled data, employing advanced techniques for accurate preprocessing of textual data through the NLTK toolkit. This approach is carefully tailored to the prevailing language used in logistics communication. MLEn utilizes innovative methods, including sentiment intensity analysis, Word2Vec, and Doc2Vec, ensuring comprehensive feature extraction. This proves particularly suitable for logistics in e-commerce, capturing nuanced communication essential for efficient operations. Ethical considerations are a cornerstone in logistics communication, and MLEn plays a pivotal role in detecting and categorizing inappropriate language, aligning inherently with ethical norms. Leveraging Tf-IDF and Vader for feature enhancement, MLEn adeptly discerns and labels ethically sensitive content in logistics communication. Across diverse datasets, including Emotions, MLEn consistently achieves impressive accuracy levels ranging from 92% to 97%, establishing its superiority in the logistics context. Particularly, our proposed method, DenseNet-EHO, outperforms BERT by 8% and surpasses other techniques by a 15-25% efficiency. A comprehensive analysis, considering metrics such as precision, recall, F1-score, Ranking Loss, Jaccard Similarity, AUC-ROC, sensitivity, and time complexity, underscores DenseNet-EHO's efficiency, aligning with the practical demands within the logistics track. Our research significantly contributes to enhancing precision, diversity, and computational efficiency in aspect-based sentiment analysis within logistics. By integrating cutting-edge preprocessing, sentiment intensity analysis, and vectorization, MLEn emerges as a robust framework for multi-label datasets, consistently outperforming conventional approaches and giving outstanding precision, accuracy, and efficiency in the logistics field.


Natural Language Processing , Humans , Communication , Machine Learning , Emotions , Algorithms
12.
PLoS One ; 19(5): e0303231, 2024.
Article En | MEDLINE | ID: mdl-38771886

Extracting biological interactions from published literature helps us understand complex biological systems, accelerate research, and support decision-making in drug or treatment development. Despite efforts to automate the extraction of biological relations using text mining tools and machine learning pipelines, manual curation continues to serve as the gold standard. However, the rapidly increasing volume of literature pertaining to biological relations poses challenges in its manual curation and refinement. These challenges are further compounded because only a small fraction of the published literature is relevant to biological relation extraction, and the embedded sentences of relevant sections have complex structures, which can lead to incorrect inference of relationships. To overcome these challenges, we propose GIX, an automated and robust Gene Interaction Extraction framework, based on pre-trained Large Language models fine-tuned through extensive evaluations on various gene/protein interaction corpora including LLL and RegulonDB. GIX identifies relevant publications with minimal keywords, optimises sentence selection to reduce computational overhead, simplifies sentence structure while preserving meaning, and provides a confidence factor indicating the reliability of extracted relations. GIX's Stage-2 relation extraction method performed well on benchmark protein/gene interaction datasets, assessed using 10-fold cross-validation, surpassing state-of-the-art approaches. We demonstrated that the proposed method, although fully automated, performs as well as manual relation extraction, with enhanced robustness. We also observed GIX's capability to augment existing datasets with new sentences, incorporating newly discovered biological terms and processes. Further, we demonstrated GIX's real-world applicability in inferring E. coli gene circuits.


Data Mining , Data Mining/methods , Natural Language Processing , Machine Learning , Computational Biology/methods , Humans , Algorithms
13.
JMIR Ment Health ; 11: e53730, 2024 May 02.
Article En | MEDLINE | ID: mdl-38722220

Background: There is growing concern around the use of sodium nitrite (SN) as an emerging means of suicide, particularly among younger people. Given the limited information on the topic from traditional public health surveillance sources, we studied posts made to an online suicide discussion forum, "Sanctioned Suicide," which is a primary source of information on the use and procurement of SN. Objective: This study aims to determine the trends in SN purchase and use, as obtained via data mining from subscriber posts on the forum. We also aim to determine the substances and topics commonly co-occurring with SN, as well as the geographical distribution of users and sources of SN. Methods: We collected all publicly available from the site's inception in March 2018 to October 2022. Using data-driven methods, including natural language processing and machine learning, we analyzed the trends in SN mentions over time, including the locations of SN consumers and the sources from which SN is procured. We developed a transformer-based source and location classifier to determine the geographical distribution of the sources of SN. Results: Posts pertaining to SN show a rise in popularity, and there were statistically significant correlations between real-life use of SN and suicidal intent when compared to data from the Centers for Disease Control and Prevention (CDC) Wide-Ranging Online Data for Epidemiologic Research (⍴=0.727; P<.001) and the National Poison Data System (⍴=0.866; P=.001). We observed frequent co-mentions of antiemetics, benzodiazepines, and acid regulators with SN. Our proposed machine learning-based source and location classifier can detect potential sources of SN with an accuracy of 72.92% and showed consumption in the United States and elsewhere. Conclusions: Vital information about SN and other emerging mechanisms of suicide can be obtained from online forums.


Natural Language Processing , Self-Injurious Behavior , Sodium Nitrite , Humans , Self-Injurious Behavior/epidemiology , Suicide/trends , Suicide/psychology , Adult , Internet , Male , Female , Social Media , Young Adult
14.
PLoS One ; 19(5): e0303519, 2024.
Article En | MEDLINE | ID: mdl-38723044

OBJECTIVE: To establish whether or not a natural language processing technique could identify two common inpatient neurosurgical comorbidities using only text reports of inpatient head imaging. MATERIALS AND METHODS: A training and testing dataset of reports of 979 CT or MRI scans of the brain for patients admitted to the neurosurgery service of a single hospital in June 2021 or to the Emergency Department between July 1-8, 2021, was identified. A variety of machine learning and deep learning algorithms utilizing natural language processing were trained on the training set (84% of the total cohort) and tested on the remaining images. A subset comparison cohort (n = 76) was then assessed to compare output of the best algorithm against real-life inpatient documentation. RESULTS: For "brain compression", a random forest classifier outperformed other candidate algorithms with an accuracy of 0.81 and area under the curve of 0.90 in the testing dataset. For "brain edema", a random forest classifier again outperformed other candidate algorithms with an accuracy of 0.92 and AUC of 0.94 in the testing dataset. In the provider comparison dataset, for "brain compression," the random forest algorithm demonstrated better accuracy (0.76 vs 0.70) and sensitivity (0.73 vs 0.43) than provider documentation. For "brain edema," the algorithm again demonstrated better accuracy (0.92 vs 0.84) and AUC (0.45 vs 0.09) than provider documentation. DISCUSSION: A natural language processing-based machine learning algorithm can reliably and reproducibly identify selected common neurosurgical comorbidities from radiology reports. CONCLUSION: This result may justify the use of machine learning-based decision support to augment provider documentation.


Comorbidity , Natural Language Processing , Humans , Algorithms , Inpatients/statistics & numerical data , Female , Male , Machine Learning , Magnetic Resonance Imaging/methods , Documentation , Middle Aged , Tomography, X-Ray Computed , Neurosurgical Procedures , Aged , Deep Learning
15.
J Orthop Surg Res ; 19(1): 287, 2024 May 10.
Article En | MEDLINE | ID: mdl-38725085

BACKGROUND: The Center for Medicare and Medicaid Services (CMS) imposes payment penalties for readmissions following total joint replacement surgeries. This study focuses on total hip, knee, and shoulder arthroplasty procedures as they account for most joint replacement surgeries. Apart from being a burden to healthcare systems, readmissions are also troublesome for patients. There are several studies which only utilized structured data from Electronic Health Records (EHR) without considering any gender and payor bias adjustments. METHODS: For this study, dataset of 38,581 total knee, hip, and shoulder replacement surgeries performed from 2015 to 2021 at Novant Health was gathered. This data was used to train a random forest machine learning model to predict the combined endpoint of emergency department (ED) visit or unplanned readmissions within 30 days of discharge or discharge to Skilled Nursing Facility (SNF) following the surgery. 98 features of laboratory results, diagnoses, vitals, medications, and utilization history were extracted. A natural language processing (NLP) model finetuned from Clinical BERT was used to generate an NLP risk score feature for each patient based on their clinical notes. To address societal biases, a feature bias analysis was performed in conjunction with propensity score matching. A threshold optimization algorithm from the Fairlearn toolkit was used to mitigate gender and payor biases to promote fairness in predictions. RESULTS: The model achieved an Area Under the Receiver Operating characteristic Curve (AUROC) of 0.738 (95% confidence interval, 0.724 to 0.754) and an Area Under the Precision-Recall Curve (AUPRC) of 0.406 (95% confidence interval, 0.384 to 0.433). Considering an outcome prevalence of 16%, these metrics indicate the model's ability to accurately discriminate between readmission and non-readmission cases within the context of total arthroplasty surgeries while adjusting patient scores in the model to mitigate bias based on patient gender and payor. CONCLUSION: This work culminated in a model that identifies the most predictive and protective features associated with the combined endpoint. This model serves as a tool to empower healthcare providers to proactively intervene based on these influential factors without introducing bias towards protected patient classes, effectively mitigating the risk of negative outcomes and ultimately improving quality of care regardless of socioeconomic factors.


Cost-Benefit Analysis , Machine Learning , Patient Readmission , Humans , Patient Readmission/economics , Patient Readmission/statistics & numerical data , Female , Male , Aged , Natural Language Processing , Middle Aged , Arthroplasty, Replacement, Knee/economics , Arthroplasty, Replacement, Hip/economics , Arthroplasty, Replacement/economics , Arthroplasty, Replacement/adverse effects , Risk Assessment/methods , Preoperative Period , Aged, 80 and over , Quality Improvement , Random Forest
16.
J Med Internet Res ; 26: e53968, 2024 May 20.
Article En | MEDLINE | ID: mdl-38767953

BACKGROUND: In 2023, the United States experienced its highest- recorded number of suicides, exceeding 50,000 deaths. In the realm of psychiatric disorders, major depressive disorder stands out as the most common issue, affecting 15% to 17% of the population and carrying a notable suicide risk of approximately 15%. However, not everyone with depression has suicidal thoughts. While "suicidal depression" is not a clinical diagnosis, it may be observed in daily life, emphasizing the need for awareness. OBJECTIVE: This study aims to examine the dynamics, emotional tones, and topics discussed in posts within the r/Depression subreddit, with a specific focus on users who had also engaged in the r/SuicideWatch community. The objective was to use natural language processing techniques and models to better understand the complexities of depression among users with potential suicide ideation, with the goal of improving intervention and prevention strategies for suicide. METHODS: Archived posts were extracted from the r/Depression and r/SuicideWatch Reddit communities in English spanning from 2019 to 2022, resulting in a final data set of over 150,000 posts contributed by approximately 25,000 unique overlapping users. A broad and comprehensive mix of methods was conducted on these posts, including trend and survival analysis, to explore the dynamic of users in the 2 subreddits. The BERT family of models extracted features from data for sentiment and thematic analysis. RESULTS: On August 16, 2020, the post count in r/SuicideWatch surpassed that of r/Depression. The transition from r/Depression to r/SuicideWatch in 2020 was the shortest, lasting only 26 days. Sadness emerged as the most prevalent emotion among overlapping users in the r/Depression community. In addition, physical activity changes, negative self-view, and suicidal thoughts were identified as the most common depression symptoms, all showing strong positive correlations with the emotion tone of disappointment. Furthermore, the topic "struggles with depression and motivation in school and work" (12%) emerged as the most discussed topic aside from suicidal thoughts, categorizing users based on their inclination toward suicide ideation. CONCLUSIONS: Our study underscores the effectiveness of using natural language processing techniques to explore language markers and patterns associated with mental health challenges in online communities like r/Depression and r/SuicideWatch. These insights offer novel perspectives distinct from previous research. In the future, there will be potential for further refinement and optimization of machine classifications using these techniques, which could lead to more effective intervention and prevention strategies.


COVID-19 , Suicidal Ideation , Humans , COVID-19/psychology , COVID-19/epidemiology , Natural Language Processing , Depression/psychology , Pandemics , United States , Social Media , Suicide/psychology , Suicide/statistics & numerical data , Depressive Disorder, Major/psychology , SARS-CoV-2
17.
JMIR Ment Health ; 11: e57234, 2024 May 16.
Article En | MEDLINE | ID: mdl-38771256

Background: Rates of suicide have increased by over 35% since 1999. Despite concerted efforts, our ability to predict, explain, or treat suicide risk has not significantly improved over the past 50 years. Objective: The aim of this study was to use large language models to understand natural language use during public web-based discussions (on Reddit) around topics related to suicidality. Methods: We used large language model-based sentence embedding to extract the latent linguistic dimensions of user postings derived from several mental health-related subreddits, with a focus on suicidality. We then applied dimensionality reduction to these sentence embeddings, allowing them to be summarized and visualized in a lower-dimensional Euclidean space for further downstream analyses. We analyzed 2.9 million posts extracted from 30 subreddits, including r/SuicideWatch, between October 1 and December 31, 2022, and the same period in 2010. Results: Our results showed that, in line with existing theories of suicide, posters in the suicidality community (r/SuicideWatch) predominantly wrote about feelings of disconnection, burdensomeness, hopeless, desperation, resignation, and trauma. Further, we identified distinct latent linguistic dimensions (well-being, seeking support, and severity of distress) among all mental health subreddits, and many of the resulting subreddit clusters were in line with a statistically driven diagnostic classification system-namely, the Hierarchical Taxonomy of Psychopathology (HiTOP)-by mapping onto the proposed superspectra. Conclusions: Overall, our findings provide data-driven support for several language-based theories of suicide, as well as dimensional classification systems for mental health disorders. Ultimately, this novel combination of natural language processing techniques can assist researchers in gaining deeper insights about emotions and experiences shared on the web and may aid in the validation and refutation of different mental health theories.


Linguistics , Mental Disorders , Social Media , Suicide , Humans , Social Media/statistics & numerical data , Suicide/psychology , Mental Disorders/psychology , Mental Disorders/epidemiology , Mental Disorders/classification , Natural Language Processing
18.
Front Public Health ; 12: 1392180, 2024.
Article En | MEDLINE | ID: mdl-38716250

Introduction: Social media platforms serve as a valuable resource for users to share health-related information, aiding in the monitoring of adverse events linked to medications and treatments in drug safety surveillance. However, extracting drug-related adverse events accurately and efficiently from social media poses challenges in both natural language processing research and the pharmacovigilance domain. Method: Recognizing the lack of detailed implementation and evaluation of Bidirectional Encoder Representations from Transformers (BERT)-based models for drug adverse event extraction on social media, we developed a BERT-based language model tailored to identifying drug adverse events in this context. Our model utilized publicly available labeled adverse event data from the ADE-Corpus-V2. Constructing the BERT-based model involved optimizing key hyperparameters, such as the number of training epochs, batch size, and learning rate. Through ten hold-out evaluations on ADE-Corpus-V2 data and external social media datasets, our model consistently demonstrated high accuracy in drug adverse event detection. Result: The hold-out evaluations resulted in average F1 scores of 0.8575, 0.9049, and 0.9813 for detecting words of adverse events, words in adverse events, and words not in adverse events, respectively. External validation using human-labeled adverse event tweets data from SMM4H further substantiated the effectiveness of our model, yielding F1 scores 0.8127, 0.8068, and 0.9790 for detecting words of adverse events, words in adverse events, and words not in adverse events, respectively. Discussion: This study not only showcases the effectiveness of BERT-based language models in accurately identifying drug-related adverse events in the dynamic landscape of social media data, but also addresses the need for the implementation of a comprehensive study design and evaluation. By doing so, we contribute to the advancement of pharmacovigilance practices and methodologies in the context of emerging information sources like social media.


Drug-Related Side Effects and Adverse Reactions , Natural Language Processing , Pharmacovigilance , Social Media , Humans , Adverse Drug Reaction Reporting Systems
19.
JCO Clin Cancer Inform ; 8: e2400051, 2024 May.
Article En | MEDLINE | ID: mdl-38713889

This new editorial discusses the promise and challenges of successful integration of natural language processing methods into electronic health records for timely, robust, and fair oncology pharmacovigilance.


Artificial Intelligence , Electronic Health Records , Medical Oncology , Natural Language Processing , Pharmacovigilance , Humans , Medical Oncology/methods , Data Collection/methods , Neoplasms/drug therapy , Adverse Drug Reaction Reporting Systems
20.
J Med Internet Res ; 26: e52499, 2024 May 02.
Article En | MEDLINE | ID: mdl-38696245

This study explores the potential of using large language models to assist content analysis by conducting a case study to identify adverse events (AEs) in social media posts. The case study compares ChatGPT's performance with human annotators' in detecting AEs associated with delta-8-tetrahydrocannabinol, a cannabis-derived product. Using the identical instructions given to human annotators, ChatGPT closely approximated human results, with a high degree of agreement noted: 94.4% (9436/10,000) for any AE detection (Fleiss κ=0.95) and 99.3% (9931/10,000) for serious AEs (κ=0.96). These findings suggest that ChatGPT has the potential to replicate human annotation accurately and efficiently. The study recognizes possible limitations, including concerns about the generalizability due to ChatGPT's training data, and prompts further research with different models, data sources, and content analysis tasks. The study highlights the promise of large language models for enhancing the efficiency of biomedical research.


Social Media , Humans , Social Media/statistics & numerical data , Dronabinol/adverse effects , Natural Language Processing
...