Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
1.
Stud Health Technol Inform ; 310: 619-623, 2024 Jan 25.
Article in English | MEDLINE | ID: mdl-38269883

ABSTRACT

According to the World Stroke Organization, 12.2 million people world-wide will have their first stroke this year almost half of which will die as a result. Natural Language Processing (NLP) may improve stroke phenotyping; however, existing rule-based classifiers are rigid, resulting in inadequate performance. We report findings from a pilot study using NLP to improve relation detection for stroke assertion detection to support research studies and healthcare operations.


Subject(s)
Natural Language Processing , Stroke , Humans , Pilot Projects , Stroke/diagnosis
2.
J Am Med Dir Assoc ; 25(1): 69-83, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37838000

ABSTRACT

OBJECTIVES: To determine the scope of the application of natural language processing to free-text clinical notes in post-acute care and provide a foundation for future natural language processing-based research in these settings. DESIGN: Scoping review; reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews guidelines. SETTING AND PARTICIPANTS: Post-acute care (ie, home health care, long-term care, skilled nursing facilities, and inpatient rehabilitation facilities). METHODS: PubMed, Cumulative Index of Nursing and Allied Health Literature, and Embase were searched in February 2023. Eligible studies had quantitative designs that used natural language processing applied to clinical documentation in post-acute care settings. The quality of each study was appraised. RESULTS: Twenty-one studies were included. Almost all studies were conducted in home health care settings. Most studies extracted data from electronic health records to examine the risk for negative outcomes, including acute care utilization, medication errors, and suicide mortality. About half of the studies did not report age, sex, race, or ethnicity data or use standardized terminologies. Only 8 studies included variables from socio-behavioral domains. Most studies fulfilled all quality appraisal indicators. CONCLUSIONS AND IMPLICATIONS: The application of natural language processing is nascent in post-acute care settings. Future research should apply natural language processing using standardized terminologies to leverage free-text clinical notes in post-acute care to promote timely, comprehensive, and equitable care. Natural language processing could be integrated with predictive models to help identify patients who are at risk of negative outcomes. Future research should incorporate socio-behavioral determinants and diverse samples to improve health equity in informatics tools.


Subject(s)
Natural Language Processing , Subacute Care , Humans , Documentation
3.
Matern Child Health J ; 28(3): 578-586, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38147277

ABSTRACT

INTRODUCTION: Stigma and bias related to race and other minoritized statuses may underlie disparities in pregnancy and birth outcomes. One emerging method to identify bias is the study of stigmatizing language in the electronic health record. The objective of our study was to develop automated natural language processing (NLP) methods to identify two types of stigmatizing language: marginalizing language and its complement, power/privilege language, accurately and automatically in labor and birth notes. METHODS: We analyzed notes for all birthing people > 20 weeks' gestation admitted for labor and birth at two hospitals during 2017. We then employed text preprocessing techniques, specifically using TF-IDF values as inputs, and tested machine learning classification algorithms to identify stigmatizing and power/privilege language in clinical notes. The algorithms assessed included Decision Trees, Random Forest, and Support Vector Machines. Additionally, we applied a feature importance evaluation method (InfoGain) to discern words that are highly correlated with these language categories. RESULTS: For marginalizing language, Decision Trees yielded the best classification with an F-score of 0.73. For power/privilege language, Support Vector Machines performed optimally, achieving an F-score of 0.91. These results demonstrate the effectiveness of the selected machine learning methods in classifying language categories in clinical notes. CONCLUSION: We identified well-performing machine learning methods to automatically detect stigmatizing language in clinical notes. To our knowledge, this is the first study to use NLP performance metrics to evaluate the performance of machine learning methods in discerning stigmatizing language. Future studies should delve deeper into refining and evaluating NLP methods, incorporating the latest algorithms rooted in deep learning.


Subject(s)
Algorithms , Natural Language Processing , Female , Humans , Electronic Health Records , Machine Learning , Language
4.
Front Artif Intell ; 6: 1229609, 2023.
Article in English | MEDLINE | ID: mdl-37693012

ABSTRACT

Purpose: Between 30 and 68% of patients prematurely discontinue their antidepressant treatment, posing significant risks to patient safety and healthcare outcomes. Online healthcare forums have the potential to offer a rich and unique source of data, revealing dimensions of antidepressant discontinuation that may not be captured by conventional data sources. Methods: We analyzed 891 patient narratives from the online healthcare forum, "askapatient.com," utilizing content analysis to create PsyRisk-a corpus highlighting the risk factors associated with antidepressant discontinuation. Leveraging PsyRisk, alongside PsyTAR [a publicly available corpus of adverse drug reactions (ADRs) related to antidepressants], we developed a machine learning-driven algorithm for proactive identification of patients at risk of abrupt antidepressant discontinuation. Results: From the analyzed 891 patients, 232 reported antidepressant discontinuation. Among these patients, 92% experienced ADRs, and 72% found these reactions distressful, negatively affecting their daily activities. Approximately 26% of patients perceived the antidepressants as ineffective. Most reported ADRs were physiological (61%, 411/673), followed by cognitive (30%, 197/673), and psychological (28%, 188/673) ADRs. In our study, we employed a nested cross-validation strategy with an outer 5-fold cross-validation for model selection, and an inner 5-fold cross-validation for hyperparameter tuning. The performance of our risk identification algorithm, as assessed through this robust validation technique, yielded an AUC-ROC of 90.77 and an F1-score of 83.33. The most significant contributors to abrupt discontinuation were high perceived distress from ADRs and perceived ineffectiveness of the antidepressants. Conclusion: The risk factors identified and the risk identification algorithm developed in this study have substantial potential for clinical application. They could assist healthcare professionals in identifying and managing patients with depression who are at risk of prematurely discontinuing their antidepressant treatment.

5.
J Am Med Dir Assoc ; 24(12): 1874-1880.e4, 2023 12.
Article in English | MEDLINE | ID: mdl-37553081

ABSTRACT

OBJECTIVE: This study aimed to develop a natural language processing (NLP) system that identified social risk factors in home health care (HHC) clinical notes and to examine the association between social risk factors and hospitalization or an emergency department (ED) visit. DESIGN: Retrospective cohort study. SETTING AND PARTICIPANTS: We used standardized assessments and clinical notes from one HHC agency located in the northeastern United States. This included 86,866 episodes of care for 65,593 unique patients. Patients received HHC services between 2015 and 2017. METHODS: Guided by HHC experts, we created a vocabulary of social risk factors that influence hospitalization or ED visit risk in the HHC setting. We then developed an NLP system to automatically identify social risk factors documented in clinical notes. We used an adjusted logistic regression model to examine the association between the NLP-based social risk factors and hospitalization or an ED visit. RESULTS: On the basis of expert consensus, the following social risk factors emerged: Social Environment, Physical Environment, Education and Literacy, Food Insecurity, Access to Care, and Housing and Economic Circumstances. Our NLP system performed "very good" with an F score of 0.91. Approximately 4% of clinical notes (33% episodes of care) documented a social risk factor. The most frequently documented social risk factors were Physical Environment and Social Environment. Except for Housing and Economic Circumstances, all NLP-based social risk factors were associated with higher odds of hospitalization and ED visits. CONCLUSIONS AND IMPLICATIONS: HHC clinicians assess and document social risk factors associated with hospitalizations and ED visits in their clinical notes. Future studies can explore the social risk factors documented in HHC to improve communication across the health care system and to predict patients at risk for being hospitalized or visiting the ED.


Subject(s)
Home Care Services , Natural Language Processing , Humans , Retrospective Studies , Hospitalization , Risk Factors
6.
J Am Med Inform Assoc ; 30(10): 1622-1633, 2023 09 25.
Article in English | MEDLINE | ID: mdl-37433577

ABSTRACT

OBJECTIVES: Little is known about proactive risk assessment concerning emergency department (ED) visits and hospitalizations in patients with heart failure (HF) who receive home healthcare (HHC) services. This study developed a time series risk model for predicting ED visits and hospitalizations in patients with HF using longitudinal electronic health record data. We also explored which data sources yield the best-performing models over various time windows. MATERIALS AND METHODS: We used data collected from 9362 patients from a large HHC agency. We iteratively developed risk models using both structured (eg, standard assessment tools, vital signs, visit characteristics) and unstructured data (eg, clinical notes). Seven specific sets of variables included: (1) the Outcome and Assessment Information Set, (2) vital signs, (3) visit characteristics, (4) rule-based natural language processing-derived variables, (5) term frequency-inverse document frequency variables, (6) Bio-Clinical Bidirectional Encoder Representations from Transformers variables, and (7) topic modeling. Risk models were developed for 18 time windows (1-15, 30, 45, and 60 days) before an ED visit or hospitalization. Risk prediction performances were compared using recall, precision, accuracy, F1, and area under the receiver operating curve (AUC). RESULTS: The best-performing model was built using a combination of all 7 sets of variables and the time window of 4 days before an ED visit or hospitalization (AUC = 0.89 and F1 = 0.69). DISCUSSION AND CONCLUSION: This prediction model suggests that HHC clinicians can identify patients with HF at risk for visiting the ED or hospitalization within 4 days before the event, allowing for earlier targeted interventions.


Subject(s)
Heart Failure , Hospitalization , Humans , Time Factors , Heart Failure/therapy , Emergency Service, Hospital , Delivery of Health Care
7.
J Am Med Inform Assoc ; 30(11): 1801-1810, 2023 10 19.
Article in English | MEDLINE | ID: mdl-37339524

ABSTRACT

OBJECTIVE: This study aimed to identify temporal risk factor patterns documented in home health care (HHC) clinical notes and examine their association with hospitalizations or emergency department (ED) visits. MATERIALS AND METHODS: Data for 73 350 episodes of care from one large HHC organization were analyzed using dynamic time warping and hierarchical clustering analysis to identify the temporal patterns of risk factors documented in clinical notes. The Omaha System nursing terminology represented risk factors. First, clinical characteristics were compared between clusters. Next, multivariate logistic regression was used to examine the association between clusters and risk for hospitalizations or ED visits. Omaha System domains corresponding to risk factors were analyzed and described in each cluster. RESULTS: Six temporal clusters emerged, showing different patterns in how risk factors were documented over time. Patients with a steep increase in documented risk factors over time had a 3 times higher likelihood of hospitalization or ED visit than patients with no documented risk factors. Most risk factors belonged to the physiological domain, and only a few were in the environmental domain. DISCUSSION: An analysis of risk factor trajectories reflects a patient's evolving health status during a HHC episode. Using standardized nursing terminology, this study provided new insights into the complex temporal dynamics of HHC, which may lead to improved patient outcomes through better treatment and management plans. CONCLUSION: Incorporating temporal patterns in documented risk factors and their clusters into early warning systems may activate interventions to prevent hospitalizations or ED visits in HHC.


Subject(s)
Home Care Services , Hospitalization , Humans , Risk Factors , Emergency Service, Hospital , Health Status
8.
JMIR Nurs ; 6: e42552, 2023 Apr 17.
Article in English | MEDLINE | ID: mdl-37067893

ABSTRACT

BACKGROUND: A clinician's biased behavior toward patients can affect the quality of care. Recent literature reviews report on widespread implicit biases among clinicians. Although emerging studies in hospital settings show racial biases in the language used in clinical documentation within electronic health records, no studies have yet investigated the extent of judgment language in home health care. OBJECTIVE: We aimed to examine racial differences in judgment language use and the relationship between judgment language use and the amount of time clinicians spent on home visits as a reflection of care quality in home health care. METHODS: This study is a retrospective observational cohort study. Study data were extracted from a large urban home health care organization in the Northeastern United States. Study data set included patients (N=45,384) who received home health care services between January 1 and December 31, 2019. The study applied a natural language processing algorithm to automatically detect the language of judgment in clinical notes. RESULTS: The use of judgment language was observed in 38% (n=17,141) of the patients. The highest use of judgment language was found in Hispanic (7,167/66,282, 10.8% of all clinical notes), followed by Black (7,010/65,628, 10.7%), White (10,206/107,626, 9.5%), and Asian (1,756/22,548, 7.8%) patients. Black and Hispanic patients were 14% more likely to have notes with judgment language than White patients. The length of a home health care visit was reduced by 21 minutes when judgment language was used. CONCLUSIONS: Racial differences were identified in judgment language use. When judgment language is used, clinicians spend less time at patients' homes. Because the language clinicians use in documentation is associated with the time spent providing care, further research is needed to study the impact of using judgment language on quality of home health care. Policy, education, and clinical practice improvements are needed to address the biases behind judgment language.

9.
J Adv Nurs ; 79(2): 593-604, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36414419

ABSTRACT

AIMS: To identify clusters of risk factors in home health care and determine if the clusters are associated with hospitalizations or emergency department visits. DESIGN: A retrospective cohort study. METHODS: This study included 61,454 patients pertaining to 79,079 episodes receiving home health care between 2015 and 2017 from one of the largest home health care organizations in the United States. Potential risk factors were extracted from structured data and unstructured clinical notes analysed by natural language processing. A K-means cluster analysis was conducted. Kaplan-Meier analysis was conducted to identify the association between clusters and hospitalizations or emergency department visits during home health care. RESULTS: A total of 11.6% of home health episodes resulted in hospitalizations or emergency department visits. Risk factors formed three clusters. Cluster 1 is characterized by a combination of risk factors related to "impaired physical comfort with pain," defined as situations where patients may experience increased pain. Cluster 2 is characterized by "high comorbidity burden" defined as multiple comorbidities or other risks for hospitalization (e.g., prior falls). Cluster 3 is characterized by "impaired cognitive/psychological and skin integrity" including dementia or skin ulcer. Compared to Cluster 1, the risk of hospitalizations or emergency department visits increased by 1.95 times for Cluster 2 and by 2.12 times for Cluster 3 (all p < .001). CONCLUSION: Risk factors were clustered into three types describing distinct characteristics for hospitalizations or emergency department visits. Different combinations of risk factors affected the likelihood of these negative outcomes. IMPACT: Cluster-based risk prediction models could be integrated into early warning systems to identify patients at risk for hospitalizations or emergency department visits leading to more timely, patient-centred care, ultimately preventing these events. PATIENT OR PUBLIC CONTRIBUTION: There was no involvement of patients in developing the research question, determining the outcome measures, or implementing the study.


Subject(s)
Home Care Services , Hospitalization , Humans , United States , Retrospective Studies , Risk Factors , Emergency Service, Hospital
10.
AMIA Jt Summits Transl Sci Proc ; 2022: 168-177, 2022.
Article in English | MEDLINE | ID: mdl-35854756

ABSTRACT

One core measure of healthcare quality set forth by the Institute of Medicine is whether care decisions match patient goals. High-quality "serious illness communication" about patient goals and prognosis is required to support patient-centered decision-making, however current methods are not sensitive enough to measure the quality of this communication or determine whether care delivered matches patient priorities. Natural language processing (NLP) offers an efficient method for identification and evaluation of documented serious illness communication, which could serve as the basis for future quality metrics in oncology and other forms of serious illness. In this study, we trained NLP algorithms to identify and characterize serious illness communication with oncology patients.

11.
J Med Internet Res ; 24(6): e36151, 2022 06 29.
Article in English | MEDLINE | ID: mdl-35767327

ABSTRACT

BACKGROUND: Free-text communication between patients and providers plays an increasing role in chronic disease management, through platforms varying from traditional health care portals to novel mobile messaging apps. These text data are rich resources for clinical purposes, but their sheer volume render them difficult to manage. Even automated approaches, such as natural language processing, require labor-intensive manual classification for developing training data sets. Automated approaches to organizing free-text data are necessary to facilitate use of free-text communication for clinical care. OBJECTIVE: The aim of this study was to apply unsupervised learning approaches to (1) understand the types of topics discussed and (2) learn medication-related intents from messages sent between patients and providers through a bidirectional text messaging system for managing participant blood pressure (BP). METHODS: This study was a secondary analysis of deidentified messages from a remote, mobile, text-based employee hypertension management program at an academic institution. We trained a latent Dirichlet allocation (LDA) model for each message type (ie, inbound patient messages and outbound provider messages) and identified the distribution of major topics and significant topics (probability >.20) across message types. Next, we annotated all medication-related messages with a single medication intent. Then, we trained a second medication-specific LDA (medLDA) model to assess how well the unsupervised method could identify more fine-grained medication intents. We encoded each medication message with n-grams (n=1-3 words) using spaCy, clinical named entities using Stanza, and medication categories using MedEx; we then applied chi-square feature selection to learn the most informative features associated with each medication intent. RESULTS: In total, 253 participants and 5 providers engaged in the program, generating 12,131 total messages: 46.90% (n=5689) patient messages and 53.10% (n=6442) provider messages. Most patient messages corresponded to BP reporting, BP encouragement, and appointment scheduling; most provider messages corresponded to BP reporting, medication adherence, and confirmatory statements. Most patient and provider messages contained 1 topic and few contained more than 3 topics identified using LDA. In total, 534 medication messages were annotated with a single medication intent. Of these, 282 (52.8%) were patient medication messages: most referred to the medication request intent (n=134, 47.5%). Most of the 252 (47.2%) provider medication messages referred to the medication question intent (n=173, 68.7%). Although the medLDA model could identify a majority intent within each topic, it could not distinguish medication intents with low prevalence within patient or provider messages. Richer feature engineering identified informative lexical-semantic patterns associated with each medication intent class. CONCLUSIONS: LDA can be an effective method for generating subgroups of messages with similar term usage and facilitating the review of topics to inform annotations. However, few training cases and shared vocabulary between intents precludes the use of LDA for fully automated, deep, medication intent classification. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): RR2-10.1101/2021.12.23.21268061.


Subject(s)
Hypertension , Text Messaging , Humans , Hypertension/drug therapy , Pilot Projects , Retrospective Studies , Unsupervised Machine Learning
12.
Front Public Health ; 10: 850619, 2022.
Article in English | MEDLINE | ID: mdl-35615042

ABSTRACT

Background: Opioid use disorder (OUD) is underdiagnosed in health system settings, limiting research on OUD using electronic health records (EHRs). Medical encounter notes can enrich structured EHR data with documented signs and symptoms of OUD and social risks and behaviors. To capture this information at scale, natural language processing (NLP) tools must be developed and evaluated. We developed and applied an annotation schema to deeply characterize OUD and related clinical, behavioral, and environmental factors, and automated the annotation schema using machine learning and deep learning-based approaches. Methods: Using the MIMIC-III Critical Care Database, we queried hospital discharge summaries of patients with International Classification of Diseases (ICD-9) OUD diagnostic codes. We developed an annotation schema to characterize problematic opioid use, identify individuals with potential OUD, and provide psychosocial context. Two annotators reviewed discharge summaries from 100 patients. We randomly sampled patients with their associated annotated sentences and divided them into training (66 patients; 2,127 annotated sentences) and testing (29 patients; 1,149 annotated sentences) sets. We used the training set to generate features, employing three NLP algorithms/knowledge sources. We trained and tested prediction models for classification with a traditional machine learner (logistic regression) and deep learning approach (Autogluon based on ELECTRA's replaced token detection model). We applied a five-fold cross-validation approach to reduce bias in performance estimates. Results: The resulting annotation schema contained 32 classes. We achieved moderate inter-annotator agreement, with F1-scores across all classes increasing from 48 to 66%. Five classes had a sufficient number of annotations for automation; of these, we observed consistently high performance (F1-scores) across training and testing sets for drug screening (training: 91-96; testing: 91-94) and opioid type (training: 86-96; testing: 86-99). Performance dropped from training and to testing sets for other drug use (training: 52-65; testing: 40-48), pain management (training: 72-78; testing: 61-78) and psychiatric (training: 73-80; testing: 72). Autogluon achieved the highest performance. Conclusion: This pilot study demonstrated that rich information regarding problematic opioid use can be manually identified by annotators. However, more training samples and features would improve our ability to reliably identify less common classes from clinical text, including text from outpatient settings.


Subject(s)
Natural Language Processing , Opioid-Related Disorders , Analgesics, Opioid , Hospitals , Humans , Patient Discharge , Pilot Projects
13.
AMIA Annu Symp Proc ; 2022: 606-615, 2022.
Article in English | MEDLINE | ID: mdl-37128417

ABSTRACT

Our objective was to detect common barriers to post-acute care (B2PAC) among hospitalized older adults using natural language processing (NLP) of clinical notes from patients discharged home when a clinical decision support system recommended post-acute care. We annotated B2PAC sentences from discharge planning notes and developed an NLP classifier to identify the highest-value B2PAC class (negative patient preferences). Thirteen machine learning models were compared with Amazon's AutoGluon deep learning model. The study included 594 acute care notes from 100 patient encounters (1156 sentences contained 11 B2PAC) in a large academic health system. The most frequent and modifiable B2PAC class was negative patient preferences (18.3%). The best supervised model was Extreme Gradient Boosting (F1: 0.859), but the deep learning model performed better (F1: 0.916). Alerting clinicians of negative patient preferences early in the hospitalization can prompt interventions such as patient education to ensure patients receive the right level of care and avoid negative outcomes.


Subject(s)
Natural Language Processing , Patient Preference , Humans , Aged , Subacute Care , Machine Learning , Referral and Consultation , Electronic Health Records
14.
JMIR Med Inform ; 9(2): e21679, 2021 Feb 22.
Article in English | MEDLINE | ID: mdl-33544689

ABSTRACT

BACKGROUND: Scientists are developing new computational methods and prediction models to better clinically understand COVID-19 prevalence, treatment efficacy, and patient outcomes. These efforts could be improved by leveraging documented COVID-19-related symptoms, findings, and disorders from clinical text sources in an electronic health record. Word embeddings can identify terms related to these clinical concepts from both the biomedical and nonbiomedical domains, and are being shared with the open-source community at large. However, it's unclear how useful openly available word embeddings are for developing lexicons for COVID-19-related concepts. OBJECTIVE: Given an initial lexicon of COVID-19-related terms, this study aims to characterize the returned terms by similarity across various open-source word embeddings and determine common semantic and syntactic patterns between the COVID-19 queried terms and returned terms specific to the word embedding source. METHODS: We compared seven openly available word embedding sources. Using a series of COVID-19-related terms for associated symptoms, findings, and disorders, we conducted an interannotator agreement study to determine how accurately the most similar returned terms could be classified according to semantic types by three annotators. We conducted a qualitative study of COVID-19 queried terms and their returned terms to detect informative patterns for constructing lexicons. We demonstrated the utility of applying such learned synonyms to discharge summaries by reporting the proportion of patients identified by concept among three patient cohorts: pneumonia (n=6410), acute respiratory distress syndrome (n=8647), and COVID-19 (n=2397). RESULTS: We observed high pairwise interannotator agreement (Cohen kappa) for symptoms (0.86-0.99), findings (0.93-0.99), and disorders (0.93-0.99). Word embedding sources generated based on characters tend to return more synonyms (mean count of 7.2 synonyms) compared to token-based embedding sources (mean counts range from 2.0 to 3.4). Word embedding sources queried using a qualifier term (eg, dry cough or muscle pain) more often returned qualifiers of the similar semantic type (eg, "dry" returns consistency qualifiers like "wet" and "runny") compared to a single term (eg, cough or pain) queries. A higher proportion of patients had documented fever (0.61-0.84), cough (0.41-0.55), shortness of breath (0.40-0.59), and hypoxia (0.51-0.56) retrieved than other clinical features. Terms for dry cough returned a higher proportion of patients with COVID-19 (0.07) than the pneumonia (0.05) and acute respiratory distress syndrome (0.03) populations. CONCLUSIONS: Word embeddings are valuable technology for learning related terms, including synonyms. When leveraging openly available word embedding sources, choices made for the construction of the word embeddings can significantly influence the words learned.

15.
BMC Med Inform Decis Mak ; 20(Suppl 11): 338, 2020 12 30.
Article in English | MEDLINE | ID: mdl-33380319

ABSTRACT

BACKGROUND: Age and time information stored within the histories of clinical notes can provide valuable insights for assessing a patient's disease risk, understanding disease progression, and studying therapeutic outcomes. However, details of age and temporally-specified clinical events are not well captured, consistently codified, and readily available to research databases for study. METHODS: We expanded upon existing annotation schemes to capture additional age and temporal information, conducted an annotation study to validate our expanded schema, and developed a prototypical, rule-based Named Entity Recognizer to extract our novel clinical named entities (NE). The annotation study was conducted on 138 discharge summaries from the pre-annotated 2014 ShARe/CLEF eHealth Challenge corpus. In addition to existing NE classes (TIMEX3, SUBJECT_CLASS, DISEASE_DISORDER), our schema proposes 3 additional NEs (AGE, PROCEDURE, OTHER_EVENTS). We also propose new attributes, e.g., "degree_relation" which captures the degree of biological relation for subjects annotated under SUBJECT_CLASS. As a proof of concept, we applied the schema to 49 H&P notes to encode pertinent history information for a lung cancer cohort study. RESULTS: An abundance of information was captured under the new OTHER_EVENTS, PROCEDURE and AGE classes, with 23%, 10% and 8% of all annotated NEs belonging to the above classes, respectively. We observed high inter-annotator agreement of >80% for AGE and TIMEX3; the automated NLP system achieved F1 scores of 86% (AGE) and 86% (TIMEX3). Age and temporally-specified mentions within past medical, family, surgical, and social histories were common in our lung cancer data set; annotation is ongoing to support this translational research study. CONCLUSIONS: Our annotation schema and NLP system can encode historical events from clinical notes to support clinical and translational research studies.


Subject(s)
Natural Language Processing , Aged, 80 and over , Cohort Studies , Humans
16.
J Med Internet Res ; 22(12): e22493, 2020 12 03.
Article in English | MEDLINE | ID: mdl-33270032

ABSTRACT

BACKGROUND: Automated texting platforms have emerged as a tool to facilitate communication between patients and health care providers with variable effects on achieving target blood pressure (BP). Understanding differences in the way patients interact with these communication platforms can inform their use and design for hypertension management. OBJECTIVE: Our primary aim was to explore the unique phenotypes of patient interactions with an automated text messaging platform for BP monitoring. Our secondary aim was to estimate associations between interaction phenotypes and BP control. METHODS: This study was a secondary analysis of data from a randomized controlled trial for adults with poorly controlled hypertension. A total of 201 patients with established primary care were assigned to the automated texting platform; messages exchanged throughout the 4-month program were analyzed. We used the k-means clustering algorithm to characterize two different interaction phenotypes: program conformity and engagement style. First, we identified unique clusters signifying differences in program conformity based on the frequency over time of error alerts, which were generated to patients when they deviated from the requested text message format (eg, ###/## for BP). Second, we explored overall engagement styles, defined by error alerts and responsiveness to text prompts, unprompted messages, and word count averages. Finally, we applied the chi-square test to identify associations between each interaction phenotype and achieving the target BP. RESULTS: We observed 3 categories of program conformity based on their frequency of error alerts: those who immediately and consistently submitted texts without system errors (perfect users, 51/201), those who did so after an initial learning period (adaptive users, 66/201), and those who consistently submitted messages generating errors to the platform (nonadaptive users, 38/201). Next, we observed 3 categories of engagement style: the enthusiast, who tended to submit unprompted messages with high word counts (17/155); the student, who inconsistently engaged (35/155); and the minimalist, who engaged only when prompted (103/155). Of all 6 phenotypes, we observed a statistically significant association between patients demonstrating the minimalist communication style (high adherence, few unprompted messages, limited information sharing) and achieving target BP (P<.001). CONCLUSIONS: We identified unique interaction phenotypes among patients engaging with an automated text message platform for remote BP monitoring. Only the minimalist communication style was associated with achieving target BP. Identifying and understanding interaction phenotypes may be useful for tailoring future automated texting interactions and designing future interventions to achieve better BP control.


Subject(s)
Blood Pressure/physiology , Hypertension/therapy , Monitoring, Physiologic/methods , Text Messaging/standards , Adolescent , Adult , Aged , Female , Humans , Male , Middle Aged , Phenotype , Young Adult
17.
AMIA Jt Summits Transl Sci Proc ; 2020: 136-141, 2020.
Article in English | MEDLINE | ID: mdl-32477632

ABSTRACT

With the increasing use of social media data for health-related research, the credibility of the information from this source has been questioned as the posts may not from originating personal accounts. While automatic bot detection approaches have been proposed, none have been evaluated on users posting health-related information. In this paper, we extend an existing bot detection system and customize it for health-related research. Using a dataset of Twitter users, we first show that the system, which was designed for political bot detection, underperforms when applied to health-related Twitter users. We then incorporate additional features and a statistical machine learning classifier to improve bot detection performance significantly. Our approach obtains F1-scores of 0.7 for the "bot" class, representing improvements of 0.339. Our approach is customizable and generalizable for bot detection in other health-related social media cohorts.

18.
PLoS One ; 15(4): e0230947, 2020.
Article in English | MEDLINE | ID: mdl-32287266

ABSTRACT

BACKGROUND: Although studies report that more than 90% of pregnant women utilize digital sources to supplement their maternal healthcare, little is known about the kinds of information that women seek from their peers during pregnancy. To date, most research has used self-report measures to elucidate how and why women to turn to digital sources during pregnancy. However, given that these measures may differ from actual utilization of online health information, it is important to analyze the online content pregnant women generate. OBJECTIVE: To apply machine learning methods to analyze online pregnancy forums, to better understand how women seek information from a community of online peers during pregnancy. METHODS: Data from seven WhatToExpect.com "birth club" forums (September 2018; January-June 2018) were scraped. Forum posts were collected for a one-year period, which included three trimesters and three months postpartum. Only initial posts from each thread were analyzed (n = 262,238). Automatic natural language processing (NLP) methods captured 50 discussed topics, which were annotated by two independent coders and grouped categorically. RESULTS: The largest topic categories were maternal health (45%), baby-related topics (29%), and people/relationships (10%). While pain was a popular topic all throughout pregnancy, individual topics that were dominant by trimester included miscarriage (first trimester), labor (third trimester), and baby sleeping routine (postpartum period). CONCLUSION: More than just emotional or peer support, pregnant women turn to online forums to discuss their health. Dominant topics, such as labor and miscarriage, suggest unmet informational needs in these domains. With misinformation becoming a growing public health concern, more attention must be directed toward peer-exchange outlets.


Subject(s)
Health Information Exchange/statistics & numerical data , Internet/statistics & numerical data , Abortion, Spontaneous , Emotions/physiology , Female , Humans , Machine Learning/statistics & numerical data , Maternal Health/statistics & numerical data , Parturition , Peer Group , Pregnancy , Pregnant Women , Social Support
19.
AMIA Annu Symp Proc ; 2020: 1268-1276, 2020.
Article in English | MEDLINE | ID: mdl-33936503

ABSTRACT

In the electronic health record, the majority of clinically relevant information is stored within clinical notes. Most clinical notes follow a set organizational structure composed of canonicalized section headers that facilitate clinical review and information gathering. Standardized section header terminologies such as the SecTag terminology permit the identification and standardization of headers to a canonicalized form. Although the SecTag terminology has been evaluated extensively for history & physical notes, the coverage of canonical section header terms has not been assessed across other note types. For this pilot study, we conducted a coverage study and characterization of canonical section headers across 5 common, clinical note types and a generalizability study of canonical section headers detected within two types of clinical notes from Penn Medicine.


Subject(s)
Documentation/methods , Electronic Health Records , Natural Language Processing , Terminology as Topic , Artificial Intelligence , Electronic Health Records/standards , Humans , Medical Records Systems, Computerized , Pilot Projects , Vocabulary, Controlled
SELECTION OF CITATIONS
SEARCH DETAIL
...