Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 53
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Antimicrob Agents Chemother ; 65(7): e0006321, 2021 06 17.
Article in English | MEDLINE | ID: mdl-33972243

ABSTRACT

Infection caused by carbapenem-resistant (CR) organisms is a rising problem in the United States. While the risk factors for antibiotic resistance are well known, there remains a large need for the early identification of antibiotic-resistant infections. Using machine learning (ML), we sought to develop a prediction model for carbapenem resistance. All patients >18 years of age admitted to a tertiary-care academic medical center between 1 January 2012 and 10 October 2017 with ≥1 bacterial culture were eligible for inclusion. All demographic, medication, vital sign, procedure, laboratory, and culture/sensitivity data were extracted from the electronic health record. Organisms were considered CR if a single isolate was reported as intermediate or resistant. Patients with CR and non-CR organisms were temporally matched to maintain the positive/negative case ratio. Extreme gradient boosting was used for model development. In total, 68,472 patients met inclusion criteria, with 1,088 patients identified as having CR organisms. Sixty-seven features were used for predictive modeling. The most important features were number of prior antibiotic days, recent central venous catheter placement, and inpatient surgery. After model training, the area under the receiver operating characteristic curve was 0.846. The sensitivity of the model was 30%, with a positive predictive value (PPV) of 30% and a negative predictive value of 99%. Using readily available clinical data, we were able to create a ML model capable of predicting CR infections at the time of culture collection with a high PPV.


Subject(s)
Carbapenems , Machine Learning , Carbapenems/pharmacology , Humans , Predictive Value of Tests , Retrospective Studies , Risk Assessment
2.
Crit Care Med ; 49(4): e433-e443, 2021 04 01.
Article in English | MEDLINE | ID: mdl-33591014

ABSTRACT

OBJECTIVES: Assess the impact of heterogeneity among established sepsis criteria (Sepsis-1, Sepsis-3, Centers for Disease Control and Prevention Adult Sepsis Event, and Centers for Medicare and Medicaid severe sepsis core measure 1) through the comparison of corresponding sepsis cohorts. DESIGN: Retrospective analysis of data extracted from electronic health record. SETTING: Single, tertiary-care center in St. Louis, MO. PATIENTS: Adult, nonsurgical inpatients admitted between January 1, 2012, and January 6, 2018. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: In the electronic health record data, 286,759 encounters met inclusion criteria across the study period. Application of established sepsis criteria yielded cohorts varying in prevalence: Centers for Disease Control and Prevention Adult Sepsis Event (4.4%), Centers for Medicare and Medicaid severe sepsis core measure 1 (4.8%), International Classification of Disease code (7.2%), Sepsis-3 (7.5%), and Sepsis-1 (11.3%). Between the two modern established criteria, Sepsis-3 (n = 21,550) and Centers for Disease Control and Prevention Adult Sepsis Event (n = 12,494), the size of the overlap was 7,763. The sepsis cohorts also varied in time from admission to sepsis onset (hr): Sepsis-1 (2.9), Sepsis-3 (4.1), Centers for Disease Control and Prevention Adult Sepsis Event (4.6), and Centers for Medicare and Medicaid severe sepsis core measure 1 (7.6); sepsis discharge International Classification of Disease code rate: Sepsis-1 (37.4%), Sepsis-3 (40.1%), Centers for Medicare and Medicaid severe sepsis core measure 1 (48.5%), and Centers for Disease Control and Prevention Adult Sepsis Event (54.5%); and inhospital mortality rate: Sepsis-1 (13.6%), Sepsis-3 (18.8%), International Classification of Disease code (20.4%), Centers for Medicare and Medicaid severe sepsis core measure 1 (22.5%), and Centers for Disease Control and Prevention Adult Sepsis Event (24.1%). CONCLUSIONS: The application of commonly used sepsis definitions on a single population produced sepsis cohorts with low agreement, significantly different baseline demographics, and clinical outcomes.


Subject(s)
Databases, Factual/statistics & numerical data , Sepsis/classification , Sepsis/diagnosis , Severity of Illness Index , Humans , International Classification of Diseases , Outcome Assessment, Health Care , Retrospective Studies , Sepsis/epidemiology , Shock, Septic/classification , Shock, Septic/diagnosis , United States
3.
MMWR Morb Mortal Wkly Rep ; 70(12): 449-455, 2021 Mar 26.
Article in English | MEDLINE | ID: mdl-33764961

ABSTRACT

Many kindergarten through grade 12 (K-12) schools offering in-person learning have adopted strategies to limit the spread of SARS-CoV-2, the virus that causes COVID-19 (1). These measures include mandating use of face masks, physical distancing in classrooms, increasing ventilation with outdoor air, identification of close contacts,* and following CDC isolation and quarantine guidance† (2). A 2-week pilot investigation was conducted to investigate occurrences of SARS-CoV-2 secondary transmission in K-12 schools in the city of Springfield, Missouri, and in St. Louis County, Missouri, during December 7-18, 2020. Schools in both locations implemented COVID-19 mitigation strategies; however, Springfield implemented a modified quarantine policy permitting student close contacts aged ≤18 years who had school-associated contact with a person with COVID-19 and met masking requirements during their exposure to continue in-person learning.§ Participating students, teachers, and staff members with COVID-19 (37) from 22 schools and their school-based close contacts (contacts) (156) were interviewed, and contacts were offered SARS-CoV-2 testing. Among 102 school-based contacts who received testing, two (2%) had positive test results indicating probable school-based SARS-CoV-2 secondary transmission. Both contacts were in Springfield and did not meet criteria to participate in the modified quarantine. In Springfield, 42 student contacts were permitted to continue in-person learning under the modified quarantine; among the 30 who were interviewed, 21 were tested, and none received a positive test result. Despite high community transmission, SARS-CoV-2 transmission in schools implementing COVID-19 mitigation strategies was lower than that in the community. Until additional data are available, K-12 schools should continue implementing CDC-recommended mitigation measures (2) and follow CDC isolation and quarantine guidance to minimize secondary transmission in schools offering in-person learning.


Subject(s)
COVID-19/prevention & control , COVID-19/transmission , Schools/organization & administration , Schools/statistics & numerical data , Adolescent , Adult , COVID-19/epidemiology , COVID-19 Nucleic Acid Testing , Child , Child, Preschool , Contact Tracing , Female , Humans , Male , Masks/statistics & numerical data , Middle Aged , Missouri/epidemiology , Physical Distancing , Pilot Projects , Quarantine , SARS-CoV-2/isolation & purification , Ventilation/statistics & numerical data
4.
J Biomed Inform ; 60: 95-103, 2016 Apr.
Article in English | MEDLINE | ID: mdl-26828957

ABSTRACT

BACKGROUND: Community-level factors have been clearly linked to health outcomes, but are challenging to incorporate into medical practice. Increasing use of electronic health records (EHRs) makes patient-level data available for researchers in a systematic and accessible way, but these data remain siloed from community-level data relevant to health. PURPOSE: This study sought to link community and EHR data from an older female patient cohort participating in an ongoing intervention at the Ohio State University Wexner Medical Center to associate community-level data with patient-level cardiovascular health (CVH) as well as to assess the utility of this EHR integration methodology. MATERIALS AND METHODS: CVH was characterized among patients using available EHR data collected May through July of 2013. EHR data for 153 patients were linked to United States census-tract level data to explore feasibility and insights gained from combining these disparate data sources. Analyses were conducted in 2014. RESULTS: Using the linked data, weekly per capita expenditure on fruits and vegetables was found to be significantly associated with CVH at the p<0.05 level and three other community-level attributes (median income, average household size, and unemployment rate) were associated with CVH at the p<0.10 level. CONCLUSIONS: This work paves the way for future integration of community and EHR-based data into patient care as a novel methodology to gain insight into multi-level factors that affect CVH and other health outcomes. Further, our findings demonstrate the specific architectural and functional challenges associated with integrating decision support technologies and geographic information to support tailored and patient-centered decision making therein.


Subject(s)
Cardiovascular System , Delivery of Health Care , Electronic Health Records , Health Status , Information Storage and Retrieval , Aged , Cohort Studies , Female , Geographic Information Systems , Humans , Ohio , Residence Characteristics , Socioeconomic Factors
5.
J Biomed Inform ; 58 Suppl: S103-S110, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26375493

ABSTRACT

The second track of the 2014 i2b2 challenge asked participants to automatically identify risk factors for heart disease among diabetic patients using natural language processing techniques for clinical notes. This paper describes a rule-based system developed using a combination of regular expressions, concepts from the Unified Medical Language System (UMLS), and freely-available resources from the community. With a performance (F1=90.7) that is significantly higher than the median (F1=87.20) and close to the top performing system (F1=92.8), it was the best rule-based system of all the submissions in the challenge. We also used this system to evaluate the utility of different terminologies in the UMLS towards the challenge task. Of the 155 terminologies in the UMLS, 129 (76.78%) have no representation in the corpus. The Consumer Health Vocabulary had very good coverage of relevant concepts and was the most useful terminology for the challenge task. While segmenting notes into sections and lists has a significant impact on the performance, identifying negations and experiencer of the medical event results in negligible gain.


Subject(s)
Data Mining/methods , Diabetes Complications/epidemiology , Electronic Health Records/organization & administration , Narration , Natural Language Processing , Unified Medical Language System/organization & administration , Aged , Cohort Studies , Comorbidity , Computer Security , Confidentiality , Coronary Artery Disease/diagnosis , Coronary Artery Disease/epidemiology , Diabetes Complications/diagnosis , Female , Humans , Incidence , Longitudinal Studies , Male , Middle Aged , Ohio/epidemiology , Pattern Recognition, Automated/methods , Risk Assessment/methods , Terminology as Topic , Vocabulary, Controlled
6.
J Biomed Inform ; 58 Suppl: S211-S218, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26376462

ABSTRACT

Clinical trials are essential for determining whether new interventions are effective. In order to determine the eligibility of patients to enroll into these trials, clinical trial coordinators often perform a manual review of clinical notes in the electronic health record of patients. This is a very time-consuming and exhausting task. Efforts in this process can be expedited if these coordinators are directed toward specific parts of the text that are relevant for eligibility determination. In this study, we describe the creation of a dataset that can be used to evaluate automated methods capable of identifying sentences in a note that are relevant for screening a patient's eligibility in clinical trials. Using this dataset, we also present results for four simple methods in natural language processing that can be used to automate this task. We found that this is a challenging task (maximum F-score=26.25), but it is a promising direction for further research.


Subject(s)
Clinical Trials as Topic/methods , Data Mining/methods , Electronic Health Records/organization & administration , Eligibility Determination/methods , Natural Language Processing , Patient Selection , Humans , Pattern Recognition, Automated/methods , Vocabulary, Controlled
7.
bioRxiv ; 2024 Apr 06.
Article in English | MEDLINE | ID: mdl-37808763

ABSTRACT

Objective: Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients' health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI's Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal was to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.5-turbo, Flan-T5-xl, Flan-T5-xxl, and two rule-based and machine learning-based methods, namely, scispaCy and medspaCy. Materials and Methods: Phenotypes such as initial cancer stage, initial treatment, evidence of cancer recurrence, and affected organs during recurrence were identified from 13,646 records for 63 NSCLC patients from Washington University in St. Louis, Missouri. The performance of the GPT-4 model is evaluated against GPT-3.5-turbo, Flan-T5-xxl, Flan-T5-xl, medspaCy and scispaCy by comparing precision, recall, and micro-F1 scores. Results: GPT-4 achieved higher F1 score, precision, and recall compared to Flan-T5-xl, Flan-T5-xxl, medspaCy and scispaCy's models. GPT-3.5-turbo performed similarly to that of GPT-4. GPT and Flan-T5 models were not constrained by explicit rule requirements for contextual pattern recognition. SpaCy models relied on predefined patterns, leading to their suboptimal performance. Discussion and Conclusion: GPT-4 improves clinical phenotype identification due to its robust pre-training and remarkable pattern recognition capability on the embedded tokens. It demonstrates data-driven effectiveness even with limited context in the input. While rule-based models remain useful for some tasks, GPT models offer improved contextual understanding of the text, and robust clinical phenotype extraction.

8.
JAMIA Open ; 7(3): ooae060, 2024 Oct.
Article in English | MEDLINE | ID: mdl-38962662

ABSTRACT

Objective: Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients' health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI's Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal was to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.5-turbo, Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, and 2 rule-based and machine learning-based methods, namely, scispaCy and medspaCy. Materials and Methods: Phenotypes such as initial cancer stage, initial treatment, evidence of cancer recurrence, and affected organs during recurrence were identified from 13 646 clinical notes for 63 NSCLC patients from Washington University in St. Louis, Missouri. The performance of the GPT-4 model is evaluated against GPT-3.5-turbo, Flan-T5-xxl, Flan-T5-xl, Llama-3-8B, medspaCy, and scispaCy by comparing precision, recall, and micro-F1 scores. Results: GPT-4 achieved higher F1 score, precision, and recall compared to Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, medspaCy, and scispaCy's models. GPT-3.5-turbo performed similarly to that of GPT-4. GPT, Flan-T5, and Llama models were not constrained by explicit rule requirements for contextual pattern recognition. spaCy models relied on predefined patterns, leading to their suboptimal performance. Discussion and Conclusion: GPT-4 improves clinical phenotype identification due to its robust pre-training and remarkable pattern recognition capability on the embedded tokens. It demonstrates data-driven effectiveness even with limited context in the input. While rule-based models remain useful for some tasks, GPT models offer improved contextual understanding of the text, and robust clinical phenotype extraction.

9.
JAMA Netw Open ; 7(6): e2417977, 2024 Jun 03.
Article in English | MEDLINE | ID: mdl-38904961

ABSTRACT

Importance: It is unclear whether cannabis use is associated with adverse health outcomes in patients with COVID-19 when accounting for known risk factors, including tobacco use. Objective: To examine whether cannabis and tobacco use are associated with adverse health outcomes from COVID-19 in the context of other known risk factors. Design, Setting, and Participants: This retrospective cohort study used electronic health record data from February 1, 2020, to January 31, 2022. This study included patients who were identified as having COVID-19 during at least 1 medical visit at a large academic medical center in the Midwest US. Exposures: Current cannabis use and tobacco smoking, as documented in the medical encounter. Main Outcomes and Measures: Health outcomes of hospitalization, intensive care unit (ICU) admission, and all-cause mortality following COVID-19 infection. The association between substance use (cannabis and tobacco) and these COVID-19 outcomes was assessed using multivariable modeling. Results: A total of 72 501 patients with COVID-19 were included (mean [SD] age, 48.9 [19.3] years; 43 315 [59.7%] female; 9710 [13.4%] had current smoking; 17 654 [24.4%] had former smoking; and 7060 [9.7%] had current use of cannabis). Current tobacco smoking was significantly associated with increased risk of hospitalization (odds ratio [OR], 1.72; 95% CI, 1.62-1.82; P < .001), ICU admission (OR, 1.22; 95% CI, 1.10-1.34; P < .001), and all-cause mortality (OR, 1.37, 95% CI, 1.20-1.57; P < .001) after adjusting for other factors. Cannabis use was significantly associated with increased risk of hospitalization (OR, 1.80; 95% CI, 1.68-1.93; P < .001) and ICU admission (OR, 1.27; 95% CI, 1.14-1.41; P < .001) but not with all-cause mortality (OR, 0.97; 95% CI, 0.82-1.14, P = .69) after adjusting for tobacco smoking, vaccination, comorbidity, diagnosis date, and demographic factors. Conclusions and Relevance: The findings of this cohort study suggest that cannabis use may be an independent risk factor for COVID-19-related complications, even after considering cigarette smoking, vaccination status, comorbidities, and other risk factors.


Subject(s)
COVID-19 , Hospitalization , Intensive Care Units , SARS-CoV-2 , Humans , COVID-19/mortality , COVID-19/epidemiology , Female , Male , Middle Aged , Retrospective Studies , Hospitalization/statistics & numerical data , Adult , Risk Factors , Intensive Care Units/statistics & numerical data , Aged , Tobacco Use/adverse effects , Tobacco Use/epidemiology , Tobacco Smoking/adverse effects , Tobacco Smoking/epidemiology , Marijuana Smoking/epidemiology , Marijuana Smoking/adverse effects
10.
JAMIA Open ; 6(1): ooad014, 2023 Apr.
Article in English | MEDLINE | ID: mdl-36844369

ABSTRACT

Objectives: There is much interest in utilizing clinical data for developing prediction models for Alzheimer's disease (AD) risk, progression, and outcomes. Existing studies have mostly utilized curated research registries, image analysis, and structured electronic health record (EHR) data. However, much critical information resides in relatively inaccessible unstructured clinical notes within the EHR. Materials and Methods: We developed a natural language processing (NLP)-based pipeline to extract AD-related clinical phenotypes, documenting strategies for success and assessing the utility of mining unstructured clinical notes. We evaluated the pipeline against gold-standard manual annotations performed by 2 clinical dementia experts for AD-related clinical phenotypes including medical comorbidities, biomarkers, neurobehavioral test scores, behavioral indicators of cognitive decline, family history, and neuroimaging findings. Results: Documentation rates for each phenotype varied in the structured versus unstructured EHR. Interannotator agreement was high (Cohen's kappa = 0.72-1) and positively correlated with the NLP-based phenotype extraction pipeline's performance (average F1-score = 0.65-0.99) for each phenotype. Discussion: We developed an automated NLP-based pipeline to extract informative phenotypes that may improve the performance of eventual machine learning predictive models for AD. In the process, we examined documentation practices for each phenotype relevant to the care of AD patients and identified factors for success. Conclusion: Success of our NLP-based phenotype extraction pipeline depended on domain-specific knowledge and focus on a specific clinical domain instead of maximizing generalizability.

11.
J Am Med Inform Assoc ; 30(10): 1730-1740, 2023 09 25.
Article in English | MEDLINE | ID: mdl-37390812

ABSTRACT

OBJECTIVE: We extended a 2013 literature review on electronic health record (EHR) data quality assessment approaches and tools to determine recent improvements or changes in EHR data quality assessment methodologies. MATERIALS AND METHODS: We completed a systematic review of PubMed articles from 2013 to April 2023 that discussed the quality assessment of EHR data. We screened and reviewed papers for the dimensions and methods defined in the original 2013 manuscript. We categorized papers as data quality outcomes of interest, tools, or opinion pieces. We abstracted and defined additional themes and methods though an iterative review process. RESULTS: We included 103 papers in the review, of which 73 were data quality outcomes of interest papers, 22 were tools, and 8 were opinion pieces. The most common dimension of data quality assessed was completeness, followed by correctness, concordance, plausibility, and currency. We abstracted conformance and bias as 2 additional dimensions of data quality and structural agreement as an additional methodology. DISCUSSION: There has been an increase in EHR data quality assessment publications since the original 2013 review. Consistent dimensions of EHR data quality continue to be assessed across applications. Despite consistent patterns of assessment, there still does not exist a standard approach for assessing EHR data quality. CONCLUSION: Guidelines are needed for EHR data quality assessment to improve the efficiency, transparency, comparability, and interoperability of data quality assessment. These guidelines must be both scalable and flexible. Automation could be helpful in generalizing this process.


Subject(s)
Data Accuracy , Electronic Health Records
12.
Neurology ; 101(14): e1424-e1433, 2023 10 03.
Article in English | MEDLINE | ID: mdl-37532510

ABSTRACT

BACKGROUND AND OBJECTIVES: The capacity of specialty memory clinics in the United States is very limited. If lower socioeconomic status or minoritized racial group is associated with reduced use of memory clinics, this could exacerbate health care disparities, especially if more effective treatments of Alzheimer disease become available. We aimed to understand how use of a memory clinic is associated with neighborhood-level measures of socioeconomic factors and the intersectionality of race. METHODS: We conducted an observational cross-sectional study using electronic health record data to compare the neighborhood advantage of patients seen at the Washington University Memory Diagnostic Center with the catchment area using a geographical information system. Furthermore, we compared the severity of dementia at the initial visit between patients who self-identified as Black or White. We used a multinomial logistic regression model to assess the Clinical Dementia Rating at the initial visit and t tests to compare neighborhood characteristics, including Area Deprivation Index, with those of the catchment area. RESULTS: A total of 4,824 patients seen at the memory clinic between 2008 and 2018 were included in this study (mean age 72.7 [SD 11.0] years, 2,712 [56%] female, 543 [11%] Black). Most of the memory clinic patients lived in more advantaged neighborhoods within the overall catchment area. The percentage of patients self-identifying as Black (11%) was lower than the average percentage of Black individuals by census tract in the catchment area (16%) (p < 0.001). Black patients lived in less advantaged neighborhoods, and Black patients were more likely than White patients to have moderate or severe dementia at their initial visit (odds ratio 1.59, 95% CI 1.11-2.25). DISCUSSION: This study demonstrates that patients living in less affluent neighborhoods were less likely to be seen in one large memory clinic. Black patients were under-represented in the clinic, and Black patients had more severe dementia at their initial visit. These findings suggest that patients with a lower socioeconomic status and who identify as Black are less likely to be seen in memory clinics, which are likely to be a major point of access for any new Alzheimer disease treatments that may become available.


Subject(s)
Alzheimer Disease , Aged , Female , Humans , Male , Alzheimer Disease/complications , Alzheimer Disease/diagnosis , Alzheimer Disease/epidemiology , Alzheimer Disease/ethnology , Alzheimer Disease/therapy , Black People , Cross-Sectional Studies , Racial Groups , Socioeconomic Factors , United States , Memory Disorders/epidemiology , Memory Disorders/ethnology , Memory Disorders/etiology , White People , Neighborhood Characteristics , Middle Aged , Aged, 80 and over
13.
J Am Med Inform Assoc ; 29(5): 813-821, 2022 04 13.
Article in English | MEDLINE | ID: mdl-35092276

ABSTRACT

OBJECTIVE: Respiratory support status is critical in understanding patient status, but electronic health record data are often scattered, incomplete, and contradictory. Further, there has been limited work on standardizing representations for respiratory support. The objective of this work was to (1) propose a practical terminology system for respiratory support methods; (2) develop (meta-)heuristics for constructing respiratory support episodes; and (3) evaluate the utility of respiratory support information for mortality prediction. MATERIALS AND METHODS: All analyses were performed using electronic health record data of COVID-19-tested, emergency department-admit, adult patients at a large, Midwestern healthcare system between March 1, 2020 and April 1, 2021. Logistic regression and XGBoost models were trained with and without respiratory support information, and performance metrics were compared. Importance of respiratory-support-based features was explored using absolute coefficient values for logistic regression and SHapley Additive exPlanations values for the XGBoost model. RESULTS: The proposed terminology system for respiratory support methods is as follows: Low-Flow Oxygen Therapy (LFOT), High-Flow Oxygen Therapy (HFOT), Non-Invasive Mechanical Ventilation (NIMV), Invasive Mechanical Ventilation (IMV), and ExtraCorporeal Membrane Oxygenation (ECMO). The addition of respiratory support information significantly improved mortality prediction (logistic regression area under receiver operating characteristic curve, median [IQR] from 0.855 [0.852-0.855] to 0.881 [0.876-0.884]; area under precision recall curve from 0.262 [0.245-0.268] to 0.319 [0.313-0.325], both P < 0.01). The proposed generalizable, interpretable, and episodic representation had commensurate performance compared to alternate representations despite loss of granularity. Respiratory support features were among the most important in both models. CONCLUSION: Respiratory support information is critical in understanding patient status and can facilitate downstream analyses.


Subject(s)
COVID-19 , Heuristics , Adult , Humans , Machine Learning , Oxygen , Retrospective Studies
14.
JAMIA Open ; 5(4): ooac105, 2022 Dec.
Article in English | MEDLINE | ID: mdl-36570030

ABSTRACT

EHR-based sepsis research often uses heterogeneous definitions of sepsis leading to poor generalizability and difficulty in comparing studies to each other. We have developed OpenSep, an open-source pipeline for sepsis phenotyping according to the Sepsis-3 definition, as well as determination of time of sepsis onset and SOFA scores. The Minimal Sepsis Data Model was developed alongside the pipeline to enable the execution of the pipeline to diverse sources of electronic health record data. The pipeline's accuracy was validated by applying it to the MIMIC-IV version 1.0 data and comparing sepsis onset and SOFA scores to those produced by the pipeline developed by the curators of MIMIC. We demonstrated high reliability between both the sepsis onsets and SOFA scores, however the use of the Minimal Sepsis Data model developed for this work allows our pipeline to be applied to more broadly to data sources beyond MIMIC.

15.
Front Digit Health ; 4: 848599, 2022.
Article in English | MEDLINE | ID: mdl-35350226

ABSTRACT

Objective: To develop and evaluate a sepsis prediction model for the general ward setting and extend the evaluation through a novel pseudo-prospective trial design. Design: Retrospective analysis of data extracted from electronic health records (EHR). Setting: Single, tertiary-care academic medical center in St. Louis, MO, USA. Patients: Adult, non-surgical inpatients admitted between January 1, 2012 and June 1, 2019. Interventions: None. Measurements and Main Results: Of the 70,034 included patient encounters, 3.1% were septic based on the Sepsis-3 criteria. Features were generated from the EHR data and were used to develop a machine learning model to predict sepsis 6-h ahead of onset. The best performing model had an Area Under the Receiver Operating Characteristic curve (AUROC or c-statistic) of 0.862 ± 0.011 and Area Under the Precision-Recall Curve (AUPRC) of 0.294 ± 0.021 compared to that of Logistic Regression (0.857 ± 0.008 and 0.256 ± 0.024) and NEWS 2 (0.699 ± 0.012 and 0.092 ± 0.009). In the pseudo-prospective trial, 388 (69.7%) septic patients were alerted on with a specificity of 81.4%. Within 24 h of crossing the alert threshold, 20.9% had a sepsis-related event occur. Conclusions: A machine learning model capable of predicting sepsis in the general ward setting was developed using the EHR data. The pseudo-prospective trial provided a more realistic estimation of implemented performance and demonstrated a 29.1% Positive Predictive Value (PPV) for sepsis-related intervention or outcome within 48 h.

16.
PLoS One ; 17(10): e0266292, 2022.
Article in English | MEDLINE | ID: mdl-36264919

ABSTRACT

OBJECTIVE: To determine whether modified K-12 student quarantine policies that allow some students to continue in-person education during their quarantine period increase schoolwide SARS-CoV-2 transmission risk following the increase in cases in winter 2020-2021. METHODS: We conducted a prospective cohort study of COVID-19 cases and close contacts among students and staff (n = 65,621) in 103 Missouri public schools. Participants were offered free, saliva-based RT-PCR testing. The projected number of school-based transmission events among untested close contacts was extrapolated from the percentage of events detected among tested asymptomatic close contacts and summed with the number of detected events for a projected total. An adjusted Cox regression model compared hazard rates of school-based SARS-CoV-2 infections between schools with a modified versus standard quarantine policy. RESULTS: From January-March 2021, a projected 23 (1%) school-based transmission events occurred among 1,636 school close contacts. There was no difference in the adjusted hazard rates of school-based SARS-CoV-2 infections between schools with a modified versus standard quarantine policy (hazard ratio = 1.00; 95% confidence interval: 0.97-1.03). DISCUSSION: School-based SARS-CoV-2 transmission was rare in 103 K-12 schools implementing multiple COVID-19 prevention strategies. Modified student quarantine policies were not associated with increased school incidence of COVID-19. Modifications to student quarantine policies may be a useful strategy for K-12 schools to safely reduce disruptions to in-person education during times of increased COVID-19 community incidence.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , Quarantine , COVID-19/epidemiology , COVID-19/prevention & control , Prospective Studies , Students , Policy
17.
Stat Med ; 30(16): 1989-2004, 2011 Jul 20.
Article in English | MEDLINE | ID: mdl-21520454

ABSTRACT

Stochastic curtailment is a sequential method to terminate a study when continuing to the end would be unlikely to change the outcome. This method has been researched most commonly in the context of clinical trials. The current paper explores its use in a different setting: the administration of a health questionnaire to patients via computer. A classification procedure augmenting logistic regression with stochastic curtailment is introduced to avoid burdening the patients with unnecessary questions. In a real-data simulation using responses from the Medicare Health Outcomes Survey, the new procedure substantially reduced the average number of questions administered with a minimal loss of classification accuracy.


Subject(s)
Biostatistics/methods , Stochastic Processes , Surveys and Questionnaires , Aged , Clinical Trials as Topic/statistics & numerical data , Data Interpretation, Statistical , Female , Health Surveys/statistics & numerical data , Humans , Male , Medicare , United States
18.
JAMIA Open ; 4(3): ooab052, 2021 Jul.
Article in English | MEDLINE | ID: mdl-34350389

ABSTRACT

OBJECTIVE: Alzheimer disease (AD) is the most common cause of dementia, a syndrome characterized by cognitive impairment severe enough to interfere with activities of daily life. We aimed to conduct a systematic literature review (SLR) of studies that applied machine learning (ML) methods to clinical data derived from electronic health records in order to model risk for progression of AD dementia. MATERIALS AND METHODS: We searched for articles published between January 1, 2010, and May 31, 2020, in PubMed, Scopus, ScienceDirect, IEEE Explore Digital Library, Association for Computing Machinery Digital Library, and arXiv. We used predefined criteria to select relevant articles and summarized them according to key components of ML analysis such as data characteristics, computational algorithms, and research focus. RESULTS: There has been a considerable rise over the past 5 years in the number of research papers using ML-based analysis for AD dementia modeling. We reviewed 64 relevant articles in our SLR. The results suggest that majority of existing research has focused on predicting progression of AD dementia using publicly available datasets containing both neuroimaging and clinical data (neurobehavioral status exam scores, patient demographics, neuroimaging data, and laboratory test values). DISCUSSION: Identifying individuals at risk for progression of AD dementia could potentially help to personalize disease management to plan future care. Clinical data consisting of both structured data tables and clinical notes can be effectively used in ML-based approaches to model risk for AD dementia progression. Data sharing and reproducibility of results can enhance the impact, adaptation, and generalizability of this research.

19.
Learn Health Syst ; 5(1): e10235, 2021 Jan.
Article in English | MEDLINE | ID: mdl-32838037

ABSTRACT

Problem: The current coronavirus disease 2019 (COVID-19) pandemic underscores the need for building and sustaining public health data infrastructure to support a rapid local, regional, national, and international response. Despite a historical context of public health crises, data sharing agreements and transactional standards do not uniformly exist between institutions which hamper a foundational infrastructure to meet data sharing and integration needs for the advancement of public health. Approach: There is a growing need to apply population health knowledge with technological solutions to data transfer, integration, and reasoning, to improve health in a broader learning health system ecosystem. To achieve this, data must be combined from healthcare provider organizations, public health departments, and other settings. Public health entities are in a unique position to consume these data, however, most do not yet have the infrastructure required to integrate data sources and apply computable knowledge to combat this pandemic. Outcomes: Herein, we describe lessons learned and a framework to address these needs, which focus on: (a) identifying and filling technology "gaps"; (b) pursuing collaborative design of data sharing requirements and transmission mechanisms; (c) facilitating cross-domain discussions involving legal and research compliance; and (d) establishing or participating in multi-institutional convening or coordinating activities. Next steps: While by no means a comprehensive evaluation of such issues, we envision that many of our experiences are universal. We hope those elucidated can serve as the catalyst for a robust community-wide dialogue on what steps can and should be taken to ensure that our regional and national health care systems can truly learn, in a rapid manner, so as to respond to this and future emergent public health crises.

20.
JAMIA Open ; 4(3): ooab062, 2021 Jul.
Article in English | MEDLINE | ID: mdl-34820600

ABSTRACT

The objective of this study was to directly compare the ability of commonly used early warning scores (EWS) for early identification and prediction of sepsis in the general ward setting. For general ward patients at a large, academic medical center between early-2012 and mid-2018, common EWS and patient acuity scoring systems were calculated from electronic health records (EHR) data for patients that both met and did not meet Sepsis-3 criteria. For identification of sepsis at index time, National Early Warning Score 2 (NEWS 2) had the highest performance (area under the receiver operating characteristic curve: 0.803 [95% confidence interval [CI]: 0.795-0.811], area under the precision recall curves: 0.130 [95% CI: 0.121-0.140]) followed NEWS, Modified Early Warning Score, and quick Sequential Organ Failure Assessment (qSOFA). Using validated thresholds, NEWS 2 also had the highest recall (0.758 [95% CI: 0.736-0.778]) but qSOFA had the highest specificity (0.950 [95% CI: 0.948-0.952]), positive predictive value (0.184 [95% CI: 0.169-0.198]), and F1 score (0.236 [95% CI: 0.220-0.253]). While NEWS 2 outperformed all other compared EWS and patient acuity scores, due to the low prevalence of sepsis, all scoring systems were prone to false positives (low positive predictive value without drastic sacrifices in sensitivity), thus leaving room for more computationally advanced approaches.

SELECTION OF CITATIONS
SEARCH DETAIL