Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 33
Filter
1.
Artif Intell Med ; 154: 102898, 2024 May 23.
Article in English | MEDLINE | ID: mdl-38843691

ABSTRACT

We present a neural network framework for learning a survival model to predict a time-to-event outcome while simultaneously learning a topic model that reveals feature relationships. In particular, we model each subject as a distribution over "topics", where a topic could, for instance, correspond to an age group, a disorder, or a disease. The presence of a topic in a subject means that specific clinical features are more likely to appear for the subject. Topics encode information about related features and are learned in a supervised manner to predict a time-to-event outcome. Our framework supports combining many different topic and survival models; training the resulting joint survival-topic model readily scales to large datasets using standard neural net optimizers with minibatch gradient descent. For example, a special case is to combine LDA with a Cox model, in which case a subject's distribution over topics serves as the input feature vector to the Cox model. We explain how to address practical implementation issues that arise when applying these neural survival-supervised topic models to clinical data, including how to visualize results to assist clinical interpretation. We study the effectiveness of our proposed framework on seven clinical datasets on predicting time until death as well as hospital ICU length of stay, where we find that neural survival-supervised topic models achieve competitive accuracy with existing approaches while yielding interpretable clinical topics that explain feature relationships. Our code is available at: https://github.com/georgehc/survival-topics.

2.
Proc Mach Learn Res ; 225: 403-427, 2023.
Article in English | MEDLINE | ID: mdl-38550276

ABSTRACT

We consider the problem of predicting how the likelihood of an outcome of interest for a patient changes over time as we observe more of the patient's data. To solve this problem, we propose a supervised contrastive learning framework that learns an embedding representation for each time step of a patient time series. Our framework learns the embedding space to have the following properties: (1) nearby points in the embedding space have similar predicted class probabilities, (2) adjacent time steps of the same time series map to nearby points in the embedding space, and (3) time steps with very different raw feature vectors map to far apart regions of the embedding space. To achieve property (3), we employ a nearest neighbor pairing mechanism in the raw feature space. This mechanism also serves as an alternative to "data augmentation", a key ingredient of contrastive learning, which lacks a standard procedure that is adequately realistic for clinical tabular data, to our knowledge. We demonstrate that our approach outperforms state-of-the-art baselines in predicting mortality of septic patients (MIMIC-III dataset) and tracking progression of cognitive impairment (ADNI dataset). Our method also consistently recovers the correct synthetic dataset embedding structure across experiments, a feat not achieved by baselines. Our ablation experiments show the pivotal role of our nearest neighbor pairing.

3.
Proc Mach Learn Res ; 219: 94-109, 2023 Aug.
Article in English | MEDLINE | ID: mdl-38476630

ABSTRACT

Reliable extraction of temporal relations from clinical notes is a growing need in many clinical research domains. Our work introduces typed markers to the task of clinical temporal relation extraction. We demonstrate that the addition of medical entity information to clinical text as tags with context sentences then input to a transformer-based architecture can outperform more complex systems requiring feature engineering and temporal reasoning. We propose several strategies of typed marker creation that incorporate entity type information at different granularities, with extensive experiments to test their effectiveness. Our system establishes the best result on I2B2, a clinical benchmark dataset for temporal relation extraction, with a F1 at 83.5% that provides a substantial 3.3% improvement over the previous best system.

4.
Sci Rep ; 12(1): 22503, 2022 Dec 28.
Article in English | MEDLINE | ID: mdl-36577760

ABSTRACT

Fusion magnets made from high temperature superconducting ReBCO CORC® cables are typically protected with quench detection systems that use voltage or temperature measurements to trigger current extraction processes. Although small coils with low inductances have been demonstrated, magnet protection remains a challenge and magnets are typically operated with little knowledge of the intrinsic performance parameters. We propose a protection framework based on current distribution monitoring in fusion cables with limited inter-cable current sharing. By employing inverse Biot-Savart techniques to distributed Hall probe arrays around CORC® Cable-In-Conduit-Conductor (CICC) terminations, individual cable currents are recreated and used to extract the parameters of a predictive model. These parameters are shown to be of value for detecting conductor damage and defining safe magnet operating limits. The trained model is then used to predict cable current distributions in real-time, and departures between predictions and inverse Biot-Savart recreated current distributions are used to generate quench triggers. The methodology shows promise for quality control, operational planning and real-time quench detection in bundled CORC® cables for compact fusion reactors.

5.
Methods Mol Biol ; 2496: 91-109, 2022.
Article in English | MEDLINE | ID: mdl-35713860

ABSTRACT

Epidemiological studies identifying biological markers of disease state are valuable, but can be time-consuming, expensive, and require extensive intuition and expertise. Furthermore, not all hypothesized markers will be borne out in a study, suggesting that high-quality initial hypotheses are crucial. In this chapter, we describe a high-throughput pipeline to produce a ranked list of high-quality hypothesized biomarkers for diseases. We review an example use of this approach to generate a large number of candidate disease biomarker hypotheses derived from machine learning models, filter and rank them according to their potential novelty using text mining, and corroborate the most promising hypotheses with further statistical modeling. The example use of the pipeline uses a large electronic health record dataset and the PubMed corpus, to find several promising hypothesized laboratory tests with previously undocumented correlations to particular diseases.


Subject(s)
Data Mining , Machine Learning , Electronic Health Records , Models, Statistical , Publications
6.
Lancet Digit Health ; 4(6): e455-e465, 2022 06.
Article in English | MEDLINE | ID: mdl-35623798

ABSTRACT

BACKGROUND: Little is known about whether machine-learning algorithms developed to predict opioid overdose using earlier years and from a single state will perform as well when applied to other populations. We aimed to develop a machine-learning algorithm to predict 3-month risk of opioid overdose using Pennsylvania Medicaid data and externally validated it in two data sources (ie, later years of Pennsylvania Medicaid data and data from a different state). METHODS: This prognostic modelling study developed and validated a machine-learning algorithm to predict overdose in Medicaid beneficiaries with one or more opioid prescription in Pennsylvania and Arizona, USA. To predict risk of hospital or emergency department visits for overdose in the subsequent 3 months, we measured 284 potential predictors from pharmaceutical and health-care encounter claims data in 3-month periods, starting 3 months before the first opioid prescription and continuing until loss to follow-up or study end. We developed and internally validated a gradient-boosting machine algorithm to predict overdose using 2013-16 Pennsylvania Medicaid data (n=639 693). We externally validated the model using (1) 2017-18 Pennsylvania Medicaid data (n=318 585) and (2) 2015-17 Arizona Medicaid data (n=391 959). We reported several prediction performance metrics (eg, C-statistic, positive predictive value). Beneficiaries were stratified into risk-score subgroups to support clinical use. FINDINGS: A total of 8641 (1·35%) 2013-16 Pennsylvania Medicaid beneficiaries, 2705 (0·85%) 2017-18 Pennsylvania Medicaid beneficiaries, and 2410 (0·61%) 2015-17 Arizona beneficiaries had one or more overdose during the study period. C-statistics for the algorithm predicting 3-month overdoses developed from the 2013-16 Pennsylvania training dataset and validated on the 2013-16 Pennsylvania internal validation dataset, 2017-18 Pennsylvania external validation dataset, and 2015-17 Arizona external validation dataset were 0·841 (95% CI 0·835-0·847), 0·828 (0·822-0·834), and 0·817 (0·807-0·826), respectively. In external validation datasets, 71 361 (22·4%) of 318 585 2017-18 Pennsylvania beneficiaries were in high-risk subgroups (positive predictive value of 0·38-4·08%; capturing 73% of overdoses in the subsequent 3 months) and 40 041 (10%) of 391 959 2015-17 Arizona beneficiaries were in high-risk subgroups (positive predictive value of 0·19-1·97%; capturing 55% of overdoses). Lower risk subgroups in both validation datasets had few individuals (≤0·2%) with an overdose. INTERPRETATION: A machine-learning algorithm predicting opioid overdose derived from Pennsylvania Medicaid data performed well in external validation with more recent Pennsylvania data and with Arizona Medicaid data. The algorithm might be valuable for overdose risk prediction and stratification in Medicaid beneficiaries. FUNDING: National Institute of Health, National Institute on Drug Abuse, National Institute on Aging.


Subject(s)
Drug Overdose , Opiate Overdose , Algorithms , Analgesics, Opioid , Humans , Machine Learning , Medicaid , Prognosis , United States
7.
Addiction ; 117(8): 2254-2263, 2022 08.
Article in English | MEDLINE | ID: mdl-35315173

ABSTRACT

BACKGROUND AND AIMS: The time lag encountered when accessing health-care data is one major barrier to implementing opioid overdose prediction measures in practice. Little is known regarding how one's opioid overdose risk changes over time. We aimed to identify longitudinal patterns of individual predicted overdose risks among Medicaid beneficiaries after initiation of opioid prescriptions. DESIGN, SETTING AND PARTICIPANTS: A retrospective cohort study in Pennsylvania, USA among Pennsylvania Medicaid beneficiaries aged 18-64 years who initiated opioid prescriptions between July 2017 and September 2018 (318 585 eligible beneficiaries (mean age = 39 ± 12 years, female = 65.7%, White = 62.2% and Black = 24.9%). MEASUREMENTS: We first applied a previously developed and validated machine-learning algorithm to obtain risk scores for opioid overdose emergency room or hospital visits in 3-month intervals for each beneficiary who initiated opioid therapy, until disenrollment from Medicaid, death or the end of observation (December 2018). We performed group-based trajectory modeling to identify trajectories of these predicted overdose risk scores over time. FINDINGS: Among eligible beneficiaries, 0.61% had one or more occurrences of opioid overdose in a median follow-up of 15 months. We identified five unique opioid overdose risk trajectories: three trajectories (accounting for 92% of the cohort) had consistent overdose risk over time, including consistent low-risk (63%), consistent medium-risk (25%) and consistent high-risk (4%) groups; another two trajectories (accounting for 8%) had overdose risks that substantially changed over time, including a group that transitioned from high- to medium-risk (3%) and another group that increased from medium- to high-risk over time (5%). CONCLUSIONS: More than 90% of Medicaid beneficiaries in Pennsylvania USA with one or more opioid prescriptions had consistent, predicted opioid overdose risks over 15 months. Applying opioid prediction algorithms developed from historical data may not be a major barrier to implementation in practice for the large majority of individuals.


Subject(s)
Drug Overdose , Opiate Overdose , Adult , Analgesics, Opioid/therapeutic use , Drug Overdose/drug therapy , Drug Overdose/epidemiology , Female , Humans , Medicaid , Middle Aged , Opiate Overdose/epidemiology , Retrospective Studies
8.
Diabetes ; 71(5): 1023-1033, 2022 05 01.
Article in English | MEDLINE | ID: mdl-35100352

ABSTRACT

Epigenetic regulation is an important factor in glucose metabolism, but underlying mechanisms remain largely unknown. Here we investigated epigenetic control of systemic metabolism by bromodomain-containing proteins (Brds), which are transcriptional regulators binding to acetylated histone, in both intestinal cells and mice treated with the bromodomain inhibitor JQ-1. In vivo treatment with JQ-1 resulted in hyperglycemia and severe glucose intolerance. Whole-body or tissue-specific insulin sensitivity was not altered by JQ-1; however, JQ-1 treatment reduced insulin secretion during both in vivo glucose tolerance testing and ex vivo incubation of isolated islets. JQ-1 also inhibited expression of fibroblast growth factor (FGF) 15 in the ileum and decreased FGF receptor 4-related signaling in the liver. These adverse metabolic effects of Brd4 inhibition were fully reversed by in vivo overexpression of FGF19, with normalization of hyperglycemia. At a cellular level, we demonstrate Brd4 binds to the promoter region of FGF19 in human intestinal cells; Brd inhibition by JQ-1 reduces FGF19 promoter binding and downregulates FGF19 expression. Thus, we identify Brd4 as a novel transcriptional regulator of intestinal FGF15/19 in ileum and FGF signaling in the liver and a contributor to the gut-liver axis and systemic glucose metabolism.


Subject(s)
Hyperglycemia , Nuclear Proteins , Animals , Epigenesis, Genetic , Fibroblast Growth Factors/metabolism , Glucose , Mice , Nuclear Proteins/genetics , Nuclear Proteins/metabolism , Transcription Factors/genetics , Transcription Factors/metabolism
9.
AMIA Annu Symp Proc ; 2022: 1257-1266, 2022.
Article in English | MEDLINE | ID: mdl-37128459

ABSTRACT

With COVID-19 now pervasive, identification of high-risk individuals is crucial. Using data from a major healthcare provider in Southwestern Pennsylvania, we develop survival models predicting severe COVID-19 progression. In this endeavor, we face a tradeoff between more accurate models relying on many features and less accurate models relying on a few features aligned with clinician intuition. Complicating matters, many EHR features tend to be under-coded degrading the accuracy of smaller models. In this study we develop two sets of high-performance risk scores: (i) an unconstrained model built from all available features; and (ii) a pipeline that learns a small set of clinical concepts before training a risk predictor. Learned concepts boost performance over the corresponding features (C-index 0.858 vs. 0.844) and demonstrate improvements over (i) when evaluated out-of-sample (subsequent time periods). Our models outperform previous works (C-index 0.844-0.872 vs. 0.598-0.810).


Subject(s)
COVID-19 , Humans , Machine Learning , Risk Factors , Pennsylvania
10.
Proc Conf Empir Methods Nat Lang Process ; 2022(SD): 109-120, 2022 Dec.
Article in English | MEDLINE | ID: mdl-38476318

ABSTRACT

Diagnostic coding, or ICD coding, is the task of assigning diagnosis codes defined by the ICD (International Classification of Diseases) standard to patient visits based on clinical notes. The current process of manual ICD coding is time-consuming and often error-prone, which suggests the need for automatic ICD coding. However, despite the long history of automatic ICD coding, there have been no standardized frameworks for benchmarking ICD coding models. We open-source an easy-to-use tool named AnEMIC, which provides a streamlined pipeline for preprocessing, training, and evaluating for automatic ICD coding. We correct errors in preprocessing by existing works, and provide key models and weights trained on the correctly preprocessed datasets. We also provide an interactive demo performing real-time inference from custom inputs, and visualizations drawn from explainable AI to analyze the models. We hope the framework helps move the research of ICD coding forward and helps professionals explore the potential of ICD coding. The framework and the associated code are available here.

11.
Proc Mach Learn Res ; 174: 234-247, 2022 Apr.
Article in English | MEDLINE | ID: mdl-38665367

ABSTRACT

Spelling correction is a particularly important problem in clinical natural language processing because of the abundant occurrence of misspellings in medical records. However, the scarcity of labeled datasets in a clinical context makes it hard to build a machine learning system for such clinical spelling correction. In this work, we present a probabilistic model of correcting misspellings based on a simple conditional independence assumption, which leads to a modular decomposition into a language model and a corruption model. With a deep character-level language model trained on a large clinical corpus, and a simple edit-based corruption model, we can build a spelling correction model with small or no real data. Experimental results show that our model significantly outperforms baselines on two healthcare spelling correction datasets.

12.
J Gen Intern Med ; 36(4): 908-915, 2021 04.
Article in English | MEDLINE | ID: mdl-33481168

ABSTRACT

BACKGROUND: Survivors of opioid overdose have substantially increased mortality risk, although this risk is not evenly distributed across individuals. No study has focused on predicting an individual's risk of death after a nonfatal opioid overdose. OBJECTIVE: To predict risk of death after a nonfatal opioid overdose. DESIGN AND PARTICIPANTS: This retrospective cohort study included 9686 Pennsylvania Medicaid beneficiaries with an emergency department or inpatient claim for nonfatal opioid overdose in 2014-2016. The index date was the first overdose claim during this period. EXPOSURES, MAIN OUTCOME, AND MEASURES: Predictor candidates were measured in the 180 days before the index overdose. Primary outcome was 180-day all-cause mortality. Using a gradient boosting machine model, we classified beneficiaries into six subgroups according to their risk of mortality (< 25th percentile of the risk score, 25th to < 50th, 50th to < 75th, 75th to < 90th, 90th to < 98th, ≥ 98th). We then measured receipt of medication for opioid use disorder (OUD), risk mitigation interventions (e.g., prescriptions for naloxone), and prescription opioids filled in the 180 days after the index overdose, by risk subgroup. KEY RESULTS: Of eligible beneficiaries, 347 (3.6%) died within 180 days after the index overdose. The C-statistic of the mortality prediction model was 0.71. In the highest risk subgroup, the observed 180-day mortality rate was 20.3%, while in the lowest risk subgroup, it was 1.5%. Medication for OUD and risk mitigation interventions after overdose were more commonly seen in lower risk groups, while opioid prescriptions were more likely to be used in higher risk groups (both p trends < .001). CONCLUSIONS: A risk prediction model performed well for classifying mortality risk after a nonfatal opioid overdose. This prediction score can identify high-risk subgroups to target interventions to improve outcomes among overdose survivors.


Subject(s)
Drug Overdose , Opiate Overdose , Opioid-Related Disorders , Analgesics, Opioid/therapeutic use , Drug Overdose/drug therapy , Emergency Service, Hospital , Hospitals , Humans , Opioid-Related Disorders/drug therapy , Pennsylvania/epidemiology , Retrospective Studies , United States/epidemiology
13.
AMIA Annu Symp Proc ; 2021: 285-294, 2021.
Article in English | MEDLINE | ID: mdl-35308980

ABSTRACT

Since the COVID-19 pandemic began, the United States's case fatality rate (CFR) has plummeted. Using national and Florida data, we unpack the drop in CFR between April and December 2020, accounting for such confounders as expanded testing, age distribution shift, and detection-to-death lags. Guided by the insight that treatment improvements in this period should correspond to decreases in hospitalization fatality rate (HFR), and using a block-bootstrapping procedure to quantify uncertainty, we find that although treatment improvements do not follow the same trajectory in Florida and nationally (with Florida undergoing a comparatively severe second peak), by December, significant improvements are observed both in Florida and nationally (at least 17% and 55% respectively). These estimates paint a more realistic picture of improvements than the drop in aggregate CFR (70.8%-91.1%). We publish a website where users can apply our analyses to selected demographics, regions, and dates of interest.


Subject(s)
COVID-19 , Age Distribution , COVID-19/epidemiology , Florida/epidemiology , Hospitalization , Humans , Pandemics
14.
Br J Haematol ; 192(1): 158-170, 2021 01.
Article in English | MEDLINE | ID: mdl-33169861

ABSTRACT

Reducing preventable hospital re-admissions in Sickle Cell Disease (SCD) could potentially improve outcomes and decrease healthcare costs. In a retrospective study of electronic health records, we hypothesized Machine-Learning (ML) algorithms may outperform standard re-admission scoring systems (LACE and HOSPITAL indices). Participants (n = 446) included patients with SCD with at least one unplanned inpatient encounter between January 1, 2013, and November 1, 2018. Patients were randomly partitioned into training and testing groups. Unplanned hospital admissions (n = 3299) were stratified to training and testing samples. Potential predictors (n = 486), measured from the last unplanned inpatient discharge to the current unplanned inpatient visit, were obtained via both data-driven methods and clinical knowledge. Three standard ML algorithms, Logistic Regression (LR), Support-Vector Machine (SVM), and Random Forest (RF) were applied. Prediction performance was assessed using the C-statistic, sensitivity, and specificity. In addition, we reported the most important predictors in our best models. In this dataset, ML algorithms outperformed LACE [C-statistic 0·6, 95% Confidence Interval (CI) 0·57-0·64] and HOSPITAL (C-statistic 0·69, 95% CI 0·66-0·72), with the RF (C-statistic 0·77, 95% CI 0·73-0·79) and LR (C-statistic 0·77, 95% CI 0·73-0·8) performing the best. ML algorithms can be powerful tools in predicting re-admission in high-risk patient groups.


Subject(s)
Anemia, Sickle Cell/therapy , Machine Learning , Patient Readmission , Adolescent , Adult , Aged , Aged, 80 and over , Algorithms , Female , Hospitalization , Humans , Male , Middle Aged , Risk Assessment , Young Adult
15.
PLoS One ; 15(7): e0235981, 2020.
Article in English | MEDLINE | ID: mdl-32678860

ABSTRACT

OBJECTIVE: To develop and validate a machine-learning algorithm to improve prediction of incident OUD diagnosis among Medicare beneficiaries with ≥1 opioid prescriptions. METHODS: This prognostic study included 361,527 fee-for-service Medicare beneficiaries, without cancer, filling ≥1 opioid prescriptions from 2011-2016. We randomly divided beneficiaries into training, testing, and validation samples. We measured 269 potential predictors including socio-demographics, health status, patterns of opioid use, and provider-level and regional-level factors in 3-month periods, starting from three months before initiating opioids until development of OUD, loss of follow-up or end of 2016. The primary outcome was a recorded OUD diagnosis or initiating methadone or buprenorphine for OUD as proxy of incident OUD. We applied elastic net, random forests, gradient boosting machine, and deep neural network to predict OUD in the subsequent three months. We assessed prediction performance using C-statistics and other metrics (e.g., number needed to evaluate to identify an individual with OUD [NNE]). Beneficiaries were stratified into subgroups by risk-score decile. RESULTS: The training (n = 120,474), testing (n = 120,556), and validation (n = 120,497) samples had similar characteristics (age ≥65 years = 81.1%; female = 61.3%; white = 83.5%; with disability eligibility = 25.5%; 1.5% had incident OUD). In the validation sample, the four approaches had similar prediction performances (C-statistic ranged from 0.874 to 0.882); elastic net required the fewest predictors (n = 48). Using the elastic net algorithm, individuals in the top decile of risk (15.8% [n = 19,047] of validation cohort) had a positive predictive value of 0.96%, negative predictive value of 99.7%, and NNE of 104. Nearly 70% of individuals with incident OUD were in the top two deciles (n = 37,078), having highest incident OUD (36 to 301 per 10,000 beneficiaries). Individuals in the bottom eight deciles (n = 83,419) had minimal incident OUD (3 to 28 per 10,000). CONCLUSIONS: Machine-learning algorithms improve risk prediction and risk stratification of incident OUD in Medicare beneficiaries.


Subject(s)
Computational Biology/methods , Fee-for-Service Plans/statistics & numerical data , Machine Learning , Medicare/statistics & numerical data , Opioid-Related Disorders/diagnosis , Opioid-Related Disorders/epidemiology , Risk Assessment/methods , Aged , Female , Humans , Male , Middle Aged , Opioid-Related Disorders/complications , Prognosis , United States
16.
Ann Fam Med ; 18(4): 334-340, 2020 07.
Article in English | MEDLINE | ID: mdl-32661034

ABSTRACT

PURPOSE: To develop and test a machine-learning-based model to predict primary care and other specialties using Medicare claims data. METHODS: We used 2014-2016 prescription and procedure Medicare data to train 3 sets of random forest classifiers (prescription only, procedure only, and combined) to predict specialty. Self-reported specialties were condensed to 27 categories. Physicians were assigned to testing and training cohorts, and random forest models were trained and then applied to 2014-2016 data sets for the testing cohort to generate a series of specialty predictions. Comparing the predicted specialty to self-report, we assessed performance with F1 scores and area under the receiver operating characteristic curve (AUROC) values. RESULTS: A total of 564,986 physicians were included. The combined model had a greater aggregate (macro) F1 score (0.876) than the prescription-only (0.745; P <.01) or procedure-only (0.821; P <.01) model. Mean F1 scores across specialties in the combined model ranged from 0.533 to 0.987. The mean F1 score was 0.920 for primary care. The mean AUROC value for the combined model was 0.992, with values ranging from 0.982 to 0.999. The AUROC value for primary care was 0.982. CONCLUSIONS: This novel approach showed high performance and provides a near real-time assessment of current primary care practice. These findings have important implications for primary care workforce research in the absence of accurate data.


Subject(s)
Machine Learning , Medicare , Physicians, Primary Care/supply & distribution , Primary Health Care , Algorithms , Area Under Curve , Cross-Sectional Studies , Humans , Insurance Claim Review , Physicians, Primary Care/education , Physicians, Primary Care/trends , ROC Curve , United States , Workforce
18.
Diabetes Res Clin Pract ; 154: 130-137, 2019 Aug.
Article in English | MEDLINE | ID: mdl-31279958

ABSTRACT

AIMS: We aimed to confirm the hypothesis that dysglycaemia including in the pre-diabetes range affects a majority of patients admitted with acute coronary syndrome (ACS) and is associated with worse outcomes. METHODS: In this prospective observational cohort study, consecutive inpatients aged ≥ 54 years with ACS were uniformly tested and categorised into diabetes (prior diagnosis/ HbA1c ≥ 6.5%, ≥48 mmol/mol), pre-diabetes (HbA1c 5.7-6.4%, 39-47 mmol/mol) and no diabetes (HbA1c ≤ 5.6%, ≤38 mmol/mol) groups. RESULTS: Over two years, 847 consecutive inpatients presented with ACS. 313 (37%) inpatients had diabetes, 312 (37%) had pre-diabetes and 222 (25%) had no diabetes. Diabetes, compared with no diabetes, was associated with higher odds of acute pulmonary oedema (APO, odds ratio, OR 2.60, p < 0.01), longer length of stay (LOS, incidence rate ratio, IRR 1.18, p = 0.02) and, 12-month ACS recurrence (OR 1.86, p = 0.046) after adjustment, while no significant associations were identified for pre-diabetes. Analysed as a continuous variable, every 1% (11 mmol/mol) increase in HbA1c was associated with increased odds of APO (OR 1.28, P = 0.002) and a longer LOS (IRR 1.05, P = 0.03). CONCLUSIONS: The high prevalence of dysglycaemia and association with poorer clinical outcomes justifies routine HbA1c testing to identify individuals who may benefit from cardioprotective anti-hyperglycaemic agents and, lifestyle modification to prevent progression of pre-diabetes.


Subject(s)
Acute Coronary Syndrome/epidemiology , Diabetes Mellitus/physiopathology , Hospitalization/statistics & numerical data , Inpatients/statistics & numerical data , Prediabetic State/physiopathology , Aged , Aged, 80 and over , Australia/epidemiology , Female , Humans , Male , Middle Aged , Prevalence , Prognosis , Prospective Studies
19.
AMIA Jt Summits Transl Sci Proc ; 2019: 572-581, 2019.
Article in English | MEDLINE | ID: mdl-31259012

ABSTRACT

Epidemiological studies identifying biological markers of disease state are valuable, but can be time-consuming, expensive, and require extensive intuition and expertise. Furthermore, not all hypothesized markers will be borne out in a study, suggesting that higher quality initial hypotheses are crucial. In this work, we propose a high-throughput pipeline to produce a ranked list of high-quality hypothesized marker laboratory tests for diagnoses. Our pipeline generates a large number of candidate lab-diagnosis hypotheses derived from machine learning models, filters and ranks them according to their potential novelty using text mining, and corroborate final hypotheses with logistic regression analysis. We test our approach on a large electronic health record dataset and the PubMed corpus, and find several promising candidate hypotheses.

20.
JAMA ; 321(20): 2003-2017, 2019 05 28.
Article in English | MEDLINE | ID: mdl-31104070

ABSTRACT

Importance: Sepsis is a heterogeneous syndrome. Identification of distinct clinical phenotypes may allow more precise therapy and improve care. Objective: To derive sepsis phenotypes from clinical data, determine their reproducibility and correlation with host-response biomarkers and clinical outcomes, and assess the potential causal relationship with results from randomized clinical trials (RCTs). Design, Settings, and Participants: Retrospective analysis of data sets using statistical, machine learning, and simulation tools. Phenotypes were derived among 20 189 total patients (16 552 unique patients) who met Sepsis-3 criteria within 6 hours of hospital presentation at 12 Pennsylvania hospitals (2010-2012) using consensus k means clustering applied to 29 variables. Reproducibility and correlation with biological parameters and clinical outcomes were assessed in a second database (2013-2014; n = 43 086 total patients and n = 31 160 unique patients), in a prospective cohort study of sepsis due to pneumonia (n = 583), and in 3 sepsis RCTs (n = 4737). Exposures: All clinical and laboratory variables in the electronic health record. Main Outcomes and Measures: Derived phenotype (α, ß, γ, and δ) frequency, host-response biomarkers, 28-day and 365-day mortality, and RCT simulation outputs. Results: The derivation cohort included 20 189 patients with sepsis (mean age, 64 [SD, 17] years; 10 022 [50%] male; mean maximum 24-hour Sequential Organ Failure Assessment [SOFA] score, 3.9 [SD, 2.4]). The validation cohort included 43 086 patients (mean age, 67 [SD, 17] years; 21 993 [51%] male; mean maximum 24-hour SOFA score, 3.6 [SD, 2.0]). Of the 4 derived phenotypes, the α phenotype was the most common (n = 6625; 33%) and included patients with the lowest administration of a vasopressor; in the ß phenotype (n = 5512; 27%), patients were older and had more chronic illness and renal dysfunction; in the γ phenotype (n = 5385; 27%), patients had more inflammation and pulmonary dysfunction; and in the δ phenotype (n = 2667; 13%), patients had more liver dysfunction and septic shock. Phenotype distributions were similar in the validation cohort. There were consistent differences in biomarker patterns by phenotype. In the derivation cohort, cumulative 28-day mortality was 287 deaths of 5691 unique patients (5%) for the α phenotype; 561 of 4420 (13%) for the ß phenotype; 1031 of 4318 (24%) for the γ phenotype; and 897 of 2223 (40%) for the δ phenotype. Across all cohorts and trials, 28-day and 365-day mortality were highest among the δ phenotype vs the other 3 phenotypes (P < .001). In simulation models, the proportion of RCTs reporting benefit, harm, or no effect changed considerably (eg, varying the phenotype frequencies within an RCT of early goal-directed therapy changed the results from >33% chance of benefit to >60% chance of harm). Conclusions and Relevance: In this retrospective analysis of data sets from patients with sepsis, 4 clinical phenotypes were identified that correlated with host-response patterns and clinical outcomes, and simulations suggested these phenotypes may help in understanding heterogeneity of treatment effects. Further research is needed to determine the utility of these phenotypes in clinical care and for informing trial design and interpretation.


Subject(s)
Sepsis/classification , Algorithms , Biomarkers/blood , Cluster Analysis , Datasets as Topic , Hospital Mortality , Humans , Machine Learning , Organ Dysfunction Scores , Phenotype , Reproducibility of Results , Retrospective Studies , Sepsis/mortality , Sepsis/therapy
SELECTION OF CITATIONS
SEARCH DETAIL
...