Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 26
Filter
1.
J Am Med Inform Assoc ; 31(7): 1514-1521, 2024 Jun 20.
Article in English | MEDLINE | ID: mdl-38767857

ABSTRACT

OBJECTIVE: This study evaluates regularization variants in logistic regression (L1, L2, ElasticNet, Adaptive L1, Adaptive ElasticNet, Broken adaptive ridge [BAR], and Iterative hard thresholding [IHT]) for discrimination and calibration performance, focusing on both internal and external validation. MATERIALS AND METHODS: We use data from 5 US claims and electronic health record databases and develop models for various outcomes in a major depressive disorder patient population. We externally validate all models in the other databases. We use a train-test split of 75%/25% and evaluate performance with discrimination and calibration. Statistical analysis for difference in performance uses Friedman's test and critical difference diagrams. RESULTS: Of the 840 models we develop, L1 and ElasticNet emerge as superior in both internal and external discrimination, with a notable AUC difference. BAR and IHT show the best internal calibration, without a clear external calibration leader. ElasticNet typically has larger model sizes than L1. Methods like IHT and BAR, while slightly less discriminative, significantly reduce model complexity. CONCLUSION: L1 and ElasticNet offer the best discriminative performance in logistic regression for healthcare predictions, maintaining robustness across validations. For simpler, more interpretable models, L0-based methods (IHT and BAR) are advantageous, providing greater parsimony and calibration with fewer features. This study aids in selecting suitable regularization techniques for healthcare prediction models, balancing performance, complexity, and interpretability.


Subject(s)
Depressive Disorder, Major , Humans , Logistic Models , Electronic Health Records , Linear Models , Databases, Factual , United States
2.
Stud Health Technol Inform ; 302: 129-130, 2023 May 18.
Article in English | MEDLINE | ID: mdl-37203625

ABSTRACT

We investigated a stacking ensemble method that combines multiple base learners within a database. The results on external validation across four large databases suggest a stacking ensemble could improve model transportability.


Subject(s)
Databases, Factual
3.
Stud Health Technol Inform ; 302: 139-140, 2023 May 18.
Article in English | MEDLINE | ID: mdl-37203630

ABSTRACT

The Deposit, Evaluate and Lookup Predictive Healthcare Information (DELPHI) library provides a centralised location for the depositing, exploring and analysing of patient-level prediction models that are compatible with data mapped to the observational medical outcomes partnership common data model.

4.
BMC Med Res Methodol ; 22(1): 311, 2022 12 05.
Article in English | MEDLINE | ID: mdl-36471238

ABSTRACT

BACKGROUND: Many dementia prediction models have been developed, but only few have been externally validated, which hinders clinical uptake and may pose a risk if models are applied to actual patients regardless. Externally validating an existing prediction model is a difficult task, where we mostly rely on the completeness of model reporting in a published article. In this study, we aim to externally validate existing dementia prediction models. To that end, we define model reporting criteria, review published studies, and externally validate three well reported models using routinely collected health data from administrative claims and electronic health records. METHODS: We identified dementia prediction models that were developed between 2011 and 2020 and assessed if they could be externally validated given a set of model criteria. In addition, we externally validated three of these models (Walters' Dementia Risk Score, Mehta's RxDx-Dementia Risk Index, and Nori's ADRD dementia prediction model) on a network of six observational health databases from the United States, United Kingdom, Germany and the Netherlands, including the original development databases of the models. RESULTS: We reviewed 59 dementia prediction models. All models reported the prediction method, development database, and target and outcome definitions. Less frequently reported by these 59 prediction models were predictor definitions (52 models) including the time window in which a predictor is assessed (21 models), predictor coefficients (20 models), and the time-at-risk (42 models). The validation of the model by Walters (development c-statistic: 0.84) showed moderate transportability (0.67-0.76 c-statistic). The Mehta model (development c-statistic: 0.81) transported well to some of the external databases (0.69-0.79 c-statistic). The Nori model (development AUROC: 0.69) transported well (0.62-0.68 AUROC) but performed modestly overall. Recalibration showed improvements for the Walters and Nori models, while recalibration could not be assessed for the Mehta model due to unreported baseline hazard. CONCLUSION: We observed that reporting is mostly insufficient to fully externally validate published dementia prediction models, and therefore, it is uncertain how well these models would work in other clinical settings. We emphasize the importance of following established guidelines for reporting clinical prediction models. We recommend that reporting should be more explicit and have external validation in mind if the model is meant to be applied in different settings.


Subject(s)
Dementia , Humans , United Kingdom , Risk Factors , Dementia/diagnosis , Dementia/epidemiology , Netherlands/epidemiology , Germany , Prognosis
5.
BMC Pregnancy Childbirth ; 22(1): 442, 2022 May 26.
Article in English | MEDLINE | ID: mdl-35619056

ABSTRACT

BACKGROUND: Perinatal depression is estimated to affect ~ 12% of pregnancies and is linked to numerous negative outcomes. There is currently no model to predict perinatal depression at multiple time-points during and after pregnancy using variables ascertained early into pregnancy. METHODS: A prospective cohort design where 858 participants filled in a baseline self-reported survey at week 4-10 of pregnancy (that included social economics, health history, various psychiatric measures), with follow-up until 3 months after delivery. Our primary outcome was an Edinburgh Postnatal Depression Score (EPDS) score of 12 or more (a proxy for perinatal depression) assessed during each trimester and again at two time periods after delivery. Five gradient boosting machines were trained to predict the risk of having EPDS score > = 12 at each of the five follow-up periods. The predictors consisted of 21 variables from 3 validated psychometric scales. As a sensitivity analysis, we also investigated different predictor sets that contained: i) 17 of the 21 variables predictors by only including two of the psychometric scales and ii) including 143 additional social economics and health history predictors, resulting in 164 predictors. RESULTS: We developed five prognostic models: PND-T1 (trimester 1), PND-T2 (trimester 2), PND-T3 (trimester 3), PND-A1 (after delivery 1) and PND-A2 (delayed onset after delivery) that calculate personalised risks while only requiring that women be asked 21 questions from 3 validated psychometric scales at weeks 4-10 of pregnancy. C-statistics (also known as AUC) ranged between 0.69 (95% CI 0.65-0.73) and 0.77 (95% CI 0.74-0.80). At 50% sensitivity the positive predictive value ranged between 30%-50% across the models, generally identifying groups of patients with double the average risk. Models trained using the 17 predictors and 164 predictors did not improve model performance compared to the models trained using 21 predictors. CONCLUSIONS: The five models can predict risk of perinatal depression within each trimester and in two post-natal periods using survey responses as early as week 4 of pregnancy with modest performance. The models need to be externally validated and prospectively tested to ensure generalizability to any pregnant patient.


Subject(s)
Depression, Postpartum , Depressive Disorder , Depression/diagnosis , Depression/psychology , Depression, Postpartum/psychology , Female , Humans , Patient Reported Outcome Measures , Pregnancy , Prospective Studies
6.
Drug Saf ; 45(5): 493-510, 2022 05.
Article in English | MEDLINE | ID: mdl-35579813

ABSTRACT

Increasing availability of electronic health databases capturing real-world experiences with medical products has garnered much interest in their use for pharmacoepidemiologic and pharmacovigilance studies. The traditional practice of having numerous groups use single databases to accomplish similar tasks and address common questions about medical products can be made more efficient through well-coordinated multi-database studies, greatly facilitated through distributed data network (DDN) architectures. Access to larger amounts of electronic health data within DDNs has created a growing interest in using data-adaptive machine learning (ML) techniques that can automatically model complex associations in high-dimensional data with minimal human guidance. However, the siloed storage and diverse nature of the databases in DDNs create unique challenges for using ML. In this paper, we discuss opportunities, challenges, and considerations for applying ML in DDNs for pharmacoepidemiologic and pharmacovigilance studies. We first discuss major types of activities performed by DDNs and how ML may be used. Next, we discuss practical data-related factors influencing how DDNs work in practice. We then combine these discussions and jointly consider how opportunities for ML are affected by practical data-related factors for DDNs, leading to several challenges. We present different approaches for addressing these challenges and highlight efforts that real-world DDNs have taken or are currently taking to help mitigate them. Despite these challenges, the time is ripe for the emerging interest to use ML in DDNs, and the utility of these data-adaptive modeling techniques in pharmacoepidemiologic and pharmacovigilance studies will likely continue to increase in the coming years.


Subject(s)
Machine Learning , Pharmacovigilance , Databases, Factual , Humans , Pharmacoepidemiology
7.
Drug Saf ; 45(5): 563-570, 2022 05.
Article in English | MEDLINE | ID: mdl-35579818

ABSTRACT

INTRODUCTION: External validation of prediction models is increasingly being seen as a minimum requirement for acceptance in clinical practice. However, the lack of interoperability of healthcare databases has been the biggest barrier to this occurring on a large scale. Recent improvements in database interoperability enable a standardized analytical framework for model development and external validation. External validation of a model in a new database lacks context, whereby the external validation can be compared with a benchmark in this database. Iterative pairwise external validation (IPEV) is a framework that uses a rotating model development and validation approach to contextualize the assessment of performance across a network of databases. As a use case, we predicted 1-year risk of heart failure in patients with type 2 diabetes mellitus. METHODS: The method follows a two-step process involving (1) development of baseline and data-driven models in each database according to best practices and (2) validation of these models across the remaining databases. We introduce a heatmap visualization that supports the assessment of the internal and external model performance in all available databases. As a use case, we developed and validated models to predict 1-year risk of heart failure in patients initializing a second pharmacological intervention for type 2 diabetes mellitus. We leveraged the power of the Observational Medical Outcomes Partnership common data model to create an open-source software package to increase the consistency, speed, and transparency of this process. RESULTS: A total of 403,187 patients from five databases were included in the study. We developed five models that, when assessed internally, had a discriminative performance ranging from 0.73 to 0.81 area under the receiver operating characteristic curve with acceptable calibration. When we externally validated these models in a new database, three models achieved consistent performance and in context often performed similarly to models developed in the database itself. The visualization of IPEV provided valuable insights. From this, we identified the model developed in the Commercial Claims and Encounters (CCAE) database as the best performing model overall. CONCLUSION: Using IPEV lends weight to the model development process. The rotation of development through multiple databases provides context to model assessment, leading to improved understanding of transportability and generalizability. The inclusion of a baseline model in all modelling steps provides further context to the performance gains of increasing model complexity. The CCAE model was identified as a candidate for clinical use. The use case demonstrates that IPEV provides a huge opportunity in a new era of standardised data and analytics to improve insight into and trust in prediction models at an unprecedented scale.


Subject(s)
Diabetes Mellitus, Type 2 , Heart Failure , Databases, Factual , Diabetes Mellitus, Type 2/epidemiology , Heart Failure/epidemiology , Humans , Software
8.
Int J Med Inform ; 163: 104762, 2022 07.
Article in English | MEDLINE | ID: mdl-35429722

ABSTRACT

OBJECTIVE: Provide guidance on sample size considerations for developing predictive models by empirically establishing the adequate sample size, which balances the competing objectives of improving model performance and reducing model complexity as well as computational requirements. MATERIALS AND METHODS: We empirically assess the effect of sample size on prediction performance and model complexity by generating learning curves for 81 prediction problems (23 outcomes predicted in a depression cohort, 58 outcomes predicted in a hypertension cohort) in three large observational health databases, requiring training of 17,248 prediction models. The adequate sample size was defined as the sample size for which the performance of a model equalled the maximum model performance minus a small threshold value. RESULTS: The adequate sample size achieves a median reduction of the number of observations of 9.5%, 37.3%, 58.5%, and 78.5% for the thresholds of 0.001, 0.005, 0.01, and 0.02, respectively. The median reduction of the number of predictors in the models was 8.6%, 32.2%, 48.2%, and 68.3% for the thresholds of 0.001, 0.005, 0.01, and 0.02, respectively. DISCUSSION: Based on our results a conservative, yet significant, reduction in sample size and model complexity can be estimated for future prediction work. Though, if a researcher is willing to generate a learning curve a much larger reduction of the model complexity may be possible as suggested by a large outcome-dependent variability. CONCLUSION: Our results suggest that in most cases only a fraction of the available data was sufficient to produce a model close to the performance of one developed on the full data set, but with a substantially reduced model complexity.


Subject(s)
Logistic Models , Cohort Studies , Humans , Sample Size
9.
BMC Med Res Methodol ; 22(1): 35, 2022 01 30.
Article in English | MEDLINE | ID: mdl-35094685

ABSTRACT

BACKGROUND: We investigated whether we could use influenza data to develop prediction models for COVID-19 to increase the speed at which prediction models can reliably be developed and validated early in a pandemic. We developed COVID-19 Estimated Risk (COVER) scores that quantify a patient's risk of hospital admission with pneumonia (COVER-H), hospitalization with pneumonia requiring intensive services or death (COVER-I), or fatality (COVER-F) in the 30-days following COVID-19 diagnosis using historical data from patients with influenza or flu-like symptoms and tested this in COVID-19 patients. METHODS: We analyzed a federated network of electronic medical records and administrative claims data from 14 data sources and 6 countries containing data collected on or before 4/27/2020. We used a 2-step process to develop 3 scores using historical data from patients with influenza or flu-like symptoms any time prior to 2020. The first step was to create a data-driven model using LASSO regularized logistic regression, the covariates of which were used to develop aggregate covariates for the second step where the COVER scores were developed using a smaller set of features. These 3 COVER scores were then externally validated on patients with 1) influenza or flu-like symptoms and 2) confirmed or suspected COVID-19 diagnosis across 5 databases from South Korea, Spain, and the United States. Outcomes included i) hospitalization with pneumonia, ii) hospitalization with pneumonia requiring intensive services or death, and iii) death in the 30 days after index date. RESULTS: Overall, 44,507 COVID-19 patients were included for model validation. We identified 7 predictors (history of cancer, chronic obstructive pulmonary disease, diabetes, heart disease, hypertension, hyperlipidemia, kidney disease) which combined with age and sex discriminated which patients would experience any of our three outcomes. The models achieved good performance in influenza and COVID-19 cohorts. For COVID-19 the AUC ranges were, COVER-H: 0.69-0.81, COVER-I: 0.73-0.91, and COVER-F: 0.72-0.90. Calibration varied across the validations with some of the COVID-19 validations being less well calibrated than the influenza validations. CONCLUSIONS: This research demonstrated the utility of using a proxy disease to develop a prediction model. The 3 COVER models with 9-predictors that were developed using influenza data perform well for COVID-19 patients for predicting hospitalization, intensive services, and fatality. The scores showed good discriminatory performance which transferred well to the COVID-19 population. There was some miscalibration in the COVID-19 validations, which is potentially due to the difference in symptom severity between the two diseases. A possible solution for this is to recalibrate the models in each location before use.


Subject(s)
COVID-19 , Influenza, Human , Pneumonia , COVID-19 Testing , Humans , Influenza, Human/epidemiology , SARS-CoV-2 , United States
10.
Knee Surg Sports Traumatol Arthrosc ; 30(9): 3068-3075, 2022 Sep.
Article in English | MEDLINE | ID: mdl-34870731

ABSTRACT

PURPOSE: The purpose of this study was to develop and validate a prediction model for 90-day mortality following a total knee replacement (TKR). TKR is a safe and cost-effective surgical procedure for treating severe knee osteoarthritis (OA). Although complications following surgery are rare, prediction tools could help identify high-risk patients who could be targeted with preventative interventions. The aim was to develop and validate a simple model to help inform treatment choices. METHODS: A mortality prediction model for knee OA patients following TKR was developed and externally validated using a US claims database and a UK general practice database. The target population consisted of patients undergoing a primary TKR for knee OA, aged ≥ 40 years and registered for ≥ 1 year before surgery. LASSO logistic regression models were developed for post-operative (90-day) mortality. A second mortality model was developed with a reduced feature set to increase interpretability and usability. RESULTS: A total of 193,615 patients were included, with 40,950 in The Health Improvement Network (THIN) database and 152,665 in Optum. The full model predicting 90-day mortality yielded AUROC of 0.78 when trained in OPTUM and 0.70 when externally validated on THIN. The 12 variable model achieved internal AUROC of 0.77 and external AUROC of 0.71 in THIN. CONCLUSIONS: A simple prediction model based on sex, age, and 10 comorbidities that can identify patients at high risk of short-term mortality following TKR was developed that demonstrated good, robust performance. The 12-feature mortality model is easily implemented and the performance suggests it could be used to inform evidence based shared decision-making prior to surgery and targeting prophylaxis for those at high risk. LEVEL OF EVIDENCE: III.


Subject(s)
Arthroplasty, Replacement, Knee , Osteoarthritis, Knee , Child , Databases, Factual , Humans
11.
BMJ Open ; 11(12): e050146, 2021 12 24.
Article in English | MEDLINE | ID: mdl-34952871

ABSTRACT

OBJECTIVE: The internal validation of prediction models aims to quantify the generalisability of a model. We aim to determine the impact, if any, that the choice of development and internal validation design has on the internal performance bias and model generalisability in big data (n~500 000). DESIGN: Retrospective cohort. SETTING: Primary and secondary care; three US claims databases. PARTICIPANTS: 1 200 769 patients pharmaceutically treated for their first occurrence of depression. METHODS: We investigated the impact of the development/validation design across 21 real-world prediction questions. Model discrimination and calibration were assessed. We trained LASSO logistic regression models using US claims data and internally validated the models using eight different designs: 'no test/validation set', 'test/validation set' and cross validation with 3-fold, 5-fold or 10-fold with and without a test set. We then externally validated each model in two new US claims databases. We estimated the internal validation bias per design by empirically comparing the differences between the estimated internal performance and external performance. RESULTS: The differences between the models' internal estimated performances and external performances were largest for the 'no test/validation set' design. This indicates even with large data the 'no test/validation set' design causes models to overfit. The seven alternative designs included some validation process to select the hyperparameters and a fair testing process to estimate internal performance. These designs had similar internal performance estimates and performed similarly when externally validated in the two external databases. CONCLUSIONS: Even with big data, it is important to use some validation process to select the optimal hyperparameters and fairly assess internal validation using a test set or cross-validation.


Subject(s)
Delivery of Health Care , Bias , Humans , Logistic Models , Prognosis , Retrospective Studies
12.
Transl Psychiatry ; 11(1): 642, 2021 12 20.
Article in English | MEDLINE | ID: mdl-34930903

ABSTRACT

Many patients with bipolar disorder (BD) are initially misdiagnosed with major depressive disorder (MDD) and are treated with antidepressants, whose potential iatrogenic effects are widely discussed. It is unknown whether MDD is a comorbidity of BD or its earlier stage, and no consensus exists on individual conversion predictors, delaying BD's timely recognition and treatment. We aimed to build a predictive model of MDD to BD conversion and to validate it across a multi-national network of patient databases using the standardization afforded by the Observational Medical Outcomes Partnership (OMOP) common data model. Five "training" US databases were retrospectively analyzed: IBM MarketScan CCAE, MDCR, MDCD, Optum EHR, and Optum Claims. Cyclops regularized logistic regression models were developed on one-year MDD-BD conversion with all standard covariates from the HADES PatientLevelPrediction package. Time-to-conversion Kaplan-Meier analysis was performed up to a decade after MDD, stratified by model-estimated risk. External validation of the final prediction model was performed across 9 patient record databases within the Observational Health Data Sciences and Informatics (OHDSI) network internationally. The model's area under the curve (AUC) varied 0.633-0.745 (µ = 0.689) across the five US training databases. Nine variables predicted one-year MDD-BD transition. Factors that increased risk were: younger age, severe depression, psychosis, anxiety, substance misuse, self-harm thoughts/actions, and prior mental disorder. AUCs of the validation datasets ranged 0.570-0.785 (µ = 0.664). An assessment algorithm was built for MDD to BD conversion that allows distinguishing as much as 100-fold risk differences among patients and validates well across multiple international data sources.


Subject(s)
Bipolar Disorder , Depressive Disorder, Major , Psychotic Disorders , Antidepressive Agents , Bipolar Disorder/complications , Bipolar Disorder/diagnosis , Bipolar Disorder/epidemiology , Depressive Disorder, Major/complications , Depressive Disorder, Major/diagnosis , Depressive Disorder, Major/epidemiology , Humans , Retrospective Studies
13.
Comput Methods Programs Biomed ; 211: 106394, 2021 Nov.
Article in English | MEDLINE | ID: mdl-34560604

ABSTRACT

BACKGROUND AND OBJECTIVE: As a response to the ongoing COVID-19 pandemic, several prediction models in the existing literature were rapidly developed, with the aim of providing evidence-based guidance. However, none of these COVID-19 prediction models have been found to be reliable. Models are commonly assessed to have a risk of bias, often due to insufficient reporting, use of non-representative data, and lack of large-scale external validation. In this paper, we present the Observational Health Data Sciences and Informatics (OHDSI) analytics pipeline for patient-level prediction modeling as a standardized approach for rapid yet reliable development and validation of prediction models. We demonstrate how our analytics pipeline and open-source software tools can be used to answer important prediction questions while limiting potential causes of bias (e.g., by validating phenotypes, specifying the target population, performing large-scale external validation, and publicly providing all analytical source code). METHODS: We show step-by-step how to implement the analytics pipeline for the question: 'In patients hospitalized with COVID-19, what is the risk of death 0 to 30 days after hospitalization?'. We develop models using six different machine learning methods in a USA claims database containing over 20,000 COVID-19 hospitalizations and externally validate the models using data containing over 45,000 COVID-19 hospitalizations from South Korea, Spain, and the USA. RESULTS: Our open-source software tools enabled us to efficiently go end-to-end from problem design to reliable Model Development and evaluation. When predicting death in patients hospitalized with COVID-19, AdaBoost, random forest, gradient boosting machine, and decision tree yielded similar or lower internal and external validation discrimination performance compared to L1-regularized logistic regression, whereas the MLP neural network consistently resulted in lower discrimination. L1-regularized logistic regression models were well calibrated. CONCLUSION: Our results show that following the OHDSI analytics pipeline for patient-level prediction modelling can enable the rapid development towards reliable prediction models. The OHDSI software tools and pipeline are open source and available to researchers from all around the world.


Subject(s)
COVID-19 , Pandemics , Humans , Logistic Models , Machine Learning , SARS-CoV-2
14.
BMC Med Res Methodol ; 21(1): 180, 2021 08 28.
Article in English | MEDLINE | ID: mdl-34454423

ABSTRACT

BACKGROUND: The goal of our study is to examine the impact of the lookback length when engineering features to use in developing predictive models using observational healthcare data. Using a longer lookback for feature engineering gives more insight about patients but increases the issue of left-censoring. METHODS: We used five US observational databases to develop patient-level prediction models. A target cohort of subjects with hypertensive drug exposures and outcome cohorts of subjects with acute (stroke and gastrointestinal bleeding) and chronic outcomes (diabetes and chronic kidney disease) were developed. Candidate predictors that exist on or prior to the target index date were derived within the following lookback periods: 14, 30, 90, 180, 365, 730, and all days prior to index were evaluated. We predicted the risk of outcomes occurring 1 day until 365 days after index. Ten lasso logistic models for each lookback period were generated to create a distribution of area under the curve (AUC) metrics to evaluate the discriminative performance of the models. Calibration intercept and slope were also calculated. Impact on external validation performance was investigated across five databases. RESULTS: The maximum differences in AUCs for the models developed using different lookback periods within a database was < 0.04 for diabetes (in MDCR AUC of 0.593 with 14-day lookback vs. AUC of 0.631 with all-time lookback) and 0.012 for renal impairment (in MDCR AUC of 0.675 with 30-day lookback vs. AUC of 0.687 with 365-day lookback ). For the acute outcomes, the max difference in AUC across lookbacks within a database was 0.015 (in MDCD AUC of 0.767 with 14-day lookback vs. AUC 0.782 with 365-day lookback) for stroke and < 0.03 for gastrointestinal bleeding (in CCAE AUC of 0.631 with 14-day lookback vs. AUC of 0.660 with 730-day lookback). CONCLUSIONS: In general the choice of covariate lookback had only a small impact on discrimination and calibration, with a short lookback (< 180 days) occasionally decreasing discrimination. Based on the results, if training a logistic regression model for prediction then using covariates with a 365 day lookback appear to be a good tradeoff between performance and interpretation.


Subject(s)
Stroke , Area Under Curve , Databases, Factual , Humans , Logistic Models , Time
15.
JMIR Med Inform ; 9(4): e21547, 2021 Apr 05.
Article in English | MEDLINE | ID: mdl-33661754

ABSTRACT

BACKGROUND: SARS-CoV-2 is straining health care systems globally. The burden on hospitals during the pandemic could be reduced by implementing prediction models that can discriminate patients who require hospitalization from those who do not. The COVID-19 vulnerability (C-19) index, a model that predicts which patients will be admitted to hospital for treatment of pneumonia or pneumonia proxies, has been developed and proposed as a valuable tool for decision-making during the pandemic. However, the model is at high risk of bias according to the "prediction model risk of bias assessment" criteria, and it has not been externally validated. OBJECTIVE: The aim of this study was to externally validate the C-19 index across a range of health care settings to determine how well it broadly predicts hospitalization due to pneumonia in COVID-19 cases. METHODS: We followed the Observational Health Data Sciences and Informatics (OHDSI) framework for external validation to assess the reliability of the C-19 index. We evaluated the model on two different target populations, 41,381 patients who presented with SARS-CoV-2 at an outpatient or emergency department visit and 9,429,285 patients who presented with influenza or related symptoms during an outpatient or emergency department visit, to predict their risk of hospitalization with pneumonia during the following 0-30 days. In total, we validated the model across a network of 14 databases spanning the United States, Europe, Australia, and Asia. RESULTS: The internal validation performance of the C-19 index had a C statistic of 0.73, and the calibration was not reported by the authors. When we externally validated it by transporting it to SARS-CoV-2 data, the model obtained C statistics of 0.36, 0.53 (0.473-0.584) and 0.56 (0.488-0.636) on Spanish, US, and South Korean data sets, respectively. The calibration was poor, with the model underestimating risk. When validated on 12 data sets containing influenza patients across the OHDSI network, the C statistics ranged between 0.40 and 0.68. CONCLUSIONS: Our results show that the discriminative performance of the C-19 index model is low for influenza cohorts and even worse among patients with COVID-19 in the United States, Spain, and South Korea. These results suggest that C-19 should not be used to aid decision-making during the COVID-19 pandemic. Our findings highlight the importance of performing external validation across a range of settings, especially when a prediction model is being extrapolated to a different population. In the field of prediction, extensive validation is required to create appropriate trust in a model.

16.
BMC Med Inform Decis Mak ; 21(1): 43, 2021 02 06.
Article in English | MEDLINE | ID: mdl-33549087

ABSTRACT

BACKGROUND: Researchers developing prediction models are faced with numerous design choices that may impact model performance. One key decision is how to include patients who are lost to follow-up. In this paper we perform a large-scale empirical evaluation investigating the impact of this decision. In addition, we aim to provide guidelines for how to deal with loss to follow-up. METHODS: We generate a partially synthetic dataset with complete follow-up and simulate loss to follow-up based either on random selection or on selection based on comorbidity. In addition to our synthetic data study we investigate 21 real-world data prediction problems. We compare four simple strategies for developing models when using a cohort design that encounters loss to follow-up. Three strategies employ a binary classifier with data that: (1) include all patients (including those lost to follow-up), (2) exclude all patients lost to follow-up or (3) only exclude patients lost to follow-up who do not have the outcome before being lost to follow-up. The fourth strategy uses a survival model with data that include all patients. We empirically evaluate the discrimination and calibration performance. RESULTS: The partially synthetic data study results show that excluding patients who are lost to follow-up can introduce bias when loss to follow-up is common and does not occur at random. However, when loss to follow-up was completely at random, the choice of addressing it had negligible impact on model discrimination performance. Our empirical real-world data results showed that the four design choices investigated to deal with loss to follow-up resulted in comparable performance when the time-at-risk was 1-year but demonstrated differential bias when we looked into 3-year time-at-risk. Removing patients who are lost to follow-up before experiencing the outcome but keeping patients who are lost to follow-up after the outcome can bias a model and should be avoided. CONCLUSION: Based on this study we therefore recommend (1) developing models using data that includes patients that are lost to follow-up and (2) evaluate the discrimination and calibration of models twice: on a test set including patients lost to follow-up and a test set excluding patients lost to follow-up.


Subject(s)
Lost to Follow-Up , Bias , Calibration , Cohort Studies , Humans , Prognosis
17.
J Am Med Inform Assoc ; 28(6): 1098-1107, 2021 06 12.
Article in English | MEDLINE | ID: mdl-33211841

ABSTRACT

OBJECTIVE: Cause of death is used as an important outcome of clinical research; however, access to cause-of-death data is limited. This study aimed to develop and validate a machine-learning model that predicts the cause of death from the patient's last medical checkup. MATERIALS AND METHODS: To classify the mortality status and each individual cause of death, we used a stacking ensemble method. The prediction outcomes were all-cause mortality, 8 leading causes of death in South Korea, and other causes. The clinical data of study populations were extracted from the national claims (n = 174 747) and electronic health records (n = 729 065) and were used for model development and external validation. Moreover, we imputed the cause of death from the data of 3 US claims databases (n = 994 518, 995 372, and 407 604, respectively). All databases were formatted to the Observational Medical Outcomes Partnership Common Data Model. RESULTS: The generalized area under the receiver operating characteristic curve (AUROC) of the model predicting the cause of death within 60 days was 0.9511. Moreover, the AUROC of the external validation was 0.8887. Among the causes of death imputed in the Medicare Supplemental database, 11.32% of deaths were due to malignant neoplastic disease. DISCUSSION: This study showed the potential of machine-learning models as a new alternative to address the lack of access to cause-of-death data. All processes were disclosed to maintain transparency, and the model was easily applicable to other institutions. CONCLUSION: A machine-learning model with competent performance was developed to predict cause of death.


Subject(s)
Cause of Death , Machine Learning , Models, Statistical , Area Under Curve , Databases, Factual , Decision Support Systems, Clinical , Humans , Medical Records Systems, Computerized , Observation , Prognosis , ROC Curve , Republic of Korea/epidemiology , United States
18.
BMC Med Res Methodol ; 20(1): 102, 2020 05 06.
Article in English | MEDLINE | ID: mdl-32375693

ABSTRACT

BACKGROUND: To demonstrate how the Observational Healthcare Data Science and Informatics (OHDSI) collaborative network and standardization can be utilized to scale-up external validation of patient-level prediction models by enabling validation across a large number of heterogeneous observational healthcare datasets. METHODS: Five previously published prognostic models (ATRIA, CHADS2, CHADS2VASC, Q-Stroke and Framingham) that predict future risk of stroke in patients with atrial fibrillation were replicated using the OHDSI frameworks. A network study was run that enabled the five models to be externally validated across nine observational healthcare datasets spanning three countries and five independent sites. RESULTS: The five existing models were able to be integrated into the OHDSI framework for patient-level prediction and they obtained mean c-statistics ranging between 0.57-0.63 across the 6 databases with sufficient data to predict stroke within 1 year of initial atrial fibrillation diagnosis for females with atrial fibrillation. This was comparable with existing validation studies. The validation network study was run across nine datasets within 60 days once the models were replicated. An R package for the study was published at https://github.com/OHDSI/StudyProtocolSandbox/tree/master/ExistingStrokeRiskExternalValidation. CONCLUSION: This study demonstrates the ability to scale up external validation of patient-level prediction models using a collaboration of researchers and a data standardization that enable models to be readily shared across data sites. External validation is necessary to understand the transportability or reproducibility of a prediction model, but without collaborative approaches it can take three or more years for a model to be validated by one independent researcher. In this paper we show it is possible to both scale-up and speed-up external validation by showing how validation can be done across multiple databases in less than 2 months. We recommend that researchers developing new prediction models use the OHDSI network to externally validate their models.


Subject(s)
Atrial Fibrillation , Stroke , Atrial Fibrillation/diagnosis , Atrial Fibrillation/epidemiology , Feasibility Studies , Female , Humans , Prognosis , Reproducibility of Results , Stroke/diagnosis , Stroke/epidemiology
19.
Drug Saf ; 43(5): 447-455, 2020 05.
Article in English | MEDLINE | ID: mdl-31939079

ABSTRACT

INTRODUCTION: In observational studies with mortality endpoints, one needs to consider how to account for subjects whose interventions appear to be part of 'end-of-life' care. OBJECTIVE: The objective of this study was to develop a diagnostic predictive model to identify those in end-of-life care at the time of a drug exposure. METHODS: We used data from four administrative claims datasets from 2000 to 2017. The index date was the date of the first prescription for the last new drug subjects received during their observation period. The outcome of end-of-life care was determined by the presence of one or more codes indicating terminal or hospice care. Models were developed using regularized logistic regression. Internal validation was through examination of the area under the receiver operating characteristic curve (AUC) and through model calibration in a 25% subset of the data held back from model training. External validation was through examination of the AUC after applying the model learned on one dataset to the three other datasets. RESULTS: The models showed excellent performance characteristics. Internal validation resulted in AUCs ranging from 0.918 (95% confidence interval [CI] 0.905-0.930) to 0.983 (95% CI 0.978-0.987) for the four different datasets. Calibration results were also very good, with slopes near unity. External validation also produced very good to excellent performance metrics, with AUCs ranging from 0.840 (95% CI 0.834-0.846) to 0.956 (95% CI 0.952-0.960). CONCLUSION: These results show that developing diagnostic predictive models for determining subjects in end-of-life care at the time of a drug treatment is possible and may improve the validity of the risk profile for those treatments.


Subject(s)
Databases, Factual , Models, Theoretical , Terminal Care , Adult , Aged , Aged, 80 and over , Female , Humans , Male , Middle Aged
20.
PLoS One ; 15(1): e0226718, 2020.
Article in English | MEDLINE | ID: mdl-31910437

ABSTRACT

BACKGROUND AND PURPOSE: Hemorrhagic transformation (HT) after cerebral infarction is a complex and multifactorial phenomenon in the acute stage of ischemic stroke, and often results in a poor prognosis. Thus, identifying risk factors and making an early prediction of HT in acute cerebral infarction contributes not only to the selections of therapeutic regimen but also, more importantly, to the improvement of prognosis of acute cerebral infarction. The purpose of this study was to develop and validate a model to predict a patient's risk of HT within 30 days of initial ischemic stroke. METHODS: We utilized a retrospective multicenter observational cohort study design to develop a Lasso Logistic Regression prediction model with a large, US Electronic Health Record dataset which structured to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). To examine clinical transportability, the model was externally validated across 10 additional real-world healthcare datasets include EHR records for patients from America, Europe and Asia. RESULTS: In the database the model was developed, the target population cohort contained 621,178 patients with ischemic stroke, of which 5,624 patients had HT within 30 days following initial ischemic stroke. 612 risk predictors, including the distance a patient travels in an ambulance to get to care for a HT, were identified. An area under the receiver operating characteristic curve (AUC) of 0.75 was achieved in the internal validation of the risk model. External validation was performed across 10 databases totaling 5,515,508 patients with ischemic stroke, of which 86,401 patients had HT within 30 days following initial ischemic stroke. The mean external AUC was 0.71 and ranged between 0.60-0.78. CONCLUSIONS: A HT prognostic predict model was developed with Lasso Logistic Regression based on routinely collected EMR data. This model can identify patients who have a higher risk of HT than the population average with an AUC of 0.78. It shows the OMOP CDM is an appropriate data standard for EMR secondary use in clinical multicenter research for prognostic prediction model development and validation. In the future, combining this model with clinical information systems will assist clinicians to make the right therapy decision for patients with acute ischemic stroke.


Subject(s)
Brain Ischemia/complications , Cerebral Hemorrhage/diagnosis , Models, Statistical , Risk Assessment/methods , Stroke/complications , Cerebral Hemorrhage/etiology , Female , Follow-Up Studies , Humans , Male , Middle Aged , Prognosis , ROC Curve , Retrospective Studies , Risk Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...