Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 150
Filter
Add more filters

Publication year range
1.
Am J Epidemiol ; 2024 May 17.
Article in English | MEDLINE | ID: mdl-38754869

ABSTRACT

We spend a great deal of time on confounding in our teaching, in our methods development and in our assessment of study results. This may give the impression that uncontrolled confounding is the biggest problem that observational epidemiology faces, when in fact, other sources of bias such as selection bias, measurement error, missing data, and misalignment of zero time may often (especially if they are all present in a single study) lead to a stronger deviation from the truth. Compared to the amount of time we spend teaching how to address confounding in a data analysis, we spend relatively little time teaching methods for simulating confounding (and other sources of bias) to learn their impact and develop plans to mitigate or quantify the bias. We review a paper by Desai et al that uses simulation methods to quantify the impact of an unmeasured confounder when it is completely missing or when a proxy of the confounder is measured. We use this article to discuss how we can use simulations of sources of bias to ensure we generate better and more valid study estimates, and we discuss the importance of simulating realistic datasets with plausible bias structures to guide data collection. If an advanced life form exists outside of our current universe and they came to earth with the goal of scouring the published epidemiologic literature to understand what the biggest problem epidemiologists have, they would quickly discover that the limitations section of publications would provide them with all the information they needed. And most likely what they would conclude is that the biggest problem that we face is uncontrolled confounding. It seems to be an obsession of ours.

2.
Stat Med ; 43(6): 1119-1134, 2024 Mar 15.
Article in English | MEDLINE | ID: mdl-38189632

ABSTRACT

Tuning hyperparameters, such as the regularization parameter in Ridge or Lasso regression, is often aimed at improving the predictive performance of risk prediction models. In this study, various hyperparameter tuning procedures for clinical prediction models were systematically compared and evaluated in low-dimensional data. The focus was on out-of-sample predictive performance (discrimination, calibration, and overall prediction error) of risk prediction models developed using Ridge, Lasso, Elastic Net, or Random Forest. The influence of sample size, number of predictors and events fraction on performance of the hyperparameter tuning procedures was studied using extensive simulations. The results indicate important differences between tuning procedures in calibration performance, while generally showing similar discriminative performance. The one-standard-error rule for tuning applied to cross-validation (1SE CV) often resulted in severe miscalibration. Standard non-repeated and repeated cross-validation (both 5-fold and 10-fold) performed similarly well and outperformed the other tuning procedures. Bootstrap showed a slight tendency to more severe miscalibration than standard cross-validation-based tuning procedures. Differences between tuning procedures were larger for smaller sample sizes, lower events fractions and fewer predictors. These results imply that the choice of tuning procedure can have a profound influence on the predictive performance of prediction models. The results support the application of standard 5-fold or 10-fold cross-validation that minimizes out-of-sample prediction error. Despite an increased computational burden, we found no clear benefit of repeated over non-repeated cross-validation for hyperparameter tuning. We warn against the potentially detrimental effects on model calibration of the popular 1SE CV rule for tuning prediction models in low-dimensional settings.


Subject(s)
Research Design , Humans , Computer Simulation , Sample Size
3.
J Biomed Inform ; 155: 104666, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38848886

ABSTRACT

OBJECTIVE: Class imbalance is sometimes considered a problem when developing clinical prediction models and assessing their performance. To address it, correction strategies involving manipulations of the training dataset, such as random undersampling or oversampling, are frequently used. The aim of this article is to illustrate the consequences of these class imbalance correction strategies on clinical prediction models' internal validity in terms of calibration and discrimination performances. METHODS: We used both heuristic intuition and formal mathematical reasoning to characterize the relations between conditional probabilities of interest and probabilities targeted when using random undersampling or oversampling. We propose a plug-in estimator that represents a natural correction for predictions obtained from models that have been trained on artificially balanced datasets ("naïve" models). We conducted a Monte Carlo simulation with two different data generation processes and present a real-world example using data from the International Stroke Trial database to empirically demonstrate the consequences of applying random resampling techniques for class imbalance correction on calibration and discrimination (in terms of Area Under the ROC, AUC) for logistic regression and tree-based prediction models. RESULTS: Across our simulations and in the real-world example, calibration of the naïve models was very poor. The models using the plug-in estimator generally outperformed the models relying on class imbalance correction in terms of calibration while achieving the same discrimination performance. CONCLUSION: Random resampling techniques for class imbalance correction do not generally improve discrimination performance (i.e., AUC), and their use is hard to justify when aiming at providing calibrated predictions. Improper use of such class imbalance correction techniques can lead to suboptimal data usage and less valid risk prediction models.


Subject(s)
Monte Carlo Method , Humans , Calibration , ROC Curve , Models, Statistical , Area Under Curve , Computer Simulation , Logistic Models , Algorithms , Risk Assessment/methods
4.
Ann Intern Med ; 176(1): 105-114, 2023 01.
Article in English | MEDLINE | ID: mdl-36571841

ABSTRACT

Risk prediction models need thorough validation to assess their performance. Validation of models for survival outcomes poses challenges due to the censoring of observations and the varying time horizon at which predictions can be made. This article describes measures to evaluate predictions and the potential improvement in decision making from survival models based on Cox proportional hazards regression.As a motivating case study, the authors consider the prediction of the composite outcome of recurrence or death (the "event") in patients with breast cancer after surgery. They developed a simple Cox regression model with 3 predictors, as in the Nottingham Prognostic Index, in 2982 women (1275 events over 5 years of follow-up) and externally validated this model in 686 women (285 events over 5 years). Improvement in performance was assessed after the addition of progesterone receptor as a prognostic biomarker.The model predictions can be evaluated across the full range of observed follow-up times or for the event occurring by the end of a fixed time horizon of interest. The authors first discuss recommended statistical measures that evaluate model performance in terms of discrimination, calibration, or overall performance. Further, they evaluate the potential clinical utility of the model to support clinical decision making according to a net benefit measure. They provide SAS and R code to illustrate internal and external validation.The authors recommend the proposed set of performance measures for transparent reporting of the validity of predictions from survival models.


Subject(s)
Breast Neoplasms , Humans , Female , Proportional Hazards Models , Prognosis
5.
Eur Heart J ; 44(46): 4831-4834, 2023 Dec 07.
Article in English | MEDLINE | ID: mdl-37897346

ABSTRACT

To raise the quality of clinical artificial intelligence (AI) prediction modelling studies in the cardiovascular health domain and thereby improve their impact and relevancy, the editors for digital health, innovation, and quality standards of the European Heart Journal propose five minimal quality criteria for AI-based prediction model development and validation studies: complete reporting, carefully defined intended use of the model, rigorous validation, large enough sample size, and openness of code and software.


Subject(s)
Artificial Intelligence , Software , Humans , Heart
6.
Eur Heart J ; 44(32): 3073-3081, 2023 08 22.
Article in English | MEDLINE | ID: mdl-37452732

ABSTRACT

AIMS: Risk stratification is used for decisions regarding need for imaging in patients with clinically suspected acute pulmonary embolism (PE). The aim was to develop a clinical prediction model that provides an individualized, accurate probability estimate for the presence of acute PE in patients with suspected disease based on readily available clinical items and D-dimer concentrations. METHODS AND RESULTS: An individual patient data meta-analysis was performed based on sixteen cross-sectional or prospective studies with data from 28 305 adult patients with clinically suspected PE from various clinical settings, including primary care, emergency care, hospitalized and nursing home patients. A multilevel logistic regression model was built and validated including ten a priori defined objective candidate predictors to predict objectively confirmed PE at baseline or venous thromboembolism (VTE) during follow-up of 30 to 90 days. Multiple imputation was used for missing data. Backward elimination was performed with a P-value <0.10. Discrimination (c-statistic with 95% confidence intervals [CI] and prediction intervals [PI]) and calibration (outcome:expected [O:E] ratio and calibration plot) were evaluated based on internal-external cross-validation. The accuracy of the model was subsequently compared with algorithms based on the Wells score and D-dimer testing. The final model included age (in years), sex, previous VTE, recent surgery or immobilization, haemoptysis, cancer, clinical signs of deep vein thrombosis, inpatient status, D-dimer (in µg/L), and an interaction term between age and D-dimer. The pooled c-statistic was 0.87 (95% CI, 0.85-0.89; 95% PI, 0.77-0.93) and overall calibration was very good (pooled O:E ratio, 0.99; 95% CI, 0.87-1.14; 95% PI, 0.55-1.79). The model slightly overestimated VTE probability in the lower range of estimated probabilities. Discrimination of the current model in the validation data sets was better than that of the Wells score combined with a D-dimer threshold based on age (c-statistic 0.73; 95% CI, 0.70-0.75) or structured clinical pretest probability (c-statistic 0.79; 95% CI, 0.76-0.81). CONCLUSION: The present model provides an absolute, individualized probability of PE presence in a broad population of patients with suspected PE, with very good discrimination and calibration. Its clinical utility needs to be evaluated in a prospective management or impact study. REGISTRATION: PROSPERO ID 89366.


Subject(s)
Pulmonary Embolism , Venous Thromboembolism , Adult , Humans , Venous Thromboembolism/diagnosis , Venous Thromboembolism/epidemiology , Prospective Studies , Cross-Sectional Studies , Models, Statistical , Prognosis , Pulmonary Embolism/diagnosis , Pulmonary Embolism/epidemiology , Fibrin Fibrinogen Degradation Products/analysis
7.
Biom J ; 66(1): e2200108, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37199142

ABSTRACT

Logistic regression is one of the most commonly used approaches to develop clinical risk prediction models. Developers of such models often rely on approaches that aim to minimize the risk of overfitting and improve predictive performance of the logistic model, such as through likelihood penalization and variance decomposition techniques. We present an extensive simulation study that compares the out-of-sample predictive performance of risk prediction models derived using the elastic net, with Lasso and ridge as special cases, and variance decomposition techniques, namely, incomplete principal component regression and incomplete partial least squares regression. We varied the expected events per variable, event fraction, number of candidate predictors, presence of noise predictors, and the presence of sparse predictors in a full-factorial design. Predictive performance was compared on measures of discrimination, calibration, and prediction error. Simulation metamodels were derived to explain the performance differences within model derivation approaches. Our results indicate that, on average, prediction models developed using penalization and variance decomposition approaches outperform models developed using ordinary maximum likelihood estimation, with penalization approaches being consistently superior over the variance decomposition approaches. Differences in performance were most pronounced on the calibration of the model. Performance differences regarding prediction error and concordance statistic outcomes were often small between approaches. The use of likelihood penalization and variance decomposition techniques methods was illustrated in the context of peripheral arterial disease.


Subject(s)
Research Design , Computer Simulation , Logistic Models , Probability , Least-Squares Analysis
8.
BMC Med ; 21(1): 70, 2023 02 24.
Article in English | MEDLINE | ID: mdl-36829188

ABSTRACT

BACKGROUND: Clinical prediction models should be validated before implementation in clinical practice. But is favorable performance at internal validation or one external validation sufficient to claim that a prediction model works well in the intended clinical context? MAIN BODY: We argue to the contrary because (1) patient populations vary, (2) measurement procedures vary, and (3) populations and measurements change over time. Hence, we have to expect heterogeneity in model performance between locations and settings, and across time. It follows that prediction models are never truly validated. This does not imply that validation is not important. Rather, the current focus on developing new models should shift to a focus on more extensive, well-conducted, and well-reported validation studies of promising models. CONCLUSION: Principled validation strategies are needed to understand and quantify heterogeneity, monitor performance over time, and update prediction models when appropriate. Such strategies will help to ensure that prediction models stay up-to-date and safe to support clinical decision-making.

9.
Ann Intern Med ; 175(2): 244-255, 2022 Feb.
Article in English | MEDLINE | ID: mdl-34904857

ABSTRACT

BACKGROUND: How diagnostic strategies for suspected pulmonary embolism (PE) perform in relevant patient subgroups defined by sex, age, cancer, and previous venous thromboembolism (VTE) is unknown. PURPOSE: To evaluate the safety and efficiency of the Wells and revised Geneva scores combined with fixed and adapted D-dimer thresholds, as well as the YEARS algorithm, for ruling out acute PE in these subgroups. DATA SOURCES: MEDLINE from 1 January 1995 until 1 January 2021. STUDY SELECTION: 16 studies assessing at least 1 diagnostic strategy. DATA EXTRACTION: Individual-patient data from 20 553 patients. DATA SYNTHESIS: Safety was defined as the diagnostic failure rate (the predicted 3-month VTE incidence after exclusion of PE without imaging at baseline). Efficiency was defined as the proportion of individuals classified by the strategy as "PE considered excluded" without imaging tests. Across all strategies, efficiency was highest in patients younger than 40 years (47% to 68%) and lowest in patients aged 80 years or older (6.0% to 23%) or patients with cancer (9.6% to 26%). However, efficiency improved considerably in these subgroups when pretest probability-dependent D-dimer thresholds were applied. Predicted failure rates were highest for strategies with adapted D-dimer thresholds, with failure rates varying between 2% and 4% in the predefined patient subgroups. LIMITATIONS: Between-study differences in scoring predictor items and D-dimer assays, as well as the presence of differential verification bias, in particular for classifying fatal events and subsegmental PE cases, all of which may have led to an overestimation of the predicted failure rates of adapted D-dimer thresholds. CONCLUSION: Overall, all strategies showed acceptable safety, with pretest probability-dependent D-dimer thresholds having not only the highest efficiency but also the highest predicted failure rate. From an efficiency perspective, this individual-patient data meta-analysis supports application of adapted D-dimer thresholds. PRIMARY FUNDING SOURCE: Dutch Research Council. (PROSPERO: CRD42018089366).


Subject(s)
Neoplasms , Pulmonary Embolism , Venous Thromboembolism , Fibrin Fibrinogen Degradation Products , Humans , Neoplasms/complications , Neoplasms/diagnosis , Probability , Pulmonary Embolism/diagnosis , Pulmonary Embolism/epidemiology , Venous Thromboembolism/diagnosis , Venous Thromboembolism/epidemiology
10.
Eur Heart J ; 43(4): 271-279, 2022 01 31.
Article in English | MEDLINE | ID: mdl-34974610

ABSTRACT

This article presents some of the most important developments in the field of digital medicine that have appeared over the last 12 months and are related to cardiovascular medicine. The article consists of three main sections, as follows: (i) artificial intelligence-enabled cardiovascular diagnostic tools, techniques, and methodologies, (ii) big data and prognostic models for cardiovascular risk protection, and (iii) wearable devices in cardiovascular risk assessment, cardiovascular disease prevention, diagnosis, and management. To conclude the article, the authors present a brief further prospective on this new domain, highlighting existing gaps that are specifically related to artificial intelligence technologies, such as explainability, cost-effectiveness, and, of course, the importance of proper regulatory oversight for each clinical implementation.


Subject(s)
Cardiovascular System , Wearable Electronic Devices , Artificial Intelligence , Big Data , Humans , Precision Medicine
11.
Eur Heart J ; 43(31): 2921-2930, 2022 08 14.
Article in English | MEDLINE | ID: mdl-35639667

ABSTRACT

The medical field has seen a rapid increase in the development of artificial intelligence (AI)-based prediction models. With the introduction of such AI-based prediction model tools and software in cardiovascular patient care, the cardiovascular researcher and healthcare professional are challenged to understand the opportunities as well as the limitations of the AI-based predictions. In this article, we present 12 critical questions for cardiovascular health professionals to ask when confronted with an AI-based prediction model. We aim to support medical professionals to distinguish the AI-based prediction models that can add value to patient care from the AI that does not.


Subject(s)
Artificial Intelligence , Cardiovascular Diseases , Health Personnel , Humans , Software
12.
Biom J ; 65(8): e2300069, 2023 12.
Article in English | MEDLINE | ID: mdl-37775940

ABSTRACT

The marginality principle guides analysts to avoid omitting lower-order terms from models in which higher-order terms are included as covariates. Lower-order terms are viewed as "marginal" to higher-order terms. We consider how this principle applies to three cases: regression models that may include the ratio of two measured variables; polynomial transformations of a measured variable; and factorial arrangements of defined interventions. For each case, we show that which terms or transformations are considered to be lower-order, and therefore marginal, depends on the scale of measurement, which is frequently arbitrary. Understanding the implications of this point leads to an intuitive understanding of the curse of dimensionality. We conclude that the marginality principle may be useful to analysts in some specific cases but caution against invoking it as a context-free recipe.


Subject(s)
Algorithms , Regression Analysis
13.
PLoS Med ; 19(1): e1003905, 2022 01.
Article in English | MEDLINE | ID: mdl-35077453

ABSTRACT

BACKGROUND: The challenging clinical dilemma of detecting pulmonary embolism (PE) in suspected patients is encountered in a variety of healthcare settings. We hypothesized that the optimal diagnostic approach to detect these patients in terms of safety and efficiency depends on underlying PE prevalence, case mix, and physician experience, overall reflected by the type of setting where patients are initially assessed. The objective of this study was to assess the capability of ruling out PE by available diagnostic strategies across all possible settings. METHODS AND FINDINGS: We performed a literature search (MEDLINE) followed by an individual patient data (IPD) meta-analysis (MA; 23 studies), including patients from self-referral emergency care (n = 12,612), primary healthcare clinics (n = 3,174), referred secondary care (n = 17,052), and hospitalized or nursing home patients (n = 2,410). Multilevel logistic regression was performed to evaluate diagnostic performance of the Wells and revised Geneva rules, both using fixed and adapted D-dimer thresholds to age or pretest probability (PTP), for the YEARS algorithm and for the Pulmonary Embolism Rule-out Criteria (PERC). All strategies were tested separately in each healthcare setting. Following studies done in this field, the primary diagnostic metrices estimated from the models were the "failure rate" of each strategy-i.e., the proportion of missed PE among patients categorized as "PE excluded" and "efficiency"-defined as the proportion of patients categorized as "PE excluded" among all patients. In self-referral emergency care, the PERC algorithm excludes PE in 21% of suspected patients at a failure rate of 1.12% (95% confidence interval [CI] 0.74 to 1.70), whereas this increases to 6.01% (4.09 to 8.75) in referred patients to secondary care at an efficiency of 10%. In patients from primary healthcare and those referred to secondary care, strategies adjusting D-dimer to PTP are the most efficient (range: 43% to 62%) at a failure rate ranging between 0.25% and 3.06%, with higher failure rates observed in patients referred to secondary care. For this latter setting, strategies adjusting D-dimer to age are associated with a lower failure rate ranging between 0.65% and 0.81%, yet are also less efficient (range: 33% and 35%). For all strategies, failure rates are highest in hospitalized or nursing home patients, ranging between 1.68% and 5.13%, at an efficiency ranging between 15% and 30%. The main limitation of the primary analyses was that the diagnostic performance of each strategy was compared in different sets of studies since the availability of items used in each diagnostic strategy differed across included studies; however, sensitivity analyses suggested that the findings were robust. CONCLUSIONS: The capability of safely and efficiently ruling out PE of available diagnostic strategies differs for different healthcare settings. The findings of this IPD MA help in determining the optimum diagnostic strategies for ruling out PE per healthcare setting, balancing the trade-off between failure rate and efficiency of each strategy.


Subject(s)
Data Interpretation, Statistical , Delivery of Health Care/methods , Pulmonary Embolism/diagnosis , Pulmonary Embolism/epidemiology , Delivery of Health Care/statistics & numerical data , Humans , Pulmonary Embolism/therapy
14.
Am J Epidemiol ; 191(2): 282-286, 2022 01 24.
Article in English | MEDLINE | ID: mdl-34613347

ABSTRACT

In this brief communication, we discuss the confusion of mortality with fatality in the interpretation of evidence in the coronavirus disease 2019 (COVID-19) pandemic, and how this confusion affects the translation of science into policy and practice. We discuss how this confusion has influenced COVID-19 policy in France, Sweden, and the United Kingdom and discuss the implications for decision-making about COVID-19 vaccine distribution. We also discuss how this confusion is an example of a more general statistical fallacy we term the "Missing Link Fallacy."


Subject(s)
COVID-19/mortality , Health Policy , Policy Making , Vulnerable Populations , Epidemiologic Studies , Humans , Risk , SARS-CoV-2
15.
Am J Epidemiol ; 191(5): 886-899, 2022 03 24.
Article in English | MEDLINE | ID: mdl-35015809

ABSTRACT

Visceral adipose tissue (VAT) is a strong prognostic factor for cardiovascular disease and a potential target for cardiovascular risk stratification. Because VAT is difficult to measure in clinical practice, we estimated prediction models with predictors routinely measured in general practice and VAT as outcome using ridge regression in 2,501 middle-aged participants from the Netherlands Epidemiology of Obesity study, 2008-2012. Adding waist circumference and other anthropometric measurements on top of the routinely measured variables improved the optimism-adjusted R2 from 0.50 to 0.58 with a decrease in the root-mean-square error (RMSE) from 45.6 to 41.5 cm2 and with overall good calibration. Further addition of predominantly lipoprotein-related metabolites from the Nightingale platform did not improve the optimism-corrected R2 and RMSE. The models were externally validated in 370 participants from the Prospective Investigation of Vasculature in Uppsala Seniors (PIVUS, 2006-2009) and 1,901 participants from the Multi-Ethnic Study of Atherosclerosis (MESA, 2000-2007). Performance was comparable to the development setting in PIVUS (R2 = 0.63, RMSE = 42.4 cm2, calibration slope = 0.94) but lower in MESA (R2 = 0.44, RMSE = 60.7 cm2, calibration slope = 0.75). Our findings indicate that the estimation of VAT with routine clinical measurements can be substantially improved by incorporating waist circumference but not by metabolite measurements.


Subject(s)
Intra-Abdominal Fat , Obesity , Adipose Tissue , Body Mass Index , Humans , Metabolomics , Middle Aged , Obesity/epidemiology , Prospective Studies , Waist Circumference
16.
Thorax ; 77(9): 873-881, 2022 09.
Article in English | MEDLINE | ID: mdl-34556554

ABSTRACT

BACKGROUND: Cystic fibrosis (CF) is a life-threatening genetic disease, affecting around 10 500 people in the UK. Precision medicines have been developed to treat specific CF-gene mutations. The newest, elexacaftor/tezacaftor/ivacaftor (ELEX/TEZ/IVA), has been found to be highly effective in randomised controlled trials (RCTs) and became available to a large proportion of UK CF patients in 2020. Understanding the potential health economic impacts of ELEX/TEZ/IVA is vital to planning service provision. METHODS: We combined observational UK CF Registry data with RCT results to project the impact of ELEX/TEZ/IVA on total days of intravenous (IV) antibiotic treatment at a population level. Registry data from 2015 to 2017 were used to develop prediction models for IV days over a 1-year period using several predictors, and to estimate 1-year population total IV days based on standards of care pre-ELEX/TEZ/IVA. We considered two approaches to imposing the impact of ELEX/TEZ/IVA on projected outcomes using effect estimates from RCTs: approach 1 based on effect estimates on FEV1% and approach 2 based on effect estimates on exacerbation rate. RESULTS: ELEX/TEZ/IVA is expected to result in significant reductions in population-level requirements for IV antibiotics of 16.1% (~17 800 days) using approach 1 and 43.6% (~39 500 days) using approach 2. The two approaches require different assumptions. Increased understanding of the mechanisms through which ELEX/TEZ/IVA acts on these outcomes would enable further refinements to our projections. CONCLUSIONS: This work contributes to increased understanding of the changing healthcare needs of people with CF and illustrates how Registry data can be used in combination with RCT evidence to estimate population-level treatment impacts.


Subject(s)
Cystic Fibrosis , Aminophenols/therapeutic use , Anti-Bacterial Agents/therapeutic use , Benzodioxoles/therapeutic use , Cystic Fibrosis/drug therapy , Cystic Fibrosis/genetics , Cystic Fibrosis Transmembrane Conductance Regulator/genetics , Humans , Mutation , Observational Studies as Topic , Randomized Controlled Trials as Topic , Registries
17.
Eur Respir J ; 59(2)2022 02.
Article in English | MEDLINE | ID: mdl-34172467

ABSTRACT

INTRODUCTION: The individual prognostic factors for coronavirus disease 2019 (COVID-19) are unclear. For this reason, we aimed to present a state-of-the-art systematic review and meta-analysis on the prognostic factors for adverse outcomes in COVID-19 patients. METHODS: We systematically reviewed PubMed from 1 January 2020 to 26 July 2020 to identify non-overlapping studies examining the association of any prognostic factor with any adverse outcome in patients with COVID-19. Random-effects meta-analysis was performed, and between-study heterogeneity was quantified using I2 statistic. Presence of small-study effects was assessed by applying the Egger's regression test. RESULTS: We identified 428 eligible articles, which were used in a total of 263 meta-analyses examining the association of 91 unique prognostic factors with 11 outcomes. Angiotensin-converting enzyme inhibitors, obstructive sleep apnoea, pharyngalgia, history of venous thromboembolism, sex, coronary heart disease, cancer, chronic liver disease, COPD, dementia, any immunosuppressive medication, peripheral arterial disease, rheumatological disease and smoking were associated with at least one outcome and had >1000 events, p<0.005, I2<50%, 95% prediction interval excluding the null value, and absence of small-study effects in the respective meta-analysis. The risk of bias assessment using the Quality in Prognosis Studies tool indicated high risk of bias in 302 out of 428 articles for study participation, 389 articles for adjustment for other prognostic factors and 396 articles for statistical analysis and reporting. CONCLUSIONS: Our findings could be used for prognostic model building and guide patient selection for randomised clinical trials.


Subject(s)
COVID-19 , Bias , Humans , Prognosis , SARS-CoV-2
18.
Stat Med ; 41(8): 1334-1360, 2022 04 15.
Article in English | MEDLINE | ID: mdl-34897756

ABSTRACT

Calibration is a vital aspect of the performance of risk prediction models, but research in the context of ordinal outcomes is scarce. This study compared calibration measures for risk models predicting a discrete ordinal outcome, and investigated the impact of the proportional odds assumption on calibration and overfitting. We studied the multinomial, cumulative, adjacent category, continuation ratio, and stereotype logit/logistic models. To assess calibration, we investigated calibration intercepts and slopes, calibration plots, and the estimated calibration index. Using large sample simulations, we studied the performance of models for risk estimation under various conditions, assuming that the true model has either a multinomial logistic form or a cumulative logit proportional odds form. Small sample simulations were used to compare the tendency for overfitting between models. As a case study, we developed models to diagnose the degree of coronary artery disease (five categories) in symptomatic patients. When the true model was multinomial logistic, proportional odds models often yielded poor risk estimates, with calibration slopes deviating considerably from unity even on large model development datasets. The stereotype logistic model improved the calibration slope, but still provided biased risk estimates for individual patients. When the true model had a cumulative logit proportional odds form, multinomial logistic regression provided biased risk estimates, although these biases were modest. Nonproportional odds models require more parameters to be estimated from the data, and hence suffered more from overfitting. Despite larger sample size requirements, we generally recommend multinomial logistic regression for risk prediction modeling of discrete ordinal outcomes.


Subject(s)
Calibration , Humans , Logistic Models , Probability , Sample Size
19.
Stat Med ; 41(7): 1280-1295, 2022 03 30.
Article in English | MEDLINE | ID: mdl-34915593

ABSTRACT

Previous articles in Statistics in Medicine describe how to calculate the sample size required for external validation of prediction models with continuous and binary outcomes. The minimum sample size criteria aim to ensure precise estimation of key measures of a model's predictive performance, including measures of calibration, discrimination, and net benefit. Here, we extend the sample size guidance to prediction models with a time-to-event (survival) outcome, to cover external validation in datasets containing censoring. A simulation-based framework is proposed, which calculates the sample size required to target a particular confidence interval width for the calibration slope measuring the agreement between predicted risks (from the model) and observed risks (derived using pseudo-observations to account for censoring) on the log cumulative hazard scale. Precise estimation of calibration curves, discrimination, and net-benefit can also be checked in this framework. The process requires assumptions about the validation population in terms of the (i) distribution of the model's linear predictor and (ii) event and censoring distributions. Existing information can inform this; in particular, the linear predictor distribution can be approximated using the C-index or Royston's D statistic from the model development article, together with the overall event risk. We demonstrate how the approach can be used to calculate the sample size required to validate a prediction model for recurrent venous thromboembolism. Ideally the sample size should ensure precise calibration across the entire range of predicted risks, but must at least ensure adequate precision in regions important for clinical decision-making. Stata and R code are provided.


Subject(s)
Models, Statistical , Calibration , Computer Simulation , Humans , Prognosis , Sample Size
20.
Acta Oncol ; 61(5): 560-565, 2022 May.
Article in English | MEDLINE | ID: mdl-35253593

ABSTRACT

INTRODUCTION: The Memorial Sloan Kettering Cancer Centre (MSKCC) nomogram has been developed to estimate five-year overall survival (OS) after curative-intent surgery of colon cancer based on age, sex, T stage, differentiation grade, number of positive and examined regional lymph nodes. This is the first evaluation of the performance of the MSKCC model in a European population regarding prediction of OS. MATERIAL AND METHODS: Population-based data from patients with stage I-III colon cancer diagnosed between 2010 and 2016 were obtained from the Netherlands Cancer Registry (NCR) for external validation of the MSKCC prediction model. Five-year survival probabilities were estimated for all patients in our dataset by using the MSKCC prediction equation. Histogram density plots were created to depict the distribution of the estimated probability and prognostic index. The performance of the model was evaluated in terms of its overall performance, discrimination, and calibration. RESULTS: A total of 39,805 patients were included. Five-year OS was 71.9% (95% CI 71.5; 72.3) (11,051 events) with a median follow up of 5.6 years (IQR 4.1; 7.7). The Brier score was 0.10 (95% CI 0.10; 0.10). The C-index was 0.75 (95% CI 0.75; 0.76). The calibration measures and plot indicated that the model slightly overestimated observed mortality (observed/expected ratio = 0.86 [95% CI 0.86; 0.87], calibration intercept = -0.14 [95% CI -0.16; -0.11], and slope 1.07 [95% CI 1.05; 1.09], ICI = 0.04, E50 = 0.04, and E90 = 0.05). CONCLUSIONS: The external validation of the MSKCC prediction nomogram in a large Dutch cohort supports the use of this practical tool in the European patient population. These personalised estimated survival probabilities may support clinicians when informing patients about prognosis. Adding potential relevant prognostic factors to the model, such as primary tumour location, might further improve the model.


Subject(s)
Colonic Neoplasms , Nomograms , Calibration , Cohort Studies , Colonic Neoplasms/surgery , Humans , Neoplasm Staging , Prognosis
SELECTION OF CITATIONS
SEARCH DETAIL