RESUMO
BACKGROUND: Differences in clinical presentation of acute ischemic stroke between men and women may affect prehospital identification of anterior circulation large vessel occlusion (aLVO). We assessed sex differences in diagnostic performance of 8 prehospital scales to detect aLVO. METHODS: We analyzed pooled individual patient data from 2 prospective cohort studies (LPSS [Leiden Prehospital Stroke Study] and PRESTO [Prehospital Triage of Patients With Suspected Stroke Study]) conducted in the Netherlands between 2018 and 2019, including consecutive patients ≥18 years suspected of acute stroke who presented within 6 hours after symptom onset. Ambulance paramedics assessed clinical items from 8 prehospital aLVO detection scales: Los Angeles Motor Scale, Rapid Arterial Occlusion Evaluation, Cincinnati Stroke Triage Assessment Tool, Cincinnati Prehospital Stroke Scale, Prehospital Acute Stroke Severity, gaze-face-arm-speech-time, Conveniently Grasped Field Assessment Stroke Triage, and Face-Arm-Speech-Time Plus Severe Arm or Leg Motor Deficit. We assessed the diagnostic performance of these scales for identifying aLVO at prespecified cut points for men and women. RESULTS: Of 2358 patients with suspected stroke (median age, 73 years; 47% women), 231 (10%) had aLVO (100/1114 [9%] women and 131/1244 [11%] men). The area under the curve of the scales ranged from 0.70 (95% CI, 0.65-0.75) to 0.77 (95% CI, 0.73-0.82) in women versus 0.69 (95% CI, 0.64-0.73) to 0.75 (95% CI, 0.71-0.79) in men. Positive predictive values ranged from 0.23 (95% CI, 0.20-0.27) to 0.29 (95% CI, 0.26-0.31) in women versus 0.29 (95% CI, 0.24-0.33) to 0.37 (95% CI, 0.32-0.43) in men. Negative predictive values were similar (0.95 [95% CI, 0.94-0.96] to 0.98 [95% CI, 0.97-0.98] in women versus 0.94 [95% CI, 0.93-0.95] to 0.96 [95% CI, 0.94-0.97] in men). Sensitivity of the scales was slightly higher in women than in men (0.53 [95% CI, 0.43-0.63] to 0.76 [95% CI, 0.68-0.84] versus 0.49 [95% CI, 0.40-0.57] to 0.63 [95% CI, 0.55-0.73]), whereas specificity was lower (0.79 [95% CI, 0.76-0.81] to 0.87 [95% CI, 0.84-0.89] versus 0.82 [95% CI, 0.79-0.84] to 0.90 [95% CI, 0.88-0.91]). Rapid arterial occlusion evaluation showed the highest positive predictive values in both sexes (0.29 in women and 0.37 in men), reflecting the different event rates. CONCLUSIONS: aLVO scales show similar diagnostic performance in both sexes. The rapid arterial occlusion evaluation scale may help optimize prehospital transport decision-making in men as well as in women with suspected stroke.
Assuntos
Arteriopatias Oclusivas , Isquemia Encefálica , Serviços Médicos de Emergência , AVC Isquêmico , Acidente Vascular Cerebral , Humanos , Feminino , Masculino , Idoso , Caracteres Sexuais , Estudos Prospectivos , Acidente Vascular Cerebral/diagnóstico , Triagem , Arteriopatias Oclusivas/diagnóstico , Isquemia Encefálica/diagnósticoRESUMO
OBJECTIVE: Elderly patients with Chronic Limb Threatening Ischemia (CLTI) undergoing revascularization are prone to delirium and prolonged hospitalization. Preoperative prehabilitation may prevent delirium and reduce the length of stay. This study investigates the effect of multimodal prehabilitation on delirium incidence in elderly CLTI patients undergoing revascularization. METHODS: A comparative observational cohort study conducted in a large teaching hospital (intervention cohort n=101, retrospective control cohort n=207) and a university hospital (prospective control cohort n=48) from 2020 to 2023. Patients aged ≥ 65 years undergoing revascularization were included, with acute treatment or severe cognitive impairment as exclusion criteria. The three-week prehabilitation program included screening of general health and presence of delirium risk factors by a vascular nurse practitioner, screening and provision of personalized, home-based exercises by a physiotherapist, provision of nutritional advice by a dietician, and if indicated comprehensive geriatric assessment by a geriatrician, assessment of self-reliance and home situation by a prearranged homecare nurse, guidance and support for smoking cessation by a quit smoking coach, and anaemia treatment. Primary outcome was 30-day delirium incidence, analysed using regression models adjusting for potential confounders (age, physical impairment, history of delirium, preoperative anaemia and revascularization type). Secondary outcomes were length of stay, postoperative complications, 30-day mortality, and patient experiences. RESULTS: Median age (IQR) was 76 years (71-82). Delirium incidence was lower in the prehabilitation cohort (n=2/101, 2%) compared to controls (n=23/255, 9%; OR=0.21, 95%CI 0.05-0.89, p=.04). Adjusted analysis showed a non-significant delirium reduction (OR=0.28, 95%CI 0.06-1.3, p=.097). The prehabilitation cohort had a significantly shorter length of stay (2 [1-5] vs 4 [2-9] days; p=<.001), and fewer minor complications (14% vs 26%, p=.01). No differences were present in major complications and 30-day mortality. Patients reported high compliance and satisfaction (median score 8/10, IQR 7-9). CONCLUSIONS: Prehabilitation among elderly CLTI patients is safe and has the potential to yield multiple beneficial effects on general outcomes following revascularization, while also achieving high levels of patient satisfaction. Further validation and considering implementation in surgical settings is recommended.
RESUMO
Clinical prediction models are estimated using a sample of limited size from the target population, leading to uncertainty in predictions, even when the model is correctly specified. Generally, not all patient profiles are observed uniformly in model development. As a result, sampling uncertainty varies between individual patients' predictions. We aimed to develop an intuitive measure of individual prediction uncertainty. The variance of a patient's prediction can be equated to the variance of the sample mean outcome in n ∗ $$ {n}_{\ast } $$ hypothetical patients with the same predictor values. This hypothetical sample size n ∗ $$ {n}_{\ast } $$ can be interpreted as the number of similar patients n eff $$ {n}_{\mathrm{eff}} $$ that the prediction is effectively based on, given that the model is correct. For generalized linear models, we derived analytical expressions for the effective sample size. In addition, we illustrated the concept in patients with acute myocardial infarction. In model development, n eff $$ {n}_{\mathrm{eff}} $$ can be used to balance accuracy versus uncertainty of predictions. In a validation sample, the distribution of n eff $$ {n}_{\mathrm{eff}} $$ indicates which patients were more and less represented in the development data, and whether predictions might be too uncertain for some to be practically meaningful. In a clinical setting, the effective sample size may facilitate communication of uncertainty about predictions. We propose the effective sample size as a clinically interpretable measure of uncertainty in individual predictions. Its implications should be explored further for the development, validation and clinical implementation of prediction models.
Assuntos
Incerteza , Humanos , Modelos Lineares , Tamanho da AmostraRESUMO
PURPOSE: The use of statins for the primary prevention of cardiovascular diseases (CVD) is associated with various beneficial outcomes, alongside certain undesirable effects. This study aims to determine optimal risk thresholds above which statin therapy yields a net benefit, considering both the positive effects and potential adverse effects, as well as their probabilities and patient preferences. METHODS: Quantitative benefit-harm balance modeling was applied to the Iranian general population aged 40 to 75 years with no history of CVD. The analysis utilized data from prior studies, including statin effect estimates for different outcomes from a meta-analysis, patient preferences obtained from an Iranian survey, and baseline incidence rates of adverse outcomes sourced from the Global Burden of Disease study for Iran. Outcomes were defined as angina, myocardial infarction, fatal coronary heart disease, fatal or non-fatal stroke, and heart failure. Benefit-harm balance indices were calculated for various combinations of age, sex, and 10-year CVD risk. RESULTS: Statin therapy was found to be advantageous at a lower 10-year CVD risk threshold in men (18-23%) compared to women (24-28%). Furthermore, individuals aged 40-45 years exhibited a lower risk threshold (18% in men, 24% in women) than those aged 70-75 years (23% in men, 28% in women). CONCLUSION: The desirable 10-year risk thresholds for statin prescription in the primary prevention of CVD vary by age and gender, ranging from 18 to 28%, encompassing a spectrum of outcomes from angina to CVD mortality. These results suggest hard-CVD risk thresholds of 7.5% to 10% for both sexes.
Assuntos
Doenças Cardiovasculares , Inibidores de Hidroximetilglutaril-CoA Redutases , Prevenção Primária , Humanos , Pessoa de Meia-Idade , Inibidores de Hidroximetilglutaril-CoA Redutases/uso terapêutico , Inibidores de Hidroximetilglutaril-CoA Redutases/efeitos adversos , Masculino , Feminino , Irã (Geográfico)/epidemiologia , Adulto , Idoso , Doenças Cardiovasculares/prevenção & controle , Doenças Cardiovasculares/epidemiologia , Doenças Cardiovasculares/mortalidade , Doenças Cardiovasculares/diagnóstico , Medição de Risco , Resultado do Tratamento , Fatores Etários , Fatores Sexuais , Prescrições de Medicamentos , Fatores de Risco de Doenças Cardíacas , Técnicas de Apoio para a Decisão , Tomada de Decisão Clínica , Fatores de Tempo , Padrões de Prática MédicaRESUMO
BACKGROUND: The optimal treatment for odontoid fractures in older people remains debated. Odontoid fractures are increasingly relevant to clinical practice due to ageing of the population. METHODS: An international prospective comparative study was conducted in fifteen European centres, involving patients aged ≥55 years with type II/III odontoid fractures. The surgeon and patient jointly decided on the applied treatment. Surgical and conservative treatments were compared. Primary outcomes were Neck Disability Index (NDI) improvement, fracture union and stability at 52 weeks. Secondary outcomes were Visual Analogue Scale neck pain, Likert patient-perceived recovery and EuroQol-5D-3L at 52 weeks. Subgroup analyses considered age, type II and displaced fractures. Multivariable regression analyses adjusted for age, gender and fracture characteristics. RESULTS: The study included 276 patients, of which 144 (52%) were treated surgically and 132 (48%) conservatively (mean (SD) age 77.3 (9.1) vs. 76.6 (9.7), P = 0.56). NDI improvement was largely similar between surgical and conservative treatments (mean (SE) -11 (2.4) vs. -14 (1.8), P = 0.08), as were union (86% vs. 78%, aOR 2.3, 95% CI 0.97-5.7) and stability (99% vs. 98%, aOR NA). NDI improvement did not differ between patients with union and persistent non-union (mean (SE) -13 (2.0) vs. -12 (2.8), P = 0.78). There was no difference for any of the secondary outcomes or subgroups. CONCLUSIONS: Clinical outcome and fracture healing at 52 weeks were similar between treatments. Clinical outcome and fracture union were not associated. Treatments should prioritize favourable clinical over radiological outcomes.
Assuntos
Tratamento Conservador , Processo Odontoide , Fraturas da Coluna Vertebral , Humanos , Idoso , Feminino , Masculino , Processo Odontoide/lesões , Processo Odontoide/diagnóstico por imagem , Processo Odontoide/cirurgia , Estudos Prospectivos , Tratamento Conservador/métodos , Tratamento Conservador/estatística & dados numéricos , Idoso de 80 Anos ou mais , Fraturas da Coluna Vertebral/terapia , Fraturas da Coluna Vertebral/cirurgia , Resultado do Tratamento , Europa (Continente) , Consolidação da Fratura , Fatores Etários , Avaliação da Deficiência , Pessoa de Meia-Idade , Medição da Dor , Fatores de Tempo , Recuperação de Função Fisiológica , Fixação de Fratura/métodos , Cervicalgia/terapiaRESUMO
BACKGROUND: Estimating the risk of revision after arthroplasty could inform patient and surgeon decision-making. However, there is a lack of well-performing prediction models assisting in this task, which may be due to current conventional modeling approaches such as traditional survivorship estimators (such as Kaplan-Meier) or competing risk estimators. Recent advances in machine learning survival analysis might improve decision support tools in this setting. Therefore, this study aimed to assess the performance of machine learning compared with that of conventional modeling to predict revision after arthroplasty. QUESTION/PURPOSE: Does machine learning perform better than traditional regression models for estimating the risk of revision for patients undergoing hip or knee arthroplasty? METHODS: Eleven datasets from published studies from the Dutch Arthroplasty Register reporting on factors associated with revision or survival after partial or total knee and hip arthroplasty between 2018 and 2022 were included in our study. The 11 datasets were observational registry studies, with a sample size ranging from 3038 to 218,214 procedures. We developed a set of time-to-event models for each dataset, leading to 11 comparisons. A set of predictors (factors associated with revision surgery) was identified based on the variables that were selected in the included studies. We assessed the predictive performance of two state-of-the-art statistical time-to-event models for 1-, 2-, and 3-year follow-up: a Fine and Gray model (which models the cumulative incidence of revision) and a cause-specific Cox model (which models the hazard of revision). These were compared with a machine-learning approach (a random survival forest model, which is a decision tree-based machine-learning algorithm for time-to-event analysis). Performance was assessed according to discriminative ability (time-dependent area under the receiver operating curve), calibration (slope and intercept), and overall prediction error (scaled Brier score). Discrimination, known as the area under the receiver operating characteristic curve, measures the model's ability to distinguish patients who achieved the outcomes from those who did not and ranges from 0.5 to 1.0, with 1.0 indicating the highest discrimination score and 0.50 the lowest. Calibration plots the predicted versus the observed probabilities; a perfect plot has an intercept of 0 and a slope of 1. The Brier score calculates a composite of discrimination and calibration, with 0 indicating perfect prediction and 1 the poorest. A scaled version of the Brier score, 1 - (model Brier score/null model Brier score), can be interpreted as the amount of overall prediction error. RESULTS: Using machine learning survivorship analysis, we found no differences between the competing risks estimator and traditional regression models for patients undergoing arthroplasty in terms of discriminative ability (patients who received a revision compared with those who did not). We found no consistent differences between the validated performance (time-dependent area under the receiver operating characteristic curve) of different modeling approaches because these values ranged between -0.04 and 0.03 across the 11 datasets (the time-dependent area under the receiver operating characteristic curve of the models across 11 datasets ranged between 0.52 to 0.68). In addition, the calibration metrics and scaled Brier scores produced comparable estimates, showing no advantage of machine learning over traditional regression models. CONCLUSION: Machine learning did not outperform traditional regression models. CLINICAL RELEVANCE: Neither machine learning modeling nor traditional regression methods were sufficiently accurate in order to offer prognostic information when predicting revision arthroplasty. The benefit of these modeling approaches may be limited in this context.
Assuntos
Artroplastia de Quadril , Artroplastia do Joelho , Aprendizado de Máquina , Reoperação , Humanos , Reoperação/estatística & dados numéricos , Medição de Risco , Sistema de Registros , Fatores de Risco , Falha de Prótese , Feminino , Masculino , Idoso , Valor Preditivo dos TestesRESUMO
Risk prediction models need thorough validation to assess their performance. Validation of models for survival outcomes poses challenges due to the censoring of observations and the varying time horizon at which predictions can be made. This article describes measures to evaluate predictions and the potential improvement in decision making from survival models based on Cox proportional hazards regression.As a motivating case study, the authors consider the prediction of the composite outcome of recurrence or death (the "event") in patients with breast cancer after surgery. They developed a simple Cox regression model with 3 predictors, as in the Nottingham Prognostic Index, in 2982 women (1275 events over 5 years of follow-up) and externally validated this model in 686 women (285 events over 5 years). Improvement in performance was assessed after the addition of progesterone receptor as a prognostic biomarker.The model predictions can be evaluated across the full range of observed follow-up times or for the event occurring by the end of a fixed time horizon of interest. The authors first discuss recommended statistical measures that evaluate model performance in terms of discrimination, calibration, or overall performance. Further, they evaluate the potential clinical utility of the model to support clinical decision making according to a net benefit measure. They provide SAS and R code to illustrate internal and external validation.The authors recommend the proposed set of performance measures for transparent reporting of the validity of predictions from survival models.
Assuntos
Neoplasias da Mama , Humanos , Feminino , Modelos de Riscos Proporcionais , PrognósticoRESUMO
AIMS: Indications for surgery in patients with degenerative mitral regurgitation (DMR) are increasingly liberal in all clinical guidelines but the role of secondary outcome determinants (left atrial volume index ≥60 mL/m2, atrial fibrillation, pulmonary artery systolic pressure ≥50 mmHg and moderate to severe tricuspid regurgitation) and their impact on post-operative outcome remain disputed. Whether these secondary outcome markers are just reflective of the DMR severity or intrinsically affect survival after DMR surgery is uncertain and may have critical importance in the management of patients with DMR. To address these gaps of knowledge the present study gathered a large cohort of patients with quantified DMR, accounted for the number of secondary outcome markers and examined their independent impact on survival after surgical correction of the DMR. METHODS AND RESULTS: The Mitral Regurgitation International DAtabase-Quantitative registry includes patients with isolated DMR from centres across North America, Europe, and the Middle East. Patient enrolment extended from January 2003 to January 2020. All patients undergoing mitral valve surgery within 1 year of registry enrolment were selected. A total of 2276 patients [65 (55-73) years, 32% male] across five centres met study eligibility criteria. Over a median follow-up of 5.6 (3.6 to 8.7) years, 278 patients (12.2%) died. In a comprehensive multivariable Cox regression model adjusted for age, EuroSCORE II, symptoms, left ventricular ejection fraction (LVEF), left ventricular end-systolic diameter (LV ESD) and DMR severity, the number of secondary outcome determinants was independently associated with post-operative all-cause mortality, with adjusted hazard ratios of 1.56 [95% confidence interval (CI): 1.11-2.20, P = 0.011], 1.78 (95% CI: 1.23-2.58, P = 0.002) and 2.58 (95% CI: 1.73-3.83, P < 0.0001) for patients with one, two, and three or four secondary outcome determinants, respectively. A model incorporating the number of secondary outcome determinants demonstrated a higher C-index and was significantly more concordant with post-operative mortality than models incorporating traditional Class I indications alone [the presence of symptoms (P = 0.0003), or LVEF ≤60% (P = 0.006), or LV ESD ≥40 mm (P = 0.014)], while there was no significant difference in concordance observed compared with a model that incorporated the number of Class I indications for surgery combined (P = 0.71). CONCLUSION: In this large cohort of patients treated surgically for DMR, the presence and number of secondary outcome determinants was independently associated with post-surgical survival and demonstrated better outcome discrimination than traditional Class I indications for surgery. Randomised controlled trials are needed to determine if patients with severe DMR who demonstrate a cardiac phenotype with an increasing number of secondary outcome determinants would benefit from earlier surgery.
Assuntos
Fibrilação Atrial , Procedimentos Cirúrgicos Cardíacos , Insuficiência da Valva Mitral , Masculino , Feminino , Humanos , Insuficiência da Valva Mitral/complicações , Volume Sistólico , Função Ventricular Esquerda , Fibrilação Atrial/complicaçõesRESUMO
Previous research has shown that polygenic risk scores (PRSs) can be used to stratify women according to their risk of developing primary invasive breast cancer. This study aimed to evaluate the association between a recently validated PRS of 313 germline variants (PRS313) and contralateral breast cancer (CBC) risk. We included 56,068 women of European ancestry diagnosed with first invasive breast cancer from 1990 onward with follow-up from the Breast Cancer Association Consortium. Metachronous CBC risk (N = 1,027) according to the distribution of PRS313 was quantified using Cox regression analyses. We assessed PRS313 interaction with age at first diagnosis, family history, morphology, ER status, PR status, and HER2 status, and (neo)adjuvant therapy. In studies of Asian women, with limited follow-up, CBC risk associated with PRS313 was assessed using logistic regression for 340 women with CBC compared with 12,133 women with unilateral breast cancer. Higher PRS313 was associated with increased CBC risk: hazard ratio per standard deviation (SD) = 1.25 (95%CI = 1.18-1.33) for Europeans, and an OR per SD = 1.15 (95%CI = 1.02-1.29) for Asians. The absolute lifetime risks of CBC, accounting for death as competing risk, were 12.4% for European women at the 10th percentile and 20.5% at the 90th percentile of PRS313. We found no evidence of confounding by or interaction with individual characteristics, characteristics of the primary tumor, or treatment. The C-index for the PRS313 alone was 0.563 (95%CI = 0.547-0.586). In conclusion, PRS313 is an independent factor associated with CBC risk and can be incorporated into CBC risk prediction models to help improve stratification and optimize surveillance and treatment strategies.
Assuntos
Neoplasias da Mama/genética , Predisposição Genética para Doença , Genoma Humano , Herança Multifatorial , Segunda Neoplasia Primária/genética , Adulto , Idoso , Povo Asiático , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/etnologia , Neoplasias da Mama/terapia , Estudos de Coortes , Receptor alfa de Estrogênio/genética , Receptor alfa de Estrogênio/metabolismo , Feminino , Expressão Gênica , Estudo de Associação Genômica Ampla , Humanos , Pessoa de Meia-Idade , Terapia Neoadjuvante/métodos , Segunda Neoplasia Primária/diagnóstico , Segunda Neoplasia Primária/etnologia , Segunda Neoplasia Primária/terapia , Prognóstico , Modelos de Riscos Proporcionais , Receptor ErbB-2/genética , Receptor ErbB-2/metabolismo , Receptores de Progesterona/genética , Receptores de Progesterona/metabolismo , Medição de Risco , População BrancaRESUMO
OBJECTIVE: To develop 2 distinct preoperative and intraoperative risk scores to predict postoperative pancreatic fistula (POPF) after distal pancreatectomy (DP) to improve preventive and mitigation strategies, respectively. BACKGROUND: POPF remains the most common complication after DP. Despite several known risk factors, an adequate risk model has not been developed yet. METHODS: Two prediction risk scores were designed using data of patients undergoing DP in 2 Italian centers (2014-2016) utilizing multivariable logistic regression. The preoperative score (calculated before surgery) aims to facilitate preventive strategies and the intraoperative score (calculated at the end of surgery) aims to facilitate mitigation strategies. Internal validation was achieved using bootstrapping. These data were pooled with data from 5 centers from the United States and the Netherlands (2007-2016) to assess discrimination and calibration in an internal-external validation procedure. RESULTS: Overall, 1336 patients after DP were included, of whom 291 (22%) developed POPF. The preoperative distal fistula risk score (preoperative D-FRS) included 2 variables: pancreatic neck thickness [odds ratio: 1.14; 95% confidence interval (CI): 1.11-1.17 per mm increase] and pancreatic duct diameter (OR: 1.46; 95% CI: 1.32-1.65 per mm increase). The model performed well with an area under the receiver operating characteristic curve of 0.83 (95% CI: 0.78-0.88) and 0.73 (95% CI: 0.70-0.76) upon internal-external validation. Three risk groups were identified: low risk (<10%), intermediate risk (10%-25%), and high risk (>25%) for POPF with 238 (18%), 684 (51%), and 414 (31%) patients, respectively. The intraoperative risk score (intraoperative D-FRS) added body mass index, pancreatic texture, and operative time as variables with an area under the receiver operating characteristic curve of 0.80 (95% CI: 0.74-0.85). CONCLUSIONS: The preoperative and the intraoperative D-FRS are the first validated risk scores for POPF after DP and are readily available at: http://www.pancreascalculator.com . The 3 distinct risk groups allow for personalized treatment and benchmarking.
Assuntos
Pancreatectomia , Pancreaticoduodenectomia , Humanos , Pancreatectomia/efeitos adversos , Pancreatectomia/métodos , Pancreaticoduodenectomia/métodos , Medição de Risco/métodos , Fatores de Risco , Fístula Pancreática/epidemiologia , Fístula Pancreática/etiologia , Fístula Pancreática/prevenção & controle , Complicações Pós-Operatórias/epidemiologia , Complicações Pós-Operatórias/etiologia , Estudos RetrospectivosRESUMO
One-stage meta-analysis of individual participant data (IPD) poses several statistical and computational challenges. For time-to-event outcomes, the approach requires the estimation of complicated nonlinear mixed-effects models that are flexible enough to realistically capture the most important characteristics of the IPD. We present a model class that incorporates general normally distributed random effects into linear transformation models. We discuss extensions to model between-study heterogeneity in baseline risks and covariate effects and also relax the assumption of proportional hazards. Within the proposed framework, data with arbitrary random censoring patterns can be handled. The accompanying $\textsf{R}$ package tramME utilizes the Laplace approximation and automatic differentiation to perform efficient maximum likelihood estimation and inference in mixed-effects transformation models. We compare several variants of our model to predict the survival of patients with chronic obstructive pulmonary disease using a large data set of prognostic studies. Finally, a simulation study is presented that verifies the correctness of the implementation and highlights its efficiency compared to an alternative approach.
Assuntos
Análise de Dados , Modelos Estatísticos , Simulação por Computador , Humanos , Modelos LinearesRESUMO
BACKGROUND: Clinical prediction models should be validated before implementation in clinical practice. But is favorable performance at internal validation or one external validation sufficient to claim that a prediction model works well in the intended clinical context? MAIN BODY: We argue to the contrary because (1) patient populations vary, (2) measurement procedures vary, and (3) populations and measurements change over time. Hence, we have to expect heterogeneity in model performance between locations and settings, and across time. It follows that prediction models are never truly validated. This does not imply that validation is not important. Rather, the current focus on developing new models should shift to a focus on more extensive, well-conducted, and well-reported validation studies of promising models. CONCLUSION: Principled validation strategies are needed to understand and quantify heterogeneity, monitor performance over time, and update prediction models when appropriate. Such strategies will help to ensure that prediction models stay up-to-date and safe to support clinical decision-making.
RESUMO
OBJECTIVES: Many machine learning (ML) models have been developed for application in the ICU, but few models have been subjected to external validation. The performance of these models in new settings therefore remains unknown. The objective of this study was to assess the performance of an existing decision support tool based on a ML model predicting readmission or death within 7 days after ICU discharge before, during, and after retraining and recalibration. DESIGN: A gradient boosted ML model was developed and validated on electronic health record data from 2004 to 2021. We performed an independent validation of this model on electronic health record data from 2011 to 2019 from a different tertiary care center. SETTING: Two ICUs in tertiary care centers in The Netherlands. PATIENTS: Adult patients who were admitted to the ICU and stayed for longer than 12 hours. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: We assessed discrimination by area under the receiver operating characteristic curve (AUC) and calibration (slope and intercept). We retrained and recalibrated the original model and assessed performance via a temporal validation design. The final retrained model was cross-validated on all data from the new site. Readmission or death within 7 days after ICU discharge occurred in 577 of 10,052 ICU admissions (5.7%) at the new site. External validation revealed moderate discrimination with an AUC of 0.72 (95% CI 0.67-0.76). Retrained models showed improved discrimination with AUC 0.79 (95% CI 0.75-0.82) for the final validation model. Calibration was poor initially and good after recalibration via isotonic regression. CONCLUSIONS: In this era of expanding availability of ML models, external validation and retraining are key steps to consider before applying ML models to new settings. Clinicians and decision-makers should take this into account when considering applying new ML models to their local settings.
Assuntos
Alta do Paciente , Readmissão do Paciente , Adulto , Humanos , Unidades de Terapia Intensiva , Hospitalização , Aprendizado de MáquinaRESUMO
OBJECTIVES: Early Warning Scores (EWSs) have a great potential to assist clinical decision-making in the emergency department (ED). However, many EWS contain methodological weaknesses in development and validation and have poor predictive performance in older patients. The aim of this study was to develop and externally validate an International Early Warning Score (IEWS) based on a recalibrated National Early warning Score (NEWS) model including age and sex and evaluate its performance independently at arrival to the ED in three age categories (18-65, 66-80, > 80 yr). DESIGN: International multicenter cohort study. SETTING: Data was used from three Dutch EDs. External validation was performed in two EDs in Denmark. PATIENTS: All consecutive ED patients greater than or equal to 18 years in the Netherlands Emergency department Evaluation Database (NEED) with at least two registered vital signs were included, resulting in 95,553 patients. For external validation, 14,809 patients were included from a Danish Multicenter Cohort (DMC). MEASUREMENTS AND MAIN RESULTS: Model performance to predict in-hospital mortality was evaluated by discrimination, calibration curves and summary statistics, reclassification, and clinical usefulness by decision curve analysis. In-hospital mortality rate was 2.4% ( n = 2,314) in the NEED and 2.5% ( n = 365) in the DMC. Overall, the IEWS performed significantly better than NEWS with an area under the receiving operating characteristic of 0.89 (95% CIs, 0.89-0.90) versus 0.82 (0.82-0.83) in the NEED and 0.87 (0.85-0.88) versus 0.82 (0.80-0.84) at external validation. Calibration for NEWS predictions underestimated risk in older patients and overestimated risk in the youngest, while calibration improved for IEWS with a substantial reclassification of patients from low to high risk and a standardized net benefit of 5-15% in the relevant risk range for all age categories. CONCLUSIONS: The IEWS substantially improves in-hospital mortality prediction for all ED patients greater than or equal to18 years.
Assuntos
Escore de Alerta Precoce , Humanos , Idoso , Mortalidade Hospitalar , Estudos de Coortes , Serviço Hospitalar de Emergência , Sinais Vitais , Curva ROCRESUMO
OBJECTIVES: To assess the value of machine learning approaches in the development of a multivariable model for early prediction of ICU death in patients with acute respiratory distress syndrome (ARDS). DESIGN: A development, testing, and external validation study using clinical data from four prospective, multicenter, observational cohorts. SETTING: A network of multidisciplinary ICUs. PATIENTS: A total of 1,303 patients with moderate-to-severe ARDS managed with lung-protective ventilation. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: We developed and tested prediction models in 1,000 ARDS patients. We performed logistic regression analysis following variable selection by a genetic algorithm, random forest and extreme gradient boosting machine learning techniques. Potential predictors included demographics, comorbidities, ventilatory and oxygenation descriptors, and extrapulmonary organ failures. Risk modeling identified some major prognostic factors for ICU mortality, including age, cancer, immunosuppression, Pa o2 /F io2 , inspiratory plateau pressure, and number of extrapulmonary organ failures. Together, these characteristics contained most of the prognostic information in the first 24 hours to predict ICU mortality. Performance with machine learning methods was similar to logistic regression (area under the receiver operating characteristic curve [AUC], 0.87; 95% CI, 0.82-0.91). External validation in an independent cohort of 303 ARDS patients confirmed that the performance of the model was similar to a logistic regression model (AUC, 0.91; 95% CI, 0.87-0.94). CONCLUSIONS: Both machine learning and traditional methods lead to promising models to predict ICU death in moderate/severe ARDS patients. More research is needed to identify markers for severity beyond clinical determinants, such as demographics, comorbidities, lung mechanics, oxygenation, and extrapulmonary organ failure to guide patient management.
Assuntos
Síndrome do Desconforto Respiratório , Humanos , Unidades de Terapia Intensiva , Pulmão , Estudos Prospectivos , Respiração Artificial/métodos , Síndrome do Desconforto Respiratório/terapiaRESUMO
Recent years have seen the rapid proliferation of clinical prediction models aiming to support risk stratification and individualized care within psychiatry. Despite growing interest, attempts to synthesize current evidence in the nascent field of precision psychiatry have remained scarce. This systematic review therefore sought to summarize progress towards clinical implementation of prediction modeling for psychiatric outcomes. We searched MEDLINE, PubMed, Embase, and PsychINFO databases from inception to September 30, 2020, for English-language articles that developed and/or validated multivariable models to predict (at an individual level) onset, course, or treatment response for non-organic psychiatric disorders (PROSPERO: CRD42020216530). Individual prediction models were evaluated based on three key criteria: (i) mitigation of bias and overfitting; (ii) generalizability, and (iii) clinical utility. The Prediction model Risk Of Bias ASsessment Tool (PROBAST) was used to formally appraise each study's risk of bias. 228 studies detailing 308 prediction models were ultimately eligible for inclusion. 94.5% of developed prediction models were deemed to be at high risk of bias, largely due to inadequate or inappropriate analytic decisions. Insufficient internal validation efforts (within the development sample) were also observed, while only one-fifth of models underwent external validation in an independent sample. Finally, our search identified just one published model whose potential utility in clinical practice was formally assessed. Our findings illustrated significant growth in precision psychiatry with promising progress towards real-world application. Nevertheless, these efforts have been inhibited by a preponderance of bias and overfitting, while the generalizability and clinical utility of many published models has yet to be formally established. Through improved methodological rigor during initial development, robust evaluations of reproducibility via independent validation, and evidence-based implementation frameworks, future research has the potential to generate risk prediction tools capable of enhancing clinical decision-making in psychiatric care.
Assuntos
Modelos Estatísticos , Psiquiatria , Viés , Humanos , Prognóstico , Reprodutibilidade dos TestesRESUMO
BACKGROUND: Baseline outcome risk can be an important determinant of absolute treatment benefit and has been used in guidelines for "personalizing" medical decisions. We compared easily applicable risk-based methods for optimal prediction of individualized treatment effects. METHODS: We simulated RCT data using diverse assumptions for the average treatment effect, a baseline prognostic index of risk, the shape of its interaction with treatment (none, linear, quadratic or non-monotonic), and the magnitude of treatment-related harms (none or constant independent of the prognostic index). We predicted absolute benefit using: models with a constant relative treatment effect; stratification in quarters of the prognostic index; models including a linear interaction of treatment with the prognostic index; models including an interaction of treatment with a restricted cubic spline transformation of the prognostic index; an adaptive approach using Akaike's Information Criterion. We evaluated predictive performance using root mean squared error and measures of discrimination and calibration for benefit. RESULTS: The linear-interaction model displayed optimal or close-to-optimal performance across many simulation scenarios with moderate sample size (N = 4,250; ~ 785 events). The restricted cubic splines model was optimal for strong non-linear deviations from a constant treatment effect, particularly when sample size was larger (N = 17,000). The adaptive approach also required larger sample sizes. These findings were illustrated in the GUSTO-I trial. CONCLUSIONS: An interaction between baseline risk and treatment assignment should be considered to improve treatment effect predictions.
Assuntos
Ensaios Clínicos Controlados Aleatórios como Assunto , Humanos , Prognóstico , Simulação por Computador , Tamanho da AmostraRESUMO
Half of Barrett's esophagus (BE) surveillance endoscopies do not adhere to guideline recommendations. In this multicenter prospective cohort study, we assessed the clinical consequences of nonadherence to recommended surveillance intervals and biopsy protocol. Data from BE surveillance patients were collected from endoscopy and pathology reports; questionnaires were distributed among endoscopists. We estimated the association between (non)adherence and (i) endoscopic curability of esophageal adenocarcinoma (EAC), (ii) mortality, and (iii) misclassification of histological diagnosis according to a multistate hidden Markov model. Potential explanatory parameters (patient, facility, endoscopist variables) for nonadherence, related to clinical impact, were analyzed. In 726 BE patients, 3802 endoscopies were performed by 167 endoscopists. Adherence to surveillance interval was 16% for non-dysplastic (ND)BE, 55% for low-grade dysplasia (LGD), and 54% of endoscopies followed the Seattle protocol. There was no evidence to support the following statements: longer surveillance intervals or fewer biopsies than recommended affect endoscopic curability of EAC or cause-specific mortality (P > 0.20); insufficient biopsies affect the probability of NDBE (OR 1.0) or LGD (OR 2.3) being misclassified as high-grade dysplasia/EAC (P > 0.05). Better adherence was associated with older patients (OR 1.1), BE segments ≤ 2 cm (OR 8.3), visible abnormalities (OR 1.8, all P ≤ 0.05), endoscopists with a subspecialty (OR 3.2), and endoscopists who deemed histological diagnosis an adequate marker (OR 2.0). Clinical consequences of nonadherence to guidelines appeared to be limited with respect to endoscopic curability of EAC and mortality. This indicates that BE surveillance recommendations should be optimized to minimize the burden of endoscopies.
Assuntos
Esôfago de Barrett , Neoplasias Esofágicas , Lesões Pré-Cancerosas , Humanos , Esôfago de Barrett/complicações , Estudos Prospectivos , Lesões Pré-Cancerosas/patologia , Neoplasias Esofágicas/complicações , Progressão da DoençaRESUMO
PURPOSE: Evidence regarding the effect of surgery in traumatic intracerebral hematoma (t-ICH) is limited and relies on the STITCH(Trauma) trial. This study is aimed at comparing the effectiveness of early surgery to conservative treatment in patients with a t-ICH. METHODS: In a prospective cohort, we included patients with a large t-ICH (< 48 h of injury). Primary outcome was the Glasgow Outcome Scale Extended (GOSE) at 6 months, analyzed with multivariable proportional odds logistic regression. Subgroups included injury severity and isolated vs. non-isolated t-ICH. RESULTS: A total of 367 patients with a large t-ICH were included, of whom 160 received early surgery and 207 received conservative treatment. Patients receiving early surgery were younger (median age 54 vs. 58 years) and more severely injured (median Glasgow Coma Scale 7 vs. 10) compared to those treated conservatively. In the overall cohort, early surgery was not associated with better functional outcome (adjusted odds ratio (AOR) 1.1, (95% CI, 0.6-1.7)) compared to conservative treatment. Early surgery was associated with better outcome for patients with moderate TBI and isolated t-ICH (AOR 1.5 (95% CI, 1.1-2.0); P value for interaction 0.71, and AOR 1.8 (95% CI, 1.3-2.5); P value for interaction 0.004). Conversely, in mild TBI and those with a smaller t-ICH (< 33 cc), conservative treatment was associated with better outcome (AOR 0.6 (95% CI, 0.4-0.9); P value for interaction 0.71, and AOR 0.8 (95% CI, 0.5-1.0); P value for interaction 0.32). CONCLUSIONS: Early surgery in t-ICH might benefit those with moderate TBI and isolated t-ICH, comparable with results of the STITCH(Trauma) trial.
Assuntos
Tratamento Conservador , Hemorragia Intracraniana Traumática , Humanos , Pessoa de Meia-Idade , Estudos Prospectivos , Escala de Coma de Glasgow , Hematoma/cirurgia , Hemorragia Cerebral/cirurgiaRESUMO
BACKGROUND: Prediction of contralateral breast cancer (CBC) risk is challenging due to moderate performances of the known risk factors. We aimed to improve our previous risk prediction model (PredictCBC) by updated follow-up and including additional risk factors. METHODS: We included data from 207,510 invasive breast cancer patients participating in 23 studies. In total, 8225 CBC events occurred over a median follow-up of 10.2 years. In addition to the previously included risk factors, PredictCBC-2.0 included CHEK2 c.1100delC, a 313 variant polygenic risk score (PRS-313), body mass index (BMI), and parity. Fine and Gray regression was used to fit the model. Calibration and a time-dependent area under the curve (AUC) at 5 and 10 years were assessed to determine the performance of the models. Decision curve analysis was performed to evaluate the net benefit of PredictCBC-2.0 and previous PredictCBC models. RESULTS: The discrimination of PredictCBC-2.0 at 10 years was higher than PredictCBC with an AUC of 0.65 (95% prediction intervals (PI) 0.56-0.74) versus 0.63 (95%PI 0.54-0.71). PredictCBC-2.0 was well calibrated with an observed/expected ratio at 10 years of 0.92 (95%PI 0.34-2.54). Decision curve analysis for contralateral preventive mastectomy (CPM) showed the potential clinical utility of PredictCBC-2.0 between thresholds of 4 and 12% 10-year CBC risk for BRCA1/2 mutation carriers and non-carriers. CONCLUSIONS: Additional genetic information beyond BRCA1/2 germline mutations improved CBC risk prediction and might help tailor clinical decision-making toward CPM or alternative preventive strategies. Identifying patients who benefit from CPM, especially in the general breast cancer population, remains challenging.