RESUMO
BACKGROUND: Supporting decisions for patients who present to the emergency department (ED) with COVID-19 requires accurate prognostication. We aimed to evaluate prognostic models for predicting outcomes in hospitalized patients with COVID-19, in different locations and across time. METHODS: We included patients who presented to the ED with suspected COVID-19 and were admitted to 12 hospitals in the New York City (NYC) area and 4 large Dutch hospitals. We used second-wave patients who presented between September and December 2020 (2137 and 3252 in NYC and the Netherlands, respectively) to evaluate models that were developed on first-wave patients who presented between March and August 2020 (12,163 and 5831). We evaluated two prognostic models for in-hospital death: The Northwell COVID-19 Survival (NOCOS) model was developed on NYC data and the COVID Outcome Prediction in the Emergency Department (COPE) model was developed on Dutch data. These models were validated on subsequent second-wave data at the same site (temporal validation) and at the other site (geographic validation). We assessed model performance by the Area Under the receiver operating characteristic Curve (AUC), by the E-statistic, and by net benefit. RESULTS: Twenty-eight-day mortality was considerably higher in the NYC first-wave data (21.0%), compared to the second-wave (10.1%) and the Dutch data (first wave 10.8%; second wave 10.0%). COPE discriminated well at temporal validation (AUC 0.82), with excellent calibration (E-statistic 0.8%). At geographic validation, discrimination was satisfactory (AUC 0.78), but with moderate over-prediction of mortality risk, particularly in higher-risk patients (E-statistic 2.9%). While discrimination was adequate when NOCOS was tested on second-wave NYC data (AUC 0.77), NOCOS systematically overestimated the mortality risk (E-statistic 5.1%). Discrimination in the Dutch data was good (AUC 0.81), but with over-prediction of risk, particularly in lower-risk patients (E-statistic 4.0%). Recalibration of COPE and NOCOS led to limited net benefit improvement in Dutch data, but to substantial net benefit improvement in NYC data. CONCLUSIONS: NOCOS performed moderately worse than COPE, probably reflecting unique aspects of the early pandemic in NYC. Frequent updating of prognostic models is likely to be required for transportability over time and space during a dynamic pandemic.
Assuntos
COVID-19 , Humanos , Prognóstico , COVID-19/diagnóstico , Mortalidade Hospitalar , Curva ROC , Cidade de Nova IorqueRESUMO
Clinical prediction models (CPMs) are tools that compute the risk of an outcome given a set of patient characteristics and are routinely used to inform patients, guide treatment decision-making, and resource allocation. Although much hope has been placed on CPMs to mitigate human biases, CPMs may potentially contribute to racial disparities in decision-making and resource allocation. While some policymakers, professional organizations, and scholars have called for eliminating race as a variable from CPMs, others raise concerns that excluding race may exacerbate healthcare disparities and this controversy remains unresolved. The Guidance for Unbiased predictive Information for healthcare Decision-making and Equity (GUIDE) provides expert guidelines for model developers and health system administrators on the transparent use of race in CPMs and mitigation of algorithmic bias across contexts developed through a 5-round, modified Delphi process from a diverse 14-person technical expert panel (TEP). Deliberations affirmed that race is a social construct and that the goals of prediction are distinct from those of causal inference, and emphasized: the importance of decisional context (e.g., shared decision-making versus healthcare rationing); the conflicting nature of different anti-discrimination principles (e.g., anticlassification versus antisubordination principles); and the importance of identifying and balancing trade-offs in achieving equity-related goals with race-aware versus race-unaware CPMs for conditions where racial identity is prognostically informative. The GUIDE, comprising 31 key items in the development and use of CPMs in healthcare, outlines foundational principles, distinguishes between bias and fairness, and offers guidance for examining subgroup invalidity and using race as a variable in CPMs. This GUIDE presents a living document that supports appraisal and reporting of bias in CPMs to support best practice in CPM development and use.
RESUMO
INTRODUCTION: Clinical prediction models (CPMs) for coronavirus disease 2019 (COVID-19) may support clinical decision making, treatment, and communication. However, attitudes about using CPMs for COVID-19 decision making are unknown. METHODS: Online focus groups and interviews were conducted among health care providers, survivors of COVID-19, and surrogates (i.e., loved ones/surrogate decision makers) in the United States and the Netherlands. Semistructured questions explored experiences about clinical decision making in COVID-19 care and facilitators and barriers for implementing CPMs. RESULTS: In the United States, we conducted 4 online focus groups with 1) providers and 2) surrogates and survivors of COVID-19 between January 2021 and July 2021. In the Netherlands, we conducted 3 focus groups and 4 individual interviews with 1) providers and 2) surrogates and survivors of COVID-19 between May 2021 and July 2021. Providers expressed concern about CPM validity and the belief that patients may interpret CPM predictions as absolute. They described CPMs as potentially useful for resource allocation, triaging, education, and research. Several surrogates and people who had COVID-19 were not given prognostic estimates but believed this information would have supported and influenced their decision making. A limited number of participants felt the data would not have applied to them and that they or their loved ones may not have survived, as poor prognosis may have suggested withdrawal of treatment. CONCLUSIONS: Many providers had reservations about using CPMs for people with COVID-19 due to concerns about CPM validity and patient-level interpretation of the outcome predictions. However, several people who survived COVID-19 and their surrogates indicated that they would have found this information useful for decision making. Therefore, information provision may be needed to improve provider-level comfort and patient and surrogate understanding of CPMs. HIGHLIGHTS: While clinical prediction models (CPMs) may provide an objective means of assessing COVID-19 prognosis, provider concerns about CPM validity and the interpretation of CPM predictions may limit their clinical use.Providers felt that CPMs may be most useful for resource allocation, triage, research, or educational purposes for COVID-19.Several survivors of COVID-19 and their surrogates felt that CPMs would have been informative and may have aided them in making COVID-19 treatment decisions, while others felt the data would not have applied to them.
Assuntos
COVID-19 , Tomada de Decisões , Humanos , Tratamento Farmacológico da COVID-19 , PrognósticoRESUMO
BACKGROUND: While clinical prediction models (CPMs) are used increasingly commonly to guide patient care, the performance and clinical utility of these CPMs in new patient cohorts is poorly understood. METHODS: We performed 158 external validations of 104 unique CPMs across 3 domains of cardiovascular disease (primary prevention, acute coronary syndrome, and heart failure). Validations were performed in publicly available clinical trial cohorts and model performance was assessed using measures of discrimination, calibration, and net benefit. To explore potential reasons for poor model performance, CPM-clinical trial cohort pairs were stratified based on relatedness, a domain-specific set of characteristics to qualitatively grade the similarity of derivation and validation patient populations. We also examined the model-based C-statistic to assess whether changes in discrimination were because of differences in case-mix between the derivation and validation samples. The impact of model updating on model performance was also assessed. RESULTS: Discrimination decreased significantly between model derivation (0.76 [interquartile range 0.73-0.78]) and validation (0.64 [interquartile range 0.60-0.67], P<0.001), but approximately half of this decrease was because of narrower case-mix in the validation samples. CPMs had better discrimination when tested in related compared with distantly related trial cohorts. Calibration slope was also significantly higher in related trial cohorts (0.77 [interquartile range, 0.59-0.90]) than distantly related cohorts (0.59 [interquartile range 0.43-0.73], P=0.001). When considering the full range of possible decision thresholds between half and twice the outcome incidence, 91% of models had a risk of harm (net benefit below default strategy) at some threshold; this risk could be reduced substantially via updating model intercept, calibration slope, or complete re-estimation. CONCLUSIONS: There are significant decreases in model performance when applying cardiovascular disease CPMs to new patient populations, resulting in substantial risk of harm. Model updating can mitigate these risks. Care should be taken when using CPMs to guide clinical decision-making.
Assuntos
Doenças Cardiovasculares , Insuficiência Cardíaca , Doenças Cardiovasculares/diagnóstico , Doenças Cardiovasculares/epidemiologia , Doenças Cardiovasculares/terapia , Insuficiência Cardíaca/diagnóstico , Insuficiência Cardíaca/epidemiologia , Insuficiência Cardíaca/terapia , Humanos , Medição de Risco/métodosRESUMO
OBJECTIVE: To assess whether the Prediction model Risk Of Bias ASsessment Tool (PROBAST) and a shorter version of this tool can identify clinical prediction models (CPMs) that perform poorly at external validation. STUDY DESIGN AND SETTING: We evaluated risk of bias (ROB) on 102 CPMs from the Tufts CPM Registry, comparing PROBAST to a short form consisting of six PROBAST items anticipated to best identify high ROB. We then applied the short form to all CPMs in the Registry with at least 1 validation (n=556) and assessed the change in discrimination (dAUC) in external validation cohorts (n=1,147). RESULTS: PROBAST classified 98/102 CPMS as high ROB. The short form identified 96 of these 98 as high ROB (98% sensitivity), with perfect specificity. In the full CPM registry, 527 of 556 CPMs (95%) were classified as high ROB, 20 (3.6%) low ROB, and 9 (1.6%) unclear ROB. Only one model with unclear ROB was reclassified to high ROB after full PROBAST assessment of all low and unclear ROB models. Median change in discrimination was significantly smaller in low ROB models (dAUC -0.9%, IQR -6.2-4.2%) compared to high ROB models (dAUC -11.7%, IQR -33.3-2.6%; P<0.001). CONCLUSION: High ROB is pervasive among published CPMs. It is associated with poor discriminative performance at validation, supporting the application of PROBAST or a shorter version in CPM reviews.
Assuntos
Pesquisa Biomédica/organização & administração , Estudos Epidemiológicos , Projetos de Pesquisa/estatística & dados numéricos , Projetos de Pesquisa/normas , Medição de Risco/métodos , Medição de Risco/estatística & dados numéricos , Viés , Regras de Decisão Clínica , Análise Discriminante , Humanos , PrognósticoRESUMO
BACKGROUND: There are many clinical prediction models (CPMs) available to inform treatment decisions for patients with cardiovascular disease. However, the extent to which they have been externally tested, and how well they generally perform has not been broadly evaluated. METHODS: A SCOPUS citation search was run on March 22, 2017 to identify external validations of cardiovascular CPMs in the Tufts Predictive Analytics and Comparative Effectiveness CPM Registry. We assessed the extent of external validation, performance heterogeneity across databases, and explored factors associated with model performance, including a global assessment of the clinical relatedness between the derivation and validation data. RESULTS: We identified 2030 external validations of 1382 CPMs. Eight hundred seven (58%) of the CPMs in the Registry have never been externally validated. On average, there were 1.5 validations per CPM (range, 0-94). The median external validation area under the receiver operating characteristic curve was 0.73 (25th-75th percentile [interquartile range (IQR)], 0.66-0.79), representing a median percent decrease in discrimination of -11.1% (IQR, -32.4% to +2.7%) compared with performance on derivation data. 81% (n=1333) of validations reporting area under the receiver operating characteristic curve showed discrimination below that reported in the derivation dataset. 53% (n=983) of the validations report some measure of CPM calibration. For CPMs evaluated more than once, there was typically a large range of performance. Of 1702 validations classified by relatedness, the percent change in discrimination was -3.7% (IQR, -13.2 to 3.1) for closely related validations (n=123), -9.0 (IQR, -27.6 to 3.9) for related validations (n=862), and -17.2% (IQR, -42.3 to 0) for distantly related validations (n=717; P<0.001). CONCLUSIONS: Many published cardiovascular CPMs have never been externally validated, and for those that have, apparent performance during development is often overly optimistic. A single external validation appears insufficient to broadly understand the performance heterogeneity across different settings.
Assuntos
Doenças Cardiovasculares , Doenças Cardiovasculares/diagnóstico , Doenças Cardiovasculares/epidemiologia , Doenças Cardiovasculares/terapia , Humanos , Curva ROCRESUMO
Background More than 500 000 sudden cardiac arrests (SCAs) occur annually in the United States. Clinical predictive models (CPMs) may be helpful tools to differentiate between patients who are likely to survive or have good neurologic recovery and those who are not. However, which CPMs are most reliable for discriminating between outcomes in SCA is not known. Methods and Results We performed a systematic review of the literature using the Tufts PACE (Predictive Analytics and Comparative Effectiveness) CPM Registry through February 1, 2020, and identified 81 unique CPMs of SCA and 62 subsequent external validation studies. Initial cardiac rhythm, age, and duration of cardiopulmonary resuscitation were the 3 most commonly used predictive variables. Only 33 of the 81 novel SCA CPMs (41%) were validated at least once. Of 81 novel SCA CPMs, 56 (69%) and 61 of 62 validation studies (98%) reported discrimination, with median c-statistics of 0.84 and 0.81, respectively. Calibration was reported in only 29 of 62 validation studies (41.9%). For those novel models that both reported discrimination and were validated (26 models), the median percentage change in discrimination was -1.6%. We identified 3 CPMs that had undergone at least 3 external validation studies: the out-of-hospital cardiac arrest score (9 validations; median c-statistic, 0.79), the cardiac arrest hospital prognosis score (6 validations; median c-statistic, 0.83), and the good outcome following attempted resuscitation score (6 validations; median c-statistic, 0.76). Conclusions Although only a small number of SCA CPMs have been rigorously validated, the ones that have been demonstrate good discrimination.
Assuntos
Reanimação Cardiopulmonar , Morte Súbita Cardíaca , Frequência Cardíaca , Parada Cardíaca Extra-Hospitalar/mortalidade , Valor Preditivo dos Testes , Fatores Etários , Idoso , Calibragem , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Parada Cardíaca Extra-Hospitalar/terapia , Prognóstico , Reprodutibilidade dos TestesRESUMO
Background While many clinical prediction models (CPMs) exist to guide valvular heart disease treatment decisions, the relative performance of these CPMs is largely unknown. We systematically describe the CPMs available for patients with valvular heart disease with specific attention to performance in external validations. Methods and Results A systematic review identified 49 CPMs for patients with valvular heart disease treated with surgery (n=34), percutaneous interventions (n=12), or no intervention (n=3). There were 204 external validations of these CPMs. Only 35 (71%) CPMs have been externally validated. Sixty-five percent (n=133) of the external validations were performed on distantly related populations. There was substantial heterogeneity in model performance and a median percentage change in discrimination of -27.1% (interquartile range, -49.4%--5.7%). Nearly two-thirds of validations (n=129) demonstrate at least a 10% relative decline in discrimination. Discriminatory performance of EuroSCORE II and Society of Thoracic Surgeons (2009) models (accounting for 73% of external validations) varied widely: EuroSCORE II validation c-statistic range 0.50 to 0.95; Society of Thoracic Surgeons (2009) Models validation c-statistic range 0.50 to 0.86. These models performed well when tested on related populations (median related validation c-statistics: EuroSCORE II, 0.82 [0.76, 0.85]; Society of Thoracic Surgeons [2009], 0.72 [0.67, 0.79]). There remain few (n=9) external validations of transcatheter aortic valve replacement CPMs. Conclusions Many CPMs for patients with valvular heart disease have never been externally validated and isolated external validations appear insufficient to assess the trustworthiness of predictions. For surgical valve interventions, there are existing predictive models that perform reasonably well on related populations. For transcatheter aortic valve replacement (CPMs additional external validations are needed to broadly understand the trustworthiness of predictions.