|

1.

Calibration plots for multistate risk predictions models.

Pate, Alexander; Sperrin, Matthew; Riley, Richard D; Peek, Niels; Van Staa, Tjeerd; Sergeant, Jamie C; Mamas, Mamas A; Lip, Gregory Y H; O'Flaherty, Martin; Barrowman, Michael; Buchan, Iain; Martin, Glen P.

Stat Med ; 43(14): 2830-2852, 2024 Jun 30.

Article En | MEDLINE | ID: mdl-38720592

INTRODUCTION: There is currently no guidance on how to assess the calibration of multistate models used for risk prediction. We introduce several techniques that can be used to produce calibration plots for the transition probabilities of a multistate model, before assessing their performance in the presence of random and independent censoring through a simulation. METHODS: We studied pseudo-values based on the Aalen-Johansen estimator, binary logistic regression with inverse probability of censoring weights (BLR-IPCW), and multinomial logistic regression with inverse probability of censoring weights (MLR-IPCW). The MLR-IPCW approach results in a calibration scatter plot, providing extra insight about the calibration. We simulated data with varying levels of censoring and evaluated the ability of each method to estimate the calibration curve for a set of predicted transition probabilities. We also developed evaluated the calibration of a model predicting the incidence of cardiovascular disease, type 2 diabetes and chronic kidney disease among a cohort of patients derived from linked primary and secondary healthcare records. RESULTS: The pseudo-value, BLR-IPCW, and MLR-IPCW approaches give unbiased estimates of the calibration curves under random censoring. These methods remained predominately unbiased in the presence of independent censoring, even if the censoring mechanism was strongly associated with the outcome, with bias concentrated in low-density regions of predicted transition probability. CONCLUSIONS: We recommend implementing either the pseudo-value or BLR-IPCW approaches to produce a calibration curve, combined with the MLR-IPCW approach to produce a calibration scatter plot. The methods have been incorporated into the "calibmsm" R package available on CRAN.

Computer Simulation , Diabetes Mellitus, Type 2 , Models, Statistical , Humans , Diabetes Mellitus, Type 2/epidemiology , Risk Assessment/methods , Risk Assessment/statistics & numerical data , Logistic Models , Calibration , Cardiovascular Diseases/epidemiology , Renal Insufficiency, Chronic/epidemiology , Probability

2.

Sepsis and case fatality rates and associations with deprivation, ethnicity, and clinical characteristics: population-based case-control study with linked primary care and hospital data in England.

van Staa, Tjeerd Pieter; Pate, Alexander; Martin, Glen P; Sharma, Anita; Dark, Paul; Felton, Tim; Zhong, Xiaomin; Bladon, Sian; Cunningham, Neil; Gilham, Ellie L; Brown, Colin S; Mirfenderesky, Mariyam; Palin, Victoria; Ashiru-Oredope, Diane.

Infection ; 2024 Apr 16.

Article En | MEDLINE | ID: mdl-38627354

PURPOSE: Sepsis is a life-threatening organ dysfunction caused by dysregulated host response to infection. The purpose of the study was to measure the associations of specific exposures (deprivation, ethnicity, and clinical characteristics) with incident sepsis and case fatality. METHODS: Two research databases in England were used including anonymized patient-level records from primary care linked to hospital admission, death certificate, and small-area deprivation. Sepsis cases aged 65-100 years were matched to up to six controls. Predictors for sepsis (including 60 clinical conditions) were evaluated using logistic and random forest models; case fatality rates were analyzed using logistic models. RESULTS: 108,317 community-acquired sepsis cases were analyzed. Severe frailty was strongly associated with the risk of developing sepsis (crude odds ratio [OR] 14.93; 95% confidence interval [CI] 14.37-15.52). The quintile with most deprived patients showed an increased sepsis risk (crude OR 1.48; 95% CI 1.45-1.51) compared to least deprived quintile. Strong predictors for sepsis included antibiotic exposure in prior 2 months, being house bound, having cancer, learning disability, and diabetes mellitus. Severely frail patients had a case fatality rate of 42.0% compared to 24.0% in non-frail patients (adjusted OR 1.53; 95% CI 1.41-1.65). Sepsis cases with recent prior antibiotic exposure died less frequently compared to non-users (adjusted OR 0.7; 95% CI 0.72-0.76). Case fatality strongly decreased over calendar time. CONCLUSION: Given the variety of predictors and their level of associations for developing sepsis, there is a need for prediction models for risk of developing sepsis that can help to target preventative antibiotic therapy.

3.

Rapid systematic review on risks and outcomes of sepsis: the influence of risk factors associated with health inequalities.

Bladon, Siân; Ashiru-Oredope, Diane; Cunningham, Neil; Pate, Alexander; Martin, Glen P; Zhong, Xiaomin; Gilham, Ellie L; Brown, Colin S; Mirfenderesky, Mariyam; Palin, Victoria; van Staa, Tjeerd P.

Int J Equity Health ; 23(1): 34, 2024 Feb 21.

Article En | MEDLINE | ID: mdl-38383380

BACKGROUND AND AIMS: Sepsis is a serious and life-threatening condition caused by a dysregulated immune response to an infection. Recent guidance issued in the UK gave recommendations around recognition and antibiotic treatment of sepsis, but did not consider factors relating to health inequalities. The aim of this study was to summarise the literature investigating associations between health inequalities and sepsis. METHODS: Searches were conducted in Embase for peer-reviewed articles published since 2010 that included sepsis in combination with one of the following five areas: socioeconomic status, race/ethnicity, community factors, medical needs and pregnancy/maternity. RESULTS: Five searches identified 1,402 studies, with 50 unique studies included in the review after screening (13 sociodemographic, 14 race/ethnicity, 3 community, 3 care/medical needs and 20 pregnancy/maternity; 3 papers examined multiple health inequalities). Most of the studies were conducted in the USA (31/50), with only four studies using UK data (all pregnancy related). Socioeconomic factors associated with increased sepsis incidence included lower socioeconomic status, unemployment and lower education level, although findings were not consistent across studies. For ethnicity, mixed results were reported. Living in a medically underserved area or being resident in a nursing home increased risk of sepsis. Mortality rates after sepsis were found to be higher in people living in rural areas or in those discharged to skilled nursing facilities while associations with ethnicity were mixed. Complications during delivery, caesarean-section delivery, increased deprivation and black and other ethnic minority race were associated with post-partum sepsis. CONCLUSION: There are clear correlations between sepsis morbidity and mortality and the presence of factors associated with health inequalities. To inform local guidance and drive public health measures, there is a need for studies conducted across more diverse setting and countries.

Ethnicity , Sepsis , Humans , Female , Pregnancy , Minority Groups , Socioeconomic Factors , Risk Factors , Health Inequities

4.

A scoping review finds a growing trend in studies validating multimorbidity patterns and identifies five broad types of validation methods.

Dhafari, Thamer Ba; Pate, Alexander; Azadbakht, Narges; Bailey, Rowena; Rafferty, James; Jalali-Najafabadi, Farideh; Martin, Glen P; Hassaine, Abdelaali; Akbari, Ashley; Lyons, Jane; Watkins, Alan; Lyons, Ronan A; Peek, Niels.

J Clin Epidemiol ; 165: 111214, 2024 Jan.

Article En | MEDLINE | ID: mdl-37952700

OBJECTIVES: Multimorbidity, the presence of two or more long-term conditions, is a growing public health concern. Many studies use analytical methods to discover multimorbidity patterns from data. We aimed to review approaches used in published literature to validate these patterns. STUDY DESIGN AND SETTING: We systematically searched PubMed and Web of Science for studies published between July 2017 and July 2023 that used analytical methods to discover multimorbidity patterns. RESULTS: Out of 31,617 studies returned by the searches, 172 were included. Of these, 111 studies (64%) conducted validation, the number of studies with validation increased from 53.13% (17 out of 32 studies) to 71.25% (57 out of 80 studies) in 2017-2019 to 2022-2023, respectively. Five types of validation were identified: assessing the association of multimorbidity patterns with clinical outcomes (n = 79), stability across subsamples (n = 26), clinical plausibility (n = 22), stability across methods (n = 7) and exploring common determinants (n = 2). Some studies used multiple types of validation. CONCLUSION: The number of studies conducting a validation of multimorbidity patterns is clearly increasing. The most popular validation approach is assessing the association of multimorbidity patterns with clinical outcomes. Methodological guidance on the validation of multimorbidity patterns is needed.

Multimorbidity , Research Design , Humans , Chronic Disease

5.

Clinical prediction models and the multiverse of madness.

Riley, Richard D; Pate, Alexander; Dhiman, Paula; Archer, Lucinda; Martin, Glen P; Collins, Gary S.

BMC Med ; 21(1): 502, 2023 12 18.

Article En | MEDLINE | ID: mdl-38110939

BACKGROUND: Each year, thousands of clinical prediction models are developed to make predictions (e.g. estimated risk) to inform individual diagnosis and prognosis in healthcare. However, most are not reliable for use in clinical practice. MAIN BODY: We discuss how the creation of a prediction model (e.g. using regression or machine learning methods) is dependent on the sample and size of data used to develop it-were a different sample of the same size used from the same overarching population, the developed model could be very different even when the same model development methods are used. In other words, for each model created, there exists a multiverse of other potential models for that sample size and, crucially, an individual's predicted value (e.g. estimated risk) may vary greatly across this multiverse. The more an individual's prediction varies across the multiverse, the greater the instability. We show how small development datasets lead to more different models in the multiverse, often with vastly unstable individual predictions, and explain how this can be exposed by using bootstrapping and presenting instability plots. We recommend healthcare researchers seek to use large model development datasets to reduce instability concerns. This is especially important to ensure reliability across subgroups and improve model fairness in practice. CONCLUSIONS: Instability is concerning as an individual's predicted value is used to guide their counselling, resource prioritisation, and clinical decision making. If different samples lead to different models with very different predictions for the same individual, then this should cast doubt into using a particular model for that individual. Therefore, visualising, quantifying and reporting the instability in individual-level predictions is essential when proposing a new model.

Models, Statistical , Humans , Prognosis , Reproducibility of Results

6.

Impact of COVID-19 on broad-spectrum antibiotic prescribing for common infections in primary care in England: a time-series analyses using OpenSAFELY and effects of predictors including deprivation.

Zhong, Xiaomin; Pate, Alexander; Yang, Ya-Ting; Fahmi, Ali; Ashcroft, Darren M; Goldacre, Ben; MacKenna, Brian; Mehrkar, Amir; Bacon, Sebastian Cj; Massey, Jon; Fisher, Louis; Inglesby, Peter; Hand, Kieran; van Staa, Tjeerd; Palin, Victoria.

Lancet Reg Health Eur ; : 100653, 2023 May 16.

Article En | MEDLINE | ID: mdl-37363797

Background: The COVID-19 pandemic impacted the healthcare systems, adding extra pressure to reduce antimicrobial resistance. Therefore, we aimed to evaluate changes in antibiotic prescription patterns after COVID-19 started. Methods: With the approval of NHS England, we used the OpenSAFELY platform to access the TPP SystmOne electronic health record (EHR) system in primary care and selected patients prescribed antibiotics from 2019 to 2021. To evaluate the impact of COVID-19 on broad-spectrum antibiotic prescribing, we evaluated prescribing rates and its predictors and used interrupted time series analysis by fitting binomial logistic regression models. Findings: Over 32 million antibiotic prescriptions were extracted over the study period; 8.7% were broad-spectrum. The study showed increases in broad-spectrum antibiotic prescribing (odds ratio [OR] 1.37; 95% confidence interval [CI] 1.36-1.38) as an immediate impact of the pandemic, followed by a gradual recovery with a 1.1-1.2% decrease in odds of broad-spectrum prescription per month. The same pattern was found within subgroups defined by age, sex, region, ethnicity, and socioeconomic deprivation quintiles. More deprived patients were more likely to receive broad-spectrum antibiotics, which differences remained stable over time. The most significant increase in broad-spectrum prescribing was observed for lower respiratory tract infection (OR 2.33; 95% CI 2.1-2.50) and otitis media (OR 1.96; 95% CI 1.80-2.13). Interpretation: An immediate reduction in antibiotic prescribing and an increase in the proportion of broad-spectrum antibiotic prescribing in primary care was observed. The trends recovered to pre-pandemic levels, but the consequence of the COVID-19 pandemic on AMR needs further investigation. Funding: This work was supported by Health Data Research UK and by National Institute for Health Research.

7.

The impact of COVID-19 on antibiotic prescribing in primary care in England: Evaluation and risk prediction of appropriateness of type and repeat prescribing.

Zhong, Xiaomin; Pate, Alexander; Yang, Ya-Ting; Fahmi, Ali; Ashcroft, Darren M; Goldacre, Ben; MacKenna, Brian; Mehrkar, Amir; Bacon, Sebastian C J; Massey, Jon; Fisher, Louis; Inglesby, Peter; Hand, Kieran; van Staa, Tjeerd; Palin, Victoria.

J Infect ; 87(1): 1-11, 2023 07.

Article En | MEDLINE | ID: mdl-37182748

BACKGROUND: This study aimed to predict risks of potentially inappropriate antibiotic type and repeat prescribing and assess changes during COVID-19. METHODS: With the approval of NHS England, we used OpenSAFELY platform to access the TPP SystmOne electronic health record (EHR) system and selected patients prescribed antibiotics from 2019 to 2021. Multinomial logistic regression models predicted patient's probability of receiving inappropriate antibiotic type or repeat antibiotic course for each common infection. RESULTS: The population included 9.1 million patients with 29.2 million antibiotic prescriptions. 29.1% of prescriptions were identified as repeat prescribing. Those with same day incident infection coded in the EHR had considerably lower rates of repeat prescribing (18.0%) and 8.6% had potentially inappropriate type. No major changes in the rates of repeat antibiotic prescribing during COVID-19 were found. In the 10 risk prediction models, good levels of calibration and moderate levels of discrimination were found. CONCLUSIONS: Our study found no evidence of changes in level of inappropriate or repeat antibiotic prescribing after the start of COVID-19. Repeat antibiotic prescribing was frequent and varied according to regional and patient characteristics. There is a need for treatment guidelines to be developed around antibiotic failure and clinicians provided with individualised patient information.

COVID-19 , Respiratory Tract Infections , Humans , Anti-Bacterial Agents/therapeutic use , Inappropriate Prescribing , England/epidemiology , Primary Health Care , Respiratory Tract Infections/drug therapy

8.

Developing prediction models to estimate the risk of two survival outcomes both occurring: A comparison of techniques.

Pate, Alexander; Sperrin, Matthew; Riley, Richard D; Sergeant, Jamie C; Van Staa, Tjeerd; Peek, Niels; Mamas, Mamas A; Lip, Gregory Y H; O'Flaherty, Martin; Buchan, Iain; Martin, Glen P.

Stat Med ; 42(18): 3184-3207, 2023 08 15.

Article En | MEDLINE | ID: mdl-37218664

INTRODUCTION: This study considers the prediction of the time until two survival outcomes have both occurred. We compared a variety of analytical methods motivated by a typical clinical problem of multimorbidity prognosis. METHODS: We considered five methods: product (multiply marginal risks), dual-outcome (directly model the time until both events occur), multistate models (msm), and a range of copula and frailty models. We assessed calibration and discrimination under a variety of simulated data scenarios, varying outcome prevalence, and the amount of residual correlation. The simulation focused on model misspecification and statistical power. Using data from the Clinical Practice Research Datalink, we compared model performance when predicting the risk of cardiovascular disease and type 2 diabetes both occurring. RESULTS: Discrimination was similar for all methods. The product method was poorly calibrated in the presence of residual correlation. The msm and dual-outcome models were the most robust to model misspecification but suffered a drop in performance at small sample sizes due to overfitting, which the copula and frailty model were less susceptible to. The copula and frailty model's performance were highly dependent on the underlying data structure. In the clinical example, the product method was poorly calibrated when adjusting for 8 major cardiovascular risk factors. DISCUSSION: We recommend the dual-outcome method for predicting the risk of two survival outcomes both occurring. It was the most robust to model misspecification, although was also the most prone to overfitting. The clinical example motivates the use of the methods considered in this study.

Diabetes Mellitus, Type 2 , Frailty , Humans , Models, Statistical , Computer Simulation , Prognosis

9.

Minimum sample size for developing a multivariable prediction model using multinomial logistic regression.

Pate, Alexander; Riley, Richard D; Collins, Gary S; van Smeden, Maarten; Van Calster, Ben; Ensor, Joie; Martin, Glen P.

Stat Methods Med Res ; 32(3): 555-571, 2023 03.

Article En | MEDLINE | ID: mdl-36660777

AIMS: Multinomial logistic regression models allow one to predict the risk of a categorical outcome with > 2 categories. When developing such a model, researchers should ensure the number of participants (n) is appropriate relative to the number of events (Ek) and the number of predictor parameters (pk) for each category k. We propose three criteria to determine the minimum n required in light of existing criteria developed for binary outcomes. PROPOSED CRITERIA: The first criterion aims to minimise the model overfitting. The second aims to minimise the difference between the observed and adjusted R2 Nagelkerke. The third criterion aims to ensure the overall risk is estimated precisely. For criterion (i), we show the sample size must be based on the anticipated Cox-snell R2 of distinct 'one-to-one' logistic regression models corresponding to the sub-models of the multinomial logistic regression, rather than on the overall Cox-snell R2 of the multinomial logistic regression. EVALUATION OF CRITERIA: We tested the performance of the proposed criteria (i) through a simulation study and found that it resulted in the desired level of overfitting. Criterion (ii) and (iii) were natural extensions from previously proposed criteria for binary outcomes and did not require evaluation through simulation. SUMMARY: We illustrated how to implement the sample size criteria through a worked example considering the development of a multinomial risk prediction model for tumour type when presented with an ovarian mass. Code is provided for the simulation and worked example. We will embed our proposed criteria within the pmsampsize R library and Stata modules.

Logistic Models , Humans , Sample Size , Computer Simulation

10.

Clinical and health inequality risk factors for non-COVID-related sepsis during the global COVID-19 pandemic: a national case-control and cohort study.

Zhong, Xiaomin; Ashiru-Oredope, Diane; Pate, Alexander; Martin, Glen P; Sharma, Anita; Dark, Paul; Felton, Tim; Lake, Claire; MacKenna, Brian; Mehrkar, Amir; Bacon, Sebastian C J; Massey, Jon; Inglesby, Peter; Goldacre, Ben; Hand, Kieran; Bladon, Sian; Cunningham, Neil; Gilham, Ellie; Brown, Colin S; Mirfenderesky, Mariyam; Palin, Victoria; van Staa, Tjeerd Pieter.

EClinicalMedicine ; 66: 102321, 2023 Dec.

Article En | MEDLINE | ID: mdl-38192590

Background: Sepsis, characterised by significant morbidity and mortality, is intricately linked to socioeconomic disparities and pre-admission clinical histories. This study aspires to elucidate the association between non-COVID-19 related sepsis and health inequality risk factors amidst the pandemic in England, with a secondary focus on their association with 30-day sepsis mortality. Methods: With the approval of NHS England, we harnessed the OpenSAFELY platform to execute a cohort study and a 1:6 matched case-control study. A sepsis diagnosis was identified from the incident hospital admissions record using ICD-10 codes. This encompassed 248,767 cases with non-COVID-19 sepsis from a cohort of 22.0 million individuals spanning January 1, 2019, to June 31, 2022. Socioeconomic deprivation was gauged using the Index of Multiple Deprivation score, reflecting indicators like income, employment, and education. Hospitalisation-related sepsis diagnoses were categorised as community-acquired or hospital-acquired. Cases were matched to controls who had no recorded diagnosis of sepsis, based on age (stepwise), sex, and calendar month. The eligibility criteria for controls were established primarily on the absence of a recorded sepsis diagnosis. Associations between potential predictors and odds of developing non-COVID-19 sepsis underwent assessment through conditional logistic regression models, with multivariable regression determining odds ratios (ORs) for 30-day mortality. Findings: The study included 224,361 (10.2%) cases with non-COVID-19 sepsis and 1,346,166 matched controls. The most socioeconomic deprived quintile was associated with higher odds of developing non-COVID-19 sepsis than the least deprived quintile (crude OR 1.80 [95% CI 1.77-1.83]). Other risk factors (after adjusting comorbidities) such as learning disability (adjusted OR 3.53 [3.35-3.73]), chronic liver disease (adjusted OR 3.08 [2.97-3.19]), chronic kidney disease (stage 4: adjusted OR 2.62 [2.55-2.70], stage 5: adjusted OR 6.23 [5.81-6.69]), cancer, neurological disease, immunosuppressive conditions were also associated with developing non-COVID-19 sepsis. The incidence rate of non-COVID-19 sepsis decreased during the COVID-19 pandemic and rebounded to pre-pandemic levels (April 2021) after national lockdowns had been lifted. The 30-day mortality risk in cases with non-COVID-19 sepsis was higher for the most deprived quintile across all periods. Interpretation: Socioeconomic deprivation, comorbidity and learning disabilities were associated with an increased odds of developing non-COVID-19 related sepsis and 30-day mortality in England. This study highlights the need to improve the prevention of sepsis, including more precise targeting of antimicrobials to higher-risk patients. Funding: The UK Health Security Agency, Health Data Research UK, and National Institute for Health Research.

11.

Ranking sets of morbidities using hypergraph centrality.

Rafferty, James; Watkins, Alan; Lyons, Jane; Lyons, Ronan A; Akbari, Ashley; Peek, Niels; Jalali-Najafabadi, Farideh; Ba Dhafari, Thamer; Pate, Alexander; Martin, Glen P; Bailey, Rowena.

J Biomed Inform ; 122: 103916, 2021 10.

Article En | MEDLINE | ID: mdl-34534697

Multi-morbidity, the health state of having two or more concurrent chronic conditions, is becoming more common as populations age, but is poorly understood. Identifying and understanding commonly occurring sets of diseases is important to inform clinical decisions to improve patient services and outcomes. Network analysis has been previously used to investigate multi-morbidity, but a classic application only allows for information on binary sets of diseases to contribute to the graph. We propose the use of hypergraphs, which allows for the incorporation of data on people with any number of conditions, and also allows us to obtain a quantitative understanding of the centrality, a measure of how well connected items in the network are to each other, of both single diseases and sets of conditions. Using this framework we illustrate its application with the set of conditions described in the Charlson morbidity index using data extracted from routinely collected population-scale, patient level electronic health records (EHR) for a cohort of adults in Wales, UK. Stroke and diabetes were found to be the most central single conditions. Sets of diseases featuring diabetes; diabetes with Chronic Pulmonary Disease, Renal Disease, Congestive Heart Failure and Cancer were the most central pairs of diseases. We investigated the differences between results obtained from the hypergraph and a classic binary graph and found that the centrality of diseases such as paraplegia, which are connected strongly to a single other disease is exaggerated in binary graphs compared to hypergraphs. The measure of centrality is derived from the weighting metrics calculated for disease sets and further investigation is needed to better understand the effect of the metric used in identifying the clinical significance and ranked centrality of grouped diseases. These initial results indicate that hypergraphs can be used as a valuable tool for analysing previously poorly understood relationships and information available in EHR data.

Diabetes Mellitus , Adult , Chronic Disease , Cohort Studies , Electronic Health Records , Humans , Morbidity

12.

An assessment of the potential miscalibration of cardiovascular disease risk predictions caused by a secular trend in cardiovascular disease in England.

Pate, Alexander; van Staa, Tjeerd; Emsley, Richard.

BMC Med Res Methodol ; 20(1): 289, 2020 11 30.

Article En | MEDLINE | ID: mdl-33256644

BACKGROUND: A downwards secular trend in the incidence of cardiovascular disease (CVD) in England was identified through previous work and the literature. Risk prediction models for primary prevention of CVD do not model this secular trend, this could result in over prediction of risk for individuals in the present day. We evaluate the effects of modelling this secular trend, and also assess whether it is driven by an increase in statin use during follow up. METHODS: We derived a cohort of patients (1998-2015) eligible for cardiovascular risk prediction from the Clinical Practice Research Datalink with linked hospitalisation and mortality records (N = 3,855,660). Patients were split into development and validation cohort based on their cohort entry date (before/after 2010). The calibration of a CVD risk prediction model developed in the development cohort was tested in the validation cohort. The calibration was also assessed after modelling the secular trend. Finally, the presence of the secular trend was evaluated under a marginal structural model framework, where the effect of statin treatment during follow up is adjusted for. RESULTS: Substantial over prediction of risks in the validation cohort was found when not modelling the secular trend. This miscalibration could be minimised if one was to explicitly model the secular trend. The reduction in risk in the validation cohort when introducing the secular trend was 35.68 and 33.24% in the female and male cohorts respectively. Under the marginal structural model framework, the reductions were 33.31 and 32.67% respectively, indicating increasing statin use during follow up is not the only the cause of the secular trend. CONCLUSIONS: Inclusion of the secular trend into the model substantially changed the CVD risk predictions. Models that are being used in clinical practice in the UK do not model secular trend and may thus overestimate the risks, possibly leading to patients being treated unnecessarily. Wider discussion around the modelling of secular trends in a risk prediction framework is needed.

Cardiovascular Diseases , Hydroxymethylglutaryl-CoA Reductase Inhibitors , Cardiovascular Diseases/epidemiology , Cohort Studies , England/epidemiology , Female , Humans , Hydroxymethylglutaryl-CoA Reductase Inhibitors/therapeutic use , Incidence , Male , Risk Assessment , Risk Factors

13.

Impact of lowering the risk threshold for statin treatment on statin prescribing: a descriptive study in English primary care.

Pate, Alexander; Emsley, Richard; van Staa, Tjeerd.

Br J Gen Pract ; 70(700): e765-e771, 2020 11.

Article En | MEDLINE | ID: mdl-33020170

BACKGROUND: In 2014, the National Institute for Health and Care Excellence (NICE) changed the recommended threshold for initiating statins from a 10-year risk of cardiovascular disease (CVD) of 20% to 10% (Clinical Guideline 181), making 4.5 million extra people eligible for treatment. AIM: To evaluate the impact of this guideline change on statin prescribing behaviour. DESIGN AND SETTING: A descriptive study using data from Clinical Practice Research Datalink (CPRD), a primary care database in England. METHOD: People aged 25-84 years being initiated on statins for the primary prevention of CVD were identified. CVD risk predictions were calculated for every person using data in their medical record (calculated risks), and were extracted directly from their medical record if a QRISK score was recorded (coded risks). The 10-year CVD risks of people initiated on statins in each calendar year were compared. RESULTS: The average 'calculated risk' of all people being initiated on statins was 20.65% in the year before the guideline change, and 20.27% after. When considering only the 'coded risks', the average risk was 21.85% before the guideline change, and 18.65% after. The proportion of people initiating statins that had a coded risk score in their medical record increased significantly from 2010-2017. CONCLUSION: Currently available evidence, which only considers people with coded risk scores in their medical record, indicates the guideline change had a large impact on statin prescribing. However, that analysis likely suffers from selection bias. This new evidence indicates only a modest impact of the guideline change. Further qualitative research about the lack of response to the guideline change is needed.

Cardiovascular Diseases , Hydroxymethylglutaryl-CoA Reductase Inhibitors , Cardiovascular Diseases/drug therapy , Cardiovascular Diseases/epidemiology , Cardiovascular Diseases/prevention & control , England/epidemiology , Humans , Hydroxymethylglutaryl-CoA Reductase Inhibitors/therapeutic use , Primary Health Care , Primary Prevention

14.

Impact of sample size on the stability of risk scores from clinical prediction models: a case study in cardiovascular disease.

Pate, Alexander; Emsley, Richard; Sperrin, Matthew; Martin, Glen P; van Staa, Tjeerd.

Diagn Progn Res ; 4: 14, 2020.

Article En | MEDLINE | ID: mdl-32944655

BACKGROUND: Stability of risk estimates from prediction models may be highly dependent on the sample size of the dataset available for model derivation. In this paper, we evaluate the stability of cardiovascular disease risk scores for individual patients when using different sample sizes for model derivation; such sample sizes include those similar to models recommended in the national guidelines, and those based on recently published sample size formula for prediction models. METHODS: We mimicked the process of sampling N patients from a population to develop a risk prediction model by sampling patients from the Clinical Practice Research Datalink. A cardiovascular disease risk prediction model was developed on this sample and used to generate risk scores for an independent cohort of patients. This process was repeated 1000 times, giving a distribution of risks for each patient. N = 100,000, 50,000, 10,000, N min (derived from sample size formula) and N epv10 (meets 10 events per predictor rule) were considered. The 5-95th percentile range of risks across these models was used to evaluate instability. Patients were grouped by a risk derived from a model developed on the entire population (population-derived risk) to summarise results. RESULTS: For a sample size of 100,000, the median 5-95th percentile range of risks for patients across the 1000 models was 0.77%, 1.60%, 2.42% and 3.22% for patients with population-derived risks of 4-5%, 9-10%, 14-15% and 19-20% respectively; for N = 10,000, it was 2.49%, 5.23%, 7.92% and 10.59%, and for N using the formula-derived sample size, it was 6.79%, 14.41%, 21.89% and 29.21%. Restricting this analysis to models with high discrimination, good calibration or small mean absolute prediction error reduced the percentile range, but high levels of instability remained. CONCLUSIONS: Widely used cardiovascular disease risk prediction models suffer from high levels of instability induced by sampling variation. Many models will also suffer from overfitting (a closely linked concept), but at acceptable levels of overfitting, there may still be high levels of instability in individual risk. Stability of risk estimates should be a criterion when determining the minimum sample size to develop models.

15.

Toward a framework for the design, implementation, and reporting of methodology scoping reviews.

Martin, Glen P; Jenkins, David A; Bull, Lucy; Sisk, Rose; Lin, Lijing; Hulme, William; Wilson, Anthony; Wang, Wenjuan; Barrowman, Michael; Sammut-Powell, Camilla; Pate, Alexander; Sperrin, Matthew; Peek, Niels.

J Clin Epidemiol ; 127: 191-197, 2020 11.

Article En | MEDLINE | ID: mdl-32726605

BACKGROUND AND OBJECTIVE: In view of the growth of published articles, there is an increasing need for studies that summarize scientific research. An increasingly common review is a "methodology scoping review," which provides a summary of existing analytical methods, techniques and software that have been proposed or applied in research articles to address an analytical problem or further an analytical approach. However, guidelines for their design, implementation, and reporting are limited. METHODS: Drawing on the experiences of the authors, which were consolidated through a series of face-to-face workshops, we summarize the challenges inherent in conducting a methodology scoping review and offer suggestions of best practice to promote future guideline development. RESULTS: We identified three challenges of conducting a methodology scoping review. First, identification of search terms; one cannot usually define the search terms a priori, and the language used for a particular method can vary across the literature. Second, the scope of the review requires careful consideration because new methodology is often not described (in full) within abstracts. Third, many new methods are motivated by a specific clinical question, where the methodology may only be documented in supplementary materials. We formulated several recommendations that build upon existing review guidelines. These recommendations ranged from an iterative approach to defining search terms through to screening and data extraction processes. CONCLUSION: Although methodology scoping reviews are an important aspect of research, there is currently a lack of guidelines to standardize their design, implementation, and reporting. We recommend a wider discussion on this topic.

Research Design/standards , Review Literature as Topic , Systematic Reviews as Topic/methods , Humans

16.

The impact of statin discontinuation and restarting rates on the optimal time to initiate statins and on the number of cardiovascular events prevented.

Pate, Alexander; Elliott, Rachel A; Gkountouras, Georgios; Thompson, Alexander; Emsley, Richard; van Staa, Tjeerd.

Pharmacoepidemiol Drug Saf ; 29(6): 644-652, 2020 06.

Article En | MEDLINE | ID: mdl-32394495

INTRODUCTION: A patient is eligible for statins in England if they have a 10-year risk of cardiovascular disease >10%. We hypothesize that if statin discontinuation rates are high it may be better to delay statin initiation until patients are at a higher risk, to maximize the benefit of the drug. METHODS: A four-state health state transition model was used to assess the optimal time to initiate statins after a risk assessment, in order to prevent the highest number of cardiovascular events, for a given risk profile (age, gender, risk) and adherence rate. A Clinical Practice Research Datalink dataset linked to Hospital Episodes Statistics and Office for National Statistics was used to inform the transition probabilities in this model, taking into account observed statin discontinuation and re-continuation patterns. RESULTS: Our results suggest, if statins are initiated in a cohort of 50-year old men with a 10% 10-year risk, we prevent 4.78 events per 100 individuals. If we wait 10 years to prescribe, at which point 10-year risk scores are at 20%, we prevent 5.45 events per 100 individuals. If the observed discontinuation rate was reduced by a sixth, third or half in the same cohort, we would prevent 7.29, 9.01 or 10.22 events per 100 individuals. CONCLUSIONS: In certain scenarios, extra cardiovascular disease events could be prevented by delaying statin initiation beyond a risk of 10% until reaching a age (59 for men, 63 for women), based on statin discontinuation rates in England. The optimal time to initiate statins was driven by age, not by cardiovascular risk.

Cardiovascular Diseases/prevention & control , Dyslipidemias/drug therapy , Hydroxymethylglutaryl-CoA Reductase Inhibitors/administration & dosage , Primary Prevention , Adult , Age Factors , Aged , Cardiovascular Diseases/diagnosis , Cardiovascular Diseases/epidemiology , Drug Administration Schedule , Dyslipidemias/diagnosis , Dyslipidemias/epidemiology , England/epidemiology , Female , Health Status , Heart Disease Risk Factors , Humans , Hydroxymethylglutaryl-CoA Reductase Inhibitors/adverse effects , Male , Medication Adherence , Middle Aged , Protective Factors , Risk Assessment , Time Factors , Treatment Outcome

17.

Correction to: The uncertainty with using risk prediction models for individual decision making: an exemplar cohort study examining the prediction of cardiovascular disease in English primary care.

Pate, Alexander; Emsley, Richard; Ashcroft, Darren M; Brown, Benjamin; van Staa, Tjeerd.

BMC Med ; 17(1): 158, 2019 Aug 10.

Article En | MEDLINE | ID: mdl-31399095

The original article [1] contained an error in the abstract. The mentioned cohort size now correctly states 'N = 3,855,660'.

18.

Do population-level risk prediction models that use routinely collected health data reliably predict individual risks?

Li, Yan; Sperrin, Matthew; Belmonte, Miguel; Pate, Alexander; Ashcroft, Darren M; van Staa, Tjeerd Pieter.

Sci Rep ; 9(1): 11222, 2019 08 02.

Article En | MEDLINE | ID: mdl-31375726

The objective of this study was to assess the reliability of individual risk predictions based on routinely collected data considering the heterogeneity between clinical sites in data and populations. Cardiovascular disease (CVD) risk prediction with QRISK3 was used as exemplar. The study included 3.6 million patients in 392 sites from the Clinical Practice Research Datalink. Cox models with QRISK3 predictors and a frailty (random effect) term for each site were used to incorporate unmeasured site variability. There was considerable variation in data recording between general practices (missingness of body mass index ranged from 18.7% to 60.1%). Incidence rates varied considerably between practices (from 0.4 to 1.3 CVD events per 100 patient-years). Individual CVD risk predictions with the random effect model were inconsistent with the QRISK3 predictions. For patients with QRISK3 predicted risk of 10%, the 95% range of predicted risks were between 7.2% and 13.7% with the random effects model. Random variability only explained a small part of this. The random effects model was equivalent to QRISK3 for discrimination and calibration. Risk prediction models based on routinely collected health data perform well for populations but with great uncertainty for individuals. Clinicians and patients need to understand this uncertainty.

Data Collection/standards , Models, Statistical , Risk Assessment/methods , Adult , Cardiovascular Diseases/epidemiology , Female , Humans , Male , Middle Aged , Precision Medicine , Proportional Hazards Models , Reproducibility of Results , Risk Factors

19.

Chronic obstructive pulmonary disease exacerbation episodes derived from electronic health record data validated using clinical trial data.

Sperrin, Matthew; Webb, David J; Patel, Pinal; Davis, Kourtney J; Collier, Susan; Pate, Alexander; Leather, David A; Pimenta, Jeanne M.

Pharmacoepidemiol Drug Saf ; 28(10): 1369-1376, 2019 10.

Article En | MEDLINE | ID: mdl-31385428

PURPOSE: To validate an algorithm for acute exacerbations of chronic obstructive pulmonary disease (AECOPD) episodes derived in an electronic health record (EHR) database, against AECOPD episodes collected in a randomized clinical trial using an electronic case report form (eCRF). METHODS: We analyzed two data sources from the Salford Lung Study in COPD: trial eCRF and the Salford Integrated Record, a linked primary-secondary routine care EHR database of all patients in Salford. For trial participants, AECOPD episodes reported in eCRF were compared with algorithmically derived moderate/severe AECOPD episodes identified in EHR. Episode characteristics (frequency, duration), sensitivity, and positive predictive value (PPV) were calculated. A match between eCRF and EHR episodes was defined as at least 1-day overlap. RESULTS: In the primary effectiveness analysis population (n = 2269), 3791 EHR episodes (mean [SD] length: 15.1 [3.59] days; range: 14-54) and 4403 moderate/severe AECOPD eCRF episodes (mean length: 13.8 [16.20] days; range: 1-372) were identified. eCRF episodes exceeding 28 days were usually broken up into shorter episodes in the EHR. Sensitivity was 63.6% and PPV 71.1%, where concordance was defined as at least 1-day overlap. CONCLUSIONS: The EHR algorithm performance was acceptable, indicating that EHR-derived AECOPD episodes may provide an efficient, valid method of data collection. Comparing EHR-derived AECOPD episodes with those collected by eCRF resulted in slightly fewer episodes, and eCRF episodes of extreme lengths were poorly captured in EHR. Analysis of routinely collected EHR data may be reasonable when relative, rather than absolute, rates of AECOPD are relevant for stakeholders' decision making.

Electronic Health Records/statistics & numerical data , Pharmacoepidemiology/methods , Pulmonary Disease, Chronic Obstructive/epidemiology , Randomized Controlled Trials as Topic/statistics & numerical data , Symptom Flare Up , Aged , Algorithms , Clinical Trials, Phase III as Topic/statistics & numerical data , Data Collection/methods , Databases, Factual/statistics & numerical data , England/epidemiology , Female , Humans , Male , Middle Aged , Patient Admission/statistics & numerical data , Pharmacoepidemiology/statistics & numerical data , Pulmonary Disease, Chronic Obstructive/diagnosis , Pulmonary Disease, Chronic Obstructive/therapy , Sensitivity and Specificity , Severity of Illness Index

20.

The uncertainty with using risk prediction models for individual decision making: an exemplar cohort study examining the prediction of cardiovascular disease in English primary care.

Pate, Alexander; Emsley, Richard; Ashcroft, Darren M; Brown, Benjamin; van Staa, Tjeerd.

BMC Med ; 17(1): 134, 2019 07 17.

Article En | MEDLINE | ID: mdl-31311543

BACKGROUND: Risk prediction models are commonly used in practice to inform decisions on patients' treatment. Uncertainty around risk scores beyond the confidence interval is rarely explored. We conducted an uncertainty analysis of the QRISK prediction tool to evaluate the robustness of individual risk predictions with varying modelling decisions. METHODS: We derived a cohort of patients eligible for cardiovascular risk prediction from the Clinical Practice Research Datalink (CPRD) with linked hospitalisation and mortality records (N = 3,792,474). Risk prediction models were developed using the methods reported for QRISK2 and 3, before adjusting for additional risk factors, a secular trend, geographical variation in risk and the method for imputing missing data when generating a risk score (model A-model F). Ten-year risk scores were compared across the different models alongside model performance metrics. RESULTS: We found substantial variation in risk on the individual level across the models. The 95 percentile range of risks in model F for patients with risks between 9 and 10% according to model A was 4.4-16.3% and 4.6-15.8% for females and males respectively. Despite this, the models were difficult to distinguish using common performance metrics (Harrell's C ranged from 0.86 to 0.87). The largest contributing factor to variation in risk was adjusting for a secular trend (HR per calendar year, 0.96 [0.95-0.96] and 0.96 [0.96-0.96]). When extrapolating to the UK population, we found that 3.8 million patients may be reclassified as eligible for statin prescription depending on the model used. A key limitation of this study was that we could not assess the variation in risk that may be caused by risk factors missing from the database (such as diet or physical activity). CONCLUSIONS: Risk prediction models that use routinely collected data provide estimates strongly dependent on modelling decisions. Despite this large variability in patient risk, the models appear to perform similarly according to standard performance metrics. Decision-making should be supplemented with clinical judgement and evidence of additional risk factors. The largest source of variability, a secular trend in CVD incidence, can be accounted for and should be explored in more detail.

Cardiovascular Diseases/diagnosis , Decision Making , Decision Support Techniques , Precision Medicine , Primary Health Care/statistics & numerical data , Adult , Aged , Aged, 80 and over , Cardiovascular Diseases/epidemiology , Cardiovascular Diseases/mortality , Cohort Studies , England/epidemiology , Female , Humans , Hydroxymethylglutaryl-CoA Reductase Inhibitors/therapeutic use , Incidence , Male , Middle Aged , Precision Medicine/methods , Precision Medicine/statistics & numerical data , Primary Health Care/methods , Primary Health Care/standards , Risk Assessment/methods , Risk Factors , Uncertainty