Clinical prediction models and the multiverse of madness.

Riley, Richard D; Pate, Alexander; Dhiman, Paula; Archer, Lucinda; Martin, Glen P; Collins, Gary S

Riley, Richard D; Pate, Alexander; Dhiman, Paula; Archer, Lucinda; Martin, Glen P; Collins, Gary S.

Afiliação

Riley RD; College of Medical and Dental Sciences, Institute of Applied Health Research, University of Birmingham, Birmingham, B15 2TT, UK. r.d.riley@bham.ac.uk.
Pate A; National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK. r.d.riley@bham.ac.uk.
Dhiman P; Division of Informatics, Imaging and Data Science, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK.
Archer L; Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
Martin GP; College of Medical and Dental Sciences, Institute of Applied Health Research, University of Birmingham, Birmingham, B15 2TT, UK.
Collins GS; National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre, Birmingham, UK.

BMC Med ; 21(1): 502, 2023 12 18.

Article em En | MEDLINE | ID: mdl-38110939

ABSTRACT

ABSTRACT

BACKGROUND:

Each year, thousands of clinical prediction models are developed to make predictions (e.g. estimated risk) to inform individual diagnosis and prognosis in healthcare. However, most are not reliable for use in clinical practice. MAIN BODY We discuss how the creation of a prediction model (e.g. using regression or machine learning methods) is dependent on the sample and size of data used to develop it-were a different sample of the same size used from the same overarching population, the developed model could be very different even when the same model development methods are used. In other words, for each model created, there exists a multiverse of other potential models for that sample size and, crucially, an individual's predicted value (e.g. estimated risk) may vary greatly across this multiverse. The more an individual's prediction varies across the multiverse, the greater the instability. We show how small development datasets lead to more different models in the multiverse, often with vastly unstable individual predictions, and explain how this can be exposed by using bootstrapping and presenting instability plots. We recommend healthcare researchers seek to use large model development datasets to reduce instability concerns. This is especially important to ensure reliability across subgroups and improve model fairness in practice.

CONCLUSIONS:

Instability is concerning as an individual's predicted value is used to guide their counselling, resource prioritisation, and clinical decision making. If different samples lead to different models with very different predictions for the same individual, then this should cast doubt into using a particular model for that individual. Therefore, visualising, quantifying and reporting the instability in individual-level predictions is essential when proposing a new model.

Assuntos

Modelos Estatísticos; Humanos; Prognóstico; Reprodutibilidade dos Testes

Palavras-chave

Bootstrapping; Clinical prediction model; Instability; Mean absolute prediction error (MAPE); Risk prediction; Variance

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Modelos Estatísticos Limite: Humans Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google