|

1.

SPIN-PM: a consensus framework to evaluate the presence of spin in studies on prediction models.

Andaur Navarro, Constanza L; Damen, Johanna A A; Ghannad, Mona; Dhiman, Paula; van Smeden, Maarten; Reitsma, Johannes B; Collins, Gary S; Riley, Richard D; Moons, Karel G M; Hooft, Lotty.

J Clin Epidemiol ; 170: 111364, 2024 Apr 15.

Article En | MEDLINE | ID: mdl-38631529

OBJECTIVES: To develop a framework to identify and evaluate spin practices and its facilitators in studies on clinical prediction model regardless of the modeling technique. STUDY DESIGN AND SETTING: We followed a three-phase consensus process: (1) premeeting literature review to generate items to be included; (2) a series of structured meetings to provide comments discussed and exchanged viewpoints on items to be included with a panel of experienced researchers; and (3) postmeeting review on final list of items and examples to be included. Through this iterative consensus process, a framework was derived after all panel's researchers agreed. RESULTS: This consensus process involved a panel of eight researchers and resulted in SPIN-Prediction Models which consists of two categories of spin (misleading interpretation and misleading transportability), and within these categories, two forms of spin (spin practices and facilitators of spin). We provide criteria and examples. CONCLUSION: We proposed this guidance aiming to facilitate not only the accurate reporting but also an accurate interpretation and extrapolation of clinical prediction models which will likely improve the reporting quality of subsequent research, as well as reduce research waste.

2.

Risk of bias assessments in individual participant data meta-analyses of test accuracy and prediction models: a review shows improvements are needed.

Levis, Brooke; Snell, Kym I E; Damen, Johanna A A; Hattle, Miriam; Ensor, Joie; Dhiman, Paula; Andaur Navarro, Constanza L; Takwoingi, Yemisi; Whiting, Penny F; Debray, Thomas P A; Reitsma, Johannes B; Moons, Karel G M; Collins, Gary S; Riley, Richard D.

J Clin Epidemiol ; 165: 111206, 2024 Jan.

Article En | MEDLINE | ID: mdl-37925059

OBJECTIVES: Risk of bias assessments are important in meta-analyses of both aggregate and individual participant data (IPD). There is limited evidence on whether and how risk of bias of included studies or datasets in IPD meta-analyses (IPDMAs) is assessed. We review how risk of bias is currently assessed, reported, and incorporated in IPDMAs of test accuracy and clinical prediction model studies and provide recommendations for improvement. STUDY DESIGN AND SETTING: We searched PubMed (January 2018-May 2020) to identify IPDMAs of test accuracy and prediction models, then elicited whether each IPDMA assessed risk of bias of included studies and, if so, how assessments were reported and subsequently incorporated into the IPDMAs. RESULTS: Forty-nine IPDMAs were included. Nineteen of 27 (70%) test accuracy IPDMAs assessed risk of bias, compared to 5 of 22 (23%) prediction model IPDMAs. Seventeen of 19 (89%) test accuracy IPDMAs used Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2), but no tool was used consistently among prediction model IPDMAs. Of IPDMAs assessing risk of bias, 7 (37%) test accuracy IPDMAs and 1 (20%) prediction model IPDMA provided details on the information sources (e.g., the original manuscript, IPD, primary investigators) used to inform judgments, and 4 (21%) test accuracy IPDMAs and 1 (20%) prediction model IPDMA provided information or whether assessments were done before or after obtaining the IPD of the included studies or datasets. Of all included IPDMAs, only seven test accuracy IPDMAs (26%) and one prediction model IPDMA (5%) incorporated risk of bias assessments into their meta-analyses. For future IPDMA projects, we provide guidance on how to adapt tools such as Prediction model Risk Of Bias ASsessment Tool (for prediction models) and QUADAS-2 (for test accuracy) to assess risk of bias of included primary studies and their IPD. CONCLUSION: Risk of bias assessments and their reporting need to be improved in IPDMAs of test accuracy and, especially, prediction model studies. Using recommended tools, both before and after IPD are obtained, will address this.

Data Accuracy , Models, Statistical , Humans , Prognosis , Bias

3.

Dementia prediction in the general population using clinically accessible variables: a proof-of-concept study using machine learning. The AGES-Reykjavik study.

Twait, Emma L; Andaur Navarro, Constanza L; Gudnason, Vilmunur; Hu, Yi-Han; Launer, Lenore J; Geerlings, Mirjam I.

BMC Med Inform Decis Mak ; 23(1): 168, 2023 08 28.

Article En | MEDLINE | ID: mdl-37641038

BACKGROUND: Early identification of dementia is crucial for prompt intervention for high-risk individuals in the general population. External validation studies on prognostic models for dementia have highlighted the need for updated models. The use of machine learning in dementia prediction is in its infancy and may improve predictive performance. The current study aimed to explore the difference in performance of machine learning algorithms compared to traditional statistical techniques, such as logistic and Cox regression, for prediction of all-cause dementia. Our secondary aim was to assess the feasibility of only using clinically accessible predictors rather than MRI predictors. METHODS: Data are from 4,793 participants in the population-based AGES-Reykjavik Study without dementia or mild cognitive impairment at baseline (mean age: 76 years, % female: 59%). Cognitive, biometric, and MRI assessments (total: 59 variables) were collected at baseline, with follow-up of incident dementia diagnoses for a maximum of 12 years. Machine learning algorithms included elastic net regression, random forest, support vector machine, and elastic net Cox regression. Traditional statistical methods for comparison were logistic and Cox regression. Model 1 was fit using all variables and model 2 was after feature selection using the Boruta package. A third model explored performance when leaving out neuroimaging markers (clinically accessible model). Ten-fold cross-validation, repeated ten times, was implemented during training. Upsampling was used to account for imbalanced data. Tuning parameters were optimized for recalibration automatically using the caret package in R. RESULTS: 19% of participants developed all-cause dementia. Machine learning algorithms were comparable in performance to logistic regression in all three models. However, a slight added performance was observed in the elastic net Cox regression in the third model (c = 0.78, 95% CI: 0.78-0.78) compared to the traditional Cox regression (c = 0.75, 95% CI: 0.74-0.77). CONCLUSIONS: Supervised machine learning only showed added benefit when using survival techniques. Removing MRI markers did not significantly worsen our model's performance. Further, we presented the use of a nomogram using machine learning methods, showing transportability for the use of machine learning models in clinical practice. External validation is needed to assess the use of this model in other populations. Identifying high-risk individuals will amplify prevention efforts and selection for clinical trials.

Dementia , Machine Learning , Humans , Female , Aged , Male , Proof of Concept Study , Supervised Machine Learning , Algorithms , Dementia/diagnosis , Dementia/epidemiology

4.

Systematic review finds "spin" practices and poor reporting standards in studies on machine learning-based prediction models.

Andaur Navarro, Constanza L; Damen, Johanna A A; Takada, Toshihiko; Nijman, Steven W J; Dhiman, Paula; Ma, Jie; Collins, Gary S; Bajpai, Ram; Riley, Richard D; Moons, Karel G M; Hooft, Lotty.

J Clin Epidemiol ; 158: 99-110, 2023 06.

Article En | MEDLINE | ID: mdl-37024020

OBJECTIVES: We evaluated the presence and frequency of spin practices and poor reporting standards in studies that developed and/or validated clinical prediction models using supervised machine learning techniques. STUDY DESIGN AND SETTING: We systematically searched PubMed from 01/2018 to 12/2019 to identify diagnostic and prognostic prediction model studies using supervised machine learning. No restrictions were placed on data source, outcome, or clinical specialty. RESULTS: We included 152 studies: 38% reported diagnostic models and 62% prognostic models. When reported, discrimination was described without precision estimates in 53/71 abstracts (74.6% [95% CI 63.4-83.3]) and 53/81 main texts (65.4% [95% CI 54.6-74.9]). Of the 21 abstracts that recommended the model to be used in daily practice, 20 (95.2% [95% CI 77.3-99.8]) lacked any external validation of the developed models. Likewise, 74/133 (55.6% [95% CI 47.2-63.8]) studies made recommendations for clinical use in their main text without any external validation. Reporting guidelines were cited in 13/152 (8.6% [95% CI 5.1-14.1]) studies. CONCLUSION: Spin practices and poor reporting standards are also present in studies on prediction models using machine learning techniques. A tailored framework for the identification of spin will enhance the sound reporting of prediction model studies.

Machine Learning , Humans , Prognosis

5.

Overinterpretation of findings in machine learning prediction model studies in oncology: a systematic review.

Dhiman, Paula; Ma, Jie; Andaur Navarro, Constanza L; Speich, Benjamin; Bullock, Garrett; Damen, Johanna A A; Hooft, Lotty; Kirtley, Shona; Riley, Richard D; Van Calster, Ben; Moons, Karel G M; Collins, Gary S.

J Clin Epidemiol ; 157: 120-133, 2023 05.

Article En | MEDLINE | ID: mdl-36935090

OBJECTIVES: In biomedical research, spin is the overinterpretation of findings, and it is a growing concern. To date, the presence of spin has not been evaluated in prognostic model research in oncology, including studies developing and validating models for individualized risk prediction. STUDY DESIGN AND SETTING: We conducted a systematic review, searching MEDLINE and EMBASE for oncology-related studies that developed and validated a prognostic model using machine learning published between 1st January, 2019, and 5th September, 2019. We used existing spin frameworks and described areas of highly suggestive spin practices. RESULTS: We included 62 publications (including 152 developed models; 37 validated models). Reporting was inconsistent between methods and the results in 27% of studies due to additional analysis and selective reporting. Thirty-two studies (out of 36 applicable studies) reported comparisons between developed models in their discussion and predominantly used discrimination measures to support their claims (78%). Thirty-five studies (56%) used an overly strong or leading word in their title, abstract, results, discussion, or conclusion. CONCLUSION: The potential for spin needs to be considered when reading, interpreting, and using studies that developed and validated prognostic models in oncology. Researchers should carefully report their prognostic model research using words that reflect their actual results and strength of evidence.

Medical Oncology , Research , Humans , Prognosis , Machine Learning

6.

Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models.

Andaur Navarro, Constanza L; Damen, Johanna A A; van Smeden, Maarten; Takada, Toshihiko; Nijman, Steven W J; Dhiman, Paula; Ma, Jie; Collins, Gary S; Bajpai, Ram; Riley, Richard D; Moons, Karel G M; Hooft, Lotty.

J Clin Epidemiol ; 154: 8-22, 2023 02.

Article En | MEDLINE | ID: mdl-36436815

BACKGROUND AND OBJECTIVES: We sought to summarize the study design, modelling strategies, and performance measures reported in studies on clinical prediction models developed using machine learning techniques. METHODS: We search PubMed for articles published between 01/01/2018 and 31/12/2019, describing the development or the development with external validation of a multivariable prediction model using any supervised machine learning technique. No restrictions were made based on study design, data source, or predicted patient-related health outcomes. RESULTS: We included 152 studies, 58 (38.2% [95% CI 30.8-46.1]) were diagnostic and 94 (61.8% [95% CI 53.9-69.2]) prognostic studies. Most studies reported only the development of prediction models (n = 133, 87.5% [95% CI 81.3-91.8]), focused on binary outcomes (n = 131, 86.2% [95% CI 79.8-90.8), and did not report a sample size calculation (n = 125, 82.2% [95% CI 75.4-87.5]). The most common algorithms used were support vector machine (n = 86/522, 16.5% [95% CI 13.5-19.9]) and random forest (n = 73/522, 14% [95% CI 11.3-17.2]). Values for area under the Receiver Operating Characteristic curve ranged from 0.45 to 1.00. Calibration metrics were often missed (n = 494/522, 94.6% [95% CI 92.4-96.3]). CONCLUSION: Our review revealed that focus is required on handling of missing values, methods for internal validation, and reporting of calibration to improve the methodological conduct of studies on machine learning-based prediction models. SYSTEMATIC REVIEW REGISTRATION: PROSPERO, CRD42019161764.

Machine Learning , Supervised Machine Learning , Humans , Algorithms , Prognosis , ROC Curve

7.

Risk of bias of prognostic models developed using machine learning: a systematic review in oncology.

Dhiman, Paula; Ma, Jie; Andaur Navarro, Constanza L; Speich, Benjamin; Bullock, Garrett; Damen, Johanna A A; Hooft, Lotty; Kirtley, Shona; Riley, Richard D; Van Calster, Ben; Moons, Karel G M; Collins, Gary S.

Diagn Progn Res ; 6(1): 13, 2022 Jul 07.

Article En | MEDLINE | ID: mdl-35794668

BACKGROUND: Prognostic models are used widely in the oncology domain to guide medical decision-making. Little is known about the risk of bias of prognostic models developed using machine learning and the barriers to their clinical uptake in the oncology domain. METHODS: We conducted a systematic review and searched MEDLINE and EMBASE databases for oncology-related studies developing a prognostic model using machine learning methods published between 01/01/2019 and 05/09/2019. The primary outcome was risk of bias, judged using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). We described risk of bias overall and for each domain, by development and validation analyses separately. RESULTS: We included 62 publications (48 development-only; 14 development with validation). 152 models were developed across all publications and 37 models were validated. 84% (95% CI: 77 to 89) of developed models and 51% (95% CI: 35 to 67) of validated models were at overall high risk of bias. Bias introduced in the analysis was the largest contributor to the overall risk of bias judgement for model development and validation. 123 (81%, 95% CI: 73.8 to 86.4) developed models and 19 (51%, 95% CI: 35.1 to 67.3) validated models were at high risk of bias due to their analysis, mostly due to shortcomings in the analysis including insufficient sample size and split-sample internal validation. CONCLUSIONS: The quality of machine learning based prognostic models in the oncology domain is poor and most models have a high risk of bias, contraindicating their use in clinical practice. Adherence to better standards is urgently needed, with a focus on sample size estimation and analysis methods, to improve the quality of these models.

8.

Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review.

Dhiman, Paula; Ma, Jie; Andaur Navarro, Constanza L; Speich, Benjamin; Bullock, Garrett; Damen, Johanna A A; Hooft, Lotty; Kirtley, Shona; Riley, Richard D; Van Calster, Ben; Moons, Karel G M; Collins, Gary S.

BMC Med Res Methodol ; 22(1): 101, 2022 04 08.

Article En | MEDLINE | ID: mdl-35395724

BACKGROUND: Describe and evaluate the methodological conduct of prognostic prediction models developed using machine learning methods in oncology. METHODS: We conducted a systematic review in MEDLINE and Embase between 01/01/2019 and 05/09/2019, for studies developing a prognostic prediction model using machine learning methods in oncology. We used the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement, Prediction model Risk Of Bias ASsessment Tool (PROBAST) and CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) to assess the methodological conduct of included publications. Results were summarised by modelling type: regression-, non-regression-based and ensemble machine learning models. RESULTS: Sixty-two publications met inclusion criteria developing 152 models across all publications. Forty-two models were regression-based, 71 were non-regression-based and 39 were ensemble models. A median of 647 individuals (IQR: 203 to 4059) and 195 events (IQR: 38 to 1269) were used for model development, and 553 individuals (IQR: 69 to 3069) and 50 events (IQR: 17.5 to 326.5) for model validation. A higher number of events per predictor was used for developing regression-based models (median: 8, IQR: 7.1 to 23.5), compared to alternative machine learning (median: 3.4, IQR: 1.1 to 19.1) and ensemble models (median: 1.7, IQR: 1.1 to 6). Sample size was rarely justified (n = 5/62; 8%). Some or all continuous predictors were categorised before modelling in 24 studies (39%). 46% (n = 24/62) of models reporting predictor selection before modelling used univariable analyses, and common method across all modelling types. Ten out of 24 models for time-to-event outcomes accounted for censoring (42%). A split sample approach was the most popular method for internal validation (n = 25/62, 40%). Calibration was reported in 11 studies. Less than half of models were reported or made available. CONCLUSIONS: The methodological conduct of machine learning based clinical prediction models is poor. Guidance is urgently needed, with increased awareness and education of minimum prediction modelling standards. Particular focus is needed on sample size estimation, development and validation analysis methods, and ensuring the model is available for independent validation, to improve quality of machine learning based clinical prediction models.

Machine Learning , Medical Oncology , Research Design , Bias , Humans , Prognosis

9.

Completeness of reporting of clinical prediction models developed using supervised machine learning: a systematic review.

Andaur Navarro, Constanza L; Damen, Johanna A A; Takada, Toshihiko; Nijman, Steven W J; Dhiman, Paula; Ma, Jie; Collins, Gary S; Bajpai, Ram; Riley, Richard D; Moons, Karel G M; Hooft, Lotty.

BMC Med Res Methodol ; 22(1): 12, 2022 Jan 13.

Article En | MEDLINE | ID: mdl-35026997

BACKGROUND: While many studies have consistently found incomplete reporting of regression-based prediction model studies, evidence is lacking for machine learning-based prediction model studies. We aim to systematically review the adherence of Machine Learning (ML)-based prediction model studies to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Statement. METHODS: We included articles reporting on development or external validation of a multivariable prediction model (either diagnostic or prognostic) developed using supervised ML for individualized predictions across all medical fields. We searched PubMed from 1 January 2018 to 31 December 2019. Data extraction was performed using the 22-item checklist for reporting of prediction model studies ( www.TRIPOD-statement.org ). We measured the overall adherence per article and per TRIPOD item. RESULTS: Our search identified 24,814 articles, of which 152 articles were included: 94 (61.8%) prognostic and 58 (38.2%) diagnostic prediction model studies. Overall, articles adhered to a median of 38.7% (IQR 31.0-46.4%) of TRIPOD items. No article fully adhered to complete reporting of the abstract and very few reported the flow of participants (3.9%, 95% CI 1.8 to 8.3), appropriate title (4.6%, 95% CI 2.2 to 9.2), blinding of predictors (4.6%, 95% CI 2.2 to 9.2), model specification (5.2%, 95% CI 2.4 to 10.8), and model's predictive performance (5.9%, 95% CI 3.1 to 10.9). There was often complete reporting of source of data (98.0%, 95% CI 94.4 to 99.3) and interpretation of the results (94.7%, 95% CI 90.0 to 97.3). CONCLUSION: Similar to prediction model studies developed using conventional regression-based techniques, the completeness of reporting is poor. Essential information to decide to use the model (i.e. model specification and its performance) is rarely reported. However, some items and sub-items of TRIPOD might be less suitable for ML-based prediction model studies and thus, TRIPOD requires extensions. Overall, there is an urgent need to improve the reporting quality and usability of research to avoid research waste. SYSTEMATIC REVIEW REGISTRATION: PROSPERO, CRD42019161764.

Checklist , Models, Statistical , Humans , Machine Learning , Prognosis , Supervised Machine Learning

10.

Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review.

Andaur Navarro, Constanza L; Damen, Johanna A A; Takada, Toshihiko; Nijman, Steven W J; Dhiman, Paula; Ma, Jie; Collins, Gary S; Bajpai, Ram; Riley, Richard D; Moons, Karel G M; Hooft, Lotty.

BMJ ; 375: n2281, 2021 10 20.

Article En | MEDLINE | ID: mdl-34670780

OBJECTIVE: To assess the methodological quality of studies on prediction models developed using machine learning techniques across all medical specialties. DESIGN: Systematic review. DATA SOURCES: PubMed from 1 January 2018 to 31 December 2019. ELIGIBILITY CRITERIA: Articles reporting on the development, with or without external validation, of a multivariable prediction model (diagnostic or prognostic) developed using supervised machine learning for individualised predictions. No restrictions applied for study design, data source, or predicted patient related health outcomes. REVIEW METHODS: Methodological quality of the studies was determined and risk of bias evaluated using the prediction risk of bias assessment tool (PROBAST). This tool contains 21 signalling questions tailored to identify potential biases in four domains. Risk of bias was measured for each domain (participants, predictors, outcome, and analysis) and each study (overall). RESULTS: 152 studies were included: 58 (38%) included a diagnostic prediction model and 94 (62%) a prognostic prediction model. PROBAST was applied to 152 developed models and 19 external validations. Of these 171 analyses, 148 (87%, 95% confidence interval 81% to 91%) were rated at high risk of bias. The analysis domain was most frequently rated at high risk of bias. Of the 152 models, 85 (56%, 48% to 64%) were developed with an inadequate number of events per candidate predictor, 62 handled missing data inadequately (41%, 33% to 49%), and 59 assessed overfitting improperly (39%, 31% to 47%). Most models used appropriate data sources to develop (73%, 66% to 79%) and externally validate the machine learning based prediction models (74%, 51% to 88%). Information about blinding of outcome and blinding of predictors was, however, absent in 60 (40%, 32% to 47%) and 79 (52%, 44% to 60%) of the developed models, respectively. CONCLUSION: Most studies on machine learning based prediction models show poor methodological quality and are at high risk of bias. Factors contributing to risk of bias include small study size, poor handling of missing data, and failure to deal with overfitting. Efforts to improve the design, conduct, reporting, and validation of such studies are necessary to boost the application of machine learning based prediction models in clinical practice. SYSTEMATIC REVIEW REGISTRATION: PROSPERO CRD42019161764.

Bias , Clinical Decision Rules , Data Interpretation, Statistical , Machine Learning , Models, Statistical , Humans , Multivariate Analysis , Risk

11.

Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence.

Collins, Gary S; Dhiman, Paula; Andaur Navarro, Constanza L; Ma, Jie; Hooft, Lotty; Reitsma, Johannes B; Logullo, Patricia; Beam, Andrew L; Peng, Lily; Van Calster, Ben; van Smeden, Maarten; Riley, Richard D; Moons, Karel Gm.

BMJ Open ; 11(7): e048008, 2021 07 09.

Article En | MEDLINE | ID: mdl-34244270

INTRODUCTION: The Transparent Reporting of a multivariable prediction model of Individual Prognosis Or Diagnosis (TRIPOD) statement and the Prediction model Risk Of Bias ASsessment Tool (PROBAST) were both published to improve the reporting and critical appraisal of prediction model studies for diagnosis and prognosis. This paper describes the processes and methods that will be used to develop an extension to the TRIPOD statement (TRIPOD-artificial intelligence, AI) and the PROBAST (PROBAST-AI) tool for prediction model studies that applied machine learning techniques. METHODS AND ANALYSIS: TRIPOD-AI and PROBAST-AI will be developed following published guidance from the EQUATOR Network, and will comprise five stages. Stage 1 will comprise two systematic reviews (across all medical fields and specifically in oncology) to examine the quality of reporting in published machine-learning-based prediction model studies. In stage 2, we will consult a diverse group of key stakeholders using a Delphi process to identify items to be considered for inclusion in TRIPOD-AI and PROBAST-AI. Stage 3 will be virtual consensus meetings to consolidate and prioritise key items to be included in TRIPOD-AI and PROBAST-AI. Stage 4 will involve developing the TRIPOD-AI checklist and the PROBAST-AI tool, and writing the accompanying explanation and elaboration papers. In the final stage, stage 5, we will disseminate TRIPOD-AI and PROBAST-AI via journals, conferences, blogs, websites (including TRIPOD, PROBAST and EQUATOR Network) and social media. TRIPOD-AI will provide researchers working on prediction model studies based on machine learning with a reporting guideline that can help them report key details that readers need to evaluate the study quality and interpret its findings, potentially reducing research waste. We anticipate PROBAST-AI will help researchers, clinicians, systematic reviewers and policymakers critically appraise the design, conduct and analysis of machine learning based prediction model studies, with a robust standardised tool for bias evaluation. ETHICS AND DISSEMINATION: Ethical approval has been granted by the Central University Research Ethics Committee, University of Oxford on 10-December-2020 (R73034/RE001). Findings from this study will be disseminated through peer-review publications. PROSPERO REGISTRATION NUMBER: CRD42019140361 and CRD42019161764.

Artificial Intelligence , Checklist , Bias , Humans , Prognosis , Research Design , Risk Assessment

12.

Protocol for a systematic review on the methodological and reporting quality of prediction model studies using machine learning techniques.

Andaur Navarro, Constanza L; Damen, Johanna A A G; Takada, Toshihiko; Nijman, Steven W J; Dhiman, Paula; Ma, Jie; Collins, Gary S; Bajpai, Ram; Riley, Richard D; Moons, Karel Gm; Hooft, Lotty.

BMJ Open ; 10(11): e038832, 2020 Nov 11.

Article En | MEDLINE | ID: mdl-33177137

INTRODUCTION: Studies addressing the development and/or validation of diagnostic and prognostic prediction models are abundant in most clinical domains. Systematic reviews have shown that the methodological and reporting quality of prediction model studies is suboptimal. Due to the increasing availability of larger, routinely collected and complex medical data, and the rising application of Artificial Intelligence (AI) or machine learning (ML) techniques, the number of prediction model studies is expected to increase even further. Prediction models developed using AI or ML techniques are often labelled as a 'black box' and little is known about their methodological and reporting quality. Therefore, this comprehensive systematic review aims to evaluate the reporting quality, the methodological conduct, and the risk of bias of prediction model studies that applied ML techniques for model development and/or validation. METHODS AND ANALYSIS: A search will be performed in PubMed to identify studies developing and/or validating prediction models using any ML methodology and across all medical fields. Studies will be included if they were published between January 2018 and December 2019, predict patient-related outcomes, use any study design or data source, and available in English. Screening of search results and data extraction from included articles will be performed by two independent reviewers. The primary outcomes of this systematic review are: (1) the adherence of ML-based prediction model studies to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD), and (2) the risk of bias in such studies as assessed using the Prediction model Risk Of Bias ASsessment Tool (PROBAST). A narrative synthesis will be conducted for all included studies. Findings will be stratified by study type, medical field and prevalent ML methods, and will inform necessary extensions or updates of TRIPOD and PROBAST to better address prediction model studies that used AI or ML techniques. ETHICS AND DISSEMINATION: Ethical approval is not required for this study because only available published data will be analysed. Findings will be disseminated through peer-reviewed publications and scientific conferences. SYSTEMATIC REVIEW REGISTRATION: PROSPERO, CRD42019161764.

Machine Learning , Research Design , Bias , Humans , Prognosis , Systematic Reviews as Topic

13.

Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal

Wynants, Laure; Van Calster, Ben; Collins, Gary S; Riley, Richard D; Heinze, Georg; Schuit, Ewoud; Bonten, Marc M J; Dahly, Darren L; Damen, Johanna A A; Debray, Thomas P A; de Jong, Valentijn M T; De Vos, Maarten; Dhiman, Paul; Haller, Maria C; Harhay, Michael O; Henckaerts, Liesbet; Heus, Pauline; Kammer, Michael; Kreuzberger, Nina; Lohmann, Anna; Luijken, Kim; Ma, Jie; Martin, Glen P; McLernon, David J; Andaur Navarro, Constanza L; Reitsma, Johannes B; Sergeant, Jamie C; Shi, Chunhu; Skoetz, Nicole; Smits, Luc J M; Snell, Kym I E; Sperrin, Matthew; Spijker, René; Steyerberg, Ewout W; Takada, Toshihiko; Tzoulaki, Ioanna; van Kuijk, Sander M J; van Bussel, Bas; van der Horst, Iwan C C; van Royen, Florien S; Verbakel, Jan Y; Wallisch, Christine; Wilkinson, Jack; Wolff, Robert; Hooft, Lotty; Moons, Karel G M; van Smeden, Maarten.

BMJ ; 369: m1328, 2020 04 07.

Article En | MEDLINE | ID: mdl-32265220

OBJECTIVE: To review and appraise the validity and usefulness of published and preprint reports of prediction models for diagnosing coronavirus disease 2019 (covid-19) in patients with suspected infection, for prognosis of patients with covid-19, and for detecting people in the general population at increased risk of covid-19 infection or being admitted to hospital with the disease. DESIGN: Living systematic review and critical appraisal by the COVID-PRECISE (Precise Risk Estimation to optimise covid-19 Care for Infected or Suspected patients in diverse sEttings) group. DATA SOURCES: PubMed and Embase through Ovid, up to 1 July 2020, supplemented with arXiv, medRxiv, and bioRxiv up to 5 May 2020. STUDY SELECTION: Studies that developed or validated a multivariable covid-19 related prediction model. DATA EXTRACTION: At least two authors independently extracted data using the CHARMS (critical appraisal and data extraction for systematic reviews of prediction modelling studies) checklist; risk of bias was assessed using PROBAST (prediction model risk of bias assessment tool). RESULTS: 37 421 titles were screened, and 169 studies describing 232 prediction models were included. The review identified seven models for identifying people at risk in the general population; 118 diagnostic models for detecting covid-19 (75 were based on medical imaging, 10 to diagnose disease severity); and 107 prognostic models for predicting mortality risk, progression to severe disease, intensive care unit admission, ventilation, intubation, or length of hospital stay. The most frequent types of predictors included in the covid-19 prediction models are vital signs, age, comorbidities, and image features. Flu-like symptoms are frequently predictive in diagnostic models, while sex, C reactive protein, and lymphocyte counts are frequent prognostic factors. Reported C index estimates from the strongest form of validation available per model ranged from 0.71 to 0.99 in prediction models for the general population, from 0.65 to more than 0.99 in diagnostic models, and from 0.54 to 0.99 in prognostic models. All models were rated at high or unclear risk of bias, mostly because of non-representative selection of control patients, exclusion of patients who had not experienced the event of interest by the end of the study, high risk of model overfitting, and unclear reporting. Many models did not include a description of the target population (n=27, 12%) or care setting (n=75, 32%), and only 11 (5%) were externally validated by a calibration plot. The Jehi diagnostic model and the 4C mortality score were identified as promising models. CONCLUSION: Prediction models for covid-19 are quickly entering the academic literature to support medical decision making at a time when they are urgently needed. This review indicates that almost all pubished prediction models are poorly reported, and at high risk of bias such that their reported predictive performance is probably optimistic. However, we have identified two (one diagnostic and one prognostic) promising models that should soon be validated in multiple cohorts, preferably through collaborative efforts and data sharing to also allow an investigation of the stability and heterogeneity in their performance across populations and settings. Details on all reviewed models are publicly available at https://www.covprecise.org/. Methodological guidance as provided in this paper should be followed because unreliable predictions could cause more harm than benefit in guiding clinical decisions. Finally, prediction model authors should adhere to the TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) reporting guideline. SYSTEMATIC REVIEW REGISTRATION: Protocol https://osf.io/ehc47/, registration https://osf.io/wy245. READERS' NOTE: This article is a living systematic review that will be updated to reflect emerging evidence. Updates may occur for up to two years from the date of original publication. This version is update 3 of the original article published on 7 April 2020 (BMJ 2020;369:m1328). Previous updates can be found as data supplements (https://www.bmj.com/content/369/bmj.m1328/related#datasupp). When citing this paper please consider adding the update number and date of access for clarity.

Coronavirus Infections/diagnosis , Models, Theoretical , Pneumonia, Viral/diagnosis , COVID-19 , Coronavirus , Disease Progression , Hospitalization/statistics & numerical data , Humans , Multivariate Analysis , Pandemics , Prognosis