The importance of being external. methodological insights for the external validation of machine learning models in medicine.

Cabitza, Federico; Campagner, Andrea; Soares, Felipe; García de Guadiana-Romualdo, Luis; Challa, Feyissa; Sulejmani, Adela; Seghezzi, Michela; Carobene, Anna

Cabitza, Federico; Campagner, Andrea; Soares, Felipe; García de Guadiana-Romualdo, Luis; Challa, Feyissa; Sulejmani, Adela; Seghezzi, Michela; Carobene, Anna.

Afiliação

Cabitza F; University of Milano-Bicocca, Viale Sarca 336, Milano, 20126, Italy. Electronic address: federico.cabitza@unimib.it.
Campagner A; University of Milano-Bicocca, Viale Sarca 336, Milano, 20126, Italy.
Soares F; Department of Industrial Engineering - Universidade Federal do Rio Grande do Sul. Porto Alegre, Brazil.
García de Guadiana-Romualdo L; Laboratory Medicine Department, Hospital Universitario Santa Lucia, Cartagena, Spain.
Challa F; National Reference Laboratory for Clinical Chemistry, Ethiopian Public Health Institute, Addis Ababa, Ethiopia.
Sulejmani A; Laboratorio di chimica clinica, Ospedale di Desio e Monza, ASST-Monza, Dipartimento di medicina e chirurgia, Universit di Milano-Bicocca, Monza, Italy.
Seghezzi M; Laboratorio di chimica clinica, Ospedale Papa Giovanni XXIII, Bergamo, Italy.
Carobene A; Laboratory Medicine, IRCCS San Raffaele Scientific Institute, Milan, Italy.

Comput Methods Programs Biomed ; 208: 106288, 2021 Sep.

Article em En | MEDLINE | ID: mdl-34352688

ABSTRACT

ABSTRACT

Background and Objective Medical machine learning (ML) models tend to perform better on data from the same cohort than on new data, often due to overfitting, or co-variate shifts. For these reasons, external validation (EV) is a necessary practice in the evaluation of medical ML. However, there is still a gap in the literature on how to interpret EV results and hence assess the robustness of ML models.

METHODS:

We fill this gap by proposing a meta-validation method, to assess the soundness of EV procedures. In doing so, we complement the usual way to assess EV by considering both dataset cardinality, and the similarity of the EV dataset with respect to the training set. We then investigate how the notions of cardinality and similarity can be used to inform on the reliability of a validation procedure, by integrating them into two summative data visualizations.

RESULTS:

We illustrate our methodology by applying it to the validation of a state-of-the-art COVID-19 diagnostic model on 8 EV sets, collected across 3 different continents. The model performance was moderately impacted by data similarity (Pearson ρ = 0.38, p< 0.001). In the EV, the validated model reported good AUC (average 0.84), acceptable calibration (average 0.17) and utility (average 0.50). The validation datasets were adequate in terms of dataset cardinality and similarity, thus suggesting the soundness of the results. We also provide a qualitative guideline to evaluate the reliability of validation procedures, and we discuss the importance of proper external validation in light of the obtained results.

CONCLUSIONS:

In this paper, we propose a novel, lean methodology to 1) study how the similarity between training and validation sets impacts the generalizability of a ML model; 2) assess the soundness of EV evaluations along three complementary performance dimensions discrimination, utility and calibration; 3) draw conclusions on the robustness of the model under validation. We applied this methodology to a state-of-the-art model for the diagnosis of COVID-19 from routine blood tests, and showed how to interpret the results in light of the presented framework.

Assuntos

COVID-19; Estudos de Coortes; Humanos; Aprendizado de Máquina; Reprodutibilidade dos Testes; SARS-CoV-2

Palavras-chave

COVID-19; Dataset cardinality; Dataset similarity; Medical machine learning; Validation

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: COVID-19 Tipo de estudo: Etiology_studies / Guideline / Incidence_studies / Observational_studies / Prognostic_studies / Qualitative_research / Risk_factors_studies Limite: Humans Idioma: En Revista: Comput Methods Programs Biomed Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2021 Tipo de documento: Article País de publicação: IE / IRELAND / IRLANDA

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google