Your browser doesn't support javascript.
loading
The Impact of Multi-Institution Datasets on the Generalizability of Machine Learning Prediction Models in the ICU.
Rockenschaub, Patrick; Hilbert, Adam; Kossen, Tabea; Elbers, Paul; von Dincklage, Falk; Madai, Vince Istvan; Frey, Dietmar.
Afiliación
  • Rockenschaub P; Charité Lab for Artificial Intelligence in Medicine (CLAIM), CharitéUniversitätsmedizin Berlin, Berlin, Germany .
  • Hilbert A; QUEST Center for Responsible Research, Berlin Institute of Health at Charité Universitätsmedizin Berlin, Berlin, Germany .
  • Kossen T; Institute of Clinical Epidemiology, Public Health, Health Economics, Medical Statistics and Informatics, Medical University of Innsbruck, Innsbruck, Austria .
  • Elbers P; Charité Lab for Artificial Intelligence in Medicine (CLAIM), CharitéUniversitätsmedizin Berlin, Berlin, Germany .
  • von Dincklage F; Charité Lab for Artificial Intelligence in Medicine (CLAIM), CharitéUniversitätsmedizin Berlin, Berlin, Germany .
  • Madai VI; Department of Intensive Care Medicine, Amsterdam UMC, Vrije Universiteit, Amsterdam, The Netherlands.
  • Frey D; Department of Anesthesia, Intensive Care, Emergency and Pain Medicine, Universitätsmedizin Greifswald, Greifswald, Germany.
Crit Care Med ; 52(11): 1710-1721, 2024 Nov 01.
Article en En | MEDLINE | ID: mdl-38958568
ABSTRACT

OBJECTIVES:

To evaluate the transferability of deep learning (DL) models for the early detection of adverse events to previously unseen hospitals.

DESIGN:

Retrospective observational cohort study utilizing harmonized intensive care data from four public datasets.

SETTING:

ICUs across Europe and the United States. PATIENTS Adult patients admitted to the ICU for at least 6 hours who had good data quality.

INTERVENTIONS:

None. MEASUREMENTS AND MAIN

RESULTS:

Using carefully harmonized data from a total of 334,812 ICU stays, we systematically assessed the transferability of DL models for three common adverse events death, acute kidney injury (AKI), and sepsis. We tested whether using more than one data source and/or algorithmically optimizing for generalizability during training improves model performance at new hospitals. We found that models achieved high area under the receiver operating characteristic (AUROC) for mortality (0.838-0.869), AKI (0.823-0.866), and sepsis (0.749-0.824) at the training hospital. As expected, AUROC dropped when models were applied at other hospitals, sometimes by as much as -0.200. Using more than one dataset for training mitigated the performance drop, with multicenter models performing roughly on par with the best single-center model. Dedicated methods promoting generalizability did not noticeably improve performance in our experiments.

CONCLUSIONS:

Our results emphasize the importance of diverse training data for DL-based risk prediction. They suggest that as data from more hospitals become available for training, models may become increasingly generalizable. Even so, good performance at a new hospital still depended on the inclusion of compatible hospitals during training.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Sepsis / Lesión Renal Aguda / Unidades de Cuidados Intensivos Límite: Aged / Female / Humans / Male / Middle aged País/Región como asunto: America do norte / Europa Idioma: En Revista: Crit Care Med Año: 2024 Tipo del documento: Article País de afiliación: Alemania Pais de publicación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Sepsis / Lesión Renal Aguda / Unidades de Cuidados Intensivos Límite: Aged / Female / Humans / Male / Middle aged País/Región como asunto: America do norte / Europa Idioma: En Revista: Crit Care Med Año: 2024 Tipo del documento: Article País de afiliación: Alemania Pais de publicación: Estados Unidos