Your browser doesn't support javascript.
loading
Comparison of machine-learning and logistic regression models for prediction of 30-day unplanned readmission in electronic health records: A development and validation study.
Iwagami, Masao; Inokuchi, Ryota; Kawakami, Eiryo; Yamada, Tomohide; Goto, Atsushi; Kuno, Toshiki; Hashimoto, Yohei; Michihata, Nobuaki; Goto, Tadahiro; Shinozaki, Tomohiro; Sun, Yu; Taniguchi, Yuta; Komiyama, Jun; Uda, Kazuaki; Abe, Toshikazu; Tamiya, Nanako.
Afiliación
  • Iwagami M; Department of Health Services Research, Institute of Medicine, University of Tsukuba, Tsukuba, Ibaraki, Japan.
  • Inokuchi R; Health Services Research and Development Center, University of Tsukuba, Tsukuba, Ibaraki, Japan.
  • Kawakami E; Digital Society Division, Cyber Medicine Research Center, University of Tsukuba, Tsukuba, Ibaraki, Japan.
  • Yamada T; Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, United Kingdom.
  • Goto A; Department of Health Services Research, Institute of Medicine, University of Tsukuba, Tsukuba, Ibaraki, Japan.
  • Kuno T; Health Services Research and Development Center, University of Tsukuba, Tsukuba, Ibaraki, Japan.
  • Hashimoto Y; Department of Clinical Engineering, The University of Tokyo Hospital, Tokyo, Japan.
  • Michihata N; Department of Emergency and Critical Care Medicine, The University of Tokyo Hospital, Tokyo, Japan.
  • Goto T; Department of Artificial Intelligence Medicine, Graduate School of Medicine, Chiba University, Chiba, Chiba, Japan.
  • Shinozaki T; Advanced Data Science Project (ADSP), RIKEN Information R&D and Strategy Headquarters, RIKEN, Yokohama, Kanagawa, Japan.
  • Sun Y; Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.
  • Taniguchi Y; Department of Public Health, School of Medicine, Yokohama City University, Yokohama, Kanagawa, Japan.
  • Komiyama J; Division of Cardiology, Montefiore Medical Center, Albert Einstein College of Medicine, NY, United States of America.
  • Uda K; Cardiology Division, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States of America.
  • Abe T; Department of Ophthalmology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.
  • Tamiya N; Department of Clinical Epidemiology and Health Economics, School of Public Health, The University of Tokyo, Tokyo, Japan.
PLOS Digit Health ; 3(8): e0000578, 2024 Aug.
Article en En | MEDLINE | ID: mdl-39163277
ABSTRACT
It is expected but unknown whether machine-learning models can outperform regression models, such as a logistic regression (LR) model, especially when the number and types of predictor variables increase in electronic health records (EHRs). We aimed to compare the predictive performance of gradient-boosted decision tree (GBDT), random forest (RF), deep neural network (DNN), and LR with the least absolute shrinkage and selection operator (LR-LASSO) for unplanned readmission. We used EHRs of patients discharged alive from 38 hospitals in 2015-2017 for derivation and in 2018 for validation, including basic characteristics, diagnosis, surgery, procedure, and drug codes, and blood-test results. The outcome was 30-day unplanned readmission. We created six patterns of data tables having different numbers of binary variables (that ≥5% or ≥1% of patients or ≥10 patients had) with and without blood-test results. For each pattern of data tables, we used the derivation data to establish the machine-learning and LR models, and used the validation data to evaluate the performance of each model. The incidence of outcome was 6.8% (23,108/339,513 discharges) and 6.4% (7,507/118,074 discharges) in the derivation and validation datasets, respectively. For the first data table with the smallest number of variables (102 variables that ≥5% of patients had, without blood-test results), the c-statistic was highest for GBDT (0.740), followed by RF (0.734), LR-LASSO (0.720), and DNN (0.664). For the last data table with the largest number of variables (1543 variables that ≥10 patients had, including blood-test results), the c-statistic was highest for GBDT (0.764), followed by LR-LASSO (0.755), RF (0.751), and DNN (0.720), suggesting that the difference between GBDT and LR-LASSO was small and their 95% confidence intervals overlapped. In conclusion, GBDT generally outperformed LR-LASSO to predict unplanned readmission, but the difference of c-statistic became smaller as the number of variables was increased and blood-test results were used.

Texto completo: 1 Base de datos: MEDLINE Idioma: En Revista: PLOS Digit Health Año: 2024 Tipo del documento: Article

Texto completo: 1 Base de datos: MEDLINE Idioma: En Revista: PLOS Digit Health Año: 2024 Tipo del documento: Article