Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults.

Huang, Xiao; Cao, Tianyu; Chen, Liangziqian; Li, Junpei; Tan, Ziheng; Xu, Benjamin; Xu, Richard; Song, Yun; Zhou, Ziyi; Wang, Zhuo; Wei, Yaping; Zhang, Yan; Li, Jianping; Huo, Yong; Qin, Xianhui; Wu, Yanqing; Wang, Xiaobin; Wang, Hong; Cheng, Xiaoshu; Xu, Xiping; Liu, Lishun

Huang, Xiao; Cao, Tianyu; Chen, Liangziqian; Li, Junpei; Tan, Ziheng; Xu, Benjamin; Xu, Richard; Song, Yun; Zhou, Ziyi; Wang, Zhuo; Wei, Yaping; Zhang, Yan; Li, Jianping; Huo, Yong; Qin, Xianhui; Wu, Yanqing; Wang, Xiaobin; Wang, Hong; Cheng, Xiaoshu; Xu, Xiping; Liu, Lishun.

Afiliación

Huang X; Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China.
Cao T; Biological Anthropology, University of California, Santa Barbara, Santa Barbara, CA, United States.
Chen L; Department of Data Management, Shenzhen Evergreen Medical Institute, Shenzhen, China.
Li J; Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China.
Tan Z; Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China.
Xu B; Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, United States.
Xu R; Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States.
Song Y; Department of Data Management, Shenzhen Evergreen Medical Institute, Shenzhen, China.
Zhou Z; Institute of Biomedicine, Anhui Medical University, Hefei, China.
Wang Z; Department of Biomedical Engineering, Graduate School at Shenzhen, Tsinghua University, Shenzhen, China.
Wei Y; Key Laboratory of Precision Nutrition and Food Quality, Ministry of Education, Department of Nutrition and Health, College of Food Sciences and Nutritional Engineering, China Agricultural University, Beijing, China.
Zhang Y; Key Laboratory of Precision Nutrition and Food Quality, Ministry of Education, Department of Nutrition and Health, College of Food Sciences and Nutritional Engineering, China Agricultural University, Beijing, China.
Li J; Department of Cardiology, Peking University First Hospital, Beijing, China.
Huo Y; Department of Cardiology, Peking University First Hospital, Beijing, China.
Qin X; Department of Cardiology, Peking University First Hospital, Beijing, China.
Wu Y; National Clinical Research Study Center for Kidney Disease, The State Key Laboratory for Organ Failure Research, Renal Division, Nanfang Hospital, Southern Medical University, Guangzhou, China.
Wang X; Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China.
Wang H; Department of Population, Family and Reproductive Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, United States.
Cheng X; Department of Cardiovascular Science, Temple University Lewis Katz School of Medicine, Philadelphia, PA, United States.
Xu X; Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China.
Liu L; Key Laboratory of Precision Nutrition and Food Quality, Ministry of Education, Department of Nutrition and Health, College of Food Sciences and Nutritional Engineering, China Agricultural University, Beijing, China.

Front Cardiovasc Med ; 9: 901240, 2022.

Article en En | MEDLINE | ID: mdl-35600480

RESUMEN

Background: Stroke is a major global health burden, and risk prediction is essential for the primary prevention of stroke. However, uncertainty remains about the optimal prediction model for analyzing stroke risk. In this study, we aim to determine the most effective stroke prediction method in a Chinese hypertensive population using machine learning and establish a general methodological pipeline for future analysis. Methods: The training set included 70% of data (n = 14,491) from the China Stroke Primary Prevention Trial (CSPPT). Internal validation was processed with the rest 30% of CSPPT data (n = 6,211), and external validation was conducted using a nested case-control (NCC) dataset (n = 2,568). The primary outcome was the first stroke. Four received analysis methods were processed and compared: logistic regression (LR), stepwise logistic regression (SLR), extreme gradient boosting (XGBoost), and random forest (RF). Population characteristic data with inclusion and exclusion of laboratory variables were separately analyzed. Accuracy, sensitivity, specificity, kappa, and area under receiver operating characteristic curves (AUCs) were used to make model assessments with AUCs the top concern. Data balancing techniques, including random under-sampling (RUS) and synthetic minority over-sampling technique (SMOTE), were applied to process this unbalanced training set. Results: The best model performance was observed in RUS-applied RF model with laboratory variables. Compared with null models (sensitivity = 0, specificity = 100, and mean AUCs = 0.643), data balancing techniques improved overall performance with RUS, demonstrating a more satisfactory effect in the current study (RUS: sensitivity = 63.9; specificity = 53.7; and mean AUCs = 0.624. Adding laboratory variables improved the performance of analysis methods. All results were reconfirmed in validation sets. The top 10 important variables were determined by the analysis method with the best performance. Conclusion: Among the tested methods, the most effective stroke prediction model in targeted population is RUS-applied RF. From the insights, the current study revealed, we provided general frameworks for building machine learning-based prediction models.

Palabras clave

XGBoost; machine learning; primary prevention; risk assessment; stroke

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Front Cardiovasc Med Año: 2022 Tipo del documento: Article País de afiliación: China Pais de publicación: Suiza

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google