Enhanced feature selection and ensemble learning for cardiovascular disease prediction: hybrid GOL2-2 T and adaptive boosted decision fusion with babysitting refinement.

Praveen, S Phani; Hasan, Mohammad Kamrul; Abdullah, Siti Norul Huda Sheikh; Sirisha, Uddagiri; Tirumanadham, N S Koti Mani Kumar; Islam, Shayla; Ahmed, Fatima Rayan Awad; Ahmed, Thowiba E; Noboni, Ayman Afrin; Sampedro, Gabriel Avelino; Yeun, Chan Yeob; Ghazal, Taher M

Praveen, S Phani; Hasan, Mohammad Kamrul; Abdullah, Siti Norul Huda Sheikh; Sirisha, Uddagiri; Tirumanadham, N S Koti Mani Kumar; Islam, Shayla; Ahmed, Fatima Rayan Awad; Ahmed, Thowiba E; Noboni, Ayman Afrin; Sampedro, Gabriel Avelino; Yeun, Chan Yeob; Ghazal, Taher M.

Affiliation

Praveen SP; Department of Computer Science and Engineering, Prasad V Potluri Siddhartha Institute of Technology, Vijayawada, India.
Hasan MK; Faculty of Information Science and Technology, University Kebangsaan Malaysia, Bangi, Selangor, Malaysia.
Abdullah SNHS; Faculty of Information Science and Technology, University Kebangsaan Malaysia, Bangi, Selangor, Malaysia.
Sirisha U; Department of Computer Science and Engineering, Prasad V Potluri Siddhartha Institute of Technology, Vijayawada, India.
Tirumanadham NSKMK; Department of Computer Science and Engineering, Sir C R Reddy College of Engineering, Eluru, India.
Islam S; Institute of Computer Science and Digital innovation, UCSI University, Kuala Lumpur, Malaysia.
Ahmed FRA; Computer Science Department, College of Computer Engineering and Science, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia.
Ahmed TE; Computer Science Department, College of Science and Humanities-Jubail, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia.
Noboni AA; Department of Surgery, Medical College For Women and Hospital, Dhaka, Bangladesh.
Sampedro GA; Faculty of Information and Communication Studies, University of the Philippines Open University, Los Baños, Philippines.
Yeun CY; Center for Computational Imaging and Visual Innovations, De La Salle University, Manila, Philippines.
Ghazal TM; Centre for Cyber Physical Systems, Computer Science Department, Khalifa University, Abu Dhabi, United Arab Emirates.

Front Med (Lausanne) ; 11: 1407376, 2024.

Article de En | MEDLINE | ID: mdl-39071085

ABSTRACT

ABSTRACT

Introduction:

Global Cardiovascular disease (CVD) is still one of the leading causes of death and requires the enhancement of diagnostic methods for the effective detection of early signs and prediction of the disease outcomes. The current diagnostic tools are cumbersome and imprecise especially with complex diseases, thus emphasizing the incorporation of new machine learning applications in differential diagnosis.

Methods:

This paper presents a new machine learning approach that uses MICE for mitigating missing data, the IQR for handling outliers and SMOTE to address first imbalance distance. Additionally, to select optimal features, we introduce the Hybrid 2-Tier Grasshopper Optimization with L2 regularization methodology which we call GOL2-2T. One of the promising methods to improve the predictive modelling is an Adaboost decision fusion (ABDF) ensemble learning algorithm with babysitting technique implemented for the hyperparameters tuning. The accuracy, recall, and AUC score will be considered as the measures for assessing the model.

Results:

On the results, our heart disease prediction model yielded an accuracy of 83.0%, and a balanced F1 score of 84.0%. The integration of SMOTE, IQR outlier detection, MICE, and GOL2-2T feature selection enhances robustness while improving the predictive performance. ABDF removed the impurities in the model and elaborated its effectiveness, which proved to be high on predicting the heart disease.

Discussion:

These findings demonstrate the effectiveness of additional machine learning methodologies in medical diagnostics, including early recognition improvements and trustworthy tools for clinicians. But yes, the model's use and extent of work depends on the dataset used for it really. Further work is needed to replicate the model across different datasets and samples as for most models, it will be important to see if the results are generalizable to populations that are not representative of the patient population that was used for the current study.

Mots clés

adaboost decision fusion (ABDF); adaptive boosted decision fusion; cardiovascular disease; interquartile range; multivariate imputation by chained equations; synthetic minority over-sampling technique

Texte intégral

Ajouter à My VHL

Imprimer

XML

PubMed Links

Recherche sur Google

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Langue: En Journal: Front Med (Lausanne) / Front. med. (Lausanne) / Frontiers in medicine (Lausanne) Année: 2024 Type de document: Article Pays d'affiliation: Inde Pays de publication: Suisse

Texte intégral

Ajouter à My VHL

Imprimer

XML

PubMed Links

Recherche sur Google