Your browser doesn't support javascript.
loading
An effective up-sampling approach for breast cancer prediction with imbalanced data: A machine learning model-based comparative analysis.
Tran, Tuan; Le, Uyen; Shi, Yihui.
Afiliação
  • Tran T; College of Pharmacy, California Northstate University, Elk Grove, CA, United States of America.
  • Le U; College of Pharmacy, California Northstate University, Elk Grove, CA, United States of America.
  • Shi Y; College of Medicine, California Northstate University, Elk Grove, CA, United States of America.
PLoS One ; 17(5): e0269135, 2022.
Article em En | MEDLINE | ID: mdl-35622821
ABSTRACT
Early detection of breast cancer plays a critical role in successful treatment that saves thousands of lives of patients every year. Despite massive clinical data have been collected and stored by healthcare organizations, only a small portion of the data has been used to support decision-making for treatments. In this study, we proposed an engineered up-sampling method (ENUS) for handling imbalanced data to improve predictive performance of machine learning models. Our experiment results showed that when the ratio of the minority to the majority class is less than 20%, training models with ENUS improved the balanced accuracy 3.74%, sensitivity 8.36% and F1 score 3.83%. Our study also identified that XGBoost Tree (XGBTree) using ENUS achieved the best performance with an average balanced accuracy of 97.47% (min = 93%, max = 100%), sensitivity of 97.88% (min = 89% and max = 100%), and F1 score of 96.20% (min = 89.5%, max = 100%) in the validation dataset. Furthermore, our ensemble algorithm identified Cell_Shape and Nuclei as the most important attributes in predicting breast cancer. The finding re-affirms the previous knowledge of the relationship between Cell_Shape, Nuclei, and the grades of breast cancer using a data-driven approach. Finally, our experiment showed that Random Forest and Neural Network models had the least training time. Our study provided a comprehensive comparison of a wide range of machine learning methods in predicting breast cancer risk. It can be used as a tool for healthcare practitioners to effectively detect and treat breast cancer.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Neoplasias da Mama Tipo de estudo: Diagnostic_studies / Prognostic_studies / Risk_factors_studies / Screening_studies Limite: Female / Humans Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Neoplasias da Mama Tipo de estudo: Diagnostic_studies / Prognostic_studies / Risk_factors_studies / Screening_studies Limite: Female / Humans Idioma: En Ano de publicação: 2022 Tipo de documento: Article