Your browser doesn't support javascript.
loading
Comparison of model feature importance statistics to identify covariates that contribute most to model accuracy in prediction of insomnia.
Huang, Alexander A; Huang, Samuel Y.
Afiliação
  • Huang AA; Northwestern University Feinberg School of Medicine, Chicago, IL, United States of America.
  • Huang SY; Virginia Commonwealth University School of Medicine, Richmond, VA, United States of America.
PLoS One ; 19(7): e0306359, 2024.
Article em En | MEDLINE | ID: mdl-38954735
ABSTRACT
IMPORTANCE Sleep is critical to a person's physical and mental health and there is a need to create high performing machine learning models and critically understand how models rank covariates.

OBJECTIVE:

The study aimed to compare how different model metrics rank the importance of various covariates. DESIGN, SETTING, AND

PARTICIPANTS:

A cross-sectional cohort study was conducted retrospectively using the National Health and Nutrition Examination Survey (NHANES), which is publicly available.

METHODS:

This study employed univariate logistic models to filter out strong, independent covariates associated with sleep disorder outcome, which were then used in machine-learning models, of which, the most optimal was chosen. The machine-learning model was used to rank model covariates based on gain, cover, and frequency to identify risk factors for sleep disorder and feature importance was evaluated using both univariable and multivariable t-statistics. A correlation matrix was created to determine the similarity of the importance of variables ranked by different model metrics.

RESULTS:

The XGBoost model had the highest mean AUROC of 0.865 (SD = 0.010) with Accuracy of 0.762 (SD = 0.019), F1 of 0.875 (SD = 0.766), Sensitivity of 0.768 (SD = 0.023), Specificity of 0.782 (SD = 0.025), Positive Predictive Value of 0.806 (SD = 0.025), and Negative Predictive Value of 0.737 (SD = 0.034). The model metrics from the machine learning of gain and cover were strongly positively correlated with one another (r > 0.70). Model metrics from the multivariable model and univariable model were weakly negatively correlated with machine learning model metrics (R between -0.3 and 0).

CONCLUSION:

The ranking of important variables associated with sleep disorder in this cohort from the machine learning models were not related to those from regression models.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina / Distúrbios do Início e da Manutenção do Sono Limite: Adult / Aged / Female / Humans / Male / Middle aged Idioma: En Revista: PLoS One Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina / Distúrbios do Início e da Manutenção do Sono Limite: Adult / Aged / Female / Humans / Male / Middle aged Idioma: En Revista: PLoS One Ano de publicação: 2024 Tipo de documento: Article