Búsqueda | BVS Bolivia

An explainable artificial intelligence framework for risk prediction of COPD in smokers.

Wang, Xuchun; Qiao, Yuchao; Cui, Yu; Ren, Hao; Zhao, Ying; Linghu, Liqin; Ren, Jiahui; Zhao, Zhiyang; Chen, Limin; Qiu, Lixia.

BMC Public Health ; 23(1): 2164, 2023 11 06.

Artículo en Inglés | MEDLINE | ID: mdl-37932692

RESUMEN

BACKGROUND: Since the inconspicuous nature of early signs associated with Chronic Obstructive Pulmonary Disease (COPD), individuals often remain unidentified, leading to suboptimal opportunities for timely prevention and treatment. The purpose of this study was to create an explainable artificial intelligence framework combining data preprocessing methods, machine learning methods, and model interpretability methods to identify people at high risk of COPD in the smoking population and to provide a reasonable interpretation of model predictions. METHODS: The data comprised questionnaire information, physical examination data and results of pulmonary function tests before and after bronchodilatation. First, the factorial analysis for mixed data (FAMD), Boruta and NRSBoundary-SMOTE resampling methods were used to solve the missing data, high dimensionality and category imbalance problems. Then, seven classification models (CatBoost, NGBoost, XGBoost, LightGBM, random forest, SVM and logistic regression) were applied to model the risk level, and the best machine learning (ML) model's decisions were explained using the Shapley additive explanations (SHAP) method and partial dependence plot (PDP). RESULTS: In the smoking population, age and 14 other variables were significant factors for predicting COPD. The CatBoost, random forest, and logistic regression models performed reasonably well in unbalanced datasets. CatBoost with NRSBoundary-SMOTE had the best classification performance in balanced datasets when composite indicators (the AUC, F1-score, and G-mean) were used as model comparison criteria. Age, COPD Assessment Test (CAT) score, gross annual income, body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), anhelation, respiratory disease, central obesity, use of polluting fuel for household heating, region, use of polluting fuel for household cooking, and wheezing were important factors for predicting COPD in the smoking population. CONCLUSION: This study combined feature screening methods, unbalanced data processing methods, and advanced machine learning methods to enable early identification of COPD risk groups in the smoking population. COPD risk factors in the smoking population were identified using SHAP and PDP, with the goal of providing theoretical support for targeted screening strategies and smoking population self-management strategies.

Asunto(s)

Enfermedad Pulmonar Obstructiva Crónica , Fumadores , Humanos , Adolescente , Inteligencia Artificial , Fumar Tabaco , Fumar

Diabetes mellitus early warning and factor analysis using ensemble Bayesian networks with SMOTE-ENN and Boruta.

Wang, Xuchun; Ren, Jiahui; Ren, Hao; Song, Wenzhu; Qiao, Yuchao; Zhao, Ying; Linghu, Liqin; Cui, Yu; Zhao, Zhiyang; Chen, Limin; Qiu, Lixia.

Sci Rep ; 13(1): 12718, 2023 08 05.

Artículo en Inglés | MEDLINE | ID: mdl-37543637

RESUMEN

Diabetes mellitus (DM) has become the third chronic non-infectious disease affecting patients after tumor, cardiovascular and cerebrovascular diseases, becoming one of the major public health issues worldwide. Detection of early warning risk factors for DM is key to the prevention of DM, which has been the focus of some previous studies. Therefore, from the perspective of residents' self-management and prevention, this study constructed Bayesian networks (BNs) combining feature screening and multiple resampling techniques for DM monitoring data with a class imbalance in Shanxi Province, China, to detect risk factors in chronic disease monitoring programs and predict the risk of DM. First, univariate analysis and Boruta feature selection algorithm were employed to conduct the preliminary screening of all included risk factors. Then, three resampling techniques, SMOTE, Borderline-SMOTE (BL-SMOTE) and SMOTE-ENN, were adopted to deal with data imbalance. Finally, BNs developed by three algorithms (Tabu, Hill-climbing and MMHC) were constructed using the processed data to find the warning factors that strongly correlate with DM. The results showed that the accuracy of DM classification is significantly improved by the BNs constructed by processed data. In particular, the BNs combined with the SMOTE-ENN resampling improved the most, and the BNs constructed by the Tabu algorithm obtained the best classification performance compared with the hill-climbing and MMHC algorithms. The best-performing joint Boruta-SMOTE-ENN-Tabu model showed that the risk factors of DM included family history, age, central obesity, hyperlipidemia, salt reduction, occupation, heart rate, and BMI.

Asunto(s)

Algoritmos , Diabetes Mellitus , Humanos , Teorema de Bayes , Factores de Riesgo , Análisis Factorial

Machine learning-enabled risk prediction of chronic obstructive pulmonary disease with unbalanced data.

Wang, Xuchun; Ren, Hao; Ren, Jiahui; Song, Wenzhu; Qiao, Yuchao; Ren, Zeping; Zhao, Ying; Linghu, Liqin; Cui, Yu; Zhao, Zhiyang; Chen, Limin; Qiu, Lixia.

Comput Methods Programs Biomed ; 230: 107340, 2023 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-36640604

RESUMEN

BACKGROUND AND OBJECTIVE: Since the early symptoms of chronic obstructive pulmonary disease (COPD) are not obvious, patients are not easily identified, causing improper time for prevention and treatment. In present study, machine learning (ML) methods were employed to construct a risk prediction model for COPD to improve its prediction efficiency. METHODS: We collected data from a sample of 5807 cases with a complete COPD diagnosis from the 2019 COPD Surveillance Program in Shanxi Province and extracted 34 potentially relevant variables from the dataset. Firstly, we used feature selection methods (i.e., Generalized elastic net, Lasso and Adaptive lasso) to select ten variables. Afterwards, we employed supervised classifiers for class imbalanced data by combining the cost-sensitive learning and SMOTE resampling methods with the ML methods (Logistic Regression, SVM, Random Forest, XGBoost, LightGBM, NGBoost and Stacking), respectively. Last, we assessed their performance. RESULTS: The cough frequently at age 14 and before and other 9 variables are significant parameters for COPD. The Stacking heterogeneous ensemble model showed relatively good performance in the unbalanced datasets. The Logistic Regression with class weighting enjoyed the best classification performance in the balancing data when these composite indicators (AUC, F1-Score and G-mean) were used as criteria for model comparison. The values of F1-Score and G-mean for the top three ML models were 0.290/0.660 for Logistic Regression with class weighting, 0.288/0.649 for Stacking with synthetic minority oversampling technique (SMOTE), and 0.285/0.648 for LightGBM with SMOTE. CONCLUSIONS: This paper combining feature selection methods, unbalanced data processing methods and machine learning methods with data from disease surveillance questionnaires and physical measurements to identify people at risk of COPD, concluded that machine learning models based on survey questionnaires could provide an automated identification for patients at risk of COPD, and provide a simple and scientific aid for early identification of COPD.

Asunto(s)

Enfermedad Pulmonar Obstructiva Crónica , Humanos , Adolescente , Enfermedad Pulmonar Obstructiva Crónica/diagnóstico , Enfermedad Pulmonar Obstructiva Crónica/epidemiología , Aprendizaje Automático , Modelos Logísticos , Máquina de Vectores de Soporte

Exploring influencing factors of chronic obstructive pulmonary disease based on elastic net and Bayesian network.

Quan, Dichen; Ren, Jiahui; Ren, Hao; Linghu, Liqin; Wang, Xuchun; Li, Meichen; Qiao, Yuchao; Ren, Zeping; Qiu, Lixia.

Sci Rep ; 12(1): 7563, 2022 05 09.

Artículo en Inglés | MEDLINE | ID: mdl-35534641

RESUMEN

This study aimed to construct Bayesian networks (BNs) to analyze the network relationships between COPD and its influencing factors, and the strength of each factor's influence on COPD was reflected through network reasoning. Elastic Net and Max-Min Hill-Climbing (MMHC) algorithm were adopted to screen the variables on the surveillance data of COPD among residents in Shanxi Province, China from 2014 to 2015, and construct BNs respectively. 10 variables finally entered the model after screening by Elastic Net. The BNs constructed by MMHC showed that smoking status, household air pollution, family history, cough, air hunger or dyspnea were directly related to COPD, and Gender was indirectly linked to COPD through smoking status. Moreover, smoking status, household air pollution and family history were the parent nodes of COPD, and cough, air hunger or dyspnea represented the child nodes of COPD. In other words, smoking status, household air pollution and family history were related to the occurrence of COPD, and COPD would make patients' cough, air hunger or dyspnea worse. Generally speaking, BNs could reveal the complex network linkages between COPD and its relevant factors well, making it more convenient to carry out targeted prevention and control of COPD.

Asunto(s)

Tos , Enfermedad Pulmonar Obstructiva Crónica , Teorema de Bayes , Niño , Disnea , Humanos , Factores de Riesgo

Nutritional components and protein quality analysis of genetically modified phytase maize.

Hu, Yichun; Linghu, Liqin; Li, Min; Mao, Deqian; Zhang, Yu; Yang, Xiaoguang; Yang, Lichen.

GM Crops Food ; 13(1): 15-25, 2022 Dec 31.

Artículo en Inglés | MEDLINE | ID: mdl-35102811

RESUMEN

The nutritional components and protein quality of genetically modified maize expressing phytase gene (GM) were analyzed and evaluated in this study. The nutritional components were analyzed by Chinese national standard methods. The ileostomy Bama miniature pigs were utilized to analyze the true digestibility of protein and amino acids. The digestible indispensable amino acid score (DIAAS) was adopted to evaluate the protein quality of GM, its parental maize (PM) and commercial available maize Zhengdan 958 (ZD). Meanwhile, the widely used protein digestibility corrected amino acid score (PDCAAS) was also calculated and compared with DIAAS. The content of protein, fat, vitamins, and minerals of all the strains of maize are in the normal ranges of OECD and/or ILSI. The DIAAS of GM, PM, and ZD were 54.57, 31.75, and 33.91, respectively, and the first limiting amino acid for GM, PM, and ZD was lysine. In conclusion, the introduction of phyA2 gene in GM maize does not disturb the digestion of protein/amino acid, but has the ability to promote the digestion of amino acids.

Asunto(s)

6-Fitasa , 6-Fitasa/genética , 6-Fitasa/metabolismo , Aminoácidos Esenciales/metabolismo , Alimentación Animal/análisis , Animales , Digestión , Íleon/metabolismo , Porcinos , Zea mays/genética , Zea mays/metabolismo

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA