Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 143
Filtrar
1.
Front Aging Neurosci ; 16: 1444998, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39314993

RESUMO

Objective: Cognitive decline is often considered an inevitable aspect of aging; however, recent research has identified a subset of older adults known as "superagers" who maintain cognitive abilities comparable to those of younger individuals. Investigating the neurobiological characteristics associated with superior cognitive function in superagers is essential for understanding "successful aging." Evidence suggests that the gut microbiome plays a key role in brain function, forming a bidirectional communication network known as the microbiome-gut-brain axis. Alterations in the gut microbiome have been linked to cognitive aging markers such as oxidative stress and inflammation. This study aims to investigate the unique patterns of the gut microbiome in superagers and to develop machine learning-based predictive models to differentiate superagers from typical agers. Methods: We recruited 161 cognitively unimpaired, community-dwelling volunteers aged 60 years or from dementia prevention centers in Seoul, South Korea. After applying inclusion and exclusion criteria, 115 participants were included in the study. Following the removal of microbiome data outliers, 102 participants, comprising 57 superagers and 45 typical agers, were finally analyzed. Superagers were defined based on memory performance at or above average normative values of middle-aged adults. Gut microbiome data were collected from stool samples, and microbial DNA was extracted and sequenced. Relative abundances of bacterial genera were used as features for model development. We employed the LightGBM algorithm to build predictive models and utilized SHAP analysis for feature importance and interpretability. Results: The predictive model achieved an AUC of 0.832 and accuracy of 0.764 in the training dataset, and an AUC of 0.861 and accuracy of 0.762 in the test dataset. Significant microbiome features for distinguishing superagers included Alistipes, PAC001137_g, PAC001138_g, Leuconostoc, and PAC001115_g. SHAP analysis revealed that higher abundances of certain genera, such as PAC001138_g and PAC001115_g, positively influenced the likelihood of being classified as superagers. Conclusion: Our findings demonstrate the machine learning-based predictive models using gut-microbiome features can differentiate superagers from typical agers with a reasonable performance.

2.
Int J Biol Macromol ; 280(Pt 4): 136140, 2024 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-39349086

RESUMO

Lignin has been recognized as a major factor contributing to lignocellulosic recalcitrance in biofuel production and attracted attentions as a high-value product in the biorefinery field. As the traditional wet chemical methods for detecting lignin content are labor-intensive, time-consuming and environment-toxic, it is an urgent need to develop high-throughput and environment-friendly techniques for large-scale crop germplasms screening. In this study, we conducted a Fourier transform infrared (FTIR) assay on 150 maize germplasms with a diverse lignin composition to build predictive models for lignin content in maize stalk. Principal component analysis (PCA) was applied to the FTIR spectra for use as model inputs. Classification and advanced gradient boosting machine (GBM) algorithms demonstrated higher predictive accuracy (0.82-0.96) compared to traditional linear and regularization algorithms (0.03-0.04) in the training set. Notably, two optimal models, built using the extreme gradient boosting (XGBoost) and light gradient boosting machine (LightGBM) algorithms, achieved R2 values of over 0.91 in the training set and over 0.82 in the test set. Overall, the combination of FTIR and machine learning (ML) algorithms offers a high-throughput and efficient method for predicting lignin content. This approach holds significant potential for genetic breeding and the effective utilization of maize in industrial production.

3.
Sensors (Basel) ; 24(18)2024 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-39338719

RESUMO

With the increasing popularity of Android smartphones, malware targeting the Android platform is showing explosive growth. Currently, mainstream detection methods use static analysis methods to extract features of the software and apply machine learning algorithms for detection. However, static analysis methods can be less effective when faced with Android malware that employs sophisticated obfuscation techniques such as altering code structure. In order to effectively detect Android malware and improve the detection accuracy, this paper proposes a dynamic detection model for Android malware based on the combination of an Improved Zebra Optimization Algorithm (IZOA) and Light Gradient Boosting Machine (LightGBM) model, called IZOA-LightGBM. By introducing elite opposition-based learning and firefly perturbation strategies, IZOA enhances the convergence speed and search capability of the traditional zebra optimization algorithm. Then, the IZOA is employed to optimize the LightGBM model hyperparameters for the dynamic detection of Android malware multi-classification. The results from experiments indicate that the overall accuracy of the proposed IZOA-LightGBM model on the CICMalDroid-2020, CCCS-CIC-AndMal-2020, and CIC-AAGM-2017 datasets is 99.75%, 98.86%, and 97.95%, respectively, which are higher than the other comparative models.

4.
Health Informatics J ; 30(3): 14604582241283968, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39262121

RESUMO

Objectives: Addressing the challenge of cost-effective asthma diagnosis amidst diverse symptom patterns among patients, this study aims to develop a machine learning-based asthma prediction tool for self-detection of asthma. Methods: Data from 6,665 participants in the Sri Lanka Health and Ageing Study (2018-2019) are used for this research. Thirteen machine learning algorithms, including Logistic Regression, Support Vector Machine, Decision Tree, Random Forest, Naïve Bayes, K-Nearest Neighbors, Gradient Boost, XGBoost, AdaBoost, CatBoost, LightGBM, Multi-Layer Perceptron, and Probabilistic Neural Network, are employed. Results: A hybrid version of Logistic Regression and LightGBM outperformed other models, achieving an AUC of 0.9062 and 79.85% sensitivity. Key predictive features for asthma include wheezing, breathlessness with wheezing, shortness of breath attacks, coughing attacks, chest tightness, nasal allergies, physical activity, passive smoking, ethnicity, and residential sector. Conclusion: Combining Logistic Regression and LightGBM models can effectively predict adult asthma based on self-reported symptoms and demographic and behavioural characteristics. The proposed expert system assists clinicians and patients in diagnosing potential asthma cases.


Assuntos
Asma , Aprendizado de Máquina , Humanos , Asma/diagnóstico , Sri Lanka , Feminino , Masculino , Pessoa de Meia-Idade , Adulto , Modelos Logísticos , Idoso , Algoritmos
5.
Sensors (Basel) ; 24(15)2024 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-39124011

RESUMO

Load recognition remains not comprehensively explored in Home Energy Management Systems (HEMSs). There are gaps in current approaches to load recognition, such as enhancing appliance identification and increasing the overall performance of the load-recognition system through more robust models. To address this issue, we propose a novel approach based on the Analysis of Variance (ANOVA) F-test combined with SelectKBest and gradient-boosting machines (GBMs) for load recognition. The proposed approach improves the feature selection and consequently aids inter-class separability. Further, we optimized GBM models, such as the histogram-based gradient-boosting machine (HistGBM), light gradient-boosting machine (LightGBM), and XGBoost (extreme gradient boosting), to create a more reliable load-recognition system. Our findings reveal that the ANOVA-GBM approach achieves greater efficiency in training time, even when compared to Principal Component Analysis (PCA) and a higher number of features. ANOVA-XGBoost is approximately 4.31 times faster than PCA-XGBoost, ANOVA-LightGBM is about 5.15 times faster than PCA-LightGBM, and ANOVA-HistGBM is 2.27 times faster than PCA-HistGBM. The general performance results expose the impact on the overall performance of the load-recognition system. Some of the key results show that the ANOVA-LightGBM pair reached 96.42% accuracy, 96.27% F1, and a Kappa index of 0.9404; the ANOVA-HistGBM combination achieved 96.64% accuracy, 96.48% F1, and a Kappa index of 0.9434; and the ANOVA-XGBoost pair attained 96.75% accuracy, 96.64% F1, and a Kappa index of 0.9452; such findings overcome rival methods from the literature. In addition, the accuracy gain of the proposed approach is prominent when compared straight to its competitors. The higher accuracy gains were 13.09, 13.31, and 13.42 percentage points (pp) for the pairs ANOVA-LightGBM, ANOVA-HistGBM, and ANOVA-XGBoost, respectively. These significant improvements highlight the effectiveness and refinement of the proposed approach.

6.
BMC Womens Health ; 24(1): 442, 2024 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-39098907

RESUMO

OBJECTIVE: Breast cancer has become the most prevalent malignant tumor in women, and the occurrence of distant metastasis signifies a poor prognosis. Utilizing predictive models to forecast distant metastasis in breast cancer presents a novel approach. This study aims to utilize readily available clinical data and advanced machine learning algorithms to establish an accurate clinical prediction model. The overall objective is to provide effective decision support for clinicians. METHODS: Data from 239 patients from two centers were analyzed, focusing on clinical blood biomarkers (tumor markers, liver and kidney function, lipid profile, cardiovascular markers). Spearman correlation and the least absolute shrinkage and selection operator regression were employed for feature dimension reduction. A predictive model was built using LightGBM and validated in training, testing, and external validation cohorts. Feature importance correlation analysis was conducted on the clinical model and the comprehensive model, followed by univariate and multivariate regression analysis of these features. RESULTS: Through internal and external validation, we constructed a LightGBM model to predict de novo bone metastasis in newly diagnosed breast cancer patients. The area under the receiver operating characteristic curve values of this model in the training, internal validation test, and external validation test1 cohorts were 0.945, 0.892, and 0.908, respectively. Our validation results indicate that the model exhibits high sensitivity, specificity, and accuracy, making it the most accurate model for predicting bone metastasis in breast cancer patients. Carcinoembryonic Antigen, creatine kinase, albumin-globulin ratio, Apolipoprotein B, and Cancer Antigen 153 (CA153) play crucial roles in the model's predictions. Lipoprotein a, CA153, gamma-glutamyl transferase, α-Hydroxybutyrate dehydrogenase, alkaline phosphatase, and creatine kinase are positively correlated with breast cancer bone metastasis, while white blood cell ratio and total cholesterol are negatively correlated. CONCLUSION: This study successfully utilized clinical blood biomarkers to construct an artificial intelligence model for predicting distant metastasis in breast cancer, demonstrating high accuracy. This suggests potential clinical utility in predicting and identifying distant metastasis in breast cancer. These findings underscore the potential prospect of developing economically efficient and readily accessible predictive tools in clinical oncology.


Assuntos
Inteligência Artificial , Biomarcadores Tumorais , Neoplasias Ósseas , Neoplasias da Mama , Humanos , Neoplasias da Mama/patologia , Feminino , Neoplasias Ósseas/secundário , Neoplasias Ósseas/sangue , Pessoa de Meia-Idade , Biomarcadores Tumorais/sangue , Adulto , Idoso , Curva ROC , Aprendizado de Máquina , Valor Preditivo dos Testes
7.
J Asthma Allergy ; 17: 783-789, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39157425

RESUMO

Asthma is a chronic inflammatory airway disease with significant burden; exacerbations can severely affect quality of life and healthcare costs. Advances in big data analysis and artificial intelligence have made it easier to predict future exacerbations more accurately. This study used an integrated dataset of Korean National Health Insurance, meteorological, air pollution, and viral data from national public databases to develop a model to predict asthma exacerbations on a daily basis in South Korea. We merged these sources and applied random forest, AdaBoost, XGBoost, and LightGBM machine learning models to compare their performances at predicting future exacerbations. Of the models, XGBoost (AUROC of 0.68 and accuracy of 0.96) and LightGBM (AUROC of 0.67 and accuracy of 0.96) were the most promising. Common important variables were the number of visits and exacerbations per year, and medical resource utilization, including the prescription of asthma medications. Comorbid diabetes, hypertension, gastroesophageal reflux, arthritis, metabolic syndrome, osteoporosis, and ischemic heart disease were also associated with elevated exacerbation risk. The models examined in this study highlight the importance of previous exacerbations, use of medical resources, and comorbidities in the prediction of future exacerbations in patients with asthma.

8.
Sci Rep ; 14(1): 19293, 2024 Aug 20.
Artigo em Inglês | MEDLINE | ID: mdl-39164297

RESUMO

Geomagnetic storms can cause variations in the ionization levels of the ionosphere, which is commonly studied using the total electron content (TEC). TEC is a crucial parameter to identify the possible effects of ionospheric variations on satellite communication and navigation. This paper assesses the performance of light gradient boosting machine (LGB) and deep neural network (DNN) machine learning algorithms in modeling ionospheric vertical TEC (VTEC) during geomagnetic disturbances. GPS VTEC data for years 2011-2016 from 13 dual-frequency receiver stations over Ethiopia was utilized. Input parameters for the models were derived from the factors that influence VTEC, such as time, location, geomagnetic activity, solar activity, solar wind, and the interplanetary magnetic field. The LGB model improved the predictions of the DNN model from root mean squared error (RMSE), mean absolute percentage error (MAPE), and R2 values of 5.45 TECU, 21%, and 0.93 to 4.98 TECU, 18%, and 0.94 on the testing data, respectively. The two machine learning models significantly outperformed the International Reference Ionosphere (IRI 2020) model during the selected geomagnetic storm periods. This study could provide insight into the impacts of ionosphere variations on satellite communication and navigation systems in the low-latitude ionospheric region.

9.
Environ Monit Assess ; 196(8): 738, 2024 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-39009752

RESUMO

Accurate retrieval of LST is crucial for understanding and mitigating the effects of urban heat islands, and ultimately addressing the broader challenge of global warming. This study emphasizes the importance of a single day satellite imageries for large-scale LST retrieval. It explores the impact of Spectral indices of the surface parameters, using machine learning algorithms to enhance accuracy. The research proposes a novel approach of capturing satellite data on a single day to reduce uncertainties in LST estimations. A case study over Chandigarh city using Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine, and Random Forest (RF) reveals RF's superior performance in LST estimations during both summer and winter seasons. All the ML models gave an R-square of above 0.8 and RF with slightly higher R-square during both summer (0.93) and winter (0.85). Building on these findings, the study extends its focus to Ranchi, demonstrating RF's robustness with impressive accuracy in capturing LST variations. The research contributes to bridging existing gaps in large-scale LST estimation methodologies, offering valuable insights for its diverse applications in understanding Earth's dynamic systems.


Assuntos
Monitoramento Ambiental , Aprendizado de Máquina , Imagens de Satélites , Estações do Ano , Temperatura , Monitoramento Ambiental/métodos , Aquecimento Global
10.
Front Oncol ; 14: 1409273, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38947897

RESUMO

Objective: This study aims to develop an artificial intelligence model utilizing clinical blood markers, ultrasound data, and breast biopsy pathological information to predict the distant metastasis in breast cancer patients. Methods: Data from two medical centers were utilized, Clinical blood markers, ultrasound data, and breast biopsy pathological information were separately extracted and selected. Feature dimensionality reduction was performed using Spearman correlation and LASSO regression. Predictive models were constructed using LR and LightGBM machine learning algorithms and validated on internal and external validation sets. Feature correlation analysis was conducted for both models. Results: The LR model achieved AUC values of 0.892, 0.816, and 0.817 for the training, internal validation, and external validation cohorts, respectively. The LightGBM model achieved AUC values of 0.971, 0.861, and 0.890 for the same cohorts, respectively. Clinical decision curve analysis showed a superior net benefit of the LightGBM model over the LR model in predicting distant metastasis in breast cancer. Key features identified included creatine kinase isoenzyme (CK-MB) and alpha-hydroxybutyrate dehydrogenase. Conclusion: This study developed an artificial intelligence model using clinical blood markers, ultrasound data, and pathological information to identify distant metastasis in breast cancer patients. The LightGBM model demonstrated superior predictive accuracy and clinical applicability, suggesting it as a promising tool for early diagnosis of distant metastasis in breast cancer.

11.
J Cardiovasc Dev Dis ; 11(7)2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-39057627

RESUMO

Stroke constitutes a significant public health concern due to its impact on mortality and morbidity. This study investigates the utility of machine learning algorithms in predicting stroke and identifying key risk factors using data from the Suita study, comprising 7389 participants and 53 variables. Initially, unsupervised k-prototype clustering categorized participants into risk clusters, while five supervised models including Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosted Machine (LightGBM) were employed to predict stroke outcomes. Stroke incidence disparities among identified risk clusters using the unsupervised k-prototype clustering method are substantial, according to the findings. Supervised learning, particularly RF, was a preferable option because of the higher levels of performance metrics. The Shapley Additive Explanations (SHAP) method identified age, systolic blood pressure, hypertension, estimated glomerular filtration rate, metabolic syndrome, and blood glucose level as key predictors of stroke, aligning with findings from the unsupervised clustering approach in high-risk groups. Additionally, previously unidentified risk factors such as elbow joint thickness, fructosamine, hemoglobin, and calcium level demonstrate potential for stroke prediction. In conclusion, machine learning facilitated accurate stroke risk predictions and highlighted potential biomarkers, offering a data-driven framework for risk assessment and biomarker discovery.

12.
Foods ; 13(13)2024 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-38998534

RESUMO

To enhance the accuracy of identifying fresh meat varieties using laser-induced breakdown spectroscopy (LIBS), we utilized the LightGBM model in combination with the Optuna algorithm. The procedure involved flattening fresh meat slices with glass slides and collecting spectral data of the plasma from the surfaces of the fresh meat tissues (pork, beef, and chicken) using LIBS technology. A total of 900 spectra were collected. Initially, we established LightGBM and SVM (support vector machine) models for the collected spectra. Subsequently, we applied information gain and peak extraction algorithms to select the features for each model. We then employed Optuna to optimize the hyperparameters of the LightGBM model, while a 10-fold cross-validation was conducted to determine the optimal parameters for SVM. Ultimately, the LightGBM model achieved higher accuracy, macro-F1, and Cohen's kappa coefficient (kappa coefficient) values of 0.9370, 0.9364, and 0.9244, respectively, compared to the SVM model's values of 0.8888, 0.8881, and 0.8666. This study provides a novel method for the rapid classification of fresh meat varieties using LIBS.

13.
Molecules ; 29(13)2024 Jun 22.
Artigo em Inglês | MEDLINE | ID: mdl-38998926

RESUMO

As an important photovoltaic material, organic-inorganic hybrid perovskites have attracted much attention in the field of solar cells, but their instability is one of the main challenges limiting their commercial application. However, the search for stable perovskites among the thousands of perovskite materials still faces great challenges. In this work, the energy above the convex hull values of organic-inorganic hybrid perovskites was predicted based on four different machine learning algorithms, namely random forest regression (RFR), support vector machine regression (SVR), XGBoost regression, and LightGBM regression, to study the thermodynamic phase stability of organic-inorganic hybrid perovskites. The results show that the LightGBM algorithm has a low prediction error and can effectively capture the key features related to the thermodynamic phase stability of organic-inorganic hybrid perovskites. Meanwhile, the Shapley Additive Explanation (SHAP) method was used to analyze the prediction results based on the LightGBM algorithm. The third ionization energy of the B element is the most critical feature related to the thermodynamic phase stability, and the second key feature is the electron affinity of ions at the X site, which are significantly negatively correlated with the predicted values of energy above the convex hull (Ehull). In the screening of organic-inorganic perovskites with high stability, the third ionization energy of the B element and the electron affinity of ions at the X site is a worthy priority. The results of this study can help us to understand the correlation between the thermodynamic phase stability of organic-inorganic hybrid perovskites and the key features, which can assist with the rapid discovery of highly stable perovskite materials.

14.
PeerJ Comput Sci ; 10: e2044, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38855258

RESUMO

Patent lifespan is commonly used as a quantitative measure in patent assessments. Patent holders maintain exclusive rights by paying significant maintenance fees, suggesting a strong correlation between a patent's lifespan and its business potential or economic value. Therefore, accurately forecasting the duration of a patent is of great significance. This study introduces a highly effective method that combines LightGBM, a sophisticated machine learning algorithm, with a customized loss function derived from Focal Loss. The purpose of this approach is to accurately predict the probability of a patent remaining valid until its maximum expiration date. This research differs from previous studies that have examined the various stages and phases of patents. Instead, it assesses the commercial viability of individual patents by considering their lifespan. The evaluation process utilizes a dataset consisting of 200,000 patents. The experimental results show a significant improvement in the performance of the model by combining Focal Loss with LightGBM. By incorporating Focal Loss into LightGBM, its ability to give priority to difficult instances during training is enhanced, resulting in an overall improvement in performance. This targeted approach enhances the model's ability to distinguish between different samples and its ability to recover from challenges by giving priority to difficult samples. As a result, it improves the model's accuracy in making predictions and its ability to apply those predictions to new data.

15.
Med Biol Eng Comput ; 2024 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-38874706

RESUMO

The work elucidates the importance of accurate Parkinson's disease classification within medical diagnostics and introduces a novel framework for achieving this goal. Specifically, the study focuses on enhancing disease identification accuracy utilizing boosting methods. A standout contribution of this work lies in the utilization of a light gradient boosting machine (LGBM) coupled with hyperparameter tuning through grid search optimization (GSO) on the Parkinson's disease dataset derived from speech recording signals. In addition, the Synthetic Minority Over-sampling Technique (SMOTE) has also been employed as a pre-processing technique to balance the dataset, enhancing the robustness and reliability of the analysis. This approach is a novel addition to the study and underscores its potential to enhance disease identification accuracy. The datasets employed in this work include both gender-specific and combined cases, utilizing several distinctive feature subsets including baseline, Mel-frequency cepstral coefficients (MFCC), time-frequency, wavelet transform (WT), vocal fold, and tunable-Q-factor wavelet transform (TQWT). Comparative analyses against state-of-the-art boosting methods, such as AdaBoost and XG-Boost, reveal the superior performance of our proposed approach across diverse datasets and metrics. Notably, on the male cohort dataset, our method achieves exceptional results, demonstrating an accuracy of 0.98, precision of 1.00, sensitivity of 0.97, F1-Score of 0.98, and specificity of 1.00 when utilizing all features with GSO-LGBM. In comparison to AdaBoost and XGBoost, the proposed framework utilizing LGBM demonstrates superior accuracy, achieving an average improvement of 5% in classification accuracy across all feature subsets and datasets. These findings underscore the potential of the proposed methodology to enhance disease identification accuracy and provide valuable insights for further advancements in medical diagnostics.

16.
Front Artif Intell ; 7: 1401810, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38887604

RESUMO

Introduction: Regulatory agencies generate a vast amount of textual data in the review process. For example, drug labeling serves as a valuable resource for regulatory agencies, such as U.S. Food and Drug Administration (FDA) and Europe Medical Agency (EMA), to communicate drug safety and effectiveness information to healthcare professionals and patients. Drug labeling also serves as a resource for pharmacovigilance and drug safety research. Automated text classification would significantly improve the analysis of drug labeling documents and conserve reviewer resources. Methods: We utilized artificial intelligence in this study to classify drug-induced liver injury (DILI)-related content from drug labeling documents based on FDA's DILIrank dataset. We employed text mining and XGBoost models and utilized the Preferred Terms of Medical queries for adverse event standards to simplify the elimination of common words and phrases while retaining medical standard terms for FDA and EMA drug label datasets. Then, we constructed a document term matrix using weights computed by Term Frequency-Inverse Document Frequency (TF-IDF) for each included word/term/token. Results: The automatic text classification model exhibited robust performance in predicting DILI, achieving cross-validation AUC scores exceeding 0.90 for both drug labels from FDA and EMA and literature abstracts from the Critical Assessment of Massive Data Analysis (CAMDA). Discussion: Moreover, the text mining and XGBoost functions demonstrated in this study can be applied to other text processing and classification tasks.

17.
Sci Rep ; 14(1): 13511, 2024 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-38866817

RESUMO

The growing application of carbon dioxide (CO2) in various environmental and energy fields, including carbon capture and storage (CCS) and several CO2-based enhanced oil recovery (EOR) techniques, highlights the importance of studying the phase equilibria of this gas with water. Therefore, accurate prediction of CO2 solubility in water becomes an important thermodynamic property. This study focused on developing two powerful intelligent models, namely gradient boosting (GBoost) and light gradient boosting machine (LightGBM) that predict CO2 solubility in water with high accuracy. The results revealed the outperformance of the GBoost model with root mean square error (RMSE) and determination coefficient (R2) of 0.137 mol/kg and 0.9976, respectively. The trend analysis demonstrated that the developed models were highly capable of detecting the physical trend of CO2 solubility in water across various pressure and temperature ranges. Moreover, the Leverage technique was employed to identify suspected data points as well as the applicability domain of the proposed models. The results showed that less than 5% of the data points were detected as outliers representing the large applicability domain of intelligent models. The outcome of this research provided insight into the potential of intelligent models in predicting solubility of CO2 in pure water.

18.
Sensors (Basel) ; 24(10)2024 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-38794105

RESUMO

Heavy metal pollution in farmland soil threatens soil environmental quality. It is an important task to quickly grasp the status of heavy metal pollution in farmland soil in a region. Hyperspectral remote sensing technology has been widely used in soil heavy metal concentration monitoring. How to improve the accuracy and reliability of its estimation model is a hot topic. This study analyzed 440 soil samples from Sihe Town and the surrounding agricultural areas in Yushu City, Jilin Province. Considering the differences between different types of soils, a local regression model of heavy metal concentrations (As and Cu) was established based on projection pursuit (PP) and light gradient boosting machine (LightGBM) algorithms. Based on the estimations, a spatial distribution map of soil heavy metals in the region was drawn. The findings of this study showed that considering the differences between different soils to construct a local regression estimation model of soil heavy metal concentration improved the estimation accuracy. Specifically, the relative percent difference (RPD) of As and Cu element estimations in black soil increased the most, by 0.30 and 0.26, respectively. The regional spatial distribution map of heavy metal concentration derived from local regression showed high spatial variability. The number of characteristic bands screened by the PP method accounted for 10-13% of the total spectral bands, effectively reducing the model complexity. Compared with the traditional machine model, the LightGBM model showed better estimation ability, and the highest determination coefficients (R2) of different soil validation sets reached 0.73 (As) and 0.75 (Cu), respectively. In this study, the constructed PP-LightGBM estimation model takes into account the differences in soil types, which effectively improves the accuracy and reliability of hyperspectral image estimation of soil heavy metal concentration and provides a reference for drawing large-scale spatial distributions of heavy metals from hyperspectral images and mastering soil environmental quality.

19.
Plant Methods ; 20(1): 77, 2024 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-38797847

RESUMO

BACKGROUND: Taraxacum kok-saghyz Rodin (TKS) is a highly potential source of natural rubber (NR) due to its wide range of suitable planting areas, strong adaptability, and suitability for mechanized planting and harvesting. However, current methods for detecting NR content are relatively cumbersome, necessitating the development of a rapid detection model. This study used near-infrared spectroscopy technology to establish a rapid detection model for NR content in TKS root segments and powder samples. The K445 strain at different growth stages within a year and 129 TKS samples hybridized with dandelion were used to obtain their near-infrared spectral data. The rubber content in the root of the samples was detected using the alkaline boiling method. The Monte Carlo sampling method (MCS) was used to filter abnormal data from the root segments of TKS and powder samples, respectively. The SPXY algorithm was used to divide the training set and validation set in a 3:1 ratio. The original spectrum was preprocessed using moving window smoothing (MWS), standard normalized variate (SNV), multiplicative scatter correction (MSC), and first derivative (FD) algorithms. The competitive adaptive reweighted sampling (CARS) algorithm and the corresponding chemical characteristic bands of NR were used to screen the bands. Partial least squares (PLS), random forest (RF), Lightweight gradient augmentation machine (LightGBM), and convolutional neural network (CNN) algorithms were employed to establish a model using the optimal spectral processing method for three different bands: full band, CARS algorithm, and chemical characteristic bands corresponding to NR. The model with the best predictive performance for high rubber content intervals (rubber content > 15%) was identified. RESULT: The results indicated that the optimal rubber content prediction models for TKS root segments and powder samples were MWS-FD CASR-RF and MWS-FD chemical characteristic band RF, respectively. Their respective R P 2 , RMSEP, and RPDP values were 0.951, 0.979, 1.814, 1.133, 4.498, and 6.845. In the high rubber content range, the model based on the LightGBM algorithm had the best prediction performance, with the RMSEP of the root segments and powder samples being 0.752 and 0.918, respectively. CONCLUSIONS: This research indicates that dried TKS root powder samples are more appropriate for constructing a rubber content prediction model than segmented samples, and the predictive capability of root powder samples is superior to that of root segmented samples. Especially in the elevated rubber content range, the model formulated using the LightGBM algorithm has superior predictive performance, which could offer a theoretical basis for the rapid detection technology of TKS content in the future.

20.
Front Big Data ; 7: 1353469, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38817683

RESUMO

Objective: To develop a robust machine learning prediction model for the automatic screening and diagnosis of obstructive sleep apnea (OSA) using five advanced algorithms, namely Extreme Gradient Boosting (XGBoost), Logistic Regression (LR), Support Vector Machine (SVM), Light Gradient Boosting Machine (LightGBM), and Random Forest (RF) to provide substantial support for early clinical diagnosis and intervention. Methods: We conducted a retrospective analysis of clinical data from 439 patients who underwent polysomnography at the Affiliated Hospital of Xuzhou Medical University between October 2019 and October 2022. Predictor variables such as demographic information [age, sex, height, weight, body mass index (BMI)], medical history, and Epworth Sleepiness Scale (ESS) were used. Univariate analysis was used to identify variables with significant differences, and the dataset was then divided into training and validation sets in a 4:1 ratio. The training set was established to predict OSA severity grading. The validation set was used to assess model performance using the area under the curve (AUC). Additionally, a separate analysis was conducted, categorizing the normal population as one group and patients with moderate-to-severe OSA as another. The same univariate analysis was applied, and the dataset was divided into training and validation sets in a 4:1 ratio. The training set was used to build a prediction model for screening moderate-to-severe OSA, while the validation set was used to verify the model's performance. Results: Among the four groups, the LightGBM model outperformed others, with the top five feature importance rankings of ESS total score, BMI, sex, hypertension, and gastroesophageal reflux (GERD), where Age, ESS total score and BMI played the most significant roles. In the dichotomous model, RF is the best performer of the five models respectively. The top five ranked feature importance of the best-performing RF models were ESS total score, BMI, GERD, age and Dry mouth, with ESS total score and BMI being particularly pivotal. Conclusion: Machine learning-based prediction models for OSA disease grading and screening prove instrumental in the early identification of patients with moderate-to-severe OSA, revealing pertinent risk factors and facilitating timely interventions to counter pathological changes induced by OSA. Notably, ESS total score and BMI emerge as the most critical features for predicting OSA, emphasizing their significance in clinical assessments. The dataset will be publicly available on my Github.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA