Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.246
Filtrar
1.
Sci Rep ; 14(1): 18792, 2024 08 13.
Artigo em Inglês | MEDLINE | ID: mdl-39138235

RESUMO

Machine learning (ML) models have been increasingly employed to predict osteoporosis. However, the incorporation of hair minerals into ML models remains unexplored. This study aimed to develop ML models for predicting low bone mass (LBM) using health checkup data and hair mineral analysis. A total of 1206 postmenopausal women and 820 men aged 50 years or older at a health promotion center were included in this study. LBM was defined as a T-score below - 1 at the lumbar, femur neck, or total hip area. The proportion of individuals with LBM was 59.4% (n = 1205). The features used in the models comprised 50 health checkup items and 22 hair minerals. The ML algorithms employed were Extreme Gradient Boosting (XGB), Random Forest (RF), Gradient Boosting (GB), and Adaptive Boosting (AdaBoost). The subjects were divided into training and test datasets with an 80:20 ratio. The area under the receiver operating characteristic curve (AUROC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and an F1 score were evaluated to measure the performances of the models. Through 50 repetitions, the mean (standard deviation) AUROC for LBM was 0.744 (± 0.021) for XGB, the highest among the models, followed by 0.737 (± 0.023) for AdaBoost, and 0.733 (± 0.023) for GB, and 0.732 (± 0.021) for RF. The XGB model had an accuracy of 68.7%, sensitivity of 80.7%, specificity of 51.1%, PPV of 70.9%, NPV of 64.3%, and an F1 score of 0.754. However, these performance metrics did not demonstrate notable differences among the models. The XGB model identified sulfur, sodium, mercury, copper, magnesium, arsenic, and phosphate as crucial hair mineral features. The study findings emphasize the significance of employing ML algorithms for predicting LBM. Integrating health checkup data and hair mineral analysis into these models may provide valuable insights into identifying individuals at risk of LBM.


Assuntos
Densidade Óssea , Cabelo , Aprendizado de Máquina , Humanos , Pessoa de Meia-Idade , Feminino , Cabelo/química , Cabelo/metabolismo , Masculino , Idoso , Osteoporose/diagnóstico , Osteoporose/metabolismo , Curva ROC , Algoritmos , Minerais/análise , Minerais/metabolismo
2.
BMC Bioinformatics ; 25(1): 265, 2024 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-39138564

RESUMO

BACKGROUND: Survival analysis has been used to characterize the time-to-event data. In medical studies, a typical application is to analyze the survival time of specific cancers by using high-dimensional gene expressions. The main challenges include the involvement of non-informaive gene expressions and possibly nonlinear relationship between survival time and gene expressions. Moreover, due to possibly imprecise data collection or wrong record, measurement error might be ubiquitous in the survival time and its censoring status. Ignoring measurement error effects may incur biased estimator and wrong conclusion. RESULTS: To tackle those challenges and derive a reliable estimation with efficiently computational implementation, we develop the R package AFFECT, which is referred to Accelerated Functional Failure time model with Error-Contaminated survival Times. CONCLUSIONS: This package aims to correct for measurement error effects in survival times and implements a boosting algorithm under corrected data to determine informative gene expressions as well as derive the corresponding nonlinear functions.


Assuntos
Algoritmos , Humanos , Análise de Sobrevida , Neoplasias/genética , Neoplasias/mortalidade , Software , Perfilação da Expressão Gênica/métodos , Expressão Gênica/genética
3.
Transl Cancer Res ; 13(7): 3370-3381, 2024 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-39145065

RESUMO

Background: The incidence of diffuse large B-cell lymphoma (DLBCL) in children is increasing globally. Due to the immature immune system in children, the prognosis of DLBCL is quite different from that of adults. We aim to use the multicenter large retrospective analysis for prognosis study of the disease. Methods: For our retrospective analysis, we retrieved data from the Surveillance, Epidemiology and End Results (SEER) database that included 836 DLBCL patients under 18 years old who were treated at 22 central institutions between 2000 and 2019. The patients were randomly divided into a modeling group and a validation group based on the ratio of 7:3. Cox stepwise regression, generalized Cox regression and eXtreme Gradient Boosting (XGBoost) were used to screen all variables. The selected prognostic variables were used to construct a nomogram through Cox stepwise regression. The importance of variables was ranked using XGBoost. The predictive performance of the model was assessed by using C-index, area under the curve (AUC) of receiver operating characteristic (ROC) curve, sensitivity and specificity. The consistency of the model was evaluated by using a calibration curve. The clinical practicality of the model was verified through decision curve analysis (DCA). Results: ROC curve demonstrated that all models except the non-proportional hazards and non-log linearity (NPHNLL) model, achieved AUC values above 0.7, indicating high accuracy. The calibration curve and DCA further confirmed strong predictive performance and clinical practicability. Conclusions: In this study, we successfully constructed a machine learning model by combining XGBoost with Cox and generalized Cox regression models. This integrated approach accurately predicts the prognosis of children with DLBCL from multiple dimensions. These findings provide a scientific basis for accurate clinical prognosis prediction.

4.
Resuscitation ; : 110359, 2024 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-39142467

RESUMO

Out-of-hospital cardiac arrest (OHCA) is a critical condition with low survival rates. In patients with a return of spontaneous circulation, brain injury is a leading cause of death. In this study, we propose an interpretable machine learning approach for predicting neurologic outcome after OHCA, using information available at the time of hospital admission. METHODS: The study population were 55 615 OHCA cases registered in the Swedish Cardiopulmonary Resuscitation Registry between 2010 and 2020. The dataset was split to training and validation sets (for model development) and test set (for evaluation of the final model). We used an XGBoost algorithm with stratified, repeated 10-fold cross-validation along with Optuna framework for hyperparameters tuning. The final model was trained on 10 features selected based on the importance scores and evaluated on the test set in terms of discrimination, calibration and bias-variance tradeoff. We used SHapley Additive exPlanations to address the 'black-box' model and align with eXplainable artificial intelligence. RESULTS: The final model achieved: area under the receiver operating characteristic value 0.964 (95% confidence interval (CI) [0.960-0.968]), sensitivity 0.606 (95% CI [0.573-0.634]), specificity 0.975 (95% CI [0.972-0.978]), positive predictive value (PPV) 0.664 (95% CI [0.625-0.696]), negative predictive value (NPV) 0.969 (95% CI [0.966-0.972]), macro F1 0.803 (95% CI [0.788-0.816]), and showed a very good calibration. SHAP features with the highest impact on the model's output were: 'ROSC on arrival to hospital', 'Initial rhythm asystole' and 'Conscious on arrival to hospital'. CONCLUSIONS: The XGBoost machine learning model with 10 features available at the time of hospital admission showed good performance for predicting neurologic outcome after OHCA, with no apparent signs of overfitting.

5.
Ecotoxicol Environ Saf ; 284: 116867, 2024 Aug 17.
Artigo em Inglês | MEDLINE | ID: mdl-39154501

RESUMO

The loss of nitrogen in soil damages the environment. Clarifying the mechanism of ammonium nitrogen (NH4+-N) transport in soil and increasing the fixation of NH4+-N after N application are effective methods for improving N use efficiency. However, the main factors are not easily identified because of the complicated transport and retardation factors in different soils. This study employed machine learning (ML) to identify the main influencing factors that contribute to the retardation factor (Rf) of NH4+-N in soil. First, NH4+-N transport in the soil was investigated using column experiments and a transport model. The Rf (1.29 - 17.42) was calculated and used as a proxy for the efficacy of NH4+-N transport. Second, the physicochemical parameters of the soil were determined and screened using lasso and ridge regressions as inputs for the ML model. Third, six machine learning models were evaluated: Adaptive Boosting, Extreme Gradient Boosting (XGB), Random Forest, Gradient Boosting Regression, Multilayer Perceptron, and Support Vector Regression. The optimal ML model of the XGB model with a low mean absolute error (0.81), mean squared error (0.50), and high test r2 (0.97) was obtained by random sampling and five-fold cross-validation. Finally, SHapely Additive exPlanations, entropy-based feature importance, and permutation characteristic importance were used for global interpretation. The cation exchange capacity (CEC), total organic carbon (TOC), and Kaolin had the greatest effects on NH4+-N transport in the soil. The accumulated local effect offered a fundamental insight: When CEC > 6 cmol+ kg-1, and TOC > 40 g kg-1, the maximum resistance to NH4+-N transport within the soil was observed. This study provides a novel approach for predicting the impact of the soil environment on NH4+-N transport and guiding the establishment of an early-warning system of nutrient loss.

6.
Am J Transl Res ; 16(7): 2864-2876, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39114712

RESUMO

OBJECTIVE: To explore the application value of a gradient boosting decision tree (GBDT) in predicting postoperative atelectasis in patients with destroyed lungs. METHODS: A total of 170 patients with damaged lungs who underwent surgical treatment in Chest Hospital of Guangxi Zhuang Autonomous Region from January 2021 to May 2023 were retrospectively selected. The patients were divided into a training set (n = 119) and a validation set (n = 51). Both GBDT algorithm model and Logistic regression model for predicting postoperative atelectasis in patients were constructed. The receiver operating characteristic (ROC) curve, calibration curve and decision curve were used to evaluate the prediction efficiency of the model. RESULTS: The GBDT model indicated that the relative importance scores of the four influencing factors were operation time (51.037), intraoperative blood loss (38.657), presence of lung function (9.126) and sputum obstruction (1.180). Multivariate Logistic regression analysis revealed that operation duration and sputum obstruction were significant predictors of postoperative atelectasis among patients with destroyed lungs within the training set (P = 0.048, P = 0.002). The ROC curve analysis showed that the area under the curve (AUC) for GBDT and Logistic model in the training set was 0.795 and 0.763, and their AUCs in the validation set were 0.776 and 0.811. The GBDT model's predictions closely matched the ideal curve, showing a higher net benefit than the reference line. CONCLUSIONS: GBDT model is suitable for predicting the incidence of complications in small samples.

7.
Digit Health ; 10: 20552076241272697, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39130518

RESUMO

Objective: Urinary tract infection is one of the most prevalent bacterial infectious diseases in outpatient treatment, and 50-80% of women experience it more than once, with a recurrence rate of 40-50% within a year; consequently, preventing re-hospitalization of patients is critical. However, in the field of urology, no research on the analysis of the re-hospitalization status for urinary tract infections using machine learning algorithms has been reported to date. Therefore, this study uses various machine learning algorithms to analyze the clinical and nonclinical factors related to patients who were re-hospitalized within 30 days of urinary tract infection. Methods: Data were collected from 497 patients re-hospitalized for urinary tract infections within 30 days and 496 patients who did not require re-hospitalization. The re-hospitalization factors were analyzed using four machine learning algorithms: gradient boosting classifier, random forest, naive Bayes, and logistic regression. Results: The best-performing gradient boosting classifier identified respiratory rate, days of hospitalization, albumin, diastolic blood pressure, blood urea nitrogen, body mass index, systolic blood pressure, body temperature, total bilirubin, and pulse as the top-10 factors that affect re-hospitalization because of urinary tract infections. The 993 patients whose data were collected were divided into risk groups based on these factors, and the re-hospitalization rate, days of hospitalization, and medical expenses were observed to decrease from the high- to low-risk group. Conclusions: This study showed new possibilities in analyzing the status of urinary tract infection-related re-hospitalization using machine learning. Identifying factors affecting re-hospitalization and incorporating preventable and reinforcement-based treatment programs can aid in reducing the re-hospitalization rate and average number of days of hospitalization, thereby reducing medical expenses.

8.
Foods ; 13(15)2024 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-39123583

RESUMO

Fermented foods and ingredients, including furmenties derived from lactic acid bacteria (LAB) in dairy products, can modulate the immune system. Here, we describe the use of reconstituted skimmed milk powder to generate novel fermentates from Lactobacillus helveticus strains SC232, SC234, SC212, and SC210, and from Lacticaseibacillus casei strains SC209 and SC229, and demonstrate, using in vitro assays, that these fermentates can differentially modulate cytokine secretion via bone-marrow-derived dendritic cells (BMDCs) when activated with either the viral ligand loxoribine or an inflammatory stimulus, lipopolysaccharide. Specifically, we demonstrate that SC232 and SC234 increase cytokines IL-6, TNF-α, IL-12p40, IL-23, IL-27, and IL-10 and decrease IL-1ß in primary bone-marrow-derived dendritic cells (BMDCs) stimulated with a viral ligand. In contrast, exposure of these cells to SC212 and SC210 resulted in increased IL-10, IL-1ß, IL-23, and decreased IL-12p40 following activation of the cells with the inflammatory stimulus LPS. Interestingly, SC209 and SC229 had little or no effect on cytokine secretion by BMDCs. Overall, our data demonstrate that these novel fermentates have specific effects and can differentially enhance key immune mechanisms that are critical to viral immune responses, or can suppress responses involved in chronic inflammatory conditions, such as ulcerative colitis (UC), and Crohn's disease (CD).

9.
Sensors (Basel) ; 24(15)2024 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-39124011

RESUMO

Load recognition remains not comprehensively explored in Home Energy Management Systems (HEMSs). There are gaps in current approaches to load recognition, such as enhancing appliance identification and increasing the overall performance of the load-recognition system through more robust models. To address this issue, we propose a novel approach based on the Analysis of Variance (ANOVA) F-test combined with SelectKBest and gradient-boosting machines (GBMs) for load recognition. The proposed approach improves the feature selection and consequently aids inter-class separability. Further, we optimized GBM models, such as the histogram-based gradient-boosting machine (HistGBM), light gradient-boosting machine (LightGBM), and XGBoost (extreme gradient boosting), to create a more reliable load-recognition system. Our findings reveal that the ANOVA-GBM approach achieves greater efficiency in training time, even when compared to Principal Component Analysis (PCA) and a higher number of features. ANOVA-XGBoost is approximately 4.31 times faster than PCA-XGBoost, ANOVA-LightGBM is about 5.15 times faster than PCA-LightGBM, and ANOVA-HistGBM is 2.27 times faster than PCA-HistGBM. The general performance results expose the impact on the overall performance of the load-recognition system. Some of the key results show that the ANOVA-LightGBM pair reached 96.42% accuracy, 96.27% F1, and a Kappa index of 0.9404; the ANOVA-HistGBM combination achieved 96.64% accuracy, 96.48% F1, and a Kappa index of 0.9434; and the ANOVA-XGBoost pair attained 96.75% accuracy, 96.64% F1, and a Kappa index of 0.9452; such findings overcome rival methods from the literature. In addition, the accuracy gain of the proposed approach is prominent when compared straight to its competitors. The higher accuracy gains were 13.09, 13.31, and 13.42 percentage points (pp) for the pairs ANOVA-LightGBM, ANOVA-HistGBM, and ANOVA-XGBoost, respectively. These significant improvements highlight the effectiveness and refinement of the proposed approach.

10.
Int J Med Inform ; 191: 105585, 2024 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-39098165

RESUMO

BACKGROUND: Atrial fibrillation (AF) is common among intensive care unit (ICU) patients and significantly raises the in-hospital mortality rate. Existing scoring systems or models have limited predictive capabilities for AF patients in ICU. Our study developed and validated machine learning models to predict the risk of in-hospital mortality in ICU patients with AF. METHODS AND RESULTS: Medical Information Mart for Intensive Care (MIMIC)-IV dataset and eICU Collaborative Research Database (eICU-CRD) were analyzed. Among ten classifiers compared, adaptive boosting (AdaBoost) showed better performance in predicting all-cause mortality in AF patients. A compact model with 15 features was developed and validated. Both the all variable and compact models exhibited excellent performance with area under the receiver operating characteristic curves (AUCs) of 1(95%confidence interval [CI]: 1.0-1.0) in the training set. In the MIMIC-IV testing set, the AUCs of the all variable and compact models were 0.978 (95% CI: 0.973-0.982) and 0.977 (95% CI: 0.972-0.982), respectively. In the external validation set, the AUCs of all variable and compact models were 0.825 (95% CI: 0.815-0.834) and 0.807 (95% CI: 0.796-0.817), respectively. CONCLUSION: An AdaBoost-based predictive model was subjected to internal and external validation, highlighting its strong predictive capacity for assessing the risk of in-hospital mortality in ICU patients with AF.

11.
BMC Med Inform Decis Mak ; 24(1): 223, 2024 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-39118128

RESUMO

BACKGROUND: There is a growing demand for advanced methods to improve the understanding and prediction of illnesses. This study focuses on Sepsis, a critical response to infection, aiming to enhance early detection and mortality prediction for Sepsis-3 patients to improve hospital resource allocation. METHODS: In this study, we developed a Machine Learning (ML) framework to predict the 30-day mortality rate of ICU patients with Sepsis-3 using the MIMIC-III database. Advanced big data extraction tools like Snowflake were used to identify eligible patients. Decision tree models and Entropy Analyses helped refine feature selection, resulting in 30 relevant features curated with clinical experts. We employed the Light Gradient Boosting Machine (LightGBM) model for its efficiency and predictive power. RESULTS: The study comprised a cohort of 9118 Sepsis-3 patients. Our preprocessing techniques significantly improved both the AUC and accuracy metrics. The LightGBM model achieved an impressive AUC of 0.983 (95% CI: [0.980-0.990]), an accuracy of 0.966, and an F1-score of 0.910. Notably, LightGBM showed a substantial 6% improvement over our best baseline model and a 14% enhancement over the best existing literature. These advancements are attributed to (I) the inclusion of the novel and pivotal feature Hospital Length of Stay (HOSP_LOS), absent in previous studies, and (II) LightGBM's gradient boosting architecture, enabling robust predictions with high-dimensional data while maintaining computational efficiency, as demonstrated by its learning curve. CONCLUSIONS: Our preprocessing methodology reduced the number of relevant features and identified a crucial feature overlooked in previous studies. The proposed model demonstrated high predictive power and generalization capability, highlighting the potential of ML in ICU settings. This model can streamline ICU resource allocation and provide tailored interventions for Sepsis-3 patients.


Assuntos
Unidades de Terapia Intensiva , Aprendizado de Máquina , Sepse , Humanos , Sepse/mortalidade , Mortalidade Hospitalar , Masculino , Feminino , Pessoa de Meia-Idade , Idoso , Prognóstico
12.
Artigo em Inglês | MEDLINE | ID: mdl-39121441

RESUMO

Applying machine-learning techniques for imbalanced data sets presents a significant challenge in materials science since the underrepresented characteristics of minority classes are often buried by the abundance of unrelated characteristics in majority of classes. Existing approaches to address this focus on balancing the counts of each class using oversampling or synthetic data generation techniques. However, these methods can lead to loss of valuable information or overfitting. Here, we introduce a deep learning framework to predict minority-class materials, specifically within the realm of metal-insulator transition (MIT) materials. The proposed approach, termed boosting-CGCNN, combines the crystal graph convolutional neural network (CGCNN) model with a gradient-boosting algorithm. The model effectively handled extreme class imbalances in MIT material data by sequentially building a deeper neural network. The comparative evaluations demonstrated the superior performance of the proposed model compared to other approaches. Our approach is a promising solution for handling imbalanced data sets in materials science.

13.
Prev Med Rep ; 44: 102806, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39091569

RESUMO

Background: Many individuals with hypertension remain undiagnosed. We aimed to develop a predictive model for hypertension using diagnostic codes from prevailing electronic medical records in Swedish primary care. Methods: This sex- and age-matched case-control (1:5) study included patients aged 30-65 years living in the Stockholm Region, Sweden, with a newly recorded diagnosis of hypertension during 2010-19 (cases) and individuals without a recorded hypertension diagnosis during 2010-19 (controls), in total 507,618 individuals. Patients with diagnoses of cardiovascular diseases or diabetes were excluded. A stochastic gradient boosting machine learning model was constructed using the 1,309 most registered ICD-10 codes from primary care for three years prior the hypertension diagnosis. Results: The model showed an area under the curve (95 % confidence interval) of 0.748 (0.742-0.753) for females and 0.745 (0.740-0.751) for males for predicting diagnosis of hypertension within three years. The sensitivity was 63 % and 68 %, and the specificity 76 % and 73 %, for females and males, respectively. The 25 diagnoses that contributed the most to the model for females and males all exhibited a normalized relative influence >1 %. The codes contributing most to the model, all with an odds ratio of marginal effects >1 for both sexes, were dyslipidaemia, obesity, and encountering health services in other circumstances. Conclusions: This machine learning model, using prevailing recorded diagnoses within primary health care, may contribute to the identification of patients at risk of unrecognized hypertension. The added value of this predictive model beyond information of blood pressure warrants further study.

14.
Talanta ; 279: 126639, 2024 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-39094531

RESUMO

In this paper, an ultra-small-sized CuOx/GDYO nanozyme in situ grown on ITO glass was rationally synthesized from mixed precursors of graphdiyne oxide (GDYO) and copper based infinite coordination polymer (Cu-ICP, consisting of Cu ions and two organic ligands 3,5-di-tert-butylcatechol and 1,4-bis(imidazole-1-ylmethyl)benzene) via mild and simple electrochemical strategy. On one hand, the preferential electro-reduction of Cu-ICP enabled the formation of ultra-small CuOx with Cu(I) as the main component and avoided the loss of oxygen-containing functional groups and defects on the surface of GDYO; on the other hand, GDYO can also serve as electroless reductive species to facilitate the electrochemical deposition of CuOx and turn itself to a higher oxidation state with more exposed functional groups and defects. This one-stone-two-birds electrochemical strategy empowered CuOx/GDYO nanozyme with superior peroxidase-mimicking activity and robust anchoring stability on ITO glass, thus enabled further exploration of the portable device with availability for point-of-use applications. Based on the organophosphorus pesticides (OPs) blocked acetylcholinesterase (AChE) activity, the competitive redox reaction was regulated to initiate the chromogenic reaction of 3,3',5,5'-tetramethylbenzidine (TMB) catalyzed by CuOx/GDYO peroxidase-like nanozyme, which laid out a foundation for the detection of OPs (with chlorpyrifos as an example). With a detection of limit low to 0.57 nM, the OPs residues during agricultural production can be directly monitored by the portable device we developed.

15.
Toxicol Mech Methods ; : 1-9, 2024 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-39104137

RESUMO

Per- and polyfluoroalkyl substances (PFASs), one of the persistent organic pollutants, have immunosuppressive effects. The evaluation of this effect has been the focus of regulatory toxicology. In this investigation, 146 PFASs (immunosuppressive or nonimmunosuppressive) and corresponding concentration gradients were collected from literature, and their structures were characterized by using Dragon descriptors. Feature importance analysis and stepwise feature elimination are used for feature selection. Three machine learning (ML) methods, namely Random Forest (RF), Extreme Gradient Boosting Machine (XGB), and Categorical Boosting Machine (CB), were utilized for model development. The model interpretability was explored by feature importance analysis and correlation analysis. The findings indicated that the three models developed have exhibited excellent performance. Among them, the best-performing RF model has an average AUC score of 0.9720 for the testing set. The results of the feature importance analysis demonstrated that concentration, SpPosA_X, IVDE, R2s, and SIC2 were the crucial molecular features. Applicability domain analysis was also performed to determine reliable prediction boundaries for the model. In conclusion, this study is the first application of ML models to investigate the immunosuppressive activity of PFASs. The variables used in the models can help understand the mechanism of the immunosuppressive activity of PFASs, allow researchers to more effectively assess the immunosuppressive potential of a large number of PFASs, and thus better guide environmental and health risk assessment efforts.

16.
Comput Biol Med ; 179: 108859, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39029431

RESUMO

O-linked glycosylation is a complex post-translational modification (PTM) in human proteins that plays a critical role in regulating various cellular metabolic and signaling pathways. In contrast to N-linked glycosylation, O-linked glycosylation lacks specific sequence features and maintains an unstable core structure. Identifying O-linked threonine glycosylation sites (OTGs) remains challenging, requiring extensive experimental tests. While bioinformatics tools have emerged for predicting OTGs, their reliance on limited conventional features and absence of well-defined feature selection strategies limit their effectiveness. To address these limitations, we introduced HOTGpred (Human O-linked Threonine Glycosylation predictor), employing a multi-stage feature selection process to identify the optimal feature set for accurately identifying OTGs. Initially, we assessed 25 different feature sets derived from various pretrained protein language model (PLM)-based embeddings and conventional feature descriptors using nine classifiers. Subsequently, we integrated the top five embeddings linearly and determined the most effective scoring function for ranking hybrid features, identifying the optimal feature set through a process of sequential forward search. Among the classifiers, the extreme gradient boosting (XGBT)-based model, using the optimal feature set (HOTGpred), achieved 92.03 % accuracy on the training dataset and 88.25 % on the balanced independent dataset. Notably, HOTGpred significantly outperformed the current state-of-the-art methods on both the balanced and imbalanced independent datasets, demonstrating its superior prediction capabilities. Additionally, SHapley Additive exPlanations (SHAP) and ablation analyses were conducted to identify the features contributing most significantly to HOTGpred. Finally, we developed an easy-to-navigate web server, accessible at https://balalab-skku.org/HOTGpred/, to support glycobiologists in their research on glycosylation structure and function.


Assuntos
Treonina , Glicosilação , Humanos , Treonina/metabolismo , Treonina/química , Processamento de Proteína Pós-Traducional , Software , Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteínas/química , Proteínas/metabolismo
17.
Sci Total Environ ; 948: 174584, 2024 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-38977098

RESUMO

Acid-modified biochar is a modified biochar material with convenient preparation, high specific surface area, and rich pore structure. It has great potential for application in the heavy metal remediation, soil amendments, and carrying catalysts. Specific surface area (SSA), average pore size (APS), and total pore volume (TPV) are the key properties that determine its adsorption capacity, reactivity, and water holding capacity, and an intensive study of these properties is essential to optimize the performance of biochar. But the complex interactions among the preparation conditions obstruct finding the optimal modification strategy. This study collected dataset through bibliometric analysis and used four typical machine learning models to predict the SSA, APS, and TPV of acid-modified biochar. The results showed that the extreme gradient boosting (XGB) was optimal for the test results (SSA R2 = 0.92, APS R2 = 0.87, TPV R2 = 0.96). The model interpretation revealed that the modification conditions were the major factors affecting SSA and TPV, and the pyrolysis conditions were the major factors affecting APS. Based on the XGB model, the modification conditions of biochar were optimized, which revealed the ideal preparation conditions for producing the optimal biochar (SSA = 727.02 m2/g, APS = 5.34 nm, TPV = 0.68 cm3/g). Moreover, the biochar produced under specific conditions verified the generalization ability of the XGB model (R2 = 0.99, RMSE = 12.355). This study provides guidance for optimizing the preparation strategy of acid-modified biochar and promotes its potentiality for industrial application.


Assuntos
Carvão Vegetal , Aprendizado de Máquina , Carvão Vegetal/química , Bibliometria , Porosidade , Adsorção
18.
Sci Total Environ ; 948: 174462, 2024 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-38992374

RESUMO

This comprehensive study unveils the vast global potential of microalgae as a sustainable bioenergy source, focusing on the utilization of marginal lands and employing advanced machine learning techniques to predict biomass productivity. By identifying approximately 7.37 million square kilometers of marginal lands suitable for microalgae cultivation, this research uncovers the extensive potential of these underutilized areas, particularly within equatorial and low-latitude regions, for microalgae bioenergy development. This approach mitigates the competition for food resources and conserves freshwater supplies. Utilizing cutting-edge machine learning algorithms based on robust datasets from global microalgae cultivation experiments spanning 1994 to 2017, this study integrates essential environmental variables to map out a detailed projection of potential yields across a variety of landscapes. The analysis further delineates the bioenergy and carbon sequestration potential across two effective cultivation methods: Photobioreactors (PBRs), and Open Ponds, with PBRs showcasing exceptional productivity, with a global average daily biomass productivity of 142.81mgL-1d-1, followed by Open Ponds at 122.57mgL-1d-1. Projections based on optimal PBR conditions suggest an annual yield of 99.54 gigatons of microalgae biomass. This yield can be transformed into 64.70 gigatons of biodiesel, equivalent to 58.68 gigatons of traditional diesel, while sequestering 182.16 gigatons of CO2, equating to approximately 4.5 times the global CO2 emissions projected for 2023. Notably, Australia leads in microalgae biomass production, with an annual output of 16.19 gigatons, followed by significant contributions from Kazakhstan, Sudan, Brazil, the United States, and China, showcasing the diverse global potential for microalgae bioenergy across varying ecological and geographical landscapes. Through this rigorous investigation, the study emphasizes the strategic importance of microalgae cultivation in achieving sustainable energy solutions and mitigating climate change, while also acknowledging the scalability challenges and the necessity for significant economic and energy investments.


Assuntos
Biocombustíveis , Biomassa , Sequestro de Carbono , Aprendizado de Máquina , Microalgas , Microalgas/crescimento & desenvolvimento
19.
Sci Rep ; 14(1): 16400, 2024 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-39013923

RESUMO

In order to further promote the application of cementitious sand gravel (CSG), the mechanical properties and variation rules of CSG material under triaxial test were studied. Considering the influence of fly ash content, water-binder ratio, sand rate and lateral confining pressure, 81 cylinder specimens were designed and made for conventional triaxial test, and the influence laws of stress-strain curve, failure pattern, elastic modulus, energy dissipation and damage evolution of specimens were analyzed. The results showed that the peak of stress-strain curve increased with the increase of confining pressure, and the peak stress, peak strain and energy dissipation all increased significantly, but the damage variable D decreased with the increase of confining pressure. Under triaxial compression, the specimen was basically sheared failure from the bonding surface, and the aggregate generally did not break. Sand rate had a significant effect on the peak stress of CSG, and decreased with the increase of sand rate. Under the conditions of the same cement content, fly ash content and confining pressure, the optimal water-binder ratio 1.2 existed when the sand rate was 0.2 and 0.3. After analyzing and processing the stress-strain curve of triaxial test, a Cuckoo Search-eXtreme Gradient Boosting (CS-XGBoost) curve prediction model was established, and the model was evaluated by evaluation indexes R2, RMSE and MAE. The average R2 of the XGBoost model based on initial parameters under 18 different output features was 0.8573, and the average R2 of the CS-XGBoost model was 0.9516, an increase of 10.10%. Moreover, the prediction curve was highly consistent with the test curve, indicating that the CS algorithm had significant advantages. The CS-XGBoost model could accurately predict the triaxial stress-strain curve of CSG.

20.
BMC Med Inform Decis Mak ; 24(1): 199, 2024 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-39039467

RESUMO

OBJECTIVE: To develop and validate machine learning models for predicting coronary artery disease (CAD) within a Taiwanese cohort, with an emphasis on identifying significant predictors and comparing the performance of various models. METHODS: This study involved a comprehensive analysis of clinical, demographic, and laboratory data from 8,495 subjects in Taiwan Biobank (TWB) after propensity score matching to address potential confounding factors. Key variables included age, gender, lipid profiles (T-CHO, HDL_C, LDL_C, TG), smoking and alcohol consumption habits, and renal and liver function markers. The performance of multiple machine learning models was evaluated. RESULTS: The cohort comprised 1,699 individuals with CAD identified through self-reported questionnaires. Significant differences were observed between CAD and non-CAD individuals regarding demographics and clinical features. Notably, the Gradient Boosting model emerged as the most accurate, achieving an AUC of 0.846 (95% confidence interval [CI] 0.819-0.873), sensitivity of 0.776 (95% CI, 0.732-0.820), and specificity of 0.759 (95% CI, 0.736-0.782), respectively. The accuracy was 0.762 (95% CI, 0.742-0.782). Age was identified as the most influential predictor of CAD risk within the studied dataset. CONCLUSION: The Gradient Boosting machine learning model demonstrated superior performance in predicting CAD within the Taiwanese cohort, with age being a critical predictor. These findings underscore the potential of machine learning models in enhancing the prediction accuracy of CAD, thereby supporting early detection and targeted intervention strategies. TRIAL REGISTRATION: Not applicable.


Assuntos
Doença da Artéria Coronariana , Aprendizado de Máquina , Humanos , Taiwan , Feminino , Masculino , Pessoa de Meia-Idade , Adulto , Idoso , Medição de Risco , Fatores de Risco de Doenças Cardíacas , Algoritmos , Fatores de Risco , Doenças Cardiovasculares
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA