Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 75
Filtrar
1.
J Proteome Res ; 2024 Aug 16.
Artículo en Inglés | MEDLINE | ID: mdl-39150755

RESUMEN

Given recent technological advances in proteomics, it is now possible to quantify plasma proteomes in large cohorts of patients to screen for biomarkers and to guide the early diagnosis and treatment of depression. Here we used CatBoost machine learning to model and discover biomarkers of depression in UK Biobank data sets (depression n = 4,479, healthy control n = 19,821). CatBoost was employed for model construction, with Shapley Additive Explanations (SHAP) being utilized to interpret the resulting model. Model performance was corroborated through 5-fold cross-validation, and its diagnostic efficacy was evaluated based on the area under the receiver operating characteristic (AUC) curve. A total of 45 depression-related proteins were screened based on the top 20 important features output by the CatBoost model in six data sets. Of the nine diagnostic models for depression, the performance of the traditional risk factor model was improved after the addition of proteomic data, with the best model having an average AUC of 0.764 in the test sets. KEGG pathway analysis of 45 screened proteins showed that the most significant pathway involved was the cytokine-cytokine receptor interaction. It is feasible to explore diagnostic biomarkers of depression using data-driven machine learning methods and large-scale data sets, although the results require validation.

2.
Sci Rep ; 14(1): 18834, 2024 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-39138311

RESUMEN

As we all know, momentum plays a crucial role in ball game. Based on the 2023 Wimbledon final data, this paper investigated momentum in tennis. Firstly, we initially trained a decision tree regression model on reprocessed data for prediction, and established the CBRF model based on CatBoost regression and random forest regression models to obtain prediction data. Secondly, significant non-zero autocorrelation coefficients were found, confirming the correlation between momentum and success. Thirdly, Based on these key factors, we proposed winning strategies for the players, conducted predictive analyses for six specific time intervals of the game. At last, by implementing these models to women's matches, championships, matches on different surfaces, the results demonstrated that the models have effective generalization ability.

3.
Biomimetics (Basel) ; 9(7)2024 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-39056835

RESUMEN

In general, the design of a safe and rational laneway support scheme signifies a crucial prerequisite for ensuring the security and efficiency of mining exploitation in mines. Nevertheless, the conventional empirical support system for mining laneways faces challenges in assessing the rationality of support methods, which can compromise the safety and reliability of the laneways. To address this issue, the safety factor was incorporated into research on laneway support, and a safety evaluation method for laneway support in line with the safety factor was established. In light of the data from a specific iron mine laneway in central China, the CRITIC method was employed to preprocess the sample data. Going one step further, a Bayesian algorithm was utilized to optimize the hyperparameters of the CatBoost model, followed by proposing a prediction model based on the BO-CatBoost model for evaluating laneway safety factors of plain shotcrete support. Furthermore, the performance indexes, such as the root mean square error (RMSE), the mean absolute error (MAE), the correlation coefficient (R2), the variance accounts for (VAF), and the a-20 index, were determined to examine the predictive performance of each proposed model. In contrast to the other models, the BO-CatBoost model demonstrated the optimal predictive output item for safety factors with the lowest RMSE and MAE, the largest R2 and VAF, and an appropriate a-20 index value of 0.5688, 0.4074, 0.9553, 95.25%, and 0.9167 in the test set, respectively. Therefore, the BO-CatBoost model was proven to be the most appropriate machine learning method that can more accurately predict the safety factor, which will provide a novel approach for optimizing laneway support design and laneway safety evaluation.

4.
Front Artif Intell ; 7: 1401810, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38887604

RESUMEN

Introduction: Regulatory agencies generate a vast amount of textual data in the review process. For example, drug labeling serves as a valuable resource for regulatory agencies, such as U.S. Food and Drug Administration (FDA) and Europe Medical Agency (EMA), to communicate drug safety and effectiveness information to healthcare professionals and patients. Drug labeling also serves as a resource for pharmacovigilance and drug safety research. Automated text classification would significantly improve the analysis of drug labeling documents and conserve reviewer resources. Methods: We utilized artificial intelligence in this study to classify drug-induced liver injury (DILI)-related content from drug labeling documents based on FDA's DILIrank dataset. We employed text mining and XGBoost models and utilized the Preferred Terms of Medical queries for adverse event standards to simplify the elimination of common words and phrases while retaining medical standard terms for FDA and EMA drug label datasets. Then, we constructed a document term matrix using weights computed by Term Frequency-Inverse Document Frequency (TF-IDF) for each included word/term/token. Results: The automatic text classification model exhibited robust performance in predicting DILI, achieving cross-validation AUC scores exceeding 0.90 for both drug labels from FDA and EMA and literature abstracts from the Critical Assessment of Massive Data Analysis (CAMDA). Discussion: Moreover, the text mining and XGBoost functions demonstrated in this study can be applied to other text processing and classification tasks.

5.
J Environ Manage ; 363: 121273, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38850918

RESUMEN

Carbon price is a pivotal element in the carbon trading sector. Accurate estimation of carbon price can offer precise guidance for the carbon market participants. This study introduces a novel prediction model encompassing both point and interval prediction for the carbon price. Firstly, to distill the volatility traits inherent in carbon price, the successive variational mode decomposition is utilized to adaptively decompose the carbon price into regular sequences. Secondly, to obtain the optimal input variables, the partial autocorrelation function and random forest are employed to filter the influencing factors and historical carbon price. Then, to avoid single model constraint, a combination model of categorical boosting and kernel extreme learning machine optimized by the sparrow search algorithm is employed for the point prediction, and the shapley additive explanation is employed to elucidate the model prediction process. Finally, to provide more efficient information, the adaptive bandwidth kernel density estimation is applied to the interval prediction. The data from Hubei carbon market is adopted as a case study, and the results indicate that the mean absolute error, mean absolute percentage error, root mean square error and R2 of the proposed model are 0.1022, 0.0022, 0.1262 and 0.9921, respectively. The historical carbon price, Brent crude oil futures settlement price and European Union allowance futures carbon price have a positive impact on carbon price, and Hushen 300 has a negative impact on carbon price. Compared with the constant kernel density estimation, the proposed model achieves higher interval coverage probability and lower interval width. Thus, the application of the hybrid model can promote the operational efficiency of the carbon market and facilitate the implementation of carbon emission reduction policies.


Asunto(s)
Algoritmos , Carbono , Modelos Teóricos , Comercio
6.
Genes (Basel) ; 15(6)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38927611

RESUMEN

Protein-DNA complex interactivity plays a crucial role in biological activities such as gene expression, modification, replication and transcription. Understanding the physiological significance of protein-DNA binding interfacial hot spots, as well as the development of computational biology, depends on the precise identification of these regions. In this paper, a hot spot prediction method called EC-PDH is proposed. First, we extracted features of these hot spots' solid solvent-accessible surface area (ASA) and secondary structure, and then the mean, variance, energy and autocorrelation function values of the first three intrinsic modal components (IMFs) of these conventional features were extracted as new features via the empirical modal decomposition algorithm (EMD). A total of 218 dimensional features were obtained. For feature selection, we used the maximum correlation minimum redundancy sequence forward selection method (mRMR-SFS) to obtain an optimal 11-dimensional-feature subset. To address the issue of data imbalance, we used the SMOTE-Tomek algorithm to balance positive and negative samples and finally used cat gradient boosting (CatBoost) to construct our hot spot prediction model for protein-DNA binding interfaces. Our method performs well on the test set, with AUC, MCC and F1 score values of 0.847, 0.543 and 0.772, respectively. After a comparative evaluation, EC-PDH outperforms the existing state-of-the-art methods in identifying hot spots.


Asunto(s)
Algoritmos , ADN , Aprendizaje Automático , ADN/genética , ADN/química , ADN/metabolismo , Unión Proteica , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Biología Computacional/métodos , Sitios de Unión
7.
Sci Rep ; 14(1): 14590, 2024 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-38918511

RESUMEN

This study explores machine learning (ML) capabilities for predicting the shear strength of reinforced concrete deep beams (RCDBs). For this purpose, eight typical machine-learning models, i.e., symbolic regression (SR), XGBoost (XGB), CatBoost (CATB), random forest (RF), LightGBM, support vector regression (SVR), artificial neural networks (ANN), and Gaussian process regression (GPR) models, are selected and compared based on a database of 840 samples with 14 input features. The hyperparameter tuning of the introduced ML models is performed using the Bayesian optimization (BO) technique. The comparison results show that the CatBoost model is the most reliable and accurate ML model (R2 = 0.997 and 0.947 in the training and testing sets, respectively). In addition, simple and practical design expressions for RCDBs have been proposed based on the SR model with a physical meaning and acceptable accuracy (an average prediction-to-test ratio of 0.935 and a standard deviation of 0.198). Meanwhile, the shear strength predicted by ML models was then compared with classical mechanics-driven shear models, including two prominent practice codes (i.e., ACI318, EC2) and two previous mechanical models, which indicated that the ML approach is highly reliable and accurate over conventional methods. In addition, a reliability-based design was conducted on two ML models, and their reliability results were compared with those of two code standards. The findings revealed that the ML models demonstrate higher reliability compared to code standards.

8.
Lipids Health Dis ; 23(1): 152, 2024 May 21.
Artículo en Inglés | MEDLINE | ID: mdl-38773573

RESUMEN

BACKGROUND: Alzheimer's disease (AD) is a chronic neurodegenerative disorder that poses a substantial economic burden. The Random forest algorithm is effective in predicting AD; however, the key factors influencing AD onset remain unclear. This study aimed to analyze the key lipoprotein and metabolite factors influencing AD onset using machine-learning methods. It provides new insights for researchers and medical personnel to understand AD and provides a reference for the early diagnosis, treatment, and early prevention of AD. METHODS: A total of 603 participants, including controls and patients with AD with complete lipoprotein and metabolite data from the Alzheimer's disease Neuroimaging Initiative (ADNI) database between 2005 and 2016, were enrolled. Random forest, Lasso regression, and CatBoost algorithms were employed to rank and filter 213 lipoprotein and metabolite variables. Variables with consistently high importance rankings from any two methods were incorporated into the models. Finally, the variables selected from the three methods, with the participants' age, sex, and marital status, were used to construct a random forest predictive model. RESULTS: Fourteen lipoprotein and metabolite variables were screened using the three methods, and 17 variables were included in the AD prediction model based on age, sex, and marital status of the participants. The optimal random forest modeling was constructed with "mtry" set to 3 and "ntree" set to 300. The model exhibited an accuracy of 71.01%, a sensitivity of 79.59%, a specificity of 65.28%, and an AUC (95%CI) of 0.724 (0.645-0.804). When Mean Decrease Accuracy and Gini were used to rank the proteins, age, phospholipids to total lipids ratio in intermediate-density lipoproteins (IDL_PL_PCT), and creatinine were among the top five variables. CONCLUSIONS: Age, IDL_PL_PCT, and creatinine levels play crucial roles in AD onset. Regular monitoring of lipoproteins and their metabolites in older individuals is significant for early AD diagnosis and prevention.


Asunto(s)
Enfermedad de Alzheimer , Lipoproteínas , Aprendizaje Automático , Humanos , Enfermedad de Alzheimer/diagnóstico , Enfermedad de Alzheimer/sangre , Enfermedad de Alzheimer/metabolismo , Femenino , Masculino , Anciano , Lipoproteínas/sangre , Anciano de 80 o más Años , Algoritmos , Biomarcadores/sangre
9.
Heliyon ; 10(9): e29497, 2024 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-38699007

RESUMEN

Objective: Diabetic retinopathy is one of the major complications of diabetes. In this study, a diabetic retinopathy risk prediction model integrating machine learning models and SHAP was established to increase the accuracy of risk prediction for diabetic retinopathy, explain the rationality of the findings from model prediction and improve the reliability of prediction results. Methods: Data were preprocessed for missing values and outliers, features selected through information gain, a diabetic retinopathy risk prediction model established using the CatBoost and the outputs of the mode interpreted using the SHAP model. Results: One thousand early warning data of diabetes complications derived from diabetes complication early warning dataset from the National Clinical Medical Sciences Data Center were used in this study. The CatBoost-based model for diabetic retinopathy prediction performed the best in the comparative model test. ALB_CR, HbA1c, UPR_24, NEPHROPATHY and SCR were positively correlated with diabetic retinopathy, while CP, HB, ALB, DBILI and CRP were negatively correlated with diabetic retinopathy. The relationships between HEIGHT, WEIGHT and ESR characteristics and diabetic retinopathy were not significant. Conclusion: The risk factors for diabetic retinopathy include poor renal function, elevated blood glucose level, liver disease, hematonosis and dysarteriotony, among others. Diabetic retinopathy can be prevented by monitoring and effectively controlling relevant indices. In this study, the influence relationships between the features were also analyzed to further explore the potential factors of diabetic retinopathy, which can provide new methods and new ideas for the early prevention and clinical diagnosis of subsequent diabetic retinopathy.

10.
Antimicrob Agents Chemother ; 68(7): e0026524, 2024 Jul 09.
Artículo en Inglés | MEDLINE | ID: mdl-38808999

RESUMEN

In order to predict the anti-trypanosome effect of carbazole-derived compounds by quantitative structure-activity relationship, five models were established by the linear method, random forest, radial basis kernel function support vector machine, linear combination mix-kernel function support vector machine, and nonlinear combination mix-kernel function support vector machine (NLMIX-SVM). The heuristic method and optimized CatBoost were used to select two different key descriptor sets for building linear and nonlinear models, respectively. Hyperparameters in all nonlinear models were optimized by comprehensive learning particle swarm optimization with low complexity and fast convergence. Furthermore, the models' robustness and reliability underwent rigorous assessment using fivefold and leave-one-out cross-validation, y-randomization, and statistics including concordance correlation coefficient (CCC), [Formula: see text] , [Formula: see text] , and [Formula: see text] . Among all the models, the NLMIX-SVM model, which was established by support vector regression using a nonlinear combination of radial basis kernel function, sigmoid kernel function, and linear kernel function as a new kernel function, demonstrated excellent learning and generalization abilities as well as robustness: [Formula: see text] = 0.9581, mean square error (MSE) = 0.0199 for the training set and [Formula: see text] = 0.9528, MSE = 0.0174 for the test set. [Formula: see text] , [Formula: see text] , CCC, [Formula: see text] , [Formula: see text], and [Formula: see text] are 0.9539, 0.8908, 0.9752, 0.9529, 0.9528, and 0.9633, respectively. The NLMIX-SVM method proved to be a promising way in quantitative structure-activity relationship research. In addition, molecular docking experiments were conducted to analyze the properties of new derivatives, and a new potential candidate drug molecule was ultimately found. In summary, this study will provide help for the design and screening of novel anti-trypanosome drugs.


Asunto(s)
Carbazoles , Relación Estructura-Actividad Cuantitativa , Máquina de Vectores de Soporte , Carbazoles/farmacología , Tripanocidas/farmacología
11.
Clin Neurol Neurosurg ; 242: 108308, 2024 07.
Artículo en Inglés | MEDLINE | ID: mdl-38733759

RESUMEN

OBJECT: The aim of this study was at building an effective machine learning model to contribute to the prediction of stroke recurrence in adult stroke patients subjected to moyamoya disease (MMD), while at analyzing the factors for stroke recurrence. METHODS: The data of this retrospective study originated from the database of JiangXi Province Medical Big Data Engineering & Technology Research Center. Moreover, the information of MMD patients admitted to the second affiliated hospital of Nanchang university from January 1st, 2007 to December 31st, 2019 was acquired. A total of 661 patients from January 1st, 2007 to February 28th, 2017 were covered in the training set, while the external validation set comprised 284 patients that fell into a scope from March 1st, 2017 to December 31st, 2019. First, the information regarding all the subjects was compared between the training set and the external validation set. The key influencing variables were screened out using the Lasso Regression Algorithm. Furthermore, the models for predicting stroke recurrence in 1, 2, and 3 years after the initial stroke were built based on five different machine learning algorithms, and all models were externally validated and then compared. Lastly, the CatBoost model with the optimal performance was explained using the SHapley Additive exPlanations (SHAP) interpretation model. RESULT: In general, 945 patients suffering from MMD were recruited, and the recurrence rate of acute stroke in 1, 2, and 3 years after the initial stroke reached 11.43%(108/945), 18.94%(179/945), and 23.17%(219/945), respectively. The CatBoost models exhibited the optimal prediction performance among all models; the area under the curve (AUC) of these models for predicting stroke recurrence in 1, 2, and 3 years was determined as 0.794 (0.787, 0.801), 0.813 (0.807, 0.818), and 0.789 (0.783, 0.795), respectively. As indicated by the results of the SHAP interpretation model, the high Suzuki stage, young adults (aged 18-44), no surgical treatment, and the presence of an aneurysm were likely to show significant correlations with the recurrence of stroke in adult stroke patients subjected to MMD. CONCLUSION: In adult stroke patients suffering from MMD, the CatBoost model was confirmed to be effective in stroke recurrence prediction, yielding accurate and reliable prediction outcomes. High Suzuki stage, young adults (aged 18-44 years), no surgical treatment, and the presence of an aneurysm are likely to be significantly correlated with the recurrence of stroke in adult stroke patients subjected to MMD.


Asunto(s)
Aprendizaje Automático , Enfermedad de Moyamoya , Recurrencia , Accidente Cerebrovascular , Humanos , Enfermedad de Moyamoya/complicaciones , Masculino , Femenino , Adulto , Persona de Mediana Edad , Estudios Retrospectivos , Factores de Riesgo , Valor Predictivo de las Pruebas , Anciano
12.
BMC Geriatr ; 24(1): 472, 2024 May 30.
Artículo en Inglés | MEDLINE | ID: mdl-38816811

RESUMEN

BACKGROUND: This study aims to implement a validated prediction model and application medium for postoperative pneumonia (POP) in elderly patients with hip fractures in order to facilitate individualized intervention by clinicians. METHODS: Employing clinical data from elderly patients with hip fractures, we derived and externally validated machine learning models for predicting POP. Model derivation utilized a registry from Nanjing First Hospital, and external validation was performed using data from patients at the Fourth Affiliated Hospital of Nanjing Medical University. The derivation cohort was divided into the training set and the testing set. The least absolute shrinkage and selection operator (LASSO) and multivariable logistic regression were used for feature screening. We compared the performance of models to select the optimized model and introduced SHapley Additive exPlanations (SHAP) to interpret the model. RESULTS: The derivation and validation cohorts comprised 498 and 124 patients, with 14.3% and 10.5% POP rates, respectively. Among these models, Categorical boosting (Catboost) demonstrated superior discrimination ability. AUROC was 0.895 (95%CI: 0.841-0.949) and 0.835 (95%CI: 0.740-0.930) on the training and testing sets, respectively. At external validation, the AUROC amounted to 0.894 (95% CI: 0.821-0.966). The SHAP method showed that CRP, the modified five-item frailty index (mFI-5), and ASA body status were among the top three important predicators of POP. CONCLUSION: Our model's good early prediction ability, combined with the implementation of a network risk calculator based on the Catboost model, was anticipated to effectively distinguish high-risk POP groups, facilitating timely intervention.


Asunto(s)
Fracturas de Cadera , Aprendizaje Automático , Neumonía , Complicaciones Posoperatorias , Humanos , Masculino , Femenino , Aprendizaje Automático/tendencias , Fracturas de Cadera/cirugía , Anciano , Neumonía/diagnóstico , Neumonía/epidemiología , Neumonía/etiología , Complicaciones Posoperatorias/diagnóstico , Complicaciones Posoperatorias/etiología , Complicaciones Posoperatorias/epidemiología , Anciano de 80 o más Años , Fragilidad/diagnóstico , Medición de Riesgo/métodos , Anciano Frágil
13.
J Environ Manage ; 357: 120785, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38583378

RESUMEN

Accurate air quality index (AQI) prediction is essential in environmental monitoring and management. Given that previous studies neglect the importance of uncertainty estimation and the necessity of constraining the output during prediction, we proposed a new hybrid model, namely TMSSICX, to forecast the AQI of multiple cities. Firstly, time-varying filtered based empirical mode decomposition (TVFEMD) was adopted to decompose the AQI sequence into multiple internal mode functions (IMF) components. Secondly, multi-scale fuzzy entropy (MFE) was applied to evaluate the complexity of each IMF component and clustered them into high and low-frequency portions. In addition, the high-frequency portion was secondarily decomposed by successive variational mode decomposition (SVMD) to reduce volatility. Then, six air pollutant concentrations, namely CO, SO2, PM2.5, PM10, O3, and NO2, were used as inputs. The secondary decomposition and preliminary portion were employed as the outputs for the bidirectional long short-term memory network optimized by the snake optimization algorithm (SOABiLSTM) and improved Catboost (ICatboost), respectively. Furthermore, extreme gradient boosting (XGBoost) was applied to ensemble each predicted sub-model to acquire the consequence. Ultimately, we introduced adaptive kernel density estimation (AKDE) for interval estimation. The empirical outcome indicated the TMSSICX model achieved the best performance among the other 23 models across all datasets. Moreover, implementing the XGBoost to ensemble each predicted sub-model led to an 8.73%, 8.94%, and 0.19% reduction in RMSE, compared to SVM. Additionally, by utilizing SHapley Additive exPlanations (SHAP) to assess the impact of the six pollutant concentrations on AQI, the results reveal that PM2.5 and PM10 had the most notable positive effects on the long-term trend of AQI. We hope this model can provide guidance for air quality management.


Asunto(s)
Contaminantes Atmosféricos , Contaminación del Aire , Inteligencia Artificial , Incertidumbre , Contaminación del Aire/análisis , Contaminantes Atmosféricos/análisis , Material Particulado/análisis
14.
Surg Oncol ; 54: 102079, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38688191

RESUMEN

INTRODUCTION: Colorectal cancer (CRC) is a global public health concern, ranking among the most commonly diagnosed malignancies worldwide. Despite advancements in treatment modalities, the specter of CRC recurrence remains a significant challenge, demanding innovative solutions for early detection and intervention. The integration of machine learning into oncology offers a promising avenue to address this issue, providing data-driven insights and personalized care. METHODS: This retrospective study analyzed data from 396 patients who underwent surgical procedures for colon cancer (CC) between 2010 and 2021. Machine learning algorithms were employed to predict CC recurrence, with a focus on demographic, clinicopathological, and laboratory characteristics. A range of evaluation metrics, including AUC (Area Under the Receiver Operating Characteristic), accuracy, recall, precision, and F1 scores, assessed the performance of machine learning algorithms. RESULTS: Significant risk factors for CC recurrence were identified, including sex, carcinoembryonic antigen (CEA) levels, tumor location, depth, lymphatic and venous invasion, and lymph node involvement. The CatBoost Classifier demonstrated exceptional performance, achieving an AUC of 0.92 and an accuracy of 88 % on the test dataset. Feature importance analysis highlighted the significance of CEA levels, albumin levels, N stage, weight, platelet count, height, neutrophil count, lymphocyte count, and gender in determining recurrence risk. DISCUSSION: The integration of machine learning into healthcare, exemplified by this study's findings, offers a pathway to personalized patient risk stratification and enhanced clinical decision-making. Early identification of individuals at risk of CC recurrence holds the potential for more effective therapeutic interventions and improved patient outcomes. CONCLUSION: Machine learning has the potential to revolutionize our approach to CC recurrence prediction, emphasizing the synergy between medical expertise and cutting-edge technology in the fight against cancer. This study represents a vital step toward precision medicine in CC management, showcasing the transformative power of data-driven insights in oncology.


Asunto(s)
Neoplasias del Colon , Aprendizaje Automático , Recurrencia Local de Neoplasia , Humanos , Masculino , Femenino , Recurrencia Local de Neoplasia/patología , Estudios Retrospectivos , Neoplasias del Colon/patología , Neoplasias del Colon/cirugía , Persona de Mediana Edad , Anciano , Estudios de Seguimiento , Pronóstico , Factores de Riesgo , Anciano de 80 o más Años , Adulto
15.
Math Biosci Eng ; 21(2): 2943-2969, 2024 Jan 29.
Artículo en Inglés | MEDLINE | ID: mdl-38454714

RESUMEN

Cardiovascular disease (CVD) is a leading cause of mortality worldwide, and it is of utmost importance to accurately assess the risk of cardiovascular disease for prevention and intervention purposes. In recent years, machine learning has shown significant advancements in the field of cardiovascular disease risk prediction. In this context, we propose a novel framework known as CVD-OCSCatBoost, designed for the precise prediction of cardiovascular disease risk and the assessment of various risk factors. The framework utilizes Lasso regression for feature selection and incorporates an optimized category-boosting tree (CatBoost) model. Furthermore, we propose the opposition-based learning cuckoo search (OCS) algorithm. By integrating OCS with the CatBoost model, our objective is to develop OCSCatBoost, an enhanced classifier offering improved accuracy and efficiency in predicting CVD. Extensive comparisons with popular algorithms like the particle swarm optimization (PSO) algorithm, the seagull optimization algorithm (SOA), the cuckoo search algorithm (CS), K-nearest-neighbor classification, decision tree, logistic regression, grid-search support vector machine (SVM), grid-search XGBoost, default CatBoost, and grid-search CatBoost validate the efficacy of the OCSCatBoost algorithm. The experimental results demonstrate that the OCSCatBoost model achieves superior performance compared to other models, with overall accuracy, recall, and AUC values of 73.67%, 72.17%, and 0.8024, respectively. These outcomes highlight the potential of CVD-OCSCatBoost for improving cardiovascular disease risk prediction.


Asunto(s)
Enfermedades Cardiovasculares , Humanos , Enfermedades Cardiovasculares/epidemiología , Algoritmos , Aprendizaje Automático , Factores de Riesgo , Máquina de Vectores de Soporte
16.
Sci Rep ; 14(1): 7201, 2024 Mar 26.
Artículo en Inglés | MEDLINE | ID: mdl-38532140

RESUMEN

This study aims to explore the effects of different non-landslide sampling strategies on machine learning models in landslide susceptibility mapping. Non-landslide samples are inherently uncertain, and the selection of non-landslide samples may suffer from issues such as noisy or insufficient regional representations, which can affect the accuracy of the results. In this study, a positive-unlabeled (PU) bagging semi-supervised learning method was introduced for non-landslide sample selection. In addition, buffer control sampling (BCS) and K-means (KM) clustering were applied for comparative analysis. Based on landslide data from Qiaojia County, Yunnan Province, China, collected in 2014, three machine learning models, namely, random forest, support vector machine, and CatBoost, were used for landslide susceptibility mapping. The results show that the quality of samples selected using different non-landslide sampling strategies varies significantly. Overall, the quality of non-landslide samples selected using the PU bagging method is superior, and this method performs best when combined with CatBoost for predicting (AUC = 0.897) landslides in very high and high susceptibility zones (82.14%). Additionally, the KM results indicated overfitting, displaying high accuracy for validation but poor statistical outcomes for zoning. The BCS results were the worst.

17.
Sensors (Basel) ; 24(5)2024 Feb 25.
Artículo en Inglés | MEDLINE | ID: mdl-38475028

RESUMEN

In the study of the inversion of soil multi-species heavy metal element concentrations using hyperspectral techniques, the selection of feature bands is very important. However, interactions among soil elements can lead to redundancy and instability of spectral features. In this study, heavy metal elements (Pb, Zn, Mn, and As) in entisols around a mining area in Harbin, Heilongjiang Province, China, were studied. To optimise the combination of spectral indices and their weights, radar plots of characteristic-band Pearson coefficients (RCBP) were used to screen three-band spectral index combinations of Pb, Zn, Mn, and As elements, while the Catboost algorithm was used to invert the concentrations of each element. The correlations of Fe with the four heavy metals were analysed from both concentration and characteristic band perspectives, while the effect of spectral inversion was further evaluated via spatial analysis. It was found that the regression model for the inversion of the Zn elemental concentration based on the optimised spectral index combinations had the best fit, with R2 = 0.8786 for the test set, followed by Mn (R2 = 0.8576), As (R2 = 0.7916), and Pb (R2 = 0.6022). As far as the characteristic bands are concerned, the best correlations of Fe with the Pb, Zn, Mn and As elements were 0.837, 0.711, 0.542 and 0.303, respectively. The spatial distribution and correlation of the spectral inversion concentrations of the As and Mn elements with the measured concentrations were consistent, and there were some differences in the results for Zn and Pb. Therefore, hyperspectral techniques and analysis of Fe elements have potential applications in the inversion of entisols heavy metal concentrations and can improve the quality monitoring efficiency of these soils.

18.
Sci Rep ; 14(1): 3965, 2024 Feb 17.
Artículo en Inglés | MEDLINE | ID: mdl-38368476

RESUMEN

Superconductivity is a remarkable phenomenon in condensed matter physics, which comprises a fascinating array of properties expected to revolutionize energy-related technologies and pertinent fundamental research. However, the field faces the challenge of achieving superconductivity at room temperature. In recent years, Artificial Intelligence (AI) approaches have emerged as a promising tool for predicting such properties as transition temperature (Tc) to enable the rapid screening of large databases to discover new superconducting materials. This study employs the SuperCon dataset as the largest superconducting materials dataset. Then, we perform various data pre-processing steps to derive the clean DataG dataset, containing 13,022 compounds. In another stage of the study, we apply the novel CatBoost algorithm to predict the transition temperatures of novel superconducting materials. In addition, we developed a package called Jabir, which generates 322 atomic descriptors. We also designed an innovative hybrid method called the Soraya package to select the most critical features from the feature space. These yield R2 and RMSE values (0.952 and 6.45 K, respectively) superior to those previously reported in the literature. Finally, as a novel contribution to the field, a web application was designed for predicting and determining the Tc values of superconducting materials.

19.
Bioresour Technol ; 397: 130501, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38417462

RESUMEN

A robust modeling approach for predicting heavy metal removal by sulfate-reducing bacteria (SRB) is currently missing. In this study, four machine learning models were constructed and compared to predict the removal of Cd, Cu, Pb, and Zn as individual ions by SRB. The CatBoost model exhibited the best predictive performance across the four subsets, achieving R2 values of 0.83, 0.91, 0.92, and 0.83 for the Cd, Cu, Pb, and Zn models, respectively. Feature analysis revealed that temperature, pH, sulfate concentration, and C/S (the mass ratio of chemical oxygen demand to sulfate) had significant impacts on the outcomes. These features exhibited the most effective metal removal at 35 °C and sulfate concentrations of 1000-1200 mg/L, with variations observed in pH and C/S ratios. This study introduced a new modeling approach for predicting the treatment of metal-containing wastewater by SRB, offering guidance for optimizing operational parameters in the biological sulfidogenic process.


Asunto(s)
Desulfovibrio , Metales Pesados , Cadmio , Plomo , Sulfatos
20.
Comput Methods Programs Biomed ; 246: 108005, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38354578

RESUMEN

PURPOSE: This study utilized intelligent devices to remotely monitor patients with chronic obstructive pulmonary disease (COPD), aiming to construct and evaluate machine learning (ML) models that predict the probability of acute exacerbations of COPD (AECOPD). METHODS: Patients diagnosed with COPD Group C/D at our hospital between March 2019 and June 2021 were enrolled in this study. The diagnosis of COPD Group C/D and AECOPD was based on the GOLD 2018 guidelines. We developed a series of machine learning (ML)-based models, including XGBoost, LightGBM, and CatBoost, to predict AECOPD events. These models utilized data collected from portable spirometers and electronic stethoscopes within a five-day time window. The area under the ROC curve (AUC) was used to assess the effectiveness of the models. RESULTS: A total of 66 patients were enrolled in COPD groups C/D, with 32 in group C and 34 in group D. Using observational data within a five-day time window, the ML models effectively predict AECOPD events, achieving high AUC scores. Among these models, the CatBoost model exhibited superior performance, boasting the highest AUC score (0.9721, 95 % CI: 0.9623-0.9810). Notably, the boosting tree methods significantly outperformed the time-series based methods, thanks to our feature engineering efforts. A post-hoc analysis of the CatBoost model reveals that features extracted from the electronic stethoscope (e.g., max/min vibration energy) hold more importance than those from the portable spirometer. CONCLUSIONS: The tree-based boosting models prove to be effective in predicting AECOPD events in our study. Consequently, these models have the potential to enhance remote monitoring, enable early risk assessment, and inform treatment decisions for homebound patients with chronic COPD.


Asunto(s)
Enfermedad Pulmonar Obstructiva Crónica , Humanos , Enfermedad Pulmonar Obstructiva Crónica/diagnóstico , Medición de Riesgo , Aprendizaje Automático , Progresión de la Enfermedad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA