Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Mol Divers ; 28(4): 2153-2161, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38554168

RESUMEN

Cancer, being the second leading cause of death globally. So, the development of effective anticancer treatments is crucial in the field of medicine. Anticancer peptides (ACPs) have shown promising therapeutic potential in cancer treatment compared to traditional methods. However, the process of identifying ACPs through experimental means is often time-intensive and expensive. To overcome this issue, we employed a machine learning-based approach for the first time to develop an anticancer model using small molecules. Anticancer small molecules (ACSMs) are compounds that have been developed to target and inhibit cancer cells. In this study, we used 10,000 compounds to develop the machine learning models using five algorithms such as, Random Forest (RF), Light gradient boosting machine (LightGBM), K-nearest neighbors (KNN), Decision tree (DT) and Extreme Gradient Boosting (XGB). The developed models were evaluated using the test set and top three models were identified (RF, LightGBM and XGB). Furthermore, to validate the predictive performance of our models, we have performed external validation using an FDA approved anticancer compounds/drugs. Following this analysis, we found that our LightGBM model correctly predicted 9 compounds as active. However, RF and XGB exhibited some limitations by predicting 8 and 7 compounds as active out of 10, respectively. These results demonstrate that, when compared to RF and XGB, the LightGBM model showcase robust prediction capabilities, achieving a superior accuracy of 79% with an AUC of 0.88. These findings provide promising insights into the potential of our approach for predicting anticancer small molecules, highlighting the role of machine learning in advancing cancer treatment research.


Asunto(s)
Algoritmos , Antineoplásicos , Aprendizaje Automático , Antineoplásicos/farmacología , Antineoplásicos/química , Humanos , Bibliotecas de Moléculas Pequeñas/farmacología , Bibliotecas de Moléculas Pequeñas/química , Neoplasias/tratamiento farmacológico , Descubrimiento de Drogas/métodos
2.
Graefes Arch Clin Exp Ophthalmol ; 262(1): 203-210, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37773288

RESUMEN

PURPOSE: To develop a machine learning model to evaluate the activity stage of extraocular muscles in thyroid-associated ophthalmopathy (TAO). METHODS: This study retrospectively analysed data from patients with TAO who underwent contrast-enhanced magnetic resonance imaging (MRI) from 2015 to 2022. Three independent machine learning models, namely, extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and deep neural networks (DNNs), were constructed using common clinical features. The performance of these models was compared using evaluation metrics such as the area under the receiver operating curve (AUC), accuracy, precision, recall, and F1 score. The importance of features was explained using Shapley additive explanations (SHAP). RESULTS: A total of 2561 eyes of 1479 TAO patients were included in this study. The original dataset was randomly divided into a training set (80%, n = 2048) and a test set (20%, n = 513). In the performance evaluation of the test set, the LightGBM model had the best diagnostic performance (AUC 0.9260). According to the SHAP results, features such as conjunctival congestion, swollen caruncles, oedema of the upper eyelid, course of TAO, and intraocular pressure had the most significant impact on the LightGBM model. CONCLUSION: This study used contrast-enhanced MRI as an objective evaluation criterion and constructed a LightGBM model based on readily accessible clinical data. The model had good classification performance, making it a promising artificial intelligence (AI)-assisted tool to help community hospitals evaluate the inflammatory activity of extraocular muscles in TAO patients in a timely manner.


Asunto(s)
Oftalmopatía de Graves , Humanos , Oftalmopatía de Graves/diagnóstico , Músculos Oculomotores , Inteligencia Artificial , Estudios Retrospectivos , Redes Neurales de la Computación , Párpados
3.
BMC Bioinformatics ; 24(1): 129, 2023 Apr 04.
Artículo en Inglés | MEDLINE | ID: mdl-37016308

RESUMEN

BACKGROUND: Identification of hot spots in protein-DNA binding interfaces is extremely important for understanding the underlying mechanisms of protein-DNA interactions and drug design. Since experimental methods for identifying hot spots are time-consuming and expensive, and most of the existing computational methods are based on traditional protein-DNA features to predict hot spots, unable to make full use of the effective information in the features. RESULTS: In this work, a method named WTL-PDH is proposed for hot spots prediction. To deal with the unbalanced dataset, we used the Synthetic Minority Over-sampling Technique to generate minority class samples to achieve the balance of dataset. First, we extracted the solvent accessible surface area features and structural features, and then processed the traditional features using discrete wavelet transform and wavelet packet transform to extract the wavelet energy information and wavelet entropy information, and obtained a total of 175 dimensional features. In order to obtain the best feature subset, we systematically evaluate these features in various feature selection strategies. Finally, light gradient boosting machine (LightGBM) was used to establish the model. CONCLUSIONS: Our method achieved good results on independent test set with AUC, MCC and F1 scores of 0.838, 0.533 and 0.750, respectively. WTL-PDH can achieve generally better performance in predicting hot spots when compared with state-of-the-art methods. The dataset and source code are available at https://github.com/chase2555/WTL-PDH .


Asunto(s)
Programas Informáticos , Análisis de Ondículas , Modelos Moleculares , Bases de Datos de Proteínas , Unión Proteica , Algoritmos
4.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34245238

RESUMEN

In this paper, for accurate prediction of protein-protein interaction (PPI), a novel hybrid classifier is developed by combining the functional-link Siamese neural network (FSNN) with the light gradient boosting machine (LGBM) classifier. The hybrid classifier (FSNN-LGBM) uses the fusion of features derived using pseudo amino acid composition and conjoint triad descriptors. The FSNN extracts the high-level abstraction features from the raw features and LGBM performs the PPI prediction task using these abstraction features. On performing 5-fold cross-validation experiments, the proposed hybrid classifier provides average accuracies of 98.70 and 98.38%, respectively, on the intraspecies PPI data sets of Saccharomyces cerevisiae and Helicobacter pylori. Similarly, the average accuracies for the interspecies PPI data sets of the Human-Bacillus and Human-Yersinia data sets are 98.52 and 97.40%, respectively. Compared with the existing methods, the hybrid classifier achieves higher prediction accuracy on the independent test sets and network data sets. The improved prediction performance obtained by the FSNN-LGBM makes it a flexible and effective PPI prediction model.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Redes Neurales de la Computación , Mapeo de Interacción de Proteínas/métodos , Aminoácidos , Bases de Datos Genéticas , Humanos , Aprendizaje Automático , Proteínas/química , Proteínas/metabolismo , Reproducibilidad de los Resultados
5.
Ren Fail ; 45(2): 2251597, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37724550

RESUMEN

BACKGROUND: Established prognostic models of idiopathic membranous nephropathy (IMN) were limited to traditional modeling methods and did not comprehensively consider clinical and pathological patient data. Based on the electronic medical record (EMR) system, machine learning (ML) was used to construct a risk prediction model for the prognosis of IMN. METHODS: Data from 418 patients with IMN were diagnosed by renal biopsy at the Fifth Clinical Medical College of Shanxi Medical University. Fifty-nine medical features of the patients could be obtained from EMR, and prediction models were established based on five ML algorithms. The area under the curve, recall rate, accuracy, and F1 were used to evaluate and compare the performances of the models. Shapley additive explanation (SHAP) was used to explain the results of the best-performing model. RESULTS: One hundred and seventeen patients (28.0%) with IMN experienced adverse events, 28 of them had compound outcomes (ESRD or double serum creatinine (SCr)), and 89 had relapsed. The gradient boosting machine (LightGBM) model had the best performance, with the highest AUC (0.892 ± 0.052, 95% CI 0.840-0.945), accuracy (0.909 ± 0.016), recall (0.741 ± 0.092), precision (0.906 ± 0.027), and F1 (0.905 ± 0.020). Recursive feature elimination with random forest and SHAP plots based on LightGBM showed that anti-phospholipase A2 receptor (anti-PLA2R), immunohistochemical immunoglobulin G4 (IHC IgG4), D-dimer (D-DIMER), triglyceride (TG), serum albumin (ALB), aspartate transaminase (AST), ß2-microglobulin (BMG), SCr, and fasting plasma glucose (FPG) were important risk factors for the prognosis of IMN. Increased risk of adverse events in IMN patients was correlated with high anti-PLA2R and low IHC IgG4. CONCLUSIONS: This study established a risk prediction model for the prognosis of IMN using ML based on clinical and pathological patient data. The LightGBM model may become a tool for personalized management of IMN patients.


Asunto(s)
Glomerulonefritis Membranosa , Humanos , Pronóstico , Glomerulonefritis Membranosa/diagnóstico , Algoritmos , Inmunoglobulina G , Aprendizaje Automático
6.
Proteins ; 90(2): 443-454, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-34528291

RESUMEN

Feature fusion and selection strategies have been applied to improve accuracy in the prediction of protein-protein interaction (PPI). In this paper, an embedded feature selection framework is developed by integrating a cost function based on analysis of variance (ANOVA) with the particle swarm optimization (PSO), termed AVPSO. Initially, the features of the protein sequences extracted using pseudo-amino acid composition (PseAAC), conjoint triad composition, and local descriptor are fused. Then, AVPSO is employed to select the optimal set of features. The light gradient boosting machine (LGBM) classifier is used to predict the PPIs using the optimal feature subset. On the five-fold cross-validation analysis, the proposed model (AVPSO-LGBM) achieved an average accuracy of 97.12% and 95.09%, respectively, on the intraspecies PPI datasets Saccharomyces cerevisiae and Helicobacter pylori. On the interspecies, PPI datasets of the Human-Bacillus and Human-Yersinia, an average accuracy of 95.20% and 93.44%, are achieved. Results obtained on independent test datasets, and network datasets show that the prediction accuracy of the AVPSO-LGBM is better than the existing methods, demonstrating its generalization ability. The improved prediction performance obtained by the proposed model makes it a reliable and effective PPI prediction model.


Asunto(s)
Bacterias/metabolismo , Biología Computacional/métodos , Mapeo de Interacción de Proteínas , Proteínas/metabolismo , Humanos , Aprendizaje Automático , Unión Proteica
7.
J Transl Med ; 20(1): 143, 2022 03 26.
Artículo en Inglés | MEDLINE | ID: mdl-35346252

RESUMEN

BACKGROUND: Established prediction models of Diabetic kidney disease (DKD) are limited to the analysis of clinical research data or general population data and do not consider hospital visits. Construct a 3-year diabetic kidney disease risk prediction model in patients with type 2 diabetes mellitus (T2DM) using machine learning, based on electronic medical records (EMR). METHODS: Data from 816 patients (585 males) with T2DM and 3 years of follow-up at the PLA General Hospital. 46 medical characteristics that are readily available from EMR were used to develop prediction models based on seven machine learning algorithms (light gradient boosting machine [LightGBM], eXtreme gradient boosting, adaptive boosting, artificial neural network, decision tree, support vector machine, logistic regression). Model performance was evaluated using the area under the receiver operating characteristic curve (AUC). Shapley additive explanation (SHAP) was used to interpret the results of the best performing model. RESULTS: The LightGBM model had the highest AUC (0.815, 95% CI 0.747-0.882). Recursive feature elimination with random forest and SHAP plot based on LightGBM showed that older patients with T2DM with high homocysteine (Hcy), poor glycemic control, low serum albumin (ALB), low estimated glomerular filtration rate (eGFR), and high bicarbonate had an increased risk of developing DKD over the next 3 years. CONCLUSIONS: This study constructed a 3-year DKD risk prediction model in patients with T2DM and normo-albuminuria using machine learning and EMR. The LightGBM model is a tool with potential to facilitate population management strategies for T2DM care in the EMR era.


Asunto(s)
Diabetes Mellitus Tipo 2 , Nefropatías Diabéticas , Diabetes Mellitus Tipo 2/complicaciones , Diabetes Mellitus Tipo 2/epidemiología , Nefropatías Diabéticas/epidemiología , Registros Electrónicos de Salud , Humanos , Modelos Logísticos , Aprendizaje Automático , Masculino
8.
Biol Pharm Bull ; 45(8): 1142-1157, 2022 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-35644566

RESUMEN

A system for predicting apparent bidirectional permeability (Papp) across Caco-2 cells of diverse chemicals has been reported. The present study aimed to investigate the relationship between in silico-generated Papp (from apical to basal side, Papp A to B) for 301 substances with diverse structures and a binary classification of the reported roles of efflux P-glycoprotein or breast cancer resistant protein. The in silico log(Papp A to B/Papp B to A) values of 70 substances with reported active efflux and 231 substances with no reported active efflux were significantly different (p < 0.01). The probabilities of active efflux transport estimated by trivariate analysis with log MW, log DpH 6.0, and log DpH 7.4 for the 70 active-efflux-positive compounds were higher than those of the other 231 substances (p < 0.01); the area under the corresponding receiver operating characteristic (ROC) curve was 0.81. Further probability values estimated using a machine learning algorithm with 30 chemical descriptors as inputs yielded an area under the ROC curve of 0.79. Using a secondary set of 52 efflux-positive and 48 efflux-negative medicines, the final trivariate-generated probabilities resulted in no significant differences between these binary groups (p = 0.09); however, the final machine learning model demonstrated a good area under the ROC curve of 0.79. Consequently, a combination of the previously established system for generating the permeability coefficients across intestinal monolayers (a continuous variable) and the currently proposed system for predicting the roles of additional active efflux (a binary classification) could prove useful; high accuracy was achieved by applying machine learning using in silico-generated chemical descriptors.


Asunto(s)
Aprendizaje Automático , Proteínas de Transporte de Membrana , Transporte Biológico , Células CACO-2 , Humanos , Modelos Lineales , Proteínas de Transporte de Membrana/metabolismo , Permeabilidad
9.
BMC Bioinformatics ; 22(1): 358, 2021 Jul 02.
Artículo en Inglés | MEDLINE | ID: mdl-34215183

RESUMEN

BACKGROUND: A growing proportion of research has proved that microRNAs (miRNAs) can regulate the function of target genes and have close relations with various diseases. Developing computational methods to exploit more potential miRNA-disease associations can provide clues for further functional research. RESULTS: Inspired by the work of predecessors, we discover that the noise hiding in the data can affect the prediction performance and then propose an anti-noise algorithm (ANMDA) to predict potential miRNA-disease associations. Firstly, we calculate the similarity in miRNAs and diseases to construct features and obtain positive samples according to the Human MicroRNA Disease Database version 2.0 (HMDD v2.0). Then, we apply k-means on the undetected miRNA-disease associations and sample the negative examples equally from the k-cluster. Further, we construct several data subsets through sampling with replacement to feed on the light gradient boosting machine (LightGBM) method. Finally, the voting method is applied to predict potential miRNA-disease relationships. As a result, ANMDA can achieve an area under the receiver operating characteristic curve (AUROC) of 0.9373 ± 0.0005 in five-fold cross-validation, which is superior to several published methods. In addition, we analyze the predicted miRNA-disease associations with high probability and compare them with the data in HMDD v3.0 in the case study. The results show ANMDA is a novel and practical algorithm that can be used to infer potential miRNA-disease associations. CONCLUSION: The results indicate the noise hiding in the data has an obvious impact on predicting potential miRNA-disease associations. We believe ANMDA can achieve better results from this task with more methods used in dealing with the data noise.


Asunto(s)
MicroARNs , Algoritmos , Área Bajo la Curva , Biología Computacional , Predisposición Genética a la Enfermedad , Humanos , MicroARNs/metabolismo , Curva ROC
10.
Brief Bioinform ; 20(6): 2185-2199, 2019 11 27.
Artículo en Inglés | MEDLINE | ID: mdl-30351377

RESUMEN

As a newly discovered post-translational modification (PTM), lysine malonylation (Kmal) regulates a myriad of cellular processes from prokaryotes to eukaryotes and has important implications in human diseases. Despite its functional significance, computational methods to accurately identify malonylation sites are still lacking and urgently needed. In particular, there is currently no comprehensive analysis and assessment of different features and machine learning (ML) methods that are required for constructing the necessary prediction models. Here, we review, analyze and compare 11 different feature encoding methods, with the goal of extracting key patterns and characteristics from residue sequences of Kmal sites. We identify optimized feature sets, with which four commonly used ML methods (random forest, support vector machines, K-nearest neighbor and logistic regression) and one recently proposed [Light Gradient Boosting Machine (LightGBM)] are trained on data from three species, namely, Escherichia coli, Mus musculus and Homo sapiens, and compared using randomized 10-fold cross-validation tests. We show that integration of the single method-based models through ensemble learning further improves the prediction performance and model robustness on the independent test. When compared to the existing state-of-the-art predictor, MaloPred, the optimal ensemble models were more accurate for all three species (AUC: 0.930, 0.923 and 0.944 for E. coli, M. musculus and H. sapiens, respectively). Using the ensemble models, we developed an accessible online predictor, kmal-sp, available at http://kmalsp.erc.monash.edu/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for PTM site prediction, expedite the discovery of new malonylation and other PTM types and facilitate hypothesis-driven experimental validation of novel malonylated substrates and malonylation sites.


Asunto(s)
Biología Computacional , Lisina/metabolismo , Aprendizaje Automático , Malonatos/metabolismo , Animales , Humanos
11.
Sensors (Basel) ; 21(24)2021 Dec 08.
Artículo en Inglés | MEDLINE | ID: mdl-34960287

RESUMEN

High-throughput, nondestructive, and precise measurement of seeds is critical for the evaluation of seed quality and the improvement of agricultural productions. To this end, we have developed a novel end-to-end platform named HyperSeed to provide hyperspectral information for seeds. As a test case, the hyperspectral images of rice seeds are obtained from a high-performance line-scan image spectrograph covering the spectral range from 600 to 1700 nm. The acquired images are processed via a graphical user interface (GUI)-based open-source software for background removal and seed segmentation. The output is generated in the form of a hyperspectral cube and curve for each seed. In our experiment, we presented the visual results of seed segmentation on different seed species. Moreover, we conducted a classification of seeds raised in heat stress and control environments using both traditional machine learning models and neural network models. The results show that the proposed 3D convolutional neural network (3D CNN) model has the highest accuracy, which is 97.5% in seed-based classification and 94.21% in pixel-based classification, compared to 80.0% in seed-based classification and 85.67% in seed-based classification from the support vector machine (SVM) model. Moreover, our pipeline enables systematic analysis of spectral curves and identification of wavelengths of biological interest.


Asunto(s)
Redes Neurales de la Computación , Oryza , Análisis Espectral , Máquina de Vectores de Soporte
12.
Sensors (Basel) ; 21(17)2021 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-34502747

RESUMEN

Sign language is designed to assist the deaf and hard of hearing community to convey messages and connect with society. Sign language recognition has been an important domain of research for a long time. Previously, sensor-based approaches have obtained higher accuracy than vision-based approaches. Due to the cost-effectiveness of vision-based approaches, researchers have been conducted here also despite the accuracy drop. The purpose of this research is to recognize American sign characters using hand images obtained from a web camera. In this work, the media-pipe hands algorithm was used for estimating hand joints from RGB images of hands obtained from a web camera and two types of features were generated from the estimated coordinates of the joints obtained for classification: one is the distances between the joint points and the other one is the angles between vectors and 3D axes. The classifiers utilized to classify the characters were support vector machine (SVM) and light gradient boosting machine (GBM). Three character datasets were used for recognition: the ASL Alphabet dataset, the Massey dataset, and the finger spelling A dataset. The results obtained were 99.39% for the Massey dataset, 87.60% for the ASL Alphabet dataset, and 98.45% for Finger Spelling A dataset. The proposed design for automatic American sign language recognition is cost-effective, computationally inexpensive, does not require any special sensors or devices, and has outperformed previous studies.


Asunto(s)
Mano , Lengua de Signos , Algoritmos , Dedos , Humanos , Reconocimiento en Psicología , Estados Unidos
13.
Micromachines (Basel) ; 15(2)2024 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-38398939

RESUMEN

Detecting inclusions in materials at small scales is of high importance to ensure the quality, structural integrity and performance efficiency of microelectromechanical machines and products. Ultrasound waves are commonly used as a non-destructive method to find inclusions or structural flaws in a material. Mathematical continuum models can be used to enable ultrasound techniques to provide quantitative information about the change in the mechanical properties due to the presence of inclusions. In this paper, a nonlocal size-dependent poroelasticity model integrated with machine learning is developed for the description of the mechanical behaviour of spherical inclusions under uniform radial compression. The scale effects on fluid pressure and radial displacement are captured using Eringen's theory of nonlocality. The conservation of mass law is utilised for both the solid matrix and fluid content of the poroelastic material to derive the storage equation. The governing differential equations are derived by decoupling the equilibrium equation and effective stress-strain relations in the spherical coordinate system. An accurate numerical solution is obtained using the Galerkin discretisation technique and a precise integration method. A Dormand-Prince solution is also developed for comparison purposes. A light gradient boosting machine learning model in conjunction with the nonlocal model is used to extract the pattern of changes in the mechanical response of the poroelastic inclusion. The optimised hyperparameters are calculated by a grid search cross validation. The modelling estimation power is enhanced by considering nonlocal effects and applying machine learning processes, facilitating the detection of ultrasmall inclusions within a poroelastic medium at micro/nanoscales.

14.
J Contam Hydrol ; 261: 104300, 2024 02.
Artículo en Inglés | MEDLINE | ID: mdl-38242063

RESUMEN

Long-term agricultural activities have affected the sustainable development of groundwater in the Northern Anhui Plain, East China. It is, therefore, important to identify areas at high groundwater pollution risk in the Northern Anhui Plain to ensure effective protection of regional water resources. In this study, 60 groundwater samples were collected from the shallow aquifer of the plain and analyzed for nitrate (NO3-) concentrations. In addition, 10 environmental and geological factors including the elevations, distances-to-rivers, slope angles, orientations of slopes, land cover types, topographic wetness index (TWI), geomorphology, lithology, soil types, and precipitation amounts in the study area were selected as input layers. The light gradient boosting machine (LightGBM) and random forest (RF) algorithms, combined with the geographic information system (GIS), were performed to generate the groundwater pollution occurrence probability maps. The descriptive statistics showed that the NO3- concentrations in the shallow groundwater ranged from 4.3 to 73.6 mg/L. Most sampling wells exhibited NO3- concentrations above the threshold of 18.3 mg/L. The prediction results of the LightGBM and RF algorithms indicated a high groundwater NO3- pollution risk in the southern part of the plain. However, the LightGBM algorithm had a better prediction performance than RF, with a higher Kappa value of 0.84. Moreover, the frequency ratio method revealed that the precipitation amounts contributed to the groundwater NO3- pollution risk in the study area by 38.14%, followed by the elevations, slope angles, TWI, land cover types, and slope aspects, with contributions of 21.4, 13.02, 8.37, 7.44, and 6.51%, respectively. In the future, sampling of additional wells and further anthropogenic factors shall be considered for the development of more effective groundwater nitrate pollution prevention strategies provided to decision makers.


Asunto(s)
Agua Subterránea , Contaminantes Químicos del Agua , Nitratos/análisis , Sistemas de Información Geográfica , Monitoreo del Ambiente/métodos , Contaminantes Químicos del Agua/análisis , China , Medición de Riesgo , Aprendizaje Automático
15.
Ying Yong Sheng Tai Xue Bao ; 35(5): 1321-1330, 2024 May.
Artículo en Zh | MEDLINE | ID: mdl-38886431

RESUMEN

Rapid acquisition of the data of soil moisture content (SMC) and soil organic matter (SOM) content is crucial for the improvement and utilization of saline alkali farmland soil. Based on field measurements of hyperspectral reflectance and soil properties of farmland soil in the Hetao Plain, we used a competitive adaptive reweighted sampling algorithm (CARS) to screen sensitive bands after transforming the original spectral reflectance (Ref) into a standard normal variable (SNV). Strategies Ⅰ, Ⅱ, and Ⅲ were used to model the input variables of Ref, Ref SNV, Ref-SNV+ soil covariate (SC), and digital elevation model (DEM). We constructed SMC and SOM estimation models based on random forest (RF) and light gradient boosting machine (LightGBM), and then verified and compared the accuracy of the models. The results showed that after CARS screening, the sensitive bands of SMC and SOM were compressed to below 3.3% of the entire band, which effectively optimized band selection and reduced redundant spectral information. Compared with the LightGBM model, the RF model had higher accuracy in SMC and SOM estimation, and the input variable strategy Ⅲ was better than Ⅱ and Ⅰ. The introduction of auxiliary variables effectively improved the estimation ability of the model. Based on comprehensive analysis, the coefficient of determination (Rp2), root mean square error (RMSE), and relative analysis error (RPD) of the SMC estimation model validation based on strategy Ⅲ-RF were 0.63, 3.16, and 2.01, respectively. The SOM estimation models based on strategy Ⅲ-RF had Rp2, RMSE, and RPD of 0.93, 1.15, and 3.52, respectively. The strategy Ⅲ-RF model was an effective method for estimating SMC and SOM. Our results could provide a new method for the rapid estimation of soil moisture and organic matter content in saline alkali farmland.


Asunto(s)
Algoritmos , Compuestos Orgánicos , Suelo , Agua , Suelo/química , Compuestos Orgánicos/análisis , Agua/análisis , Productos Agrícolas/crecimiento & desarrollo , Productos Agrícolas/química , Álcalis/análisis , Álcalis/química , China , Ecosistema
16.
Sci Rep ; 14(1): 23028, 2024 Oct 03.
Artículo en Inglés | MEDLINE | ID: mdl-39362913

RESUMEN

The accurate prediction of uneven rock mass classes is crucial for intelligent operation in tunnel-boring machine (TBM) tunneling. However, the classification of rock masses presents significant challenges due to the variability and complexity of geological conditions. To address these challenges, this study introduces an innovative predictive model combining the improved EWOA (IEWOA) and the light gradient boosting machine (LightGBM). The proposed IEWOA algorithm incorporates a novel parameter l for more effective position updates during the exploration stage and utilizes sine functions during the exploitation stage to optimize the search process. Additionally, the model integrates a minority class technique enhanced with a random walk strategy (MCT-RW) to extend the boundaries of minority classes, such as Classes II, IV, and V. This approach significantly improves the recall and F1-score for these rock mass classes. The proposed methodology was rigorously evaluated against other predictive algorithms, demonstrating superior performance with an accuracy of 94.74%. This innovative model not only enhances the accuracy of rock mass classification but also contributes significantly to the intelligent and efficient construction of TBM tunnels, providing a robust solution to one of the key challenges in underground engineering.

17.
Phys Med Biol ; 69(11)2024 May 30.
Artículo en Inglés | MEDLINE | ID: mdl-38749471

RESUMEN

Accurate diagnosis and treatment assessment of liver fibrosis face significant challenges, including inherent limitations in current techniques like sampling errors and inter-observer variability. Addressing this, our study introduces a novel machine learning (ML) framework, which integrates light gradient boosting machine and multivariate imputation by chained equations to enhance liver status assessment using biomechanical markers. Building upon our previously established multiscale mechanical characteristics in fibrotic and treated livers, this framework employs Gaussian Bayesian optimization for post-imputation, significantly improving classification performance. Our findings indicate a marked increase in the precision of liver fibrosis diagnosis and provide a novel, quantitative approach for assessing fibrosis treatment. This innovative combination of multiscale biomechanical markers with advanced ML algorithms represents a transformative step in liver disease diagnostics and treatment evaluation, with potential implications for other areas in medical diagnostics.


Asunto(s)
Cirrosis Hepática , Aprendizaje Automático , Fenómenos Biomecánicos , Humanos , Fenómenos Mecánicos , Teorema de Bayes , Animales , Biomarcadores/metabolismo
18.
Sci Rep ; 14(1): 12539, 2024 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-38822049

RESUMEN

Mine water inrush is a serious threat to mine safety production. It is very important to identify water inrush source types quickly to prevent and control water damage. In this study, the aqueous chemical components Na+ + K+, Ca2+, Mg2+, Cl-, SO42- and HCO3- of different aquifers in Pingdingshan coalfield were selected as the characteristic values, and the Surface water, Quaternary pore water, Carboniferous limestone karst water, Permian sandstone water, and Cambrian limestone karst water were used as the labels. An intelligent water source discrimination model is proposed by combining data mining, classification models, and reinforcement learning. As outlier data in the samples may interfere with the model recognition ability, the data distribution range was analyzed using box plots, and 20 groups of abnormal samples were excluded. The processed water chemistry data were divided into 80% learning samples and 20% test samples, and the learning samples were fed into a light gradient boosting machine (LightGBM) for training. The tree-structured parson estimator (TPE) obtains the optimal values of the main parameters of LightGBM in a very short time. Substituting the hyperparameters back into the model yields a 13.9% improvement in the accuracy of the model, proving the effectiveness of the TPE algorithm. To further validate the performance of the model, TPE-LightGBM is compared and analyzed with a Random Search-Multi Layer Perceptron Machine (RS-MLP) and Genetic Algorithm-Extreme Gradient Boosting Tree (GA-SVM). The accuracy of TPE-LightGBM, RS-MLP, and GA-SVM is 0.931, 0.759, 0.724 in that order, and the generalization error RMSE is 0.415, 1.05, and 1.313 in that order. The results show that TPE-LightGBM is more advantageous in water source identification and is more resistant to overfitting. By calculating and comparing the information gain of each variable, the contribution of Ca2+ is the highest, so it is necessary to pay attention to the change in Ca2+ concentration. TPE-LightGBM's high accuracy and generalization ability have a good prospect for the identification of sudden water source types.

19.
Heliyon ; 10(4): e25406, 2024 Feb 29.
Artículo en Inglés | MEDLINE | ID: mdl-38370176

RESUMEN

Objective: This study aims to develop a predictive model using artificial intelligence to estimate the ICU length of stay (LOS) for Congenital Heart Defects (CHD) patients after surgery, improving care planning and resource management. Design: We analyze clinical data from 2240 CHD surgery patients to create and validate the predictive model. Twenty AI models are developed and evaluated for accuracy and reliability. Setting: The study is conducted in a Brazilian hospital's Cardiovascular Surgery Department, focusing on transplants and cardiopulmonary surgeries. Participants: Retrospective analysis is conducted on data from 2240 consecutive CHD patients undergoing surgery. Interventions: Ninety-three pre and intraoperative variables are used as ICU LOS predictors. Measurements and main results: Utilizing regression and clustering methodologies for ICU LOS (ICU Length of Stay) estimation, the Light Gradient Boosting Machine, using regression, achieved a Mean Squared Error (MSE) of 15.4, 11.8, and 15.2 days for training, testing, and unseen data. Key predictors included metrics such as "Mechanical Ventilation Duration", "Weight on Surgery Date", and "Vasoactive-Inotropic Score". Meanwhile, the clustering model, Cat Boost Classifier, attained an accuracy of 0.6917 and AUC of 0.8559 with similar key predictors. Conclusions: Patients with higher ventilation times, vasoactive-inotropic scores, anoxia time, cardiopulmonary bypass time, and lower weight, height, BMI, age, hematocrit, and presurgical oxygen saturation have longer ICU stays, aligning with existing literature.

20.
Mar Pollut Bull ; 208: 116946, 2024 Sep 17.
Artículo en Inglés | MEDLINE | ID: mdl-39293369

RESUMEN

Maritime operations face significant challenges in environmental stewardship, particularly in managing oil discharges from tankers as mandated by the International Convention for the Prevention of Pollution from Ships (MARPOL) Annex I, Regulation 34. Traditional Oil Discharge Monitoring Equipment (ODME) methods rely on manual decision-making, often failing to accurately identify MARPOL-defined no-go zones, estimate operation completion times, and recommend course alterations during decanting operations. This study introduces a novel approach by integrating advanced machine learning techniques-Extreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM)-to enhance ODME operations. Specifically, these models automate the identification of no-go zones and optimize operational decisions, leading to a 99 % accuracy rate in compliance with MARPOL regulations and an operational time estimation error margin of <1 %. Unlike traditional methods, our approach leverages large datasets and real-time GPS (Global Positioning System) data, significantly reducing human error and enhancing both environmental compliance and operational efficiency. To our knowledge, this is the first study to specifically address the application of machine learning to decanting operations under MARPOL Annex I, marking a significant advancement in maritime environmental management.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA