Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 309
Filtrar
Más filtros

Tipo del documento
Intervalo de año de publicación
1.
Methods ; 223: 56-64, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38237792

RESUMEN

DNA-binding proteins are a class of proteins that can interact with DNA molecules through physical and chemical interactions. Their main functions include regulating gene expression, maintaining chromosome structure and stability, and more. DNA-binding proteins play a crucial role in cellular and molecular biology, as they are essential for maintaining normal cellular physiological functions and adapting to environmental changes. The prediction of DNA-binding proteins has been a hot topic in the field of bioinformatics. The key to accurately classifying DNA-binding proteins is to find suitable feature sources and explore the information they contain. Although there are already many models for predicting DNA-binding proteins, there is still room for improvement in mining feature source information and calculation methods. In this study, we created a model called DBPboost to better identify DNA-binding proteins. The innovation of this study lies in the use of eight feature extraction methods, the improvement of the feature selection step, which involves selecting some features first and then performing feature selection again after feature fusion, and the optimization of the differential evolution algorithm in feature fusion, which improves the performance of feature fusion. The experimental results show that the prediction accuracy of the model on the UniSwiss dataset is 89.32%, and the sensitivity is 89.01%, which is better than most existing models.


Asunto(s)
Proteínas de Unión al ADN , Máquina de Vectores de Soporte , Proteínas de Unión al ADN/química , Algoritmos , ADN/química , Biología Computacional/métodos
2.
BMC Bioinformatics ; 25(1): 282, 2024 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-39198740

RESUMEN

BACKGROUND: Thermostability is a fundamental property of proteins to maintain their biological functions. Predicting protein stability changes upon mutation is important for our understanding protein structure-function relationship, and is also of great interest in protein engineering and pharmaceutical design. RESULTS: Here we present mutDDG-SSM, a deep learning-based framework that uses the geometric representations encoded in protein structure to predict the mutation-induced protein stability changes. mutDDG-SSM consists of two parts: a graph attention network-based protein structural feature extractor that is trained with a self-supervised learning scheme using large-scale high-resolution protein structures, and an eXtreme Gradient Boosting model-based stability change predictor with an advantage of alleviating overfitting problem. The performance of mutDDG-SSM was tested on several widely-used independent datasets. Then, myoglobin and p53 were used as case studies to illustrate the effectiveness of the model in predicting protein stability changes upon mutations. Our results show that mutDDG-SSM achieved high performance in estimating the effects of mutations on protein stability. In addition, mutDDG-SSM exhibited good unbiasedness, where the prediction accuracy on the inverse mutations is as well as that on the direct mutations. CONCLUSION: Meaningful features can be extracted from our pre-trained model to build downstream tasks and our model may serve as a valuable tool for protein engineering and drug design.


Asunto(s)
Mutación , Estabilidad Proteica , Proteínas , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Mioglobina/química , Mioglobina/genética , Proteína p53 Supresora de Tumor/genética , Proteína p53 Supresora de Tumor/química , Proteína p53 Supresora de Tumor/metabolismo , Biología Computacional/métodos , Aprendizaje Profundo , Aprendizaje Automático Supervisado , Bases de Datos de Proteínas , Conformación Proteica
3.
Chemphyschem ; : e202400629, 2024 Jul 09.
Artículo en Inglés | MEDLINE | ID: mdl-38982718

RESUMEN

Electrode materials are essential in the electrochemical process of storing charge in supercapacitors and have a significant impact on the cost and capacitive performance of the final product. Hence, it is imperative to make precise predictions regarding the capacitance of electrode materials in order to further the development of supercapacitors. MgCo2O4, with a theoretical capacitance of up to 3122 F g-1, holds immense research value as an electrode material. The objective of this study is to predict the capacitance of MgCo2O4 with high accuracy. This will be achieved by extracting numerous data from published papers and using some parameters as input features. The Recursive Feature Elimination (RFE) method was employed, using Random Forest (RF), Extreme Gradient Boosting (XGBoost) and Regression Tree (RT) as selectors to identify the optimal feature subset. Then, combining them with these three regression models to construct nine machine learning (ML) models. After performance evaluation and outlier analysis, the XGB-RFE-XGB model achieved R-squared (R²), root mean squared error (RMSE), and mean absolute error (MAE) of 0.95, 111.83 F g-1 and 68.25 F g-1, respectively, demonstrating its stability and reliability. Therefore, the XGB-RFE-XGB model can be used as a reliable predictive tool in subsequent experimental designs.

4.
Br J Clin Pharmacol ; 90(3): 691-699, 2024 03.
Artículo en Inglés | MEDLINE | ID: mdl-37845041

RESUMEN

AIMS: Heart failure with reduced ejection fraction (HFrEF) poses significant challenges for clinicians and researchers, owing to its multifaceted aetiology and complex treatment regimens. In light of this, artificial intelligence methods offer an innovative approach to identifying relationships within complex clinical datasets. Our study aims to explore the potential for machine learning algorithms to provide deeper insights into datasets of HFrEF patients. METHODS: To this end, we analysed a cohort of 386 HFrEF patients who had been initiated on sodium-glucose co-transporter-2 inhibitor treatment and had completed a minimum of a 6-month follow-up. RESULTS: In traditional frequentist statistical analyses, patients receiving the highest doses of beta-blockers (BBs) (chi-square test, P = .036) and those newly initiated on sacubitril-valsartan (chi-square test, P = .023) showed better outcomes. However, none of these pharmacological features stood out as independent predictors of improved outcomes in the Cox proportional hazards model. In contrast, when employing eXtreme Gradient Boosting (XGBoost) algorithms in conjunction with the data using Shapley additive explanations (SHAP), we identified several models with significant predictive power. The XGBoost algorithm inherently accommodates non-linear distribution, multicollinearity and confounding. Within this framework, pharmacological categories like 'newly initiated treatment with sacubitril/valsartan' and 'BB dose escalation' emerged as strong predictors of long-term outcomes. CONCLUSIONS: In this manuscript, we not only emphasize the strengths of this machine learning approach but also discuss its potential limitations and the risk of identifying statistically significant yet clinically irrelevant predictors.


Asunto(s)
Insuficiencia Cardíaca , Humanos , Insuficiencia Cardíaca/tratamiento farmacológico , Insuficiencia Cardíaca/inducido químicamente , Tetrazoles/efectos adversos , Inteligencia Artificial , Volumen Sistólico , Aprendizaje Automático
5.
BMC Neurol ; 24(1): 332, 2024 Sep 10.
Artículo en Inglés | MEDLINE | ID: mdl-39256684

RESUMEN

BACKGROUND: Accurately predicting the walking independence of stroke patients is important. Our objective was to determine and compare the performance of logistic regression (LR) and three machine learning models (eXtreme Gradient Boosting (XGBoost), Support Vector Machines (SVM), and Random Forest (RF)) in predicting walking independence at discharge in stroke patients, as well as to explore the variables that predict prognosis. METHODS: 778 (80% for the training set and 20% for the test set) stroke patients admitted to China Rehabilitation Research Center between February 2020 and January 2023 were retrospectively included. The training set was used for training models. The test set was used to validate and compare the performance of the four models in terms of area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score. RESULTS: Among the three ML models, the AUC of the XGBoost model is significantly higher than that of the SVM and RF models (P < 0.001, P = 0.024, respectively). There was no significant difference in the AUCs between the XGBoost model and the LR model (0.891 vs. 0.880, P = 0.560). The XGBoost model demonstrated superior accuracy (87.82% vs. 86.54%), sensitivity (50.00% vs. 39.39%), PPV (73.68% vs. 73.33%), NPV (89.78% vs. 87.94%), and F1 score (59.57% vs. 51.16%), with only slightly lower specificity (96.09% vs. 96.88%). Together, the XGBoost model and the stepwise LR model identified age, FMA-LE at admission, FAC at admission, and lower limb spasticity as key factors influencing independent walking. CONCLUSION: Overall, the XGBoost model performed best in predicting independent walking after stroke. The XGBoost and LR models together confirm that age, admission FMA-LE, admission FAC, and lower extremity spasticity are the key factors influencing independent walking in stroke patients at hospital discharge. TRIAL REGISTRATION: Not applicable.


Asunto(s)
Aprendizaje Automático , Rehabilitación de Accidente Cerebrovascular , Accidente Cerebrovascular , Caminata , Humanos , Femenino , Masculino , Persona de Mediana Edad , Estudios Retrospectivos , Accidente Cerebrovascular/fisiopatología , Accidente Cerebrovascular/diagnóstico , Anciano , Caminata/fisiología , Rehabilitación de Accidente Cerebrovascular/métodos , Máquina de Vectores de Soporte , Pronóstico , Valor Predictivo de las Pruebas , Adulto
6.
Environ Sci Technol ; 58(2): 1255-1264, 2024 Jan 16.
Artículo en Inglés | MEDLINE | ID: mdl-38164924

RESUMEN

Lithium (Li) concentrations in drinking-water supplies are not regulated in the United States; however, Li is included in the 2022 U.S. Environmental Protection Agency list of unregulated contaminants for monitoring by public water systems. Li is used pharmaceutically to treat bipolar disorder, and studies have linked its occurrence in drinking water to human-health outcomes. An extreme gradient boosting model was developed to estimate geogenic Li in drinking-water supply wells throughout the conterminous United States. The model was trained using Li measurements from ∼13,500 wells and predictor variables related to its natural occurrence in groundwater. The model predicts the probability of Li in four concentration classifications, ≤4 µg/L, >4 to ≤10 µg/L, >10 to ≤30 µg/L, and >30 µg/L. Model predictions were evaluated using wells held out from model training and with new data and have an accuracy of 47-65%. Important predictor variables include average annual precipitation, well depth, and soil geochemistry. Model predictions were mapped at a spatial resolution of 1 km2 and represent well depths associated with public- and private-supply wells. This model was developed by hydrologists and public-health researchers to estimate Li exposure from drinking water and compare to national-scale human-health data for a better understanding of dose-response to low (<30 µg/L) concentrations of Li.


Asunto(s)
Agua Potable , Agua Subterránea , Contaminantes Químicos del Agua , Estados Unidos , Humanos , Litio , Abastecimiento de Agua , Pozos de Agua , Contaminantes Químicos del Agua/análisis , Monitoreo del Ambiente
7.
Environ Sci Technol ; 58(36): 15938-15948, 2024 Sep 10.
Artículo en Inglés | MEDLINE | ID: mdl-39192575

RESUMEN

Accurately mapping ground-level ozone concentrations at high spatiotemporal resolution (daily, 1 km) is essential for evaluating human exposure and conducting public health assessments. This requires identifying and understanding a proxy that is well-correlated with ground-level ozone variation and available with spatiotemporal high-resolution data. This study introduces a high-resolution ozone modeling method utilizing the XGBoost algorithm with satellite-derived land surface temperature (LST) as the primary predictor. Focusing on China in 2019, our model achieved a cross-validation R2 of 0.91 and a root-mean-square error (RMSE) of 13.51 µg/m3. We provide detailed maps highlighting ground-level ozone concentrations in urban areas, uncovering spatial variations previously unresolved, along with time series aligning with established understandings of ozone dynamics. Our local interpretation of the machine learning model underscores the significant contribution of LST to spatiotemporal ozone variations, surpassing other meteorological, pollutant, and geographical predictors in its influence. Validation results indicate that model performance decreases as spatial resolution becomes coarser, with R2 decreasing from 0.91 for the 1 km model to 0.85 for the 25 km model. The methodology and data sets generated by this study offer new insights into ground-level ozone variability and mapping and can significantly aid in exposure assessment and epidemiological research related to this critical environmental challenge.


Asunto(s)
Aprendizaje Automático , Ozono , Temperatura , Ozono/análisis , Monitoreo del Ambiente/métodos , China , Contaminantes Atmosféricos , Humanos
8.
Environ Res ; 245: 117784, 2024 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-38065392

RESUMEN

Nanotechnology has emerged as a promising frontier in revolutionizing the early diagnosis and surgical management of gastric cancers. The primary factors influencing curative efficacy in GIC patients are drug inefficacy and high surgical and pharmacological therapy recurrence rates. Due to its unique optical features, good biocompatibility, surface effects, and small size effects, nanotechnology is a developing and advanced area of study for detecting and treating cancer. Considering the limitations of GIC MRI and endoscopy and the complexity of gastric surgery, the early diagnosis and prompt treatment of gastric illnesses by nanotechnology has been a promising development. Nanoparticles directly target tumor cells, allowing their detection and removal. It also can be engineered to carry specific payloads, such as drugs or contrast agents, and enhance the efficacy and precision of cancer treatment. In this research, the boosting technique of machine learning was utilized to capture nonlinear interactions between a large number of input variables and outputs by using XGBoost and RNN-CNN as a classification method. The research sample included 350 patients, comprising 200 males and 150 females. The patients' mean ± SD was 50.34 ± 13.04 with a mean age of 50.34 ± 13.04. High-risk behaviors (P = 0.070), age at diagnosis (P = 0.034), distant metastasis (P = 0.004), and tumor stage (P = 0.014) were shown to have a statistically significant link with GC patient survival. AUC was 93.54%, Accuracy 93.54%, F1-score 93.57%, Precision 93.65%, and Recall 93.87% when analyzing stomach pictures. Integrating nanotechnology with advanced machine learning techniques holds promise for improving the diagnosis and treatment of gastric cancer, providing new avenues for precision medicine and better patient outcomes.


Asunto(s)
Neoplasias Gástricas , Masculino , Femenino , Humanos , Adulto , Persona de Mediana Edad , Neoplasias Gástricas/diagnóstico , Neoplasias Gástricas/cirugía , Neoplasias Gástricas/patología , Detección Precoz del Cáncer , Aprendizaje Automático , Imagen por Resonancia Magnética
9.
Nutr Metab Cardiovasc Dis ; 34(6): 1456-1466, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38508988

RESUMEN

BACKGROUND AND AIMS: Non-alcoholic fatty liver disease (NAFLD) is a common chronic liver disease, which lacks effective drug treatments. This study aimed to construct an eXtreme Gradient Boosting (XGBoost) prediction model to identify or evaluate potential NAFLD patients. METHODS AND RESULTS: We conducted a longitudinal study of 22,140 individuals from the Beijing Health Management Cohort. Variable filtering was performed using the least absolute shrinkage and selection operator. Random Over Sampling Examples was used to address imbalanced data. Next, the XGBoost model and the other three machine learning (ML) models were built using balanced data. Finally, the variable importance of the XGBoost model was ranked. Among four ML algorithms, we got that the XGBoost model outperformed the other models with the following results: accuracy of 0.835, sensitivity of 0.835, specificity of 0.834, Youden index of 0.669, precision of 0.831, recall of 0.835, F-1 score of 0.833, and an area under the curve of 0.914. The top five variables with the greatest impact on the onset of NAFLD were aspartate aminotransferase, cardiometabolic index, body mass index, alanine aminotransferase, and triglyceride-glucose index. CONCLUSION: The predictive model based on the XGBoost algorithm enables early prediction of the onset of NAFLD. Additionally, assessing variable importance provides valuable insights into the prevention and treatment of NAFLD.


Asunto(s)
Biomarcadores , Aprendizaje Automático , Enfermedad del Hígado Graso no Alcohólico , Valor Predictivo de las Pruebas , Humanos , Enfermedad del Hígado Graso no Alcohólico/diagnóstico , Enfermedad del Hígado Graso no Alcohólico/epidemiología , Enfermedad del Hígado Graso no Alcohólico/sangre , Estudios Longitudinales , Masculino , Femenino , Persona de Mediana Edad , Adulto , Medición de Riesgo , Biomarcadores/sangre , Beijing/epidemiología , Pronóstico , Reproducibilidad de los Resultados , Técnicas de Apoyo para la Decisión , Factores de Riesgo , Diagnóstico por Computador
10.
Graefes Arch Clin Exp Ophthalmol ; 262(1): 203-210, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37773288

RESUMEN

PURPOSE: To develop a machine learning model to evaluate the activity stage of extraocular muscles in thyroid-associated ophthalmopathy (TAO). METHODS: This study retrospectively analysed data from patients with TAO who underwent contrast-enhanced magnetic resonance imaging (MRI) from 2015 to 2022. Three independent machine learning models, namely, extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and deep neural networks (DNNs), were constructed using common clinical features. The performance of these models was compared using evaluation metrics such as the area under the receiver operating curve (AUC), accuracy, precision, recall, and F1 score. The importance of features was explained using Shapley additive explanations (SHAP). RESULTS: A total of 2561 eyes of 1479 TAO patients were included in this study. The original dataset was randomly divided into a training set (80%, n = 2048) and a test set (20%, n = 513). In the performance evaluation of the test set, the LightGBM model had the best diagnostic performance (AUC 0.9260). According to the SHAP results, features such as conjunctival congestion, swollen caruncles, oedema of the upper eyelid, course of TAO, and intraocular pressure had the most significant impact on the LightGBM model. CONCLUSION: This study used contrast-enhanced MRI as an objective evaluation criterion and constructed a LightGBM model based on readily accessible clinical data. The model had good classification performance, making it a promising artificial intelligence (AI)-assisted tool to help community hospitals evaluate the inflammatory activity of extraocular muscles in TAO patients in a timely manner.


Asunto(s)
Oftalmopatía de Graves , Humanos , Oftalmopatía de Graves/diagnóstico , Músculos Oculomotores , Inteligencia Artificial , Estudios Retrospectivos , Redes Neurales de la Computación , Párpados
11.
BMC Pulm Med ; 24(1): 308, 2024 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-38956528

RESUMEN

AIM: To develop a decision-support tool for predicting extubation failure (EF) in neonates with bronchopulmonary dysplasia (BPD) using a set of machine-learning algorithms. METHODS: A dataset of 284 BPD neonates on mechanical ventilation was used to develop predictive models via machine-learning algorithms, including extreme gradient boosting (XGBoost), random forest, support vector machine, naïve Bayes, logistic regression, and k-nearest neighbor. The top three models were assessed by the area under the receiver operating characteristic curve (AUC), and their performance was tested by decision curve analysis (DCA). Confusion matrix was used to show the high performance of the best model. The importance matrix plot and SHapley Additive exPlanations values were calculated to evaluate the feature importance and visualize the results. The nomogram and clinical impact curves were used to validate the final model. RESULTS: According to the AUC values and DCA results, the XGboost model performed best (AUC = 0.873, sensitivity = 0.896, specificity = 0.838). The nomogram and clinical impact curve verified that the XGBoost model possessed a significant predictive value. The following were predictive factors for EF: pO2, hemoglobin, mechanical ventilation (MV) rate, pH, Apgar score at 5 min, FiO2, C-reactive protein, Apgar score at 1 min, red blood cell count, PIP, gestational age, highest FiO2 at the first 24 h, heart rate, birth weight, pCO2. Further, pO2, hemoglobin, and MV rate were the three most important factors for predicting EF. CONCLUSIONS: The present study indicated that the XGBoost model was significant in predicting EF in BPD neonates with mechanical ventilation, which is helpful in determining the right extubation time among neonates with BPD to reduce the occurrence of complications.


Asunto(s)
Extubación Traqueal , Displasia Broncopulmonar , Aprendizaje Automático , Nomogramas , Respiración Artificial , Humanos , Displasia Broncopulmonar/terapia , Recién Nacido , Femenino , Masculino , Respiración Artificial/métodos , Curva ROC , Estudios Retrospectivos , Técnicas de Apoyo para la Decisión , Insuficiencia del Tratamiento , Modelos Logísticos
12.
Ecotoxicol Environ Saf ; 284: 116867, 2024 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-39154501

RESUMEN

The loss of nitrogen in soil damages the environment. Clarifying the mechanism of ammonium nitrogen (NH4+-N) transport in soil and increasing the fixation of NH4+-N after N application are effective methods for improving N use efficiency. However, the main factors are not easily identified because of the complicated transport and retardation factors in different soils. This study employed machine learning (ML) to identify the main influencing factors that contribute to the retardation factor (Rf) of NH4+-N in soil. First, NH4+-N transport in the soil was investigated using column experiments and a transport model. The Rf (1.29 - 17.42) was calculated and used as a proxy for the efficacy of NH4+-N transport. Second, the physicochemical parameters of the soil were determined and screened using lasso and ridge regressions as inputs for the ML model. Third, six machine learning models were evaluated: Adaptive Boosting, Extreme Gradient Boosting (XGB), Random Forest, Gradient Boosting Regression, Multilayer Perceptron, and Support Vector Regression. The optimal ML model of the XGB model with a low mean absolute error (0.81), mean squared error (0.50), and high test r2 (0.97) was obtained by random sampling and five-fold cross-validation. Finally, SHapely Additive exPlanations, entropy-based feature importance, and permutation characteristic importance were used for global interpretation. The cation exchange capacity (CEC), total organic carbon (TOC), and Kaolin had the greatest effects on NH4+-N transport in the soil. The accumulated local effect offered a fundamental insight: When CEC > 6 cmol+ kg-1, and TOC > 40 g kg-1, the maximum resistance to NH4+-N transport within the soil was observed. This study provides a novel approach for predicting the impact of the soil environment on NH4+-N transport and guiding the establishment of an early-warning system of nutrient loss.


Asunto(s)
Compuestos de Amonio , Aprendizaje Automático , Nitrógeno , Suelo , Suelo/química , Compuestos de Amonio/análisis , Nitrógeno/análisis , Contaminantes del Suelo/análisis , Monitoreo del Ambiente/métodos
13.
J Korean Med Sci ; 39(22): e176, 2024 Jun 10.
Artículo en Inglés | MEDLINE | ID: mdl-38859739

RESUMEN

BACKGROUND: Malaria elimination strategies in the Republic of Korea (ROK) have decreased malaria incidence but face challenges due to delayed case detection and response. To improve this, machine learning models for predicting malaria, focusing on high-risk areas, have been developed. METHODS: The study targeted the northern region of ROK, near the demilitarized zone, using a 1-km grid to identify areas for prediction. Grid cells without residential buildings were excluded, leaving 8,425 cells. The prediction was based on whether at least one malaria case was reported in each grid cell per month, using spatial data of patient locations. Four algorithms were used: gradient boosted (GBM), generalized linear (GLM), extreme gradient boosted (XGB), and ensemble models, incorporating environmental, sociodemographic, and meteorological data as predictors. The models were trained with data from May to October (2019-2021) and tested with data from May to October 2022. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC). RESULTS: The AUROC of the prediction models performed excellently (GBM = 0.9243, GLM = 0.9060, XGB = 0.9180, and ensemble model = 0.9301). Previous malaria risk, population size, and meteorological factors influenced the model most in GBM and XGB. CONCLUSION: Machine-learning models with properly preprocessed malaria case data can provide reliable predictions. Additional predictors, such as mosquito density, should be included in future studies to improve the performance of models.


Asunto(s)
Aprendizaje Automático , Malaria Vivax , Plasmodium vivax , Curva ROC , República de Corea/epidemiología , Humanos , Malaria Vivax/epidemiología , Plasmodium vivax/aislamiento & purificación , Algoritmos , Área Bajo la Curva , Incidencia , Factores de Riesgo
14.
BMC Med Inform Decis Mak ; 24(1): 2, 2024 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-38167056

RESUMEN

BACKGROUND: Acute Myeloid Leukemia (AML) generally has a relatively low survival rate after treatment. There is an urgent need to find new biomarkers that may improve the survival prognosis of patients. Machine-learning tools are more and more widely used in the screening of biomarkers. METHODS: Least Absolute Shrinkage and Selection Operator (LASSO), Support Vector Machine-Recursive Feature Elimination (SVM-RFE), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), lrFuncs, IdaProfile, caretFuncs, and nbFuncs models were used to screen key genes closely associated with AML. Then, based on the Cancer Genome Atlas (TCGA), pan-cancer analysis was performed to determine the correlation between important genes and AML or other cancers. Finally, the diagnostic value of important genes for AML was verified in different data sets. RESULTS: The survival analysis results of the training set showed 26 genes with survival differences. After the intersection of the results of each machine learning method, DNM1, MEIS1, and SUSD3 were selected as key genes for subsequent analysis. The results of the pan-cancer analysis showed that MEIS1 and DNM1 were significantly highly expressed in AML; MEIS1 and SUSD3 are potential risk factors for the prognosis of AML, and DNM1 is a potential protective factor. Three key genes were significantly associated with AML immune subtypes and multiple immune checkpoints in AML. The results of the verification analysis show that DNM1, MEIS1, and SUSD3 have potential diagnostic value for AML. CONCLUSION: Multiple machine learning methods identified DNM1, MEIS1, and SUSD3 can be regarded as prognostic biomarkers for AML.


Asunto(s)
Leucemia Mieloide Aguda , Humanos , Pronóstico , Leucemia Mieloide Aguda/diagnóstico , Leucemia Mieloide Aguda/genética , Aprendizaje Automático , Factores de Riesgo , Máquina de Vectores de Soporte
15.
Chem Pharm Bull (Tokyo) ; 72(6): 529-539, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38839372

RESUMEN

Lipid nanoparticles (LNPs), used for mRNA vaccines against severe acute respiratory syndrome coronavirus 2, protect mRNA and deliver it into cells, making them an essential delivery technology for RNA medicine. The LNPs manufacturing process consists of two steps, the upstream process of preparing LNPs and the downstream process of removing ethyl alcohol (EtOH) and exchanging buffers. Generally, a microfluidic device is used in the upstream process, and a dialysis membrane is used in the downstream process. However, there are many parameters in the upstream and downstream processes, and it is difficult to determine the effects of variations in the manufacturing parameters on the quality of the LNPs and establish a manufacturing process to obtain high-quality LNPs. This study focused on manufacturing mRNA-LNPs using a microfluidic device. Extreme gradient boosting (XGBoost), which is a machine learning technique, identified EtOH concentration (flow rate ratio), buffer pH, and total flow rate as the process parameters that significantly affected the particle size and encapsulation efficiency. Based on these results, we derived the manufacturing conditions for different particle sizes (approximately 80 and 200 nm) of LNPs using Bayesian optimization. In addition, the particle size of the LNPs significantly affected the protein expression level of mRNA in cells. The findings of this study are expected to provide useful information that will enable the rapid and efficient development of mRNA-LNPs manufacturing processes using microfluidic devices.


Asunto(s)
Lípidos , Aprendizaje Automático , Nanopartículas , Tamaño de la Partícula , ARN Mensajero , Nanopartículas/química , Lípidos/química , Humanos , SARS-CoV-2/genética , Etanol/química , Teorema de Bayes , Dispositivos Laboratorio en un Chip , Liposomas
16.
Sensors (Basel) ; 24(5)2024 Feb 21.
Artículo en Inglés | MEDLINE | ID: mdl-38474935

RESUMEN

Hyperspectral imaging (HSI) has become a very compelling technique in different scientific areas; indeed, many researchers use it in the fields of remote sensing, agriculture, forensics, and medicine. In the latter, HSI plays a crucial role as a diagnostic support and for surgery guidance. However, the computational effort in elaborating hyperspectral data is not trivial. Furthermore, the demand for detecting diseases in a short time is undeniable. In this paper, we take up this challenge by parallelizing three machine-learning methods among those that are the most intensively used: Support Vector Machine (SVM), Random Forest (RF), and eXtreme Gradient Boosting (XGB) algorithms using the Compute Unified Device Architecture (CUDA) to accelerate the classification of hyperspectral skin cancer images. They all showed a good performance in HS image classification, in particular when the size of the dataset is limited, as demonstrated in the literature. We illustrate the parallelization techniques adopted for each approach, highlighting the suitability of Graphical Processing Units (GPUs) to this aim. Experimental results show that parallel SVM and XGB algorithms significantly improve the classification times in comparison with their serial counterparts.


Asunto(s)
Algoritmos , Neoplasias Cutáneas , Humanos , Aprendizaje Automático , Imágenes Hiperespectrales , Aceleración , Máquina de Vectores de Soporte
17.
Sensors (Basel) ; 24(12)2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38931566

RESUMEN

Mapping soil properties in sub-watersheds is critical for agricultural productivity, land management, and ecological security. Machine learning has been widely applied to digital soil mapping due to a rapidly increasing number of environmental covariates. However, the inclusion of many environmental covariates in machine learning models leads to the problem of multicollinearity, with poorly understood consequences for prediction performance. Here, we explored the effects of variable selection on the prediction performance of two machine learning models for multiple soil properties in the Haihun River sub-watershed, Jiangxi Province, China. Surface soils (0-20 cm) were collected from a total of 180 sample points in 2022. The optimal covariates were selected from 40 environmental covariates using a recursive feature elimination algorithm. Compared to all-variable models, the random forest (RF) and extreme gradient boosting (XGBoost) models with variable selection improved in prediction accuracy. The R2 values of the RF and XGBoost models increased by 0.34 and 0.47 for the soil organic carbon, by 0.67 and 0.62 for the total phosphorus, and by 0.43 and 0.62 for the available phosphorus, respectively. The models with variable selection presented reduced global uncertainty, and the overall uncertainty of the RF model was lower than that of the XGBoost model. The soil properties showed high spatial heterogeneity based on the models with variable selection. Remote sensing covariates (particularly principal component 2) were the major factors controlling the distribution of the soil organic carbon. Human activity covariates (mainly land use) and organism covariates (mainly potential evapotranspiration) played a predominant role in driving the distribution of the soil total and soil available phosphorus, respectively. This study indicates the importance of variable selection for predicting multiple soil properties and mapping their spatial distribution in sub-watersheds.

18.
Sensors (Basel) ; 24(6)2024 Mar 14.
Artículo en Inglés | MEDLINE | ID: mdl-38544140

RESUMEN

Long-span bridges are susceptible to damage, aging, and deformation in harsh environments for a long time. Therefore, structural health monitoring (SHM) systems need to be used for reasonable monitoring and maintenance. Among various indicators, bridge displacement is a crucial parameter reflecting the bridge's health condition. Due to the simultaneous bearing of multiple environmental loads on suspension bridges, determining the impact of different loads on displacement is beneficial for the better understanding of the health conditions of the bridges. Considering the fact that extreme gradient boosting (XGBoost) has higher prediction performance and robustness, the authors of this paper have developed a data-driven approach based on the XGBoost model to quantify the impact between different environmental loads and the displacement of a suspension bridge. Simultaneously, this study combined wavelet threshold (WT) denoising and the variational mode decomposition (VMD) method to conduct a modal decomposition of three-dimensional (3D) displacement, further investigating the interrelationships between different loads and bridge displacements. This model links wind speed, temperature, air pressure, and humidity with the 3D displacement response of the span using the bridge monitoring data provided by the GNSS and Earth Observation for Structural Health Monitoring (GeoSHM) system of the Forth Road Bridge (FRB) in the United Kingdom (UK), thus eliminating the temperature time-lag effect on displacement data. The effects of the different loads on the displacement are quantified individually with partial dependence plots (PDPs). Employing testing, it was found that the XGBoost model has a high predictive effect on the target variable of displacement. The analysis of quantification and correlation reveals that lateral displacement is primarily affected by same-direction wind, showing a clear positive correlation, and vertical displacement is mainly influenced by temperature and exhibits a negative correlation. Longitudinal displacement is jointly influenced by various environmental loads, showing a positive correlation with atmospheric pressure, temperature, and vertical wind and a negative correlation with longitudinal wind, lateral wind, and humidity. The results can guide bridge structural health monitoring in extreme weather to avoid accidents.

19.
J Clin Ultrasound ; 52(3): 305-314, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38149658

RESUMEN

OBJECTIVES: Radiomics-based eXtreme gradient boosting (XGBoost) model was developed to differentiate benign thyroid nodules from malignant thyroid nodules and to prevent unnecessary thyroid biopsies, including positive and negative effects. METHODS: The study evaluated a data set of ultrasound images of thyroid nodules in patients retrospectively, who initially received ultrasound-guided fine-needle aspiration biopsy (FNAB) for diagnostic purposes. According to ACR TI-RADS, a total of five ultrasound feature categories and the maximum size of the nodule were determined by four radiologists. A radiomics score was developed by the LASSO algorithm from the ultrasound-based radiomics features. An interpretative method based on Shapley additive explanation (SHAP) was developed. XGBoost was compared with ACR TI-RADS for its diagnostic performance and FNAB rate and was compared with six other machine learning models to evaluate the model performance. RESULTS: Finally, 191 thyroid nodules were examined from 177 patients. The radiomics score were calculated using 8 features, which were selected among 789 candidate features generated from the ultrasound images. The model yielded an AUC of 93% in the training cohort and 92% in the test cohort. It outperformed traditional machine learning models in assessing the nature of thyroid nodules. Compared with ACR TI-RADS, the FNAB rate decreased from 34% to 30% in training and from 35% to 41% in test. CONCLUSIONS: The radiomics-based XGBoost model proposed could distinguish benign and malignant thyroid nodules, thereby reduced significantly the number of unnecessary FNAB. It was effective in making preoperative decisions and managing selected patients using the SHAP visual interpretation tools.


Asunto(s)
Nódulo Tiroideo , Humanos , Nódulo Tiroideo/diagnóstico por imagen , Nódulo Tiroideo/patología , Estudios Retrospectivos , Radiómica , Diagnóstico Diferencial , Ultrasonografía/métodos , Biopsia con Aguja Fina
20.
Water Sci Technol ; 89(10): 2605-2624, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38822603

RESUMEN

Floods are one of the most destructive disasters that cause loss of life and property worldwide every year. In this study, the aim was to find the best-performing model in flood sensitivity assessment and analyze key characteristic factors, the spatial pattern of flood sensitivity was evaluated using three machine learning (ML) models: Logistic Regression (LR), eXtreme Gradient Boosting (XGBoost), and Random Forest (RF). Suqian City in Jiangsu Province was selected as the study area, and a random sample dataset of historical flood points was constructed. Fifteen different meteorological, hydrological, and geographical spatial variables were considered in the flood sensitivity assessment, 12 variables were selected based on the multi-collinearity study. Among the results of comparing the selected ML models, the RF method had the highest AUC value, accuracy, and comprehensive evaluation effect, and is a reliable and effective flood risk assessment model. As the main output of this study, the flood sensitivity map is divided into five categories, ranging from very low to very high sensitivity. Using the RF model (i.e., the highest accuracy of the model), the high-risk area covers about 44% of the study area, mainly concentrated in the central, eastern, and southern parts of the old city area.


Asunto(s)
Inundaciones , Modelos Logísticos , Aprendizaje Automático , China , Modelos Teóricos , Bosques Aleatorios
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA