Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 416
Filtrar
1.
BMC Biol ; 22(1): 44, 2024 Feb 27.
Artículo en Inglés | MEDLINE | ID: mdl-38408987

RESUMEN

BACKGROUND: Circular RNAs (circRNAs) can regulate microRNA activity and are related to various diseases, such as cancer. Functional research on circRNAs is the focus of scientific research. Accurate identification of circRNAs is important for gaining insight into their functions. Although several circRNA prediction models have been developed, their prediction accuracy is still unsatisfactory. Therefore, providing a more accurate computational framework to predict circRNAs and analyse their looping characteristics is crucial for systematic annotation. RESULTS: We developed a novel framework, CircDC, for classifying circRNAs from other lncRNAs. CircDC uses four different feature encoding schemes and adopts a multilayer convolutional neural network and bidirectional long short-term memory network to learn high-order feature representation and make circRNA predictions. The results demonstrate that the proposed CircDC model is more accurate than existing models. In addition, an interpretable analysis of the features affecting the model is performed, and the computational framework is applied to the extended application of circRNA identification. CONCLUSIONS: CircDC is suitable for the prediction of circRNA. The identification of circRNA helps to understand and delve into the related biological processes and functions. Feature importance analysis increases model interpretability and uncovers significant biological properties. The relevant code and data in this article can be accessed for free at https://github.com/nmt315320/CircDC.git .


Asunto(s)
MicroARNs , Neoplasias , Humanos , ARN Circular/genética , Redes Neurales de la Computación , Neoplasias/genética , Biología Computacional/métodos
2.
BMC Bioinformatics ; 25(1): 246, 2024 Jul 24.
Artículo en Inglés | MEDLINE | ID: mdl-39048979

RESUMEN

Metagenomic data plays a crucial role in analyzing the relationship between microbes and diseases. However, the limited number of samples, high dimensionality, and sparsity of metagenomic data pose significant challenges for the application of deep learning in data classification and prediction. Previous studies have shown that utilizing the phylogenetic tree structure to transform metagenomic abundance data into a 2D matrix input for convolutional neural networks (CNNs) improves classification performance. Inspired by the success of a Permutable MLP-like architecture in visual recognition, we propose Metagenomic Permutator (MetaP), which applied the Permutable MLP-like network structure to capture the phylogenetic information of microbes within the 2D matrix formed by phylogenetic tree. Our experiments demonstrate that our model achieved competitive performance compared to other deep neural networks and traditional machine learning, and has good prospects for multi-classification and large sample sizes. Furthermore, we utilize the SHAP (SHapley Additive exPlanations) method to interpret our model predictions, identifying the microbial features that are associated with diseases.


Asunto(s)
Microbioma Gastrointestinal , Metagenómica , Metagenómica/métodos , Microbioma Gastrointestinal/genética , Humanos , Redes Neurales de la Computación , Filogenia , Aprendizaje Automático , Aprendizaje Profundo , Metagenoma/genética
3.
Pflugers Arch ; 2024 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-39088045

RESUMEN

Explainable artificial intelligence (XAI) has gained significant attention in various domains, including natural and medical image analysis. However, its application in spectroscopy remains relatively unexplored. This systematic review aims to fill this gap by providing a comprehensive overview of the current landscape of XAI in spectroscopy and identifying potential benefits and challenges associated with its implementation. Following the PRISMA guideline 2020, we conducted a systematic search across major journal databases, resulting in 259 initial search results. After removing duplicates and applying inclusion and exclusion criteria, 21 scientific studies were included in this review. Notably, most of the studies focused on using XAI methods for spectral data analysis, emphasizing identifying significant spectral bands rather than specific intensity peaks. Among the most utilized AI techniques were SHapley Additive exPlanations (SHAP), masking methods inspired by Local Interpretable Model-agnostic Explanations (LIME), and Class Activation Mapping (CAM). These methods were favored due to their model-agnostic nature and ease of use, enabling interpretable explanations without modifying the original models. Future research should propose new methods and explore the adaptation of other XAI employed in other domains to better suit the unique characteristics of spectroscopic data.

4.
Glob Chang Biol ; 30(1): e17006, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37909670

RESUMEN

Uncovering the mechanisms that lead to Amazon forest resilience variations is crucial to predict the impact of future climatic and anthropogenic disturbances. Here, we apply a previously used empirical resilience metrics, lag-1 month temporal autocorrelation (TAC), to vegetation optical depth data in C-band (a good proxy of the whole canopy water content) in order to explore how forest resilience variations are impacted by human disturbances and environmental drivers in the Brazilian Amazon. We found that human disturbances significantly increase the risk of critical transitions, and that the median TAC value is ~2.4 times higher in human-disturbed forests than that in intact forests, suggesting a much lower resilience in disturbed forests. Additionally, human-disturbed forests are less resilient to land surface heat stress and atmospheric water stress than intact forests. Among human-disturbed forests, forests with a more closed and thicker canopy structure, which is linked to a higher forest cover and a lower disturbance fraction, are comparably more resilient. These results further emphasize the urgent need to limit deforestation and degradation through policy intervention to maintain the resilience of the Amazon rainforests.


Asunto(s)
Bosque Lluvioso , Resiliencia Psicológica , Efectos Antropogénicos , Conservación de los Recursos Naturales/métodos , Bosques
5.
Br J Clin Pharmacol ; 2024 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-38845212

RESUMEN

AIMS: Although there are various model-based approaches to individualized vancomycin (VCM) administration, few have been reported for adult patients with periprosthetic joint infection (PJI). This work attempted to develop a machine learning (ML)-based model for predicting VCM trough concentration in adult PJI patients. METHODS: The dataset of 287 VCM trough concentrations from 130 adult PJI patients was split into a training set (229) and a testing set (58) at a ratio of 8:2, and an independent external 32 concentrations were collected as a validation set. A total of 13 covariates and the target variable (VCM trough concentration) were included in the dataset. A covariate model was respectively constructed by support vector regression, random forest regression and gradient boosted regression trees and interpreted by SHapley Additive exPlanation (SHAP). RESULTS: The SHAP plots visualized the weight of the covariates in the models, with estimated glomerular filtration rate and VCM daily dose as the 2 most important factors, which were adopted for the model construction. Random forest regression was the optimal ML algorithm with a relative accuracy of 82.8% and absolute accuracy of 67.2% (R2 =.61, mean absolute error = 2.4, mean square error = 10.1), and its prediction performance was verified in the validation set. CONCLUSION: The proposed ML-based model can satisfactorily predict the VCM trough concentration in adult PJI patients. Its construction can be facilitated with only 2 clinical parameters (estimated glomerular filtration rate and VCM daily dose), and prediction accuracy can be rationalized by SHAP values, which highlights a profound practical value for clinical dosing guidance and timely treatment.

6.
BMC Med Res Methodol ; 24(1): 23, 2024 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-38273257

RESUMEN

Sepsis remains a critical concern in intensive care units due to its high mortality rate. Early identification and intervention are paramount to improving patient outcomes. In this study, we have proposed predictive models for early sepsis prediction based on time-series data, utilizing both CNN-Transformer and LSTM-Transformer architectures. By collecting time-series data from patients at 4, 8, and 12 h prior to sepsis diagnosis and subjecting it to various network models for analysis and comparison. In contrast to traditional recurrent neural networks, our model exhibited a substantial improvement of approximately 20%. On average, our model demonstrated an accuracy of 0.964 (± 0.018), a precision of 0.956 (± 0.012), a recall of 0.967 (± 0.012), and an F1 score of 0.959 (± 0.014). Furthermore, by adjusting the time window, it was observed that the Transformer-based model demonstrated exceptional predictive capabilities, particularly within the earlier time window (i.e., 12 h before onset), thus holding significant promise for early clinical diagnosis and intervention. Besides, we employed the SHAP algorithm to visualize the weight distribution of different features, enhancing the interpretability of our model and facilitating early clinical diagnosis and intervention.


Asunto(s)
Sepsis , Humanos , Factores de Tiempo , Sepsis/diagnóstico , Sepsis/terapia , Algoritmos , Unidades de Cuidados Intensivos , Recuerdo Mental
7.
Popul Health Metr ; 22(1): 10, 2024 Jun 03.
Artículo en Inglés | MEDLINE | ID: mdl-38831424

RESUMEN

BACKGROUND: There are significant geographic inequities in COVID-19 case fatality rates (CFRs), and comprehensive understanding its country-level determinants in a global perspective is necessary. This study aims to quantify the country-specific risk of COVID-19 CFR and propose tailored response strategies, including vaccination strategies, in 156 countries. METHODS: Cross-temporal and cross-country variations in COVID-19 CFR was identified using extreme gradient boosting (XGBoost) including 35 factors from seven dimensions in 156 countries from 28 January, 2020 to 31 January, 2022. SHapley Additive exPlanations (SHAP) was used to further clarify the clustering of countries by the key factors driving CFR and the effect of concurrent risk factors for each country. Increases in vaccination rates was simulated to illustrate the reduction of CFR in different classes of countries. FINDINGS: Overall COVID-19 CFRs varied across countries from 28 Jan 2020 to 31 Jan 31 2022, ranging from 68 to 6373 per 100,000 population. During the COVID-19 pandemic, the determinants of CFRs first changed from health conditions to universal health coverage, and then to a multifactorial mixed effect dominated by vaccination. In the Omicron period, countries were divided into five classes according to risk determinants. Low vaccination-driven class (70 countries) mainly distributed in sub-Saharan Africa and Latin America, and include the majority of low-income countries (95.7%) with many concurrent risk factors. Aging-driven class (26 countries) mainly distributed in high-income European countries. High disease burden-driven class (32 countries) mainly distributed in Asia and North America. Low GDP-driven class (14 countries) are scattered across continents. Simulating a 5% increase in vaccination rate resulted in CFR reductions of 31.2% and 15.0% for the low vaccination-driven class and the high disease burden-driven class, respectively, with greater CFR reductions for countries with high overall risk (SHAP value > 0.1), but only 3.1% for the ageing-driven class. CONCLUSIONS: Evidence from this study suggests that geographic inequities in COVID-19 CFR is jointly determined by key and concurrent risks, and achieving a decreasing COVID-19 CFR requires more than increasing vaccination coverage, but rather targeted intervention strategies based on country-specific risks.


Asunto(s)
COVID-19 , Salud Global , Aprendizaje Automático , SARS-CoV-2 , Humanos , COVID-19/mortalidad , Factores de Riesgo , Pandemias , Vacunas contra la COVID-19 , Vacunación
8.
Environ Sci Technol ; 58(19): 8372-8379, 2024 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-38691628

RESUMEN

The development of highly efficient catalysts for formaldehyde (HCHO) oxidation is of significant interest for the improvement of indoor air quality. Up to 400 works relating to the catalytic oxidation of HCHO have been published to date; however, their analysis for collective inference through conventional literature search is still a challenging task. A machine learning (ML) framework was presented to predict catalyst performance from experimental descriptors based on an HCHO oxidation catalysts database. MnOx, CeO2, Co3O4, TiO2, FeOx, ZrO2, Al2O3, SiO2, and carbon-based catalysts with different promoters were compiled from the literature. Notably, 20 descriptors including reaction catalyst composition, reaction conditions, and catalyst physical properties were collected for data mining (2263 data points). Furthermore, the eXtreme Gradient Boosting algorithm was employed, which successfully predicted the conversion efficiency of HCHO with an R-square value of 0.81. Shapley additive analysis suggested Pt/MnO2 and Ag/Ce-Co3O4 exhibited excellent catalytic performance of HCHO oxidation based on the analysis of the entire database. Validated by experimental tests and theoretical simulations, the key descriptor identified by ML, i.e., the first promoter, was further described as metal-support interactions. This study highlights ML as a useful tool for database establishment and the catalyst rational design strategy based on the importance of analysis between experimental descriptors and the performance of complex catalytic systems.


Asunto(s)
Contaminación del Aire Interior , Formaldehído , Aprendizaje Automático , Oxidación-Reducción , Formaldehído/química , Catálisis
9.
Environ Sci Technol ; 58(1): 488-497, 2024 Jan 09.
Artículo en Inglés | MEDLINE | ID: mdl-38134352

RESUMEN

Per- and polyfluoroalkyl substances (PFAS) are widely employed anthropogenic fluorinated chemicals known to disrupt hepatic lipid metabolism by binding to human peroxisome proliferator-activated receptor alpha (PPARα). Therefore, screening for PFAS that bind to PPARα is of critical importance. Machine learning approaches are promising techniques for rapid screening of PFAS. However, traditional machine learning approaches lack interpretability, posing challenges in investigating the relationship between molecular descriptors and PPARα binding. In this study, we aimed to develop a novel, explainable machine learning approach to rapidly screen for PFAS that bind to PPARα. We calculated the PPARα-PFAS binding score and 206 molecular descriptors for PFAS. Through systematic and objective selection of important molecular descriptors, we developed a machine learning model with good predictive performance using only three descriptors. The molecular size (b_single) and electrostatic properties (BCUT_PEOE_3 and PEOE_VSA_PPOS) are important for PPARα-PFAS binding. Alternative PFAS are considered safer than their legacy predecessors. However, we found that alternative PFAS with many carbon atoms and ether groups exhibited a higher affinity for PPARα. Therefore, confirming the toxicity of these alternative PFAS compounds with such characteristics through biological experiments is important.


Asunto(s)
Fluorocarburos , PPAR alfa , Humanos , PPAR alfa/metabolismo , Hígado/metabolismo
10.
Environ Sci Technol ; 58(23): 10128-10139, 2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38743597

RESUMEN

Pervaporation (PV) is an effective membrane separation process for organic dehydration, recovery, and upgrading. However, it is crucial to improve membrane materials beyond the current permeability-selectivity trade-off. In this research, we introduce machine learning (ML) models to identify high-potential polymers, greatly improving the efficiency and reducing cost compared to conventional trial-and-error approach. We utilized the largest PV data set to date and incorporated polymer fingerprints and features, including membrane structure, operating conditions, and solute properties. Dimensionality reduction, missing data treatment, seed randomness, and data leakage management were employed to ensure model robustness. The optimized LightGBM models achieved RMSE of 0.447 and 0.360 for separation factor and total flux, respectively (logarithmic scale). Screening approximately 1 million hypothetical polymers with ML models resulted in identifying polymers with a predicted permeation separation index >30 and synthetic accessibility score <3.7 for acetic acid extraction. This study demonstrates the promise of ML to accelerate tailored membrane designs.


Asunto(s)
Aprendizaje Automático , Polímeros , Polímeros/química , Membranas Artificiales , Permeabilidad
11.
Environ Sci Technol ; 58(29): 13035-13046, 2024 Jul 23.
Artículo en Inglés | MEDLINE | ID: mdl-38982681

RESUMEN

Gaseous nitrous acid (HONO) is identified as a critical precursor of hydroxyl radicals (OH), influencing atmospheric oxidation capacity and the formation of secondary pollutants. However, large uncertainties persist regarding its formation and elimination mechanisms, impeding accurate simulation of HONO levels using chemical models. In this study, a deep neural network (DNN) model was established based on routine air quality data (O3, NO2, CO, and PM2.5) and meteorological parameters (temperature, relative humidity, solar zenith angle, and season) collected from four typical megacity clusters in China. The model exhibited robust performance on both the train sets [slope = 1.0, r2 = 0.94, root mean squared error (RMSE) = 0.29 ppbv] and two independent test sets (slope = 1.0, r2 = 0.79, and RMSE = 0.39 ppbv), demonstrated excellent capability in reproducing the spatiotemporal variations of HONO, and outperformed an observation-constrained box model incorporated with newly proposed HONO formation mechanisms. Nitrogen dioxide (NO2) was identified as the most impactful features for HONO prediction using the SHapely Additive exPlanation (SHAP) approach, highlighting the importance of NO2 conversion in HONO formation. The DNN model was further employed to predict the future change of HONO levels in different NOx abatement scenarios, which is expected to decrease 27-44% in summer as the result of 30-50% NOx reduction. These results suggest a dual effect brought by abatement of NOx emissions, leading to not only reduction of O3 and nitrate precursors but also decrease in HONO levels and hence primary radical production rates (PROx). In summary, this study demonstrates the feasibility of using deep learning approach to predict HONO concentrations, offering a promising supplement to traditional chemical models. Additionally, stringent NOx abatement would be beneficial for collaborative alleviation of O3 and secondary PM2.5.


Asunto(s)
Contaminantes Atmosféricos , Aprendizaje Profundo , Ácido Nitroso , Ácido Nitroso/química , Contaminantes Atmosféricos/análisis , China , Monitoreo del Ambiente/métodos , Contaminación del Aire
12.
Network ; : 1-38, 2024 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-38511557

RESUMEN

Interpretable machine learning models are instrumental in disease diagnosis and clinical decision-making, shedding light on relevant features. Notably, Boruta, SHAP (SHapley Additive exPlanations), and BorutaShap were employed for feature selection, each contributing to the identification of crucial features. These selected features were then utilized to train six machine learning algorithms, including LR, SVM, ETC, AdaBoost, RF, and LR, using diverse medical datasets obtained from public sources after rigorous preprocessing. The performance of each feature selection technique was evaluated across multiple ML models, assessing accuracy, precision, recall, and F1-score metrics. Among these, SHAP showcased superior performance, achieving average accuracies of 80.17%, 85.13%, 90.00%, and 99.55% across diabetes, cardiovascular, statlog, and thyroid disease datasets, respectively. Notably, the LGBM emerged as the most effective algorithm, boasting an average accuracy of 91.00% for most disease states. Moreover, SHAP enhanced the interpretability of the models, providing valuable insights into the underlying mechanisms driving disease diagnosis. This comprehensive study contributes significant insights into feature selection techniques and machine learning algorithms for disease diagnosis, benefiting researchers and practitioners in the medical field. Further exploration of feature selection methods and algorithms holds promise for advancing disease diagnosis methodologies, paving the way for more accurate and interpretable diagnostic models.

13.
Eur Neurol ; 87(2): 54-66, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38565087

RESUMEN

INTRODUCTION: Malignant cerebral edema (MCE) is a serious complication and the main cause of poor prognosis in patients with large-hemisphere infarction (LHI). Therefore, the rapid and accurate identification of potential patients with MCE is essential for timely therapy. This study utilized an artificial intelligence-based machine learning approach to establish an interpretable model for predicting MCE in patients with LHI. METHODS: This study included 314 patients with LHI not undergoing recanalization therapy. The patients were divided into MCE and non-MCE groups, and the eXtreme Gradient Boosting (XGBoost) model was developed. A confusion matrix was used to measure the prediction performance of the XGBoost model. We also utilized the SHapley Additive exPlanations (SHAP) method to explain the XGBoost model. Decision curve and receiver operating characteristic curve analyses were performed to evaluate the net benefits of the model. RESULTS: MCE was observed in 121 (38.5%) of the 314 patients with LHI. The model showed excellent predictive performance, with an area under the curve of 0.916. The SHAP method revealed the top 10 predictive variables of the MCE such as ASPECTS score, NIHSS score, CS score, APACHE II score, HbA1c, AF, NLR, PLT, GCS, and age based on their importance ranking. CONCLUSION: An interpretable predictive model can increase transparency and help doctors accurately predict the occurrence of MCE in LHI patients not undergoing recanalization therapy within 48 h of onset, providing patients with better treatment strategies and enabling optimal resource allocation.


Asunto(s)
Inteligencia Artificial , Edema Encefálico , Humanos , Masculino , Femenino , Anciano , Edema Encefálico/etiología , Persona de Mediana Edad , Aprendizaje Automático , Infarto Cerebral/etiología , Infarto Cerebral/diagnóstico por imagen , Estudios Retrospectivos , Pronóstico , Anciano de 80 o más Años
14.
BMC Public Health ; 24(1): 2131, 2024 Aug 06.
Artículo en Inglés | MEDLINE | ID: mdl-39107721

RESUMEN

BACKGROUND: The temporal relationships across cardiometabolic diseases (CMDs) were recently conceptualized as the cardiometabolic continuum (CMC), sequence of cardiovascular events that stem from gene-environmental interactions, unhealthy lifestyle influences, and metabolic diseases such as diabetes, and hypertension. While the physiological pathways linking metabolic and cardiovascular diseases have been investigated, the study of the sex and population differences in the CMC have still not been described. METHODS: We present a machine learning approach to model the CMC and investigate sex and population differences in two distinct cohorts: the UK Biobank (17,700 participants) and the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil) (7162 participants). We consider the following CMDs: hypertension (Hyp), diabetes (DM), heart diseases (HD: angina, myocardial infarction, or heart failure), and stroke (STK). For the identification of the CMC patterns, individual trajectories with the time of disease occurrence were clustered using k-means. Based on clinical, sociodemographic, and lifestyle characteristics, we built multiclass random forest classifiers and used the SHAP methodology to evaluate feature importance. RESULTS: Five CMC patterns were identified across both sexes and cohorts: EarlyHyp, FirstDM, FirstHD, Healthy, and LateHyp, named according to prevalence and disease occurrence time that depicted around 95%, 78%, 75%, 88% and 99% of individuals, respectively. Within the UK Biobank, more women were classified in the Healthy cluster and more men in all others. In the EarlyHyp and LateHyp clusters, isolated hypertension occurred earlier among women. Smoking habits and education had high importance and clear directionality for both sexes. For ELSA-Brasil, more men were classified in the Healthy cluster and more women in the FirstDM. The diabetes occurrence time when followed by hypertension was lower among women. Education and ethnicity had high importance and clear directionality for women, while for men these features were smoking, alcohol, and coffee consumption. CONCLUSIONS: There are clear sex differences in the CMC that varied across the UK and Brazilian cohorts. In particular, disadvantages regarding incidence and the time to onset of diseases were more pronounced in Brazil, against woman. The results show the need to strengthen public health policies to prevent and control the time course of CMD, with an emphasis on women.


Asunto(s)
Enfermedades Cardiovasculares , Aprendizaje Automático , Humanos , Masculino , Femenino , Persona de Mediana Edad , Reino Unido/epidemiología , Brasil/epidemiología , Adulto , Enfermedades Cardiovasculares/epidemiología , Factores Sexuales , Estudios Longitudinales , Anciano , Bancos de Muestras Biológicas , Factores de Riesgo Cardiometabólico , Estudios de Cohortes , Biobanco del Reino Unido
15.
Artículo en Inglés | MEDLINE | ID: mdl-38993175

RESUMEN

PURPOSE: The objective was to predict proliferative diabetic retinopathy (PDR) in non-Hispanic Black (NHB) and Latino (LA) patients by applying machine learning algorithms to routinely collected blood and urine laboratory results. METHODS: Electronic medical records of 1124 type 2 diabetes patients treated at the Bronxcare Hospital eye clinic between January and December 2019 were analysed. Data collected included demographic information (ethnicity, age and sex), blood (fasting glucose, haemoglobin A1C [HbA1c] high-density lipoprotein [HDL], low-density lipoprotein [LDL], serum creatinine and estimated glomerular filtration rate [eGFR]) and urine (albumin-to-creatinine ratio [ACR]) test results and the outcome measure of retinopathy status. The efficacy of different machine learning models was assessed and compared. SHapley Additive exPlanations (SHAP) analysis was employed to evaluate the contribution of each feature to the model's predictions. RESULTS: The balanced random forest model surpassed other models in predicting PDR for both NHB and LA cohorts, achieving an AUC (area under the curve) of 83%. Regarding sex, the model exhibited remarkable performance for the female LA demographic, with an AUC of 87%. The SHAP analysis revealed that PDR-related factors influenced NHB and LA patients differently, with more pronounced disparity between sexes. Furthermore, the optimal cut-off values for these factors showed variations based on sex and ethnicity. CONCLUSIONS: This study demonstrates the potential of machine learning in identifying individuals at higher risk for PDR by leveraging routine blood and urine test results. It allows clinicians to prioritise at-risk individuals for timely evaluations. Furthermore, the findings emphasise the importance of accounting for both ethnicity and sex when analysing risk factors for PDR in type 2 diabetes individuals.

16.
J Med Internet Res ; 26: e51354, 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38691403

RESUMEN

BACKGROUND: Acute kidney disease (AKD) affects more than half of critically ill elderly patients with acute kidney injury (AKI), which leads to worse short-term outcomes. OBJECTIVE: We aimed to establish 2 machine learning models to predict the risk and prognosis of AKD in the elderly and to deploy the models as online apps. METHODS: Data on elderly patients with AKI (n=3542) and AKD (n=2661) from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database were used to develop 2 models for predicting the AKD risk and in-hospital mortality, respectively. Data collected from Xiangya Hospital of Central South University were for external validation. A bootstrap method was used for internal validation to obtain relatively stable results. We extracted the indicators within 24 hours of the first diagnosis of AKI and the fluctuation range of some indicators, namely delta (day 3 after AKI minus day 1), as features. Six machine learning algorithms were used for modeling; the area under the receiver operating characteristic curve (AUROC), decision curve analysis, and calibration curve for evaluating; Shapley additive explanation (SHAP) analysis for visually interpreting; and the Heroku platform for deploying the best-performing models as web-based apps. RESULTS: For the model of predicting the risk of AKD in elderly patients with AKI during hospitalization, the Light Gradient Boosting Machine (LightGBM) showed the best overall performance in the training (AUROC=0.844, 95% CI 0.831-0.857), internal validation (AUROC=0.853, 95% CI 0.841-0.865), and external (AUROC=0.755, 95% CI 0.699-0.811) cohorts. In addition, LightGBM performed well for the AKD prognostic prediction in the training (AUROC=0.861, 95% CI 0.843-0.878), internal validation (AUROC=0.868, 95% CI 0.851-0.885), and external (AUROC=0.746, 95% CI 0.673-0.820) cohorts. The models deployed as online prediction apps allowed users to predict and provide feedback to submit new data for model iteration. In the importance ranking and correlation visualization of the model's top 10 influencing factors conducted based on the SHAP value, partial dependence plots revealed the optimal cutoff of some interventionable indicators. The top 5 factors predicting the risk of AKD were creatinine on day 3, sepsis, delta blood urea nitrogen (BUN), diastolic blood pressure (DBP), and heart rate, while the top 5 factors determining in-hospital mortality were age, BUN on day 1, vasopressor use, BUN on day 3, and partial pressure of carbon dioxide (PaCO2). CONCLUSIONS: We developed and validated 2 online apps for predicting the risk of AKD and its prognostic mortality in elderly patients, respectively. The top 10 factors that influenced the AKD risk and mortality during hospitalization were identified and explained visually, which might provide useful applications for intelligent management and suggestions for future prospective research.


Asunto(s)
Lesión Renal Aguda , Enfermedad Crítica , Hospitalización , Internet , Aprendizaje Automático , Humanos , Anciano , Enfermedad Crítica/mortalidad , Pronóstico , Lesión Renal Aguda/mortalidad , Lesión Renal Aguda/diagnóstico , Femenino , Masculino , Hospitalización/estadística & datos numéricos , Anciano de 80 o más Años , Mortalidad Hospitalaria , Medición de Riesgo/métodos
17.
J Med Internet Res ; 26: e55913, 2024 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-38758578

RESUMEN

BACKGROUND: Suicide is the second-leading cause of death among adolescents and is associated with clusters of suicides. Despite numerous studies on this preventable cause of death, the focus has primarily been on single nations and traditional statistical methods. OBJECTIVE: This study aims to develop a predictive model for adolescent suicidal thinking using multinational data sets and machine learning (ML). METHODS: We used data from the Korea Youth Risk Behavior Web-based Survey with 566,875 adolescents aged between 13 and 18 years and conducted external validation using the Youth Risk Behavior Survey with 103,874 adolescents and Norway's University National General Survey with 19,574 adolescents. Several tree-based ML models were developed, and feature importance and Shapley additive explanations values were analyzed to identify risk factors for adolescent suicidal thinking. RESULTS: When trained on the Korea Youth Risk Behavior Web-based Survey data from South Korea with a 95% CI, the XGBoost model reported an area under the receiver operating characteristic (AUROC) curve of 90.06% (95% CI 89.97-90.16), displaying superior performance compared to other models. For external validation using the Youth Risk Behavior Survey data from the United States and the University National General Survey from Norway, the XGBoost model achieved AUROCs of 83.09% and 81.27%, respectively. Across all data sets, XGBoost consistently outperformed the other models with the highest AUROC score, and was selected as the optimal model. In terms of predictors of suicidal thinking, feelings of sadness and despair were the most influential, accounting for 57.4% of the impact, followed by stress status at 19.8%. This was followed by age (5.7%), household income (4%), academic achievement (3.4%), sex (2.1%), and others, which contributed less than 2% each. CONCLUSIONS: This study used ML by integrating diverse data sets from 3 countries to address adolescent suicide. The findings highlight the important role of emotional health indicators in predicting suicidal thinking among adolescents. Specifically, sadness and despair were identified as the most significant predictors, followed by stressful conditions and age. These findings emphasize the critical need for early diagnosis and prevention of mental health issues during adolescence.


Asunto(s)
Aprendizaje Automático , Ideación Suicida , Humanos , Adolescente , Femenino , Masculino , República de Corea , Algoritmos , Estudios de Cohortes , Conducta del Adolescente/psicología , Suicidio/estadística & datos numéricos , Suicidio/psicología , Noruega , Encuestas y Cuestionarios , Factores de Riesgo , Asunción de Riesgos
18.
J Med Internet Res ; 26: e52134, 2024 Jan 11.
Artículo en Inglés | MEDLINE | ID: mdl-38206673

RESUMEN

BACKGROUND: Robust and accurate prediction of severity for patients with COVID-19 is crucial for patient triaging decisions. Many proposed models were prone to either high bias risk or low-to-moderate discrimination. Some also suffered from a lack of clinical interpretability and were developed based on early pandemic period data. Hence, there has been a compelling need for advancements in prediction models for better clinical applicability. OBJECTIVE: The primary objective of this study was to develop and validate a machine learning-based Robust and Interpretable Early Triaging Support (RIETS) system that predicts severity progression (involving any of the following events: intensive care unit admission, in-hospital death, mechanical ventilation required, or extracorporeal membrane oxygenation required) within 15 days upon hospitalization based on routinely available clinical and laboratory biomarkers. METHODS: We included data from 5945 hospitalized patients with COVID-19 from 19 hospitals in South Korea collected between January 2020 and August 2022. For model development and external validation, the whole data set was partitioned into 2 independent cohorts by stratified random cluster sampling according to hospital type (general and tertiary care) and geographical location (metropolitan and nonmetropolitan). Machine learning models were trained and internally validated through a cross-validation technique on the development cohort. They were externally validated using a bootstrapped sampling technique on the external validation cohort. The best-performing model was selected primarily based on the area under the receiver operating characteristic curve (AUROC), and its robustness was evaluated using bias risk assessment. For model interpretability, we used Shapley and patient clustering methods. RESULTS: Our final model, RIETS, was developed based on a deep neural network of 11 clinical and laboratory biomarkers that are readily available within the first day of hospitalization. The features predictive of severity included lactate dehydrogenase, age, absolute lymphocyte count, dyspnea, respiratory rate, diabetes mellitus, c-reactive protein, absolute neutrophil count, platelet count, white blood cell count, and saturation of peripheral oxygen. RIETS demonstrated excellent discrimination (AUROC=0.937; 95% CI 0.935-0.938) with high calibration (integrated calibration index=0.041), satisfied all the criteria of low bias risk in a risk assessment tool, and provided detailed interpretations of model parameters and patient clusters. In addition, RIETS showed potential for transportability across variant periods with its sustainable prediction on Omicron cases (AUROC=0.903, 95% CI 0.897-0.910). CONCLUSIONS: RIETS was developed and validated to assist early triaging by promptly predicting the severity of hospitalized patients with COVID-19. Its high performance with low bias risk ensures considerably reliable prediction. The use of a nationwide multicenter cohort in the model development and validation implicates generalizability. The use of routinely collected features may enable wide adaptability. Interpretations of model parameters and patients can promote clinical applicability. Together, we anticipate that RIETS will facilitate the patient triaging workflow and efficient resource allocation when incorporated into a routine clinical practice.


Asunto(s)
Algoritmos , COVID-19 , Triaje , Humanos , Biomarcadores , COVID-19/diagnóstico , Mortalidad Hospitalaria , Redes Neurales de la Computación , Triaje/métodos , República de Corea
19.
BMC Med Inform Decis Mak ; 24(1): 24, 2024 Jan 24.
Artículo en Inglés | MEDLINE | ID: mdl-38267946

RESUMEN

BACKGROUND AND AIMS: Sexually transmitted infections (STIs) are a significant global public health challenge due to their high incidence rate and potential for severe consequences when early intervention is neglected. Research shows an upward trend in absolute cases and DALY numbers of STIs, with syphilis, chlamydia, trichomoniasis, and genital herpes exhibiting an increasing trend in age-standardized rate (ASR) from 2010 to 2019. Machine learning (ML) presents significant advantages in disease prediction, with several studies exploring its potential for STI prediction. The objective of this study is to build males-based and females-based STI risk prediction models based on the CatBoost algorithm using data from the National Health and Nutrition Examination Survey (NHANES) for training and validation, with sub-group analysis performed on each STI. The female sub-group also includes human papilloma virus (HPV) infection. METHODS: The study utilized data from the National Health and Nutrition Examination Survey (NHANES) program to build males-based and females-based STI risk prediction models using the CatBoost algorithm. Data was collected from 12,053 participants aged 18 to 59 years old, with general demographic characteristics and sexual behavior questionnaire responses included as features. The Adaptive Synthetic Sampling Approach (ADASYN) algorithm was used to address data imbalance, and 15 machine learning algorithms were evaluated before ultimately selecting the CatBoost algorithm. The SHAP method was employed to enhance interpretability by identifying feature importance in the model's STIs risk prediction. RESULTS: The CatBoost classifier achieved AUC values of 0.9995, 0.9948, 0.9923, and 0.9996 and 0.9769 for predicting chlamydia, genital herpes, genital warts, gonorrhea, and overall STIs infections among males. The CatBoost classifier achieved AUC values of 0.9971, 0.972, 0.9765, 1, 0.9485 and 0.8819 for predicting chlamydia, genital herpes, genital warts, gonorrhea, HPV and overall STIs infections among females. The characteristics of having sex with new partner/year, times having sex without condom/year, and the number of female vaginal sex partners/lifetime have been identified as the top three significant predictors for the overall risk of male STIs. Similarly, ever having anal sex with a man, age and the number of male vaginal sex partners/lifetime have been identified as the top three significant predictors for the overall risk of female STIs. CONCLUSIONS: This study demonstrated the effectiveness of the CatBoost classifier in predicting STI risks among both male and female populations. The SHAP algorithm revealed key predictors for each infection, highlighting consistent demographic characteristics and sexual behaviors across different STIs. These insights can guide targeted prevention strategies and interventions to alleviate the impact of STIs on public health.


Asunto(s)
Gonorrea , Herpes Genital , Infecciones por Papillomavirus , Enfermedades de Transmisión Sexual , Verrugas , Femenino , Masculino , Humanos , Adolescente , Adulto Joven , Adulto , Persona de Mediana Edad , Encuestas Nutricionales , Enfermedades de Transmisión Sexual/epidemiología , Algoritmos
20.
J Neuroeng Rehabil ; 21(1): 69, 2024 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-38725065

RESUMEN

BACKGROUND: In the practical application of sarcopenia screening, there is a need for faster, time-saving, and community-friendly detection methods. The primary purpose of this study was to perform sarcopenia screening in community-dwelling older adults and investigate whether surface electromyogram (sEMG) from hand grip could potentially be used to detect sarcopenia using machine learning (ML) methods with reasonable features extracted from sEMG signals. The secondary aim was to provide the interpretability of the obtained ML models using a novel feature importance estimation method. METHODS: A total of 158 community-dwelling older residents (≥ 60 years old) were recruited. After screening through the diagnostic criteria of the Asian Working Group for Sarcopenia in 2019 (AWGS 2019) and data quality check, participants were assigned to the healthy group (n = 45) and the sarcopenic group (n = 48). sEMG signals from six forearm muscles were recorded during the hand grip task at 20% maximal voluntary contraction (MVC) and 50% MVC. After filtering recorded signals, nine representative features were extracted, including six time-domain features plus three time-frequency domain features. Then, a voting classifier ensembled by a support vector machine (SVM), a random forest (RF), and a gradient boosting machine (GBM) was implemented to classify healthy versus sarcopenic participants. Finally, the SHapley Additive exPlanations (SHAP) method was utilized to investigate feature importance during classification. RESULTS: Seven out of the nine features exhibited statistically significant differences between healthy and sarcopenic participants in both 20% and 50% MVC tests. Using these features, the voting classifier achieved 80% sensitivity and 73% accuracy through a five-fold cross-validation. Such performance was better than each of the SVM, RF, and GBM models alone. Lastly, SHAP results revealed that the wavelength (WL) and the kurtosis of continuous wavelet transform coefficients (CWT_kurtosis) had the highest feature impact scores. CONCLUSION: This study proposed a method for community-based sarcopenia screening using sEMG signals of forearm muscles. Using a voting classifier with nine representative features, the accuracy exceeds 70% and the sensitivity exceeds 75%, indicating moderate classification performance. Interpretable results obtained from the SHAP model suggest that motor unit (MU) activation mode may be a key factor affecting sarcopenia.


Asunto(s)
Electromiografía , Fuerza de la Mano , Vida Independiente , Aprendizaje Automático , Sarcopenia , Humanos , Sarcopenia/diagnóstico , Sarcopenia/fisiopatología , Electromiografía/métodos , Anciano , Masculino , Femenino , Fuerza de la Mano/fisiología , China , Persona de Mediana Edad , Músculo Esquelético/fisiopatología , Máquina de Vectores de Soporte , Anciano de 80 o más Años , Pueblos del Este de Asia
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda