Pesquisa | BVS Doenças Infecciosas e Parasitárias

1.

CircRNA identification and feature interpretability analysis.

Niu, Mengting; Wang, Chunyu; Chen, Yaojia; Zou, Quan; Qi, Ren; Xu, Lei.

BMC Biol ; 22(1): 44, 2024 Feb 27.

Artigo em Inglês | MEDLINE | ID: mdl-38408987

RESUMO

BACKGROUND: Circular RNAs (circRNAs) can regulate microRNA activity and are related to various diseases, such as cancer. Functional research on circRNAs is the focus of scientific research. Accurate identification of circRNAs is important for gaining insight into their functions. Although several circRNA prediction models have been developed, their prediction accuracy is still unsatisfactory. Therefore, providing a more accurate computational framework to predict circRNAs and analyse their looping characteristics is crucial for systematic annotation. RESULTS: We developed a novel framework, CircDC, for classifying circRNAs from other lncRNAs. CircDC uses four different feature encoding schemes and adopts a multilayer convolutional neural network and bidirectional long short-term memory network to learn high-order feature representation and make circRNA predictions. The results demonstrate that the proposed CircDC model is more accurate than existing models. In addition, an interpretable analysis of the features affecting the model is performed, and the computational framework is applied to the extended application of circRNA identification. CONCLUSIONS: CircDC is suitable for the prediction of circRNA. The identification of circRNA helps to understand and delve into the related biological processes and functions. Feature importance analysis increases model interpretability and uncovers significant biological properties. The relevant code and data in this article can be accessed for free at https://github.com/nmt315320/CircDC.git .

Assuntos

MicroRNAs , Neoplasias , Humanos , RNA Circular/genética , Redes Neurais de Computação , Neoplasias/genética , Biologia Computacional/métodos

2.

A permutable MLP-like architecture for disease prediction from gut metagenomic data.

Jiang, Cong; Yang, Jian; Peng, Xiaogang; Li, Xiaozheng.

BMC Bioinformatics ; 25(1): 246, 2024 Jul 24.

Artigo em Inglês | MEDLINE | ID: mdl-39048979

RESUMO

Metagenomic data plays a crucial role in analyzing the relationship between microbes and diseases. However, the limited number of samples, high dimensionality, and sparsity of metagenomic data pose significant challenges for the application of deep learning in data classification and prediction. Previous studies have shown that utilizing the phylogenetic tree structure to transform metagenomic abundance data into a 2D matrix input for convolutional neural networks (CNNs) improves classification performance. Inspired by the success of a Permutable MLP-like architecture in visual recognition, we propose Metagenomic Permutator (MetaP), which applied the Permutable MLP-like network structure to capture the phylogenetic information of microbes within the 2D matrix formed by phylogenetic tree. Our experiments demonstrate that our model achieved competitive performance compared to other deep neural networks and traditional machine learning, and has good prospects for multi-classification and large sample sizes. Furthermore, we utilize the SHAP (SHapley Additive exPlanations) method to interpret our model predictions, identifying the microbial features that are associated with diseases.

Assuntos

Microbioma Gastrointestinal , Metagenômica , Metagenômica/métodos , Microbioma Gastrointestinal/genética , Humanos , Redes Neurais de Computação , Filogenia , Aprendizado de Máquina , Aprendizado Profundo , Metagenoma/genética

3.

Explainable artificial intelligence for spectroscopy data: a review.

Contreras, Jhonatan; Bocklitz, Thomas.

Pflugers Arch ; 2024 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-39088045

RESUMO

Explainable artificial intelligence (XAI) has gained significant attention in various domains, including natural and medical image analysis. However, its application in spectroscopy remains relatively unexplored. This systematic review aims to fill this gap by providing a comprehensive overview of the current landscape of XAI in spectroscopy and identifying potential benefits and challenges associated with its implementation. Following the PRISMA guideline 2020, we conducted a systematic search across major journal databases, resulting in 259 initial search results. After removing duplicates and applying inclusion and exclusion criteria, 21 scientific studies were included in this review. Notably, most of the studies focused on using XAI methods for spectral data analysis, emphasizing identifying significant spectral bands rather than specific intensity peaks. Among the most utilized AI techniques were SHapley Additive exPlanations (SHAP), masking methods inspired by Local Interpretable Model-agnostic Explanations (LIME), and Class Activation Mapping (CAM). These methods were favored due to their model-agnostic nature and ease of use, enabling interpretable explanations without modifying the original models. Future research should propose new methods and explore the adaptation of other XAI employed in other domains to better suit the unique characteristics of spectroscopic data.

4.

Tree-based ensemble machine learning models in the prediction of acute respiratory distress syndrome following cardiac surgery: a multicenter cohort study.

Zhang, Hang; Qian, Dewei; Zhang, Xiaomiao; Meng, Peize; Huang, Weiran; Gu, Tongtong; Fan, Yongliang; Zhang, Yi; Wang, Yuchen; Yu, Min; Yuan, Zhongxiang; Chen, Xin; Zhao, Qingnan; Ruan, Zheng.

J Transl Med ; 22(1): 772, 2024 Aug 15.

Artigo em Inglês | MEDLINE | ID: mdl-39148090

RESUMO

BACKGROUND: Acute respiratory distress syndrome (ARDS) after cardiac surgery is a severe respiratory complication with high mortality and morbidity. Traditional clinical approaches may lead to under recognition of this heterogeneous syndrome, potentially resulting in diagnosis delay. This study aims to develop and external validate seven machine learning (ML) models, trained on electronic health records data, for predicting ARDS after cardiac surgery. METHODS: This multicenter, observational cohort study included patients who underwent cardiac surgery in the training and testing cohorts (data from Nanjing First Hospital), as well as those patients who had cardiac surgery in a validation cohort (data from Shanghai General Hospital). The number of important features was determined using the sliding windows sequential forward feature selection method (SWSFS). We developed a set of tree-based ML models, including Decision Tree, GBDT, AdaBoost, XGBoost, LightGBM, Random Forest, and Deep Forest. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC) and Brier score. The SHapley Additive exPlanation (SHAP) techinque was employed to interpret the ML model. Furthermore, a comparison was made between the ML models and traditional scoring systems. ARDS is defined according to the Berlin definition. RESULTS: A total of 1996 patients who had cardiac surgery were included in the study. The top five important features identified by the SWSFS were chronic obstructive pulmonary disease, preoperative albumin, central venous pressure_T4, cardiopulmonary bypass time, and left ventricular ejection fraction. Among the seven ML models, Deep Forest demonstrated the best performance, with an AUC of 0.882 and a Brier score of 0.809 in the validation cohort. Notably, the SHAP values effectively illustrated the contribution of the 13 features attributed to the model output and the individual feature's effect on model prediction. In addition, the ensemble ML models demonstrated better performance than the other six traditional scoring systems. CONCLUSIONS: Our study identified 13 important features and provided multiple ML models to enhance the risk stratification for ARDS after cardiac surgery. Using these predictors and ML models might provide a basis for early diagnostic and preventive strategies in the perioperative management of ARDS patients.

Assuntos

Procedimentos Cirúrgicos Cardíacos , Aprendizado de Máquina , Síndrome do Desconforto Respiratório , Humanos , Síndrome do Desconforto Respiratório/etiologia , Masculino , Feminino , Pessoa de Meia-Idade , Estudos de Coortes , Procedimentos Cirúrgicos Cardíacos/efeitos adversos , Idoso , Curva ROC , Área Sob a Curva

5.

Anthropogenic disturbance exacerbates resilience loss in the Amazon rainforests.

Wang, Huan; Ciais, Philippe; Sitch, Stephen; Green, Julia K; Tao, Shengli; Fu, Zheng; Albergel, Clément; Bastos, Ana; Wang, Mengjia; Fawcett, Dominic; Frappart, Frédéric; Li, Xiaojun; Liu, Xiangzhuo; Li, Shuangcheng; Wigneron, Jean-Pierre.

Glob Chang Biol ; 30(1): e17006, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-37909670

RESUMO

Uncovering the mechanisms that lead to Amazon forest resilience variations is crucial to predict the impact of future climatic and anthropogenic disturbances. Here, we apply a previously used empirical resilience metrics, lag-1 month temporal autocorrelation (TAC), to vegetation optical depth data in C-band (a good proxy of the whole canopy water content) in order to explore how forest resilience variations are impacted by human disturbances and environmental drivers in the Brazilian Amazon. We found that human disturbances significantly increase the risk of critical transitions, and that the median TAC value is ~2.4 times higher in human-disturbed forests than that in intact forests, suggesting a much lower resilience in disturbed forests. Additionally, human-disturbed forests are less resilient to land surface heat stress and atmospheric water stress than intact forests. Among human-disturbed forests, forests with a more closed and thicker canopy structure, which is linked to a higher forest cover and a lower disturbance fraction, are comparably more resilient. These results further emphasize the urgent need to limit deforestation and degradation through policy intervention to maintain the resilience of the Amazon rainforests.

Assuntos

Floresta Úmida , Resiliência Psicológica , Efeitos Antropogênicos , Conservação dos Recursos Naturais/métodos , Florestas

6.

Factors affecting biochemical pregnancy loss (BPL) in preimplantation genetic testing for aneuploidy (PGT-A) cycles: machine learning-assisted identification.

Ortiz, José A; Lledó, B; Morales, R; Máñez-Grau, A; Cascales, A; Rodríguez-Arnedo, A; Castillo, Juan C; Bernabeu, A; Bernabeu, R.

Reprod Biol Endocrinol ; 22(1): 101, 2024 Aug 08.

Artigo em Inglês | MEDLINE | ID: mdl-39118049

RESUMO

PURPOSE: To determine the factors influencing the likelihood of biochemical pregnancy loss (BPL) after transfer of a euploid embryo from preimplantation genetic testing for aneuploidy (PGT-A) cycles. METHODS: The study employed an observational, retrospective cohort design, encompassing 6020 embryos from 2879 PGT-A cycles conducted between February 2013 and September 2021. Trophectoderm biopsies in day 5 (D5) or day 6 (D6) blastocysts were analyzed by next generation sequencing (NGS). Only single embryo transfers (SET) were considered, totaling 1161 transfers. Of these, 49.9% resulted in positive pregnancy tests, with 18.3% experiencing BPL. To establish a predictive model for BPL, both classical statistical methods and five different supervised classification machine learning algorithms were used. A total of forty-seven factors were incorporated as predictor variables in the machine learning models. RESULTS: Throughout the optimization process for each model, various performance metrics were computed. Random Forest model emerged as the best model, boasting the highest area under the ROC curve (AUC) value of 0.913, alongside an accuracy of 0.830, positive predictive value of 0.857, and negative predictive value of 0.807. For the selected model, SHAP (SHapley Additive exPlanations) values were determined for each of the variables to establish which had the best predictive ability. Notably, variables pertaining to embryo biopsy demonstrated the greatest predictive capacity, followed by factors associated with ovarian stimulation (COS), maternal age, and paternal age. CONCLUSIONS: The Random Forest model had a higher predictive power for identifying BPL occurrences in PGT-A cycles. Specifically, variables associated with the embryo biopsy procedure (biopsy day, number of biopsied embryos, and number of biopsied cells) and ovarian stimulation (number of oocytes retrieved and duration of stimulation), exhibited the strongest predictive power.

Assuntos

Aborto Espontâneo , Aneuploidia , Testes Genéticos , Aprendizado de Máquina , Diagnóstico Pré-Implantação , Humanos , Feminino , Gravidez , Diagnóstico Pré-Implantação/métodos , Estudos Retrospectivos , Adulto , Testes Genéticos/métodos , Aborto Espontâneo/diagnóstico , Aborto Espontâneo/genética , Aborto Espontâneo/epidemiologia , Transferência Embrionária/métodos , Blastocisto

7.

Vancomycin trough concentration in adult patients with periprosthetic joint infection: A machine learning-based covariate model.

Chen, Yue-Wen; Lin, Xi-Kai; Huang, Chen; Wu, Wei; Lin, Wei-Wei; Chen, Si; Lu, Zong-Xing; You, Ya-Yi; Liu, Zhou-Jie.

Br J Clin Pharmacol ; 90(9): 2188-2199, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-38845212

RESUMO

AIMS: Although there are various model-based approaches to individualized vancomycin (VCM) administration, few have been reported for adult patients with periprosthetic joint infection (PJI). This work attempted to develop a machine learning (ML)-based model for predicting VCM trough concentration in adult PJI patients. METHODS: The dataset of 287 VCM trough concentrations from 130 adult PJI patients was split into a training set (229) and a testing set (58) at a ratio of 8:2, and an independent external 32 concentrations were collected as a validation set. A total of 13 covariates and the target variable (VCM trough concentration) were included in the dataset. A covariate model was respectively constructed by support vector regression, random forest regression and gradient boosted regression trees and interpreted by SHapley Additive exPlanation (SHAP). RESULTS: The SHAP plots visualized the weight of the covariates in the models, with estimated glomerular filtration rate and VCM daily dose as the 2 most important factors, which were adopted for the model construction. Random forest regression was the optimal ML algorithm with a relative accuracy of 82.8% and absolute accuracy of 67.2% (R2 =.61, mean absolute error = 2.4, mean square error = 10.1), and its prediction performance was verified in the validation set. CONCLUSION: The proposed ML-based model can satisfactorily predict the VCM trough concentration in adult PJI patients. Its construction can be facilitated with only 2 clinical parameters (estimated glomerular filtration rate and VCM daily dose), and prediction accuracy can be rationalized by SHAP values, which highlights a profound practical value for clinical dosing guidance and timely treatment.

Assuntos

Antibacterianos , Aprendizado de Máquina , Infecções Relacionadas à Prótese , Vancomicina , Humanos , Feminino , Masculino , Vancomicina/farmacocinética , Vancomicina/administração & dosagem , Vancomicina/sangue , Antibacterianos/farmacocinética , Antibacterianos/administração & dosagem , Antibacterianos/sangue , Pessoa de Meia-Idade , Idoso , Infecções Relacionadas à Prótese/tratamento farmacológico , Adulto , Taxa de Filtração Glomerular , Estudos Retrospectivos , Modelos Biológicos , Idoso de 80 Anos ou mais

8.

A time series driven model for early sepsis prediction based on transformer module.

Tang, Yan; Zhang, Yu; Li, Jiaxi.

BMC Med Res Methodol ; 24(1): 23, 2024 Jan 25.

Artigo em Inglês | MEDLINE | ID: mdl-38273257

RESUMO

Sepsis remains a critical concern in intensive care units due to its high mortality rate. Early identification and intervention are paramount to improving patient outcomes. In this study, we have proposed predictive models for early sepsis prediction based on time-series data, utilizing both CNN-Transformer and LSTM-Transformer architectures. By collecting time-series data from patients at 4, 8, and 12 h prior to sepsis diagnosis and subjecting it to various network models for analysis and comparison. In contrast to traditional recurrent neural networks, our model exhibited a substantial improvement of approximately 20%. On average, our model demonstrated an accuracy of 0.964 (± 0.018), a precision of 0.956 (± 0.012), a recall of 0.967 (± 0.012), and an F1 score of 0.959 (± 0.014). Furthermore, by adjusting the time window, it was observed that the Transformer-based model demonstrated exceptional predictive capabilities, particularly within the earlier time window (i.e., 12 h before onset), thus holding significant promise for early clinical diagnosis and intervention. Besides, we employed the SHAP algorithm to visualize the weight distribution of different features, enhancing the interpretability of our model and facilitating early clinical diagnosis and intervention.

Assuntos

Sepse , Humanos , Fatores de Tempo , Sepse/diagnóstico , Sepse/terapia , Algoritmos , Unidades de Terapia Intensiva , Rememoração Mental

9.

Country-specific determinants for COVID-19 case fatality rate and response strategies from a global perspective: an interpretable machine learning framework.

Zhou, Cui; Wheelock, Åsa M; Zhang, Chutian; Ma, Jian; Li, Zhichao; Liang, Wannian; Gao, Jing; Xu, Lei.

Popul Health Metr ; 22(1): 10, 2024 Jun 03.

Artigo em Inglês | MEDLINE | ID: mdl-38831424

RESUMO

BACKGROUND: There are significant geographic inequities in COVID-19 case fatality rates (CFRs), and comprehensive understanding its country-level determinants in a global perspective is necessary. This study aims to quantify the country-specific risk of COVID-19 CFR and propose tailored response strategies, including vaccination strategies, in 156 countries. METHODS: Cross-temporal and cross-country variations in COVID-19 CFR was identified using extreme gradient boosting (XGBoost) including 35 factors from seven dimensions in 156 countries from 28 January, 2020 to 31 January, 2022. SHapley Additive exPlanations (SHAP) was used to further clarify the clustering of countries by the key factors driving CFR and the effect of concurrent risk factors for each country. Increases in vaccination rates was simulated to illustrate the reduction of CFR in different classes of countries. FINDINGS: Overall COVID-19 CFRs varied across countries from 28 Jan 2020 to 31 Jan 31 2022, ranging from 68 to 6373 per 100,000 population. During the COVID-19 pandemic, the determinants of CFRs first changed from health conditions to universal health coverage, and then to a multifactorial mixed effect dominated by vaccination. In the Omicron period, countries were divided into five classes according to risk determinants. Low vaccination-driven class (70 countries) mainly distributed in sub-Saharan Africa and Latin America, and include the majority of low-income countries (95.7%) with many concurrent risk factors. Aging-driven class (26 countries) mainly distributed in high-income European countries. High disease burden-driven class (32 countries) mainly distributed in Asia and North America. Low GDP-driven class (14 countries) are scattered across continents. Simulating a 5% increase in vaccination rate resulted in CFR reductions of 31.2% and 15.0% for the low vaccination-driven class and the high disease burden-driven class, respectively, with greater CFR reductions for countries with high overall risk (SHAP value > 0.1), but only 3.1% for the ageing-driven class. CONCLUSIONS: Evidence from this study suggests that geographic inequities in COVID-19 CFR is jointly determined by key and concurrent risks, and achieving a decreasing COVID-19 CFR requires more than increasing vaccination coverage, but rather targeted intervention strategies based on country-specific risks.

Assuntos

COVID-19 , Saúde Global , Aprendizado de Máquina , SARS-CoV-2 , Humanos , COVID-19/mortalidade , Fatores de Risco , Pandemias , Vacinas contra COVID-19 , Vacinação

10.

Machine-Learning-Assisted Descriptors Identification for Indoor Formaldehyde Oxidation Catalysts.

Cao, Xinyuan; Huang, Jisi; Du, Kexin; Tian, Yawen; Hu, Zhixin; Luo, Zhu; Wang, Jinlong; Guo, Yanbing.

Environ Sci Technol ; 58(19): 8372-8379, 2024 May 14.

Artigo em Inglês | MEDLINE | ID: mdl-38691628

RESUMO

The development of highly efficient catalysts for formaldehyde (HCHO) oxidation is of significant interest for the improvement of indoor air quality. Up to 400 works relating to the catalytic oxidation of HCHO have been published to date; however, their analysis for collective inference through conventional literature search is still a challenging task. A machine learning (ML) framework was presented to predict catalyst performance from experimental descriptors based on an HCHO oxidation catalysts database. MnOx, CeO2, Co3O4, TiO2, FeOx, ZrO2, Al2O3, SiO2, and carbon-based catalysts with different promoters were compiled from the literature. Notably, 20 descriptors including reaction catalyst composition, reaction conditions, and catalyst physical properties were collected for data mining (2263 data points). Furthermore, the eXtreme Gradient Boosting algorithm was employed, which successfully predicted the conversion efficiency of HCHO with an R-square value of 0.81. Shapley additive analysis suggested Pt/MnO2 and Ag/Ce-Co3O4 exhibited excellent catalytic performance of HCHO oxidation based on the analysis of the entire database. Validated by experimental tests and theoretical simulations, the key descriptor identified by ML, i.e., the first promoter, was further described as metal-support interactions. This study highlights ML as a useful tool for database establishment and the catalyst rational design strategy based on the importance of analysis between experimental descriptors and the performance of complex catalytic systems.

Assuntos

Poluição do Ar em Ambientes Fechados , Formaldeído , Aprendizado de Máquina , Oxirredução , Formaldeído/química , Catálise

11.

Elucidating Key Characteristics of PFAS Binding to Human Peroxisome Proliferator-Activated Receptor Alpha: An Explainable Machine Learning Approach.

Maeda, Kazuhiro; Hirano, Masashi; Hayashi, Taka; Iida, Midori; Kurata, Hiroyuki; Ishibashi, Hiroshi.

Environ Sci Technol ; 58(1): 488-497, 2024 Jan 09.

Artigo em Inglês | MEDLINE | ID: mdl-38134352

RESUMO

Per- and polyfluoroalkyl substances (PFAS) are widely employed anthropogenic fluorinated chemicals known to disrupt hepatic lipid metabolism by binding to human peroxisome proliferator-activated receptor alpha (PPARα). Therefore, screening for PFAS that bind to PPARα is of critical importance. Machine learning approaches are promising techniques for rapid screening of PFAS. However, traditional machine learning approaches lack interpretability, posing challenges in investigating the relationship between molecular descriptors and PPARα binding. In this study, we aimed to develop a novel, explainable machine learning approach to rapidly screen for PFAS that bind to PPARα. We calculated the PPARα-PFAS binding score and 206 molecular descriptors for PFAS. Through systematic and objective selection of important molecular descriptors, we developed a machine learning model with good predictive performance using only three descriptors. The molecular size (b_single) and electrostatic properties (BCUT_PEOE_3 and PEOE_VSA_PPOS) are important for PPARα-PFAS binding. Alternative PFAS are considered safer than their legacy predecessors. However, we found that alternative PFAS with many carbon atoms and ether groups exhibited a higher affinity for PPARα. Therefore, confirming the toxicity of these alternative PFAS compounds with such characteristics through biological experiments is important.

Assuntos

Fluorocarbonos , PPAR alfa , Humanos , PPAR alfa/metabolismo , Fígado/metabolismo

12.

Accurately Predicting Spatiotemporal Variations of Near-Surface Nitrous Acid (HONO) Based on a Deep Learning Approach.

Li, Xuan; Ye, Can; Lu, Keding; Xue, Chaoyang; Li, Xin; Zhang, Yuanhang.

Environ Sci Technol ; 58(29): 13035-13046, 2024 Jul 23.

Artigo em Inglês | MEDLINE | ID: mdl-38982681

RESUMO

Gaseous nitrous acid (HONO) is identified as a critical precursor of hydroxyl radicals (OH), influencing atmospheric oxidation capacity and the formation of secondary pollutants. However, large uncertainties persist regarding its formation and elimination mechanisms, impeding accurate simulation of HONO levels using chemical models. In this study, a deep neural network (DNN) model was established based on routine air quality data (O3, NO2, CO, and PM2.5) and meteorological parameters (temperature, relative humidity, solar zenith angle, and season) collected from four typical megacity clusters in China. The model exhibited robust performance on both the train sets [slope = 1.0, r2 = 0.94, root mean squared error (RMSE) = 0.29 ppbv] and two independent test sets (slope = 1.0, r2 = 0.79, and RMSE = 0.39 ppbv), demonstrated excellent capability in reproducing the spatiotemporal variations of HONO, and outperformed an observation-constrained box model incorporated with newly proposed HONO formation mechanisms. Nitrogen dioxide (NO2) was identified as the most impactful features for HONO prediction using the SHapely Additive exPlanation (SHAP) approach, highlighting the importance of NO2 conversion in HONO formation. The DNN model was further employed to predict the future change of HONO levels in different NOx abatement scenarios, which is expected to decrease 27-44% in summer as the result of 30-50% NOx reduction. These results suggest a dual effect brought by abatement of NOx emissions, leading to not only reduction of O3 and nitrate precursors but also decrease in HONO levels and hence primary radical production rates (PROx). In summary, this study demonstrates the feasibility of using deep learning approach to predict HONO concentrations, offering a promising supplement to traditional chemical models. Additionally, stringent NOx abatement would be beneficial for collaborative alleviation of O3 and secondary PM2.5.

Assuntos

Poluentes Atmosféricos , Aprendizado Profundo , Ácido Nitroso , Ácido Nitroso/química , Poluentes Atmosféricos/análise , China , Monitoramento Ambiental/métodos , Poluição do Ar

13.

Machine Learning for Polymer Design to Enhance Pervaporation-Based Organic Recovery.

Yang, Meiqi; Zhu, Jun-Jie; McGaughey, Allyson L; Priestley, Rodney D; Hoek, Eric M V; Jassby, David; Ren, Zhiyong Jason.

Environ Sci Technol ; 58(23): 10128-10139, 2024 Jun 11.

Artigo em Inglês | MEDLINE | ID: mdl-38743597

RESUMO

Pervaporation (PV) is an effective membrane separation process for organic dehydration, recovery, and upgrading. However, it is crucial to improve membrane materials beyond the current permeability-selectivity trade-off. In this research, we introduce machine learning (ML) models to identify high-potential polymers, greatly improving the efficiency and reducing cost compared to conventional trial-and-error approach. We utilized the largest PV data set to date and incorporated polymer fingerprints and features, including membrane structure, operating conditions, and solute properties. Dimensionality reduction, missing data treatment, seed randomness, and data leakage management were employed to ensure model robustness. The optimized LightGBM models achieved RMSE of 0.447 and 0.360 for separation factor and total flux, respectively (logarithmic scale). Screening approximately 1 million hypothetical polymers with ML models resulted in identifying polymers with a predicted permeation separation index >30 and synthetic accessibility score <3.7 for acetic acid extraction. This study demonstrates the promise of ML to accelerate tailored membrane designs.

Assuntos

Aprendizado de Máquina , Polímeros , Polímeros/química , Membranas Artificiais , Permeabilidade

14.

Unraveling the Influence of Satellite-Observed Land Surface Temperature on High-Resolution Mapping of Ground-Level Ozone Using Interpretable Machine Learning.

He, Qingqing; Cao, Jingru; Saide, Pablo E; Ye, Tong; Wang, Weihang.

Environ Sci Technol ; 2024 Aug 27.

Artigo em Inglês | MEDLINE | ID: mdl-39192575

RESUMO

Accurately mapping ground-level ozone concentrations at high spatiotemporal resolution (daily, 1 km) is essential for evaluating human exposure and conducting public health assessments. This requires identifying and understanding a proxy that is well-correlated with ground-level ozone variation and available with spatiotemporal high-resolution data. This study introduces a high-resolution ozone modeling method utilizing the XGBoost algorithm with satellite-derived land surface temperature (LST) as the primary predictor. Focusing on China in 2019, our model achieved a cross-validation R2 of 0.91 and a root-mean-square error (RMSE) of 13.51 µg/m3. We provide detailed maps highlighting ground-level ozone concentrations in urban areas, uncovering spatial variations previously unresolved, along with time series aligning with established understandings of ozone dynamics. Our local interpretation of the machine learning model underscores the significant contribution of LST to spatiotemporal ozone variations, surpassing other meteorological, pollutant, and geographical predictors in its influence. Validation results indicate that model performance decreases as spatial resolution becomes coarser, with R2 decreasing from 0.91 for the 1 km model to 0.85 for the 25 km model. The methodology and data sets generated by this study offer new insights into ground-level ozone variability and mapping and can significantly aid in exposure assessment and epidemiological research related to this critical environmental challenge.

15.

Comparative performance analysis of Boruta, SHAP, and Borutashap for disease diagnosis: A study with multiple machine learning algorithms.

Ejiyi, Chukwuebuka Joseph; Qin, Zhen; Ukwuoma, Chiagoziem Chima; Nneji, Grace Ugochi; Monday, Happy Nkanta; Ejiyi, Makuachukwu Bennedith; Ejiyi, Thomas Ugochukwu; Okechukwu, Uchenna; Bamisile, Olusola O.

Network ; : 1-38, 2024 Mar 21.

Artigo em Inglês | MEDLINE | ID: mdl-38511557

RESUMO

Interpretable machine learning models are instrumental in disease diagnosis and clinical decision-making, shedding light on relevant features. Notably, Boruta, SHAP (SHapley Additive exPlanations), and BorutaShap were employed for feature selection, each contributing to the identification of crucial features. These selected features were then utilized to train six machine learning algorithms, including LR, SVM, ETC, AdaBoost, RF, and LR, using diverse medical datasets obtained from public sources after rigorous preprocessing. The performance of each feature selection technique was evaluated across multiple ML models, assessing accuracy, precision, recall, and F1-score metrics. Among these, SHAP showcased superior performance, achieving average accuracies of 80.17%, 85.13%, 90.00%, and 99.55% across diabetes, cardiovascular, statlog, and thyroid disease datasets, respectively. Notably, the LGBM emerged as the most effective algorithm, boasting an average accuracy of 91.00% for most disease states. Moreover, SHAP enhanced the interpretability of the models, providing valuable insights into the underlying mechanisms driving disease diagnosis. This comprehensive study contributes significant insights into feature selection techniques and machine learning algorithms for disease diagnosis, benefiting researchers and practitioners in the medical field. Further exploration of feature selection methods and algorithms holds promise for advancing disease diagnosis methodologies, paving the way for more accurate and interpretable diagnostic models.

16.

Development and validation of a model predicting adrenal lipid-poor adenoma based on the minimum attenuation value from non-contrast CT: a dual-center retrospective study.

Zhu, Hanlin; Wu, Mengwei; Feng, Bo; Zhang, Haifeng; Hu, Chunfeng; Zhang, Tong; Han, Zhijiang.

BMC Med Imaging ; 24(1): 210, 2024 Aug 12.

Artigo em Inglês | MEDLINE | ID: mdl-39134939

RESUMO

OBJECTIVE: The early differentiation of adrenal lipid-poor adenomas from non-adenomas is a crucial step in reducing excessive examinations and treatments. This study seeks to construct an eXtreme Gradient Boosting (XGBoost) predictive model utilizing the minimum attenuation values (minAVs) from non-contrast CT (NCCT) scans to identify lipid-poor adenomas. MATERIALS AND METHODS: Retrospective analysis encompassed clinical data, minAVs, CT histogram (CTh), mean attenuation values (meanAVs), and lesion diameter from patients with pathologically or clinically confirmed adrenal lipid-poor adenomas across two medical institutions, juxtaposed with non-adenomas. Variable selection transpired in Institution A (training set), with XGBoost models established based on minAVs and CTh separately. Institution B (validation set) corroborated the diagnostic efficacy of the two models. Receiver operator characteristic (ROC) curve analysis, calibration curves, and Brier scores assessed the diagnostic performance and calibration of the models, with the Delong test gauging differences in the area under the curve (AUC) between models. SHapley Additive exPlanations (SHAP) values elucidated and visualized the models. RESULTS: The training set comprised 136 adrenal lipid-poor adenomas and 126 non-adenomas, while the validation set included 46 and 40 instances, respectively. In the training set, there were substantial inter-group differences in minAVs, CTh, meanAVs, diameter, and body mass index (BMI) (p < 0.05 for all). The AUC for the minAV and CTh models were 0.912 (95% confidence interval [CI]: 0.866-0.957) and 0.916 (95% CI: 0.873-0.958), respectively. Both models exhibited good calibration, with Brier scores of 0.141 and 0.136. In the validation set, the AUCs were 0.871 (95% CI: 0.792-0.951) and 0.878 (95% CI: 0.794-0.962), with Brier scores of 0.156 and 0.165, respectively. The Delong test revealed no statistically significant differences in AUC between the models (p > 0.05 for both). SHAP value analysis for the minAV model suggested that minAVs had the highest absolute weight (AW) and negative contribution. CONCLUSION: The XGBoost predictive model based on minAVs demonstrates effective discrimination between adrenal lipid-poor adenomas and non-adenomas. The minAV variable is easily obtainable, and its diagnostic performance is comparable to that of the CTh model. This provides a basis for patient diagnosis and treatment plan selection.

Assuntos

Neoplasias das Glândulas Suprarrenais , Tomografia Computadorizada por Raios X , Humanos , Estudos Retrospectivos , Feminino , Masculino , Pessoa de Meia-Idade , Tomografia Computadorizada por Raios X/métodos , Neoplasias das Glândulas Suprarrenais/diagnóstico por imagem , Adenoma/diagnóstico por imagem , Adulto , Idoso , Lipídeos , Curva ROC

17.

An Explainable Artificial Intelligence Model to Predict Malignant Cerebral Edema after Acute Anterior Circulating Large-Hemisphere Infarction.

Cao, Liping; Ma, Xiaoming; Huang, Wendie; Xu, Geman; Wang, Yumei; Liu, Meng; Sheng, Shiying; Mao, Keshi.

Eur Neurol ; 87(2): 54-66, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38565087

RESUMO

INTRODUCTION: Malignant cerebral edema (MCE) is a serious complication and the main cause of poor prognosis in patients with large-hemisphere infarction (LHI). Therefore, the rapid and accurate identification of potential patients with MCE is essential for timely therapy. This study utilized an artificial intelligence-based machine learning approach to establish an interpretable model for predicting MCE in patients with LHI. METHODS: This study included 314 patients with LHI not undergoing recanalization therapy. The patients were divided into MCE and non-MCE groups, and the eXtreme Gradient Boosting (XGBoost) model was developed. A confusion matrix was used to measure the prediction performance of the XGBoost model. We also utilized the SHapley Additive exPlanations (SHAP) method to explain the XGBoost model. Decision curve and receiver operating characteristic curve analyses were performed to evaluate the net benefits of the model. RESULTS: MCE was observed in 121 (38.5%) of the 314 patients with LHI. The model showed excellent predictive performance, with an area under the curve of 0.916. The SHAP method revealed the top 10 predictive variables of the MCE such as ASPECTS score, NIHSS score, CS score, APACHE II score, HbA1c, AF, NLR, PLT, GCS, and age based on their importance ranking. CONCLUSION: An interpretable predictive model can increase transparency and help doctors accurately predict the occurrence of MCE in LHI patients not undergoing recanalization therapy within 48 h of onset, providing patients with better treatment strategies and enabling optimal resource allocation.

Assuntos

Inteligência Artificial , Edema Encefálico , Humanos , Masculino , Feminino , Idoso , Edema Encefálico/etiologia , Pessoa de Meia-Idade , Aprendizado de Máquina , Infarto Cerebral/etiologia , Infarto Cerebral/diagnóstico por imagem , Estudos Retrospectivos , Prognóstico , Idoso de 80 Anos ou mais

18.

Sex and population differences in the cardiometabolic continuum: a machine learning study using the UK Biobank and ELSA-Brasil cohorts.

Paula, Daniela Polessa; Camacho, Marina; Barbosa, Odaleia; Marques, Larissa; Harter Griep, Rosane; da Fonseca, Maria Jesus Mendes; Barreto, Sandhi; Lekadir, Karim.

BMC Public Health ; 24(1): 2131, 2024 Aug 06.

Artigo em Inglês | MEDLINE | ID: mdl-39107721

RESUMO

BACKGROUND: The temporal relationships across cardiometabolic diseases (CMDs) were recently conceptualized as the cardiometabolic continuum (CMC), sequence of cardiovascular events that stem from gene-environmental interactions, unhealthy lifestyle influences, and metabolic diseases such as diabetes, and hypertension. While the physiological pathways linking metabolic and cardiovascular diseases have been investigated, the study of the sex and population differences in the CMC have still not been described. METHODS: We present a machine learning approach to model the CMC and investigate sex and population differences in two distinct cohorts: the UK Biobank (17,700 participants) and the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil) (7162 participants). We consider the following CMDs: hypertension (Hyp), diabetes (DM), heart diseases (HD: angina, myocardial infarction, or heart failure), and stroke (STK). For the identification of the CMC patterns, individual trajectories with the time of disease occurrence were clustered using k-means. Based on clinical, sociodemographic, and lifestyle characteristics, we built multiclass random forest classifiers and used the SHAP methodology to evaluate feature importance. RESULTS: Five CMC patterns were identified across both sexes and cohorts: EarlyHyp, FirstDM, FirstHD, Healthy, and LateHyp, named according to prevalence and disease occurrence time that depicted around 95%, 78%, 75%, 88% and 99% of individuals, respectively. Within the UK Biobank, more women were classified in the Healthy cluster and more men in all others. In the EarlyHyp and LateHyp clusters, isolated hypertension occurred earlier among women. Smoking habits and education had high importance and clear directionality for both sexes. For ELSA-Brasil, more men were classified in the Healthy cluster and more women in the FirstDM. The diabetes occurrence time when followed by hypertension was lower among women. Education and ethnicity had high importance and clear directionality for women, while for men these features were smoking, alcohol, and coffee consumption. CONCLUSIONS: There are clear sex differences in the CMC that varied across the UK and Brazilian cohorts. In particular, disadvantages regarding incidence and the time to onset of diseases were more pronounced in Brazil, against woman. The results show the need to strengthen public health policies to prevent and control the time course of CMD, with an emphasis on women.

Assuntos

Doenças Cardiovasculares , Aprendizado de Máquina , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Brasil/epidemiologia , Fatores de Risco Cardiometabólico , Doenças Cardiovasculares/epidemiologia , Estudos de Coortes , Estudos Longitudinais , Fatores Sexuais , Biobanco do Reino Unido , Reino Unido/epidemiologia

19.

Prediction of proliferative diabetic retinopathy using machine learning in Latino and non-Hispanic black cohorts with routine blood and urine testing.

Goldstein, Ayelet; Ding, Kun; Carasquillo, Onelys; Levine, Barton; Hasan, Aisha; Levine, Jonathan.

Ophthalmic Physiol Opt ; 2024 Jul 12.

Artigo em Inglês | MEDLINE | ID: mdl-38993175

RESUMO

PURPOSE: The objective was to predict proliferative diabetic retinopathy (PDR) in non-Hispanic Black (NHB) and Latino (LA) patients by applying machine learning algorithms to routinely collected blood and urine laboratory results. METHODS: Electronic medical records of 1124 type 2 diabetes patients treated at the Bronxcare Hospital eye clinic between January and December 2019 were analysed. Data collected included demographic information (ethnicity, age and sex), blood (fasting glucose, haemoglobin A1C [HbA1c] high-density lipoprotein [HDL], low-density lipoprotein [LDL], serum creatinine and estimated glomerular filtration rate [eGFR]) and urine (albumin-to-creatinine ratio [ACR]) test results and the outcome measure of retinopathy status. The efficacy of different machine learning models was assessed and compared. SHapley Additive exPlanations (SHAP) analysis was employed to evaluate the contribution of each feature to the model's predictions. RESULTS: The balanced random forest model surpassed other models in predicting PDR for both NHB and LA cohorts, achieving an AUC (area under the curve) of 83%. Regarding sex, the model exhibited remarkable performance for the female LA demographic, with an AUC of 87%. The SHAP analysis revealed that PDR-related factors influenced NHB and LA patients differently, with more pronounced disparity between sexes. Furthermore, the optimal cut-off values for these factors showed variations based on sex and ethnicity. CONCLUSIONS: This study demonstrates the potential of machine learning in identifying individuals at higher risk for PDR by leveraging routine blood and urine test results. It allows clinicians to prioritise at-risk individuals for timely evaluations. Furthermore, the findings emphasise the importance of accounting for both ethnicity and sex when analysing risk factors for PDR in type 2 diabetes individuals.

20.

Machine Learning for Predicting Risk and Prognosis of Acute Kidney Disease in Critically Ill Elderly Patients During Hospitalization: Internet-Based and Interpretable Model Study.

Li, Mingxia; Han, Shuzhe; Liang, Fang; Hu, Chenghuan; Zhang, Buyao; Hou, Qinlan; Zhao, Shuangping.

J Med Internet Res ; 26: e51354, 2024 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-38691403

RESUMO

BACKGROUND: Acute kidney disease (AKD) affects more than half of critically ill elderly patients with acute kidney injury (AKI), which leads to worse short-term outcomes. OBJECTIVE: We aimed to establish 2 machine learning models to predict the risk and prognosis of AKD in the elderly and to deploy the models as online apps. METHODS: Data on elderly patients with AKI (n=3542) and AKD (n=2661) from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database were used to develop 2 models for predicting the AKD risk and in-hospital mortality, respectively. Data collected from Xiangya Hospital of Central South University were for external validation. A bootstrap method was used for internal validation to obtain relatively stable results. We extracted the indicators within 24 hours of the first diagnosis of AKI and the fluctuation range of some indicators, namely delta (day 3 after AKI minus day 1), as features. Six machine learning algorithms were used for modeling; the area under the receiver operating characteristic curve (AUROC), decision curve analysis, and calibration curve for evaluating; Shapley additive explanation (SHAP) analysis for visually interpreting; and the Heroku platform for deploying the best-performing models as web-based apps. RESULTS: For the model of predicting the risk of AKD in elderly patients with AKI during hospitalization, the Light Gradient Boosting Machine (LightGBM) showed the best overall performance in the training (AUROC=0.844, 95% CI 0.831-0.857), internal validation (AUROC=0.853, 95% CI 0.841-0.865), and external (AUROC=0.755, 95% CI 0.699-0.811) cohorts. In addition, LightGBM performed well for the AKD prognostic prediction in the training (AUROC=0.861, 95% CI 0.843-0.878), internal validation (AUROC=0.868, 95% CI 0.851-0.885), and external (AUROC=0.746, 95% CI 0.673-0.820) cohorts. The models deployed as online prediction apps allowed users to predict and provide feedback to submit new data for model iteration. In the importance ranking and correlation visualization of the model's top 10 influencing factors conducted based on the SHAP value, partial dependence plots revealed the optimal cutoff of some interventionable indicators. The top 5 factors predicting the risk of AKD were creatinine on day 3, sepsis, delta blood urea nitrogen (BUN), diastolic blood pressure (DBP), and heart rate, while the top 5 factors determining in-hospital mortality were age, BUN on day 1, vasopressor use, BUN on day 3, and partial pressure of carbon dioxide (PaCO2). CONCLUSIONS: We developed and validated 2 online apps for predicting the risk of AKD and its prognostic mortality in elderly patients, respectively. The top 10 factors that influenced the AKD risk and mortality during hospitalization were identified and explained visually, which might provide useful applications for intelligent management and suggestions for future prospective research.

Assuntos

Injúria Renal Aguda , Estado Terminal , Hospitalização , Internet , Aprendizado de Máquina , Humanos , Idoso , Estado Terminal/mortalidade , Prognóstico , Injúria Renal Aguda/mortalidade , Injúria Renal Aguda/diagnóstico , Feminino , Masculino , Hospitalização/estatística & dados numéricos , Idoso de 80 Anos ou mais , Mortalidade Hospitalar , Medição de Risco/métodos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA