RESUMO
Whole-genome sequencing (WGS) is the gold standard for fully characterizing genetic variation but is still prohibitively expensive for large samples. To reduce costs, many studies sequence only a subset of individuals or genomic regions, and genotype imputation is used to infer genotypes for the remaining individuals or regions without sequencing data. However, not all variants can be well imputed, and the current state-of-the-art imputation quality metric, denoted as standard Rsq, is poorly calibrated for lower-frequency variants. Here, we propose MagicalRsq, a machine-learning-based method that integrates variant-level imputation and population genetics statistics, to provide a better calibrated imputation quality metric. Leveraging WGS data from the Cystic Fibrosis Genome Project (CFGP), and whole-exome sequence data from UK BioBank (UKB), we performed comprehensive experiments to evaluate the performance of MagicalRsq compared to standard Rsq for partially sequenced studies. We found that MagicalRsq aligns better with true R2 than standard Rsq in almost every situation evaluated, for both European and African ancestry samples. For example, when applying models trained from 1,992 CFGP sequenced samples to an independent 3,103 samples with no sequencing but TOPMed imputation from array genotypes, MagicalRsq, compared to standard Rsq, achieved net gains of 1.4 million rare, 117k low-frequency, and 18k common variants, where net gains were gained numbers of correctly distinguished variants by MagicalRsq over standard Rsq. MagicalRsq can serve as an improved post-imputation quality metric and will benefit downstream analysis by better distinguishing well-imputed variants from those poorly imputed. MagicalRsq is freely available on GitHub.
Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Humanos , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único/genética , Calibragem , Genótipo , Aprendizado de MáquinaRESUMO
In the drug development process, approximately 30% of failures are attributed to drug safety issues. In particular, the first-in-human (FIH) trial of a new drug represents one of the highest safety risks, and initial dose selection is crucial for ensuring safety in clinical trials. With traditional dose estimation methods, which extrapolate data from animals to humans, catastrophic events have occurred during Phase I clinical trials due to interspecies differences in compound sensitivity and unknown molecular mechanisms. To address this issue, this study proposes a CrossFuse-extreme gradient boosting (XGBoost) method that can directly predict the maximum recommended daily dose of a compound based on existing human research data, providing a reference for FIH dose selection. This method not only integrates multiple features, including molecular representations, physicochemical properties and compound-protein interactions, but also improves feature selection based on cross-validation. The results demonstrate that the CrossFuse-XGBoost method not only improves prediction accuracy compared to that of existing local weighted methods [k-nearest neighbor (k-NN) and variable k-NN (v-NN)] but also solves the low prediction coverage issue of v-NN, achieving full coverage of the external validation set and enabling more reliable predictions. Furthermore, this study offers a high level of interpretability by identifying the importance of different features in model construction. The 241 features with the most significant impact on the maximum recommended daily dose were selected, providing references for optimizing the structure of new compounds and guiding experimental research. The datasets and source code are freely available at https://github.com/cqmu-lq/CrossFuse-XGBoost.
Assuntos
Projetos de Pesquisa , Software , Animais , Humanos , Análise por ConglomeradosRESUMO
Lysine ß-hydroxybutyrylation is an important post-translational modification (PTM) involved in various physiological and biological processes. In this research, we introduce a novel predictor KbhbXG, which utilizes XGBoost to identify ß-hydroxybutyrylation modification sites based on protein sequence information. The traditional experimental methods employed for the identification of ß-hydroxybutyrylated sites using proteomic techniques are both costly and time-consuming. Thus, the development of computational methods and predictors can play a crucial role in facilitating the rapid identification of ß-hydroxybutyrylation sites. Our proposed KbhbXG model first utilizes machine learning algorithm XGBoost to predict ß-hydroxybutyrylation modification sites. On the independent test set, KbhbXG achieves an accuracy of 0.7457, specificity of 0.7771, and an impressive area under the curve (AUC) score of 0.8172. The high AUC score achieved by our method demonstrates its potential for effectively identifying novel ß-hydroxybutyrylation sites, thereby facilitating further research and exploration of the ß-hydroxybutyrylation process. Also, functional analyses have revealed that different organisms preferentially engage in distinct biological processes and pathways, which can provide valuable insights for understanding the mechanism of ß-hydroxybutyrylation and guide experimental verification. To promote transparency and reproducibility, we have made both the codes and dataset of KbhbXG publicly available. Researchers interested in utilizing our proposed model can access these resources at https://github.com/Lab-Xu/KbhbXG.
Assuntos
Lisina , Aprendizado de Máquina , Processamento de Proteína Pós-Traducional , Lisina/metabolismo , Lisina/química , Biologia Computacional/métodos , Humanos , Algoritmos , Software , Proteômica/métodosRESUMO
BACKGROUND: Plenty of clinical and biomedical research has unequivocally highlighted the tremendous significance of the human microbiome in relation to human health. Identifying microbes associated with diseases is crucial for early disease diagnosis and advancing precision medicine. RESULTS: Considering that the information about changes in microbial quantities under fine-grained disease states helps to enhance a comprehensive understanding of the overall data distribution, this study introduces MSignVGAE, a framework for predicting microbe-disease sign associations using signed message propagation. MSignVGAE employs a graph variational autoencoder to model noisy signed association data and extends the multi-scale concept to enhance representation capabilities. A novel strategy for propagating signed message in signed networks addresses heterogeneity and consistency among nodes connected by signed edges. Additionally, we utilize the idea of denoising autoencoder to handle the noise in similarity feature information, which helps overcome biases in the fused similarity data. MSignVGAE represents microbe-disease associations as a heterogeneous graph using similarity information as node features. The multi-class classifier XGBoost is utilized to predict sign associations between diseases and microbes. CONCLUSIONS: MSignVGAE achieves AUROC and AUPR values of 0.9742 and 0.9601, respectively. Case studies on three diseases demonstrate that MSignVGAE can effectively capture a comprehensive distribution of associations by leveraging signed information.
Assuntos
Microbiota , Humanos , Biologia Computacional/métodos , Algoritmos , DoençaRESUMO
BACKGROUND: Metabolic pathway prediction is one possible approach to address the problem in system biology of reconstructing an organism's metabolic network from its genome sequence. Recently there have been developments in machine learning-based pathway prediction methods that conclude that machine learning-based approaches are similar in performance to the most used method, PathoLogic which is a rule-based method. One issue is that previous studies evaluated PathoLogic without taxonomic pruning which decreases its performance. RESULTS: In this study, we update the evaluation results from previous studies to demonstrate that PathoLogic with taxonomic pruning outperforms previous machine learning-based approaches and that further improvements in performance need to be made for them to be competitive. Furthermore, we introduce mlXGPR, a XGBoost-based metabolic pathway prediction method based on the multi-label classification pathway prediction framework introduced from mlLGPR. We also improve on this multi-label framework by utilizing correlations between labels using classifier chains. We propose a ranking method that determines the order of the chain so that lower performing classifiers are placed later in the chain to utilize the correlations between labels more. We evaluate mlXGPR with and without classifier chains on single-organism and multi-organism benchmarks. Our results indicate that mlXGPR outperform other previous pathway prediction methods including PathoLogic with taxonomic pruning in terms of hamming loss, precision and F1 score on single organism benchmarks. CONCLUSIONS: The results from our study indicate that the performance of machine learning-based pathway prediction methods can be substantially improved and can even outperform PathoLogic with taxonomic pruning.
Assuntos
Aprendizado de Máquina , Redes e Vias Metabólicas , Biologia , GenomaRESUMO
Lipidomics emerges as a promising research field with the potential to help in personalized risk stratification and improve our understanding on the functional role of individual lipid species in the metabolic perturbations occurring in coronary artery disease (CAD). This study aimed to utilize a machine learning approach to provide a lipid panel able to identify patients with obstructive CAD. In this posthoc analysis of the prospective CorLipid trial, we investigated the lipid profiles of 146 patients with suspected CAD, divided into two categories based on the existence of obstructive CAD. In total, 517 lipid species were identified, from which 288 lipid species were finally quantified, including glycerophospholipids, glycerolipids, and sphingolipids. Univariate and multivariate statistical analyses have shown significant discrimination between the serum lipidomes of patients with obstructive CAD. Finally, the XGBoost algorithm identified a panel of 17 serum biomarkers (5 sphingolipids, 7 glycerophospholipids, a triacylglycerol, galectin-3, glucose, LDL, and LDH) as totally sensitive (100% sensitivity, 62.1% specificity, 100% negative predictive value) for the prediction of obstructive CAD. Our findings shed light on dysregulated lipid metabolism's role in CAD, validating existing evidence and suggesting promise for novel therapies and improved risk stratification.
Assuntos
Algoritmos , Biomarcadores , Doença da Artéria Coronariana , Lipidômica , Humanos , Doença da Artéria Coronariana/sangue , Lipidômica/métodos , Masculino , Feminino , Biomarcadores/sangue , Pessoa de Meia-Idade , Idoso , Aprendizado de Máquina , Lipídeos/sangue , Metabolismo dos Lipídeos , Esfingolipídeos/sangue , Estudos ProspectivosRESUMO
This study aimed to identify genes shared by metabolic dysfunction-associated fatty liver disease (MASH) and diabetic nephropathy (DN) and the effect of extracellular matrix (ECM) receptor interaction genes on them. Datasets with MASH and DN were downloaded from the Gene Expression Omnibus (GEO) database. Pearson's coefficients assessed the correlation between ECM-receptor interaction genes and cross talk genes. The coexpression network of co-expression pairs (CP) genes was integrated with its protein-protein interaction (PPI) network, and machine learning was employed to identify essential disease-representing genes. Finally, immuno-penetration analysis was performed on the MASH and DN gene datasets using the CIBERSORT algorithm to evaluate the plausibility of these genes in diseases. We found 19 key CP genes. Fos proto-oncogene (FOS), belonging to the IL-17 signalling pathway, showed greater centrality PPI network; Hyaluronan Mediated Motility Receptor (HMMR), belonging to ECM-receptor interaction genes, showed most critical in the co-expression network map of 19 CP genes; Forkhead Box C1 (FOXC1), like FOS, showed a high ability to predict disease in XGBoost analysis. Further immune infiltration showed a clear positive correlation between FOS/FOXC1 and mast cells that secrete IL-17 during inflammation. Combining the results of previous studies, we suggest a FOS/FOXC1/HMMR regulatory axis in MASH and DN may be associated with mast cells in the acting IL-17 signalling pathway. Extracellular HMMR may regulate the IL-17 pathway represented by FOS through the Mitogen-Activated Protein Kinase 1 (ERK) or PI3K-Akt-mTOR pathway. HMMR may serve as a signalling carrier between MASH and DN and could be targeted for therapeutic development.
Assuntos
Nefropatias Diabéticas , Interleucina-17 , Humanos , Fosfatidilinositol 3-Quinases , Biologia Computacional , Aprendizado de MáquinaRESUMO
The present study was designed to test the potential utility of regional cerebral oxygen saturation (rcSO2) in detecting term infants with brain injury. The study also examined whether quantitative rcSO2 features are associated with grade of hypoxic ischaemic encephalopathy (HIE). We analysed 58 term infants with HIE (>36 weeks of gestational age) enrolled in a prospective observational study. All newborn infants had a period of continuous rcSO2 monitoring and magnetic resonance imaging (MRI) assessment during the first week of life. rcSO2 Signals were pre-processed and quantitative features were extracted. Machine-learning and deep-learning models were developed to detect adverse outcome (brain injury on MRI or death in the first week) using the leave-one-out cross-validation approach and to assess the association between rcSO2 and HIE grade (modified Sarnat - at 1 h). The machine-learning model (rcSO2 excluding prolonged relative desaturations) significantly detected infant MRI outcome or death in the first week of life [area under the curve (AUC) = 0.73, confidence interval (CI) = 0.59-0.86, Matthew's correlation coefficient = 0.35]. In agreement, deep learning models detected adverse outcome with an AUC = 0.64, CI = 0.50-0.79. We also report a significant association between rcSO2 features and HIE grade using a machine learning approach (AUC = 0.81, CI = 0.73-0.90). We conclude that automated analysis of rcSO2 using machine learning methods in term infants with HIE was able to determine, with modest accuracy, infants with adverse outcome. De novo approaches to signal analysis of NIRS holds promise to aid clinical decision making in the future. KEY POINTS: Hypoxic-induced neonatal brain injury contributes to both short- and long-term functional deficits. Non-invasive continuous monitoring of brain oxygenation using near-infrared- spectroscopy offers a potential new insight to the development of serious injury. In this study, characteristics of the NIRS signal were summarised using either predefined features or data-driven feature extraction, both were combined with a machine learning approach to predict short-term brain injury. Using data from a cohort of term infants with hypoxic ischaemic encephalopathy, the present study illustrates that automated analysis of regional cerebral oxygen saturation rcSO2, using either machine learning or deep learning methods, was able to determine infants with adverse outcome.
RESUMO
Intravenous ganciclovir and oral valganciclovir display significant variability in ganciclovir pharmacokinetics, particularly in children. Therapeutic drug monitoring currently relies on the area under the concentration-time (AUC). Machine-learning (ML) algorithms represent an interesting alternative to Maximum-a-Posteriori Bayesian-estimators for AUC estimation. The goal of our study was to develop and validate an ML-based limited sampling strategy (LSS) approach to determine ganciclovir AUC0-24 after administration of either intravenous ganciclovir or oral valganciclovir in children. Pharmacokinetic parameters from four published population pharmacokinetic models, in addition to the World Health Organization growth curve for children, were used in the mrgsolve R package to simulate 10,800 pharmacokinetic profiles of children. Different ML algorithms were trained to predict AUC0-24 based on different combinations of two or three samples. Performances were evaluated in a simulated test set and in an external data set of real patients. The best estimation performances in the test set were obtained with the Xgboost algorithm using a 2 and 6 hours post dose LSS for oral valganciclovir (relative mean prediction error [rMPE] = 0.4% and relative root mean square error [rRMSE] = 5.7%) and 0 and 2 hours post dose LSS for intravenous ganciclovir (rMPE = 0.9% and rRMSE = 12.4%). In the external data set, the performance based on these two sample LSS was acceptable: rMPE = 0.2% and rRMSE = 16.5% for valganciclovir and rMPE = -9.7% and rRMSE = 17.2% for intravenous ganciclovir. The Xgboost algorithm developed resulted in a clinically relevant individual estimation using only two blood samples. This will improve the implementation of AUC-targeted ganciclovir therapeutic drug monitoring in children.
Assuntos
Antivirais , Área Sob a Curva , Monitoramento de Medicamentos , Ganciclovir , Aprendizado de Máquina , Valganciclovir , Humanos , Ganciclovir/farmacocinética , Ganciclovir/análogos & derivados , Valganciclovir/farmacocinética , Criança , Antivirais/farmacocinética , Antivirais/administração & dosagem , Monitoramento de Medicamentos/métodos , Pré-Escolar , Teorema de Bayes , Algoritmos , Administração Oral , Masculino , Feminino , Infecções por Citomegalovirus/tratamento farmacológico , Lactente , Administração Intravenosa , AdolescenteRESUMO
Daptomycin is a concentration-dependent lipopeptide antibiotic for which exposure/effect relationships have been shown. Machine learning (ML) algorithms, developed to predict the individual exposure to drugs, have shown very good performances in comparison to maximum a posteriori Bayesian estimation (MAP-BE). The aim of this work was to predict the area under the blood concentration curve (AUC) of daptomycin from two samples and a few covariates using XGBoost ML algorithm trained on Monte Carlo simulations. Five thousand one hundred fifty patients were simulated from two literature population pharmacokinetics models. Data from the first model were split into a training set (75%) and a testing set (25%). Four ML algorithms were built to learn AUC based on daptomycin blood concentration samples at pre-dose and 1 h post-dose. The XGBoost model (best ML algorithm) with the lowest root mean square error (RMSE) in a 10-fold cross-validation experiment was evaluated in both the test set and the simulations from the second population pharmacokinetic model (validation). The ML model based on the two concentrations, the differences between these concentrations, and five other covariates (sex, weight, daptomycin dose, creatinine clearance, and body temperature) yielded very good AUC estimation in the test (relative bias/RMSE = 0.43/7.69%) and validation sets (relative bias/RMSE = 4.61/6.63%). The XGBoost ML model developed allowed accurate estimation of daptomycin AUC using C0, C1h, and a few covariates and could be used for exposure estimation and dose adjustment. This ML approach can facilitate the conduct of future therapeutic drug monitoring (TDM) studies.
Assuntos
Antibacterianos , Área Sob a Curva , Teorema de Bayes , Daptomicina , Aprendizado de Máquina , Método de Monte Carlo , Daptomicina/farmacocinética , Daptomicina/sangue , Humanos , Antibacterianos/farmacocinética , Antibacterianos/sangue , Masculino , Feminino , Algoritmos , Pessoa de Meia-Idade , Adulto , IdosoRESUMO
The identification of relevant biomarkers from high-dimensional cancer data remains a significant challenge due to the complexity and heterogeneity inherent in various cancer types. Conventional feature selection methods often struggle to effectively navigate the vast solution space while maintaining high predictive accuracy. In response to these challenges, we introduce a novel feature selection approach that integrates Random Drift Optimization (RDO) with XGBoost, specifically designed to enhance the performance of cancer classification tasks. Our proposed framework not only improves classification accuracy but also offers valuable insights into the underlying biological mechanisms driving cancer progression. Through comprehensive experiments conducted on real-world cancer datasets, including Central Nervous System (CNS), Leukemia, Breast, and Ovarian cancers, we demonstrate the efficacy of our method in identifying a smaller subset of unique and relevant genes. This selection results in significantly improved classification efficiency and accuracy. When compared with popular classifiers such as Support Vector Machine, K-Nearest Neighbor, and Naive Bayes, our approach consistently outperforms these models in terms of both accuracy and F-measure metrics. For instance, our framework achieved an accuracy of 97.24% in the CNS dataset, 99.14% in Leukemia, 95.21% in Ovarian, and 87.62% in Breast cancer, showcasing its robustness and effectiveness across different types of cancer data. These results underline the potential of our RDO-XGBoost framework as a promising solution for feature selection in cancer data analysis, offering enhanced predictive performance and valuable biological insights.
Assuntos
Neoplasias , Humanos , Neoplasias/classificação , Algoritmos , Máquina de Vetores de Suporte , Biomarcadores Tumorais/genética , Teorema de Bayes , Biologia Computacional/métodos , FemininoRESUMO
Post-translational modification (PTM) refers to the covalent and enzymatic modification of proteins after protein biosynthesis, which orchestrates a variety of biological processes. Detecting PTM sites in proteome scale is one of the key steps to in-depth understanding their regulation mechanisms. In this study, we presented an integrated method based on eXtreme Gradient Boosting (XGBoost), called iRice-MS, to identify 2-hydroxyisobutyrylation, crotonylation, malonylation, ubiquitination, succinylation and acetylation in rice. For each PTM-specific model, we adopted eight feature encoding schemes, including sequence-based features, physicochemical property-based features and spatial mapping information-based features. The optimal feature set was identified from each encoding, and their respective models were established. Extensive experimental results show that iRice-MS always display excellent performance on 5-fold cross-validation and independent dataset test. In addition, our novel approach provides the superiority to other existing tools in terms of AUC value. Based on the proposed model, a web server named iRice-MS was established and is freely accessible at http://lin-group.cn/server/iRice-MS.
Assuntos
Oryza , Processamento de Proteína Pós-Traducional , Acetilação , Biologia Computacional , Modelos Biológicos , Oryza/metabolismo , Processamento de Proteína Pós-Traducional/fisiologia , Proteoma/metabolismo , UbiquitinaçãoRESUMO
BACKGROUND: Patients with alpha-fetoprotein (AFP)-positive hepatocellular carcinoma (HCC) have aggressive biological behavior and poor prognosis. Therefore, survival time is one of the greatest concerns for patients with AFP-positive HCC. This study aimed to demonstrate the utilization of six machine learning (ML)-based prognostic models to predict overall survival of patients with AFP-positive HCC. METHODS: Data on patients with AFP-positive HCC were extracted from the Surveillance, Epidemiology, and End Results database. Six ML algorithms (extreme gradient boosting [XGBoost], logistic regression [LR], support vector machine [SVM], random forest [RF], K-nearest neighbor [KNN], and decision tree [ID3]) were used to develop the prognostic models of patients with AFP-positive HCC at one year, three years, and five years. Area under the receiver operating characteristic curve (AUC), confusion matrix, calibration curves, and decision curve analysis (DCA) were used to evaluate the model. RESULTS: A total of 2,038 patients with AFP-positive HCC were included for analysis. The 1-, 3-, and 5-year overall survival rates were 60.7%, 28.9%, and 14.3%, respectively. Seventeen features regarding demographics and clinicopathology were included in six ML algorithms to generate a prognostic model. The XGBoost model showed the best performance in predicting survival at 1-year (train set: AUC = 0.771; test set: AUC = 0.782), 3-year (train set: AUC = 0.763; test set: AUC = 0.749) and 5-year (train set: AUC = 0.807; test set: AUC = 0.740). Furthermore, for 1-, 3-, and 5-year survival prediction, the accuracy in the training and test sets was 0.709 and 0.726, 0.721 and 0.726, and 0.778 and 0.784 for the XGBoost model, respectively. Calibration curves and DCA exhibited good predictive performance as well. CONCLUSIONS: The XGBoost model exhibited good predictive performance, which may provide physicians with an effective tool for early medical intervention and improve the survival of patients.
Assuntos
Carcinoma Hepatocelular , Neoplasias Hepáticas , Aprendizado de Máquina , alfa-Fetoproteínas , Feminino , Humanos , Masculino , Algoritmos , alfa-Fetoproteínas/metabolismo , Área Sob a Curva , Calibragem , Carcinoma Hepatocelular/sangue , Carcinoma Hepatocelular/diagnóstico , Carcinoma Hepatocelular/patologia , Carcinoma Hepatocelular/mortalidade , Neoplasias Hepáticas/sangue , Neoplasias Hepáticas/diagnóstico , Neoplasias Hepáticas/patologia , Neoplasias Hepáticas/mortalidade , Prognóstico , Curva ROCRESUMO
The H1N1pdm09 virus has been a persistent threat to public health since the 2009 pandemic. Particularly, since the relaxation of COVID-19 pandemic mitigation measures, the influenza virus and SARS-CoV-2 have been concurrently prevalent worldwide. To determine the antigenic evolution pattern of H1N1pdm09 and develop preventive countermeasures, we collected influenza sequence data and immunological data to establish a new antigenic evolution analysis framework. A machine learning model (XGBoost, accuracy = 0.86, area under the receiver operating characteristic curve = 0.89) was constructed using epitopes, physicochemical properties, receptor binding sites, and glycosylation sites as features to predict the antigenic similarity relationships between influenza strains. An antigenic correlation network was constructed, and the Markov clustering algorithm was used to identify antigenic clusters. Subsequently, the antigenic evolution pattern of H1N1pdm09 was analyzed at the global and regional scales across three continents. We found that H1N1pdm09 evolved into around five antigenic clusters between 2009 and 2023 and that their antigenic evolution trajectories were characterized by cocirculation of multiple clusters, low-level persistence of former dominant clusters, and local heterogeneity of cluster circulations. Furthermore, compared with the seasonal H1N1 virus, the potential cluster-transition determining sites of H1N1pdm09 were restricted to epitopes Sa and Sb. This study demonstrated the effectiveness of machine learning methods for characterizing antigenic evolution of viruses, developed a specific model to rapidly identify H1N1pdm09 antigenic variants, and elucidated their evolutionary patterns. Our findings may provide valuable support for the implementation of effective surveillance strategies and targeted prevention efforts to mitigate the impact of H1N1pdm09.
Assuntos
Antígenos Virais , Vírus da Influenza A Subtipo H1N1 , Influenza Humana , Vírus da Influenza A Subtipo H1N1/genética , Vírus da Influenza A Subtipo H1N1/imunologia , Humanos , Influenza Humana/epidemiologia , Influenza Humana/prevenção & controle , Influenza Humana/virologia , Influenza Humana/imunologia , Antígenos Virais/genética , Antígenos Virais/imunologia , Aprendizado de Máquina , Evolução Molecular , Epitopos/genética , Epitopos/imunologia , COVID-19/epidemiologia , COVID-19/prevenção & controle , COVID-19/virologia , COVID-19/imunologia , Pandemias/prevenção & controle , Glicoproteínas de Hemaglutininação de Vírus da Influenza/genética , Glicoproteínas de Hemaglutininação de Vírus da Influenza/imunologia , SARS-CoV-2/genética , SARS-CoV-2/imunologiaRESUMO
Fuchs uveitis syndrome (FUS) is a commonly misdiagnosed uveitis syndrome often presenting as an asymptomatic mild inflammatory condition until complications arise. The diagnosis of this disease remains clinical because of the lack of specific laboratory tests. The aqueous humor (AH) is a complex fluid containing nutrients and metabolic wastes from the eye. Changes in the AH protein provide important information for diagnosing intraocular diseases. This study aimed to analyze the proteomic profile of AH in individuals diagnosed with FUS and to identify potential biomarkers of the disease. We used liquid chromatography-tandem mass spectrometry-based proteomic methods to evaluate the AH protein profiles of all 37 samples, comprising 15 patients with FUS, six patients with Posner-Schlossman syndrome (PSS), and 16 patients with age-related cataract. A total of 538 proteins were identified from a comprehensive spectral library of 634 proteins. Subsequent differential expression analysis, enrichment analysis, and construction of key sub-networks revealed that the inflammatory response, complement activation and hypoxia might be crucial in mediating the process of FUS. The hypoxia inducible factor-1 may serve as a key regulator and therapeutic target. Additionally, the innate and adaptive immune responses are considered dominant in the patients with FUS. A diagnostic model was constructed using machine-learning algorithm to classify FUS, PSS, and normal controls. Two proteins, complement C1q subcomponent subunit B and secretogranin-1, were found to have the highest scores by the Extreme Gradient Boosting, suggesting their potential utility as a biomarker panel. Furthermore, these two proteins as biomarkers were validated in a cohort of 18 patients using high resolution multiple reaction monitoring assays. Therefore, this study contributes to advancing of the current knowledge of FUS pathogenesis and promotes the development of effective diagnostic strategies.
Assuntos
Glaucoma de Ângulo Aberto , Uveíte , Humanos , Humor Aquoso/metabolismo , Proteômica , Uveíte/metabolismo , Glaucoma de Ângulo Aberto/metabolismo , Biomarcadores/metabolismo , Hipóxia/metabolismoRESUMO
BACKGROUND: The incidence of graft failure following liver transplantation (LTx) is consistent. While traditional risk scores for LTx have limited accuracy, the potential of machine learning (ML) in this area remains uncertain, despite its promise in other transplant domains. This study aims to determine ML's predictive limitations in LTx by replicating methods used in previous heart transplant research. METHODS: This study utilized the UNOS STAR database, selecting 64,384 adult patients who underwent LTx between 2010 and 2020. Gradient boosting models (XGBoost and LightGBM) were used to predict 14, 30, and 90-day graft failure compared to conventional logistic regression model. Models were evaluated using both shuffled and rolling cross-validation (CV) methodologies. Model performance was assessed using the AUC across validation iterations. RESULTS: In a study comparing predictive models for 14-day, 30-day and 90-day graft survival, LightGBM consistently outperformed other models, achieving the highest AUC of.740,.722, and.700 in shuffled CV methods. However, in rolling CV the accuracy of the model declined across every ML algorithm. The analysis revealed influential factors for graft survival prediction across all models, including total bilirubin, medical condition, recipient age, and donor AST, among others. Several features like donor age and recipient diabetes history were important in two out of three models. CONCLUSIONS: LightGBM enhances short-term graft survival predictions post-LTx. However, due to changing medical practices and selection criteria, continuous model evaluation is essential. Future studies should focus on temporal variations, clinical implications, and ensure model transparency for broader medical utility.
Assuntos
Transplante de Fígado , Adulto , Humanos , Transplante de Fígado/efeitos adversos , Projetos de Pesquisa , Algoritmos , Bilirrubina , Aprendizado de MáquinaRESUMO
Necrophagous flies, particularly blowflies, serve as vital indicators in forensic entomology and ecological studies, contributing to minimum postmortem interval estimations and environmental monitoring. The study investigates variations in the predominant cuticular hydrocarbons (CHCs) viz. n-C25, n-C27, n-C28, and n-C29 of empty puparia of Calliphora vicina Robineau-Desvoidy, 1830, (Diptera: Calliphoridae) across diverse environmental conditions, including burial, above-ground and indoor settings, over 90 days. Notable trends include a significant decrease in n-C25 concentrations in buried and above-ground conditions over time, while n-C27 concentrations decline in buried and above-ground conditions but remain stable indoors. Burial conditions show significant declines in n-C27 and n-C29 concentrations over time, indicating environmental influences. Conversely, above-ground conditions exhibit uniform declines in all hydrocarbons. Indoor conditions remain relatively stable, with weak correlations between weathering time and CHC concentrations. Additionally, machine learning techniques, specifically Extreme Gradient Boosting (XGBoost), are employed for age estimation of empty puparia, yielding accurate predictions across different outdoor and indoor conditions. These findings highlight the subtle responses of CHC profiles to environmental stimuli, underscoring the importance of considering environmental factors in forensic entomology and ecological research. The study advances the understanding of insect remnant degradation processes and their forensic implications. Furthermore, integrating machine learning with entomological expertise offers standardized methodologies for age determination, enhancing the reliability of entomological evidence in legal contexts and paving the way for future research and development.
Assuntos
Calliphoridae , Entomologia Forense , Hidrocarbonetos , Mudanças Depois da Morte , Pupa , Animais , Hidrocarbonetos/análise , Pupa/química , Sepultamento , Aprendizado de Máquina , Comportamento Alimentar , Cromatografia Gasosa-Espectrometria de MassasRESUMO
BACKGROUND: There are significant geographic inequities in COVID-19 case fatality rates (CFRs), and comprehensive understanding its country-level determinants in a global perspective is necessary. This study aims to quantify the country-specific risk of COVID-19 CFR and propose tailored response strategies, including vaccination strategies, in 156 countries. METHODS: Cross-temporal and cross-country variations in COVID-19 CFR was identified using extreme gradient boosting (XGBoost) including 35 factors from seven dimensions in 156 countries from 28 January, 2020 to 31 January, 2022. SHapley Additive exPlanations (SHAP) was used to further clarify the clustering of countries by the key factors driving CFR and the effect of concurrent risk factors for each country. Increases in vaccination rates was simulated to illustrate the reduction of CFR in different classes of countries. FINDINGS: Overall COVID-19 CFRs varied across countries from 28 Jan 2020 to 31 Jan 31 2022, ranging from 68 to 6373 per 100,000 population. During the COVID-19 pandemic, the determinants of CFRs first changed from health conditions to universal health coverage, and then to a multifactorial mixed effect dominated by vaccination. In the Omicron period, countries were divided into five classes according to risk determinants. Low vaccination-driven class (70 countries) mainly distributed in sub-Saharan Africa and Latin America, and include the majority of low-income countries (95.7%) with many concurrent risk factors. Aging-driven class (26 countries) mainly distributed in high-income European countries. High disease burden-driven class (32 countries) mainly distributed in Asia and North America. Low GDP-driven class (14 countries) are scattered across continents. Simulating a 5% increase in vaccination rate resulted in CFR reductions of 31.2% and 15.0% for the low vaccination-driven class and the high disease burden-driven class, respectively, with greater CFR reductions for countries with high overall risk (SHAP value > 0.1), but only 3.1% for the ageing-driven class. CONCLUSIONS: Evidence from this study suggests that geographic inequities in COVID-19 CFR is jointly determined by key and concurrent risks, and achieving a decreasing COVID-19 CFR requires more than increasing vaccination coverage, but rather targeted intervention strategies based on country-specific risks.
Assuntos
COVID-19 , Saúde Global , Aprendizado de Máquina , SARS-CoV-2 , Humanos , COVID-19/mortalidade , Fatores de Risco , Pandemias , Vacinas contra COVID-19 , VacinaçãoRESUMO
Accurately mapping ground-level ozone concentrations at high spatiotemporal resolution (daily, 1 km) is essential for evaluating human exposure and conducting public health assessments. This requires identifying and understanding a proxy that is well-correlated with ground-level ozone variation and available with spatiotemporal high-resolution data. This study introduces a high-resolution ozone modeling method utilizing the XGBoost algorithm with satellite-derived land surface temperature (LST) as the primary predictor. Focusing on China in 2019, our model achieved a cross-validation R2 of 0.91 and a root-mean-square error (RMSE) of 13.51 µg/m3. We provide detailed maps highlighting ground-level ozone concentrations in urban areas, uncovering spatial variations previously unresolved, along with time series aligning with established understandings of ozone dynamics. Our local interpretation of the machine learning model underscores the significant contribution of LST to spatiotemporal ozone variations, surpassing other meteorological, pollutant, and geographical predictors in its influence. Validation results indicate that model performance decreases as spatial resolution becomes coarser, with R2 decreasing from 0.91 for the 1 km model to 0.85 for the 25 km model. The methodology and data sets generated by this study offer new insights into ground-level ozone variability and mapping and can significantly aid in exposure assessment and epidemiological research related to this critical environmental challenge.
Assuntos
Aprendizado de Máquina , Ozônio , Temperatura , Ozônio/análise , Monitoramento Ambiental/métodos , China , Poluentes Atmosféricos , HumanosRESUMO
Rheumatoid arthritis (RA) is a chronic autoimmune disorder characterized by inflammation and pain in the joints, which can lead to joint damage and disability over time. Nanotechnology in RA treatment involves using nano-scale materials to improve drug delivery efficiency, specifically targeting inflamed tissues and minimizing side effects. The study aims to develop and optimize a new class of eco-friendly and highly effective layered nanomaterials for targeted drug delivery in the treatment of RA. The study's primary objective is to develop and optimize a new class of layered nanomaterials that are both eco-friendly and highly effective in the targeted delivery of medications for treating RA. Also, by employing a combination of Adaptive Neuron-Fuzzy Inference System (ANFIS) and Extreme Gradient Boosting (XGBoost) machine learning models, the study aims to precisely control nanomaterials synthesis, structural characteristics, and release mechanisms, ensuring delivery of anti-inflammatory drugs directly to the affected joints with minimal side effects. The in vitro evaluations demonstrated a sustained and controlled drug release, with an Encapsulation Efficiency (EE) of 85% and a Loading Capacity (LC) of 10%. In vivo studies in a murine arthritis model showed a 60% reduction in inflammation markers and a 50% improvement in mobility, with no significant toxicity observed in major organs. The machine learning models exhibited high predictive accuracy with a Root Mean Square Error (RMSE) of 0.667, a correlation coefficient (r) of 0.867, and an R2 value of 0.934. The nanomaterials also demonstrated a specificity rate of 87.443%, effectively targeting inflamed tissues with minimal off-target effects. These findings highlight the potential of this novel approach to significantly enhance RA treatment by improving drug delivery precision and minimizing systemic side effects.