RESUMEN
Common Variable Immunodeficiency (CVID) is a primary immunodeficiency characterized by reduced levels of specific immunoglobulins, resulting in frequent infections, autoimmune disorders, increased cancer risk, and diminished antibody production despite an adequate B cell count. With its clinical manifestations being highly variable, the classification of CVID, including the widely recognized Freiburg classification, is primarily based on clinical symptoms and genetic variations. Our study aims to refine the classification of CVID by analyzing transcriptomics data to identify distinct disease subtypes. We utilized the GSE51405 dataset, examining transcriptomic profiles from 30 CVID patients without complications. Employing a combination of clustering techniques-KMeans, hierarchical agglomerative clustering, spectral clustering, and Gaussian Mixture models-and differential gene expression analysis with R's limma package, we integrated molecular findings with demographic data (age and gender) through correlation analysis and identified common genes among clusters. Three distinct clusters of CVID patients were identified using KMeans, Agglomerative Clustering, and Gaussian Mixture Models, highlighting the disease's heterogeneity. Differential expression analysis unveiled 31 genes with variable expression levels across these clusters. Notably, nine genes (EIF5A, RPL21, ANP32A, DTX3L, NCF2, CDC42EP3, CHP1, FOLR3, and DEFA4) exhibited consistent differential expression across all clusters, independent of demographic factors. The study recommends categorizing patients based on the four genes, NCF2, CHP1, FOLR3, and DEFA4-as they may assist in prognostic prediction. Transcriptomic analysis of common variable immunodeficiency (CVID) patients identified three distinct clusters based on gene expression, independent of age and gender. Nine differentially expressed genes were identified across these clusters, suggesting potential biomarkers for CVID subtype classification. These findings highlight the genetic heterogeneity of CVID and provide novel insights into disease classification and potential personalized treatment approaches.
Asunto(s)
Inmunodeficiencia Variable Común , Perfilación de la Expresión Génica , Transcriptoma , Humanos , Inmunodeficiencia Variable Común/genética , Inmunodeficiencia Variable Común/clasificación , Femenino , Masculino , Adulto , Análisis por Conglomerados , Persona de Mediana Edad , Adulto Joven , Adolescente , NADPH OxidasasRESUMEN
Grapes are considered a crucial fruit crop with extensive uses globally. Cluster compactness is an undesirable trait for the productivity of Yaghooti grape, and it reduces its desirability among consumers. The RNA-Seq data were analyzed in three stages of cluster development using the FEELnc software, leading to the identification of 849 lncRNAs. 183 lncRNAs were differentially expressed during cluster development stages. The GO and KEGG enrichment analyses of these lncRNAs revealed that they target 1,814 genes, including CKX, PG, PME, NPC2, and UGT. The analysis demonstrated a relationship between these lncRNAs and key processes such as grape growth and development, secondary metabolite synthesis, and resistance to both biotic and abiotic stresses. These findings, combined with molecular experiments on lncRNA interactions with other regulatory factors, could enhance Yaghooti grape quality and decrease cluster compactness.
RESUMEN
In the context of early disease detection, machine learning (ML) has emerged as a vital tool. Feature selection (FS) algorithms play a crucial role in ensuring the accuracy of predictive models by identifying the most influential variables. This study, focusing on a retrospective cohort of 4778 COVID-19 patients from Iran, explores the performance of various FS methods, including filter, embedded, and hybrid approaches, in predicting mortality outcomes. The researchers leveraged 115 routine clinical, laboratory, and demographic features and employed 13 ML models to assess the effectiveness of these FS methods based on classification accuracy, predictive accuracy, and statistical tests. The results indicate that a Hybrid Boruta-VI model combined with the Random Forest algorithm demonstrated superior performance, achieving an accuracy of 0.89, an F1 score of 0.76, and an AUC value of 0.95 on test data. Key variables identified as important predictors of adverse outcomes include age, oxygen saturation levels, albumin levels, neutrophil counts, platelet levels, and markers of kidney function. These findings highlight the potential of advanced FS techniques and ML models in enhancing early disease detection and informing clinical decision-making.
Asunto(s)
COVID-19 , Aprendizaje Automático , Humanos , COVID-19/diagnóstico , COVID-19/epidemiología , COVID-19/mortalidad , COVID-19/virología , Estudios Retrospectivos , Masculino , Femenino , Persona de Mediana Edad , SARS-CoV-2/aislamiento & purificación , Algoritmos , Irán/epidemiología , Anciano , AdultoRESUMEN
Metagenomics has opened new avenues for exploring the genetic potential of uncultured microorganisms, which may serve as promising sources of enzymes and natural products for industrial applications. Identifying enzymes with improved catalytic properties from the vast amount of available metagenomic data poses a significant challenge that demands the development of novel computational and functional screening tools. The catalytic properties of all enzymes are primarily dictated by their structures, which are predominantly determined by their amino acid sequences. However, this aspect has not been fully considered in the enzyme bioprospecting processes. With the accumulating number of available enzyme sequences and the increasing demand for discovering novel biocatalysts, structural and functional modeling can be employed to identify potential enzymes with novel catalytic properties. Recent efforts to discover new polysaccharide-degrading enzymes from rumen metagenome data using homology-based searches and machine learning-based models have shown significant promise. Here, we will explore various computational approaches that can be employed to screen and shortlist metagenome-derived enzymes as potential biocatalyst candidates, in conjunction with the wet lab analytical methods traditionally used for enzyme characterization.
RESUMEN
Sistan Yaghooti grape variety, despite characteristics such as early ripening, is vulnerable to cluster rot due to small berries and dense clusters. In this regard, AS may serve as a regulatory mechanism during developmental processes and in response to environmental signals. RNA-Seq analysis was performed to measure gene expression and the extent of AS events in the cluster growth and development stages of Sistan Yaghooti grape. The number of AS events increased during stages, suggesting that it contributes to the grapevine's adaptability to various stresses. In addition, DEG and DAS genes showed little overlap in cluster growth stages. Functional analysis of 19,194 DAS -gene sets showed that VIT_06s0004g06670 gene is involved in the activation of calcium channels (Ca2+) through the activation of 5 PLC biosynthetic pathways. Among the 27,229 DEG -sets, VIT_07s0005g05320 gene showed higher expression. Interestingly, this gene is involved in the synthesis of an EF -hand domain-containing protein capable of binding to Ca2+ by activating 4 biochemical pathways. These genes increase cytosolic Ca2+ concentration, enhancing plant stress tolerance and resistance to cracking. These results show that AS can respond independently to different types of stress. Among the other DAS genes, the GA2ox gene (VvGA2ox) showed an increase in AS events during cluster development. This gene is critical for initiating the degradation process of GA and plays a crucial role in different stages of seed development. Therefore, it is very likely that this gene is one of the main factors responsible for the density and seedlessness of Sistan Yaghooti grape.
Asunto(s)
Vitis , Vitis/genética , Empalme Alternativo , Perfilación de la Expresión Génica , RNA-Seq , Frutas , Crecimiento y Desarrollo , Regulación de la Expresión Génica de las PlantasRESUMEN
The selection of an appropriate amylase for hydrolysis poultry feed is crucial for achieving improved digestibility and high-quality feed. Cellulose nanocrystals (CNCs), which are known for their high surface area, provide an excellent platform for enzyme immobilization. Immobilization greatly enhances the operational stability of α-amylases and the efficiency of starch bioconversion in poultry feeds. In this study, we immobilized two metagenome-derived α-amylases, PersiAmy2 and PersiAmy3, on CNCs and employed computational methods to characterize and compare the degradation efficiencies of these enzymes for poultry feed hydrolysis. Experimental in vitro bioconversion assessments were performed to validate the computational outcomes. Molecular docking studies revealed the superior hydrolysis performance of PersiAmy3, which displayed stronger electrostatic interactions with CNCs. Experimental characterization demonstrated the improved performance of both α-amylases after immobilization at high temperatures (80 °C). A similar trend was observed under alkaline conditions, with α-amylase activity reaching 88% within a pH range of 8.0 to 9.0. Both immobilized α-amylases exhibited halotolerance at NaCl concentrations up to 3 M and retained over 50% of their initial activity after 13 use cycles. Notably, PersiAmy3 displayed more remarkable improvements than PersiAmy2 following immobilization, including a significant increase in activity from 65 to 80.73% at 80 °C, an increase in activity to 156.48% at a high salinity of 3 M NaCl, and a longer half-life, indicating greater thermal stability within the range of 60 to 80 °C. These findings were substantiated by the in vitro hydrolysis of poultry feed, where PersiAmy3 generated 53.53 g/L reducing sugars. This comprehensive comparison underscores the utility of computational methods as a faster and more efficient approach for selecting optimal enzymes for poultry feed hydrolysis, thereby providing valuable insights into enhancing feed digestibility and quality.
Asunto(s)
Nanopartículas , alfa-Amilasas , Animales , alfa-Amilasas/química , alfa-Amilasas/metabolismo , Hidrólisis , Celulosa/química , Simulación del Acoplamiento Molecular , Aves de Corral/metabolismo , Cloruro de SodioRESUMEN
There are significant environmental and health concerns associated with the current inefficient plastic recycling process. This study presents the first integrated reference catalog of plastic-contaminated environments obtained using an insilico workflow that could play a significant role in discovering new plastizymes. Here, we combined 66 whole metagenomic data from plastic-contaminated environment samples from four previously collected metagenome data with our new sample. In this study, an integrated plastic-contaminated environment gene, protein, taxa, and plastic degrading enzyme catalog (PDEC) was constructed. These catalogs contain 53,300,583 non-redundant genes and proteins, 691 metagenome-assembled genomes, and 136,654 plastizymes. Based on KEGG and eggNOG annotations, 42% of recognized genes lack annotations, indicating their functions remain elusive and warrant further investigation. Additionally, the PDEC catalog highlights hydrolases, peroxidases, and cutinases as the prevailing plastizymes. Ultimately, following multiple validation procedures, our effort focused on pinpointing enzymes that exhibited the highest similarity to the introduced plastizymes in terms of both sequence and three-dimensional structural aspects. This encompassed evaluating the linear composition of constituent units as well as the complex spatial conformation of the molecule. The resulting catalog is expected to improve the resolution of future multi-omics studies, providing new insights into plastic-pollution related research.
Asunto(s)
Proteínas Bacterianas , Metagenoma , Proteínas Bacterianas/metabolismo , Metagenómica/métodosRESUMEN
BACKGROUND: The largest group of patients with breast cancer are estrogen receptor-positive (ER+) type. The estrogen receptor acts as a transcription factor and triggers cell proliferation and differentiation. Hence, investigating ER-DNA interaction genomic regions can help identify genes directly regulated by ER and understand the mechanism of ER action in cancer progression. METHODS: In the present study, we employed a workflow to do a meta-analysis of ChIP-seq data of ER+ cell lines stimulated with 10 nM and 100 nM of E2. All publicly available data sets were re-analyzed with the same platform. Then, the known and unknown batch effects were removed. Finally, the meta-analysis was performed to obtain meta-differentially bound sites in estrogen-treated MCF7 cell lines compared to vehicles (as control). Also, the meta-analysis results were compared with the results of T47D cell lines for more precision. Enrichment analyses were also employed to find the functional importance of common meta-differentially bound sites and associated genes among both cell lines. RESULTS: Remarkably, POU5F1B, ZNF662, ZNF442, KIN, ZNF410, and SGSM2 transcription factors were recognized in the meta-analysis but not in individual studies. Enrichment of the meta-differentially bound sites resulted in the candidacy of pathways not previously reported in breast cancer. PCGF2, HNF1B, and ZBED6 transcription factors were also predicted through the enrichment analysis of associated genes. In addition, comparing the meta-analysis results of both ChIP-seq and RNA-seq data showed that many transcription factors affected by ER were up-regulated. CONCLUSION: The meta-analysis of ChIP-seq data of estrogen-treated MCF7 cell line leads to the identification of new binding sites of ER that have not been previously reported. Also, enrichment of the meta-differentially bound sites and their associated genes revealed new terms and pathways involved in the development of breast cancer which should be examined in future in vitro and in vivo studies.
Asunto(s)
Neoplasias de la Mama , Receptor alfa de Estrógeno , Humanos , Femenino , Receptor alfa de Estrógeno/genética , Neoplasias de la Mama/genética , Receptores de Estrógenos , Secuenciación de Inmunoprecipitación de Cromatina , Transcriptoma , Genómica , EstrógenosRESUMEN
BACKGROUND: MicroRNAs (miRNAs) play a crucial role in regulating adaptive and maladaptive responses in cardiovascular diseases, making them attractive targets for potential biomarkers. However, their potential as novel biomarkers for diagnosing cardiovascular diseases requires systematic evaluation. METHODS: In this study, we aimed to identify a key set of miRNA biomarkers using integrated bioinformatics and machine learning analysis. We combined and analyzed three gene expression datasets from the Gene Expression Omnibus (GEO) database, which contains peripheral blood mononuclear cell (PBMC) samples from individuals with myocardial infarction (MI), stable coronary artery disease (CAD), and healthy individuals. Additionally, we selected a set of miRNAs based on their area under the receiver operating characteristic curve (AUC-ROC) for separating the CAD and MI samples. We designed a two-layer architecture for sample classification, in which the first layer isolates healthy samples from unhealthy samples, and the second layer classifies stable CAD and MI samples. We trained different machine learning models using both biomarker sets and evaluated their performance on a test set. RESULTS: We identified hsa-miR-21-3p, hsa-miR-186-5p, and hsa-miR-32-3p as the differentially expressed miRNAs, and a set including hsa-miR-186-5p, hsa-miR-21-3p, hsa-miR-197-5p, hsa-miR-29a-5p, and hsa-miR-296-5p as the optimum set of miRNAs selected by their AUC-ROC. Both biomarker sets could distinguish healthy from not-healthy samples with complete accuracy. The best performance for the classification of CAD and MI was achieved with an SVM model trained using the biomarker set selected by AUC-ROC, with an AUC-ROC of 0.96 and an accuracy of 0.94 on the test data. CONCLUSIONS: Our study demonstrated that miRNA signatures derived from PBMCs could serve as valuable novel biomarkers for cardiovascular diseases.
Asunto(s)
Enfermedad de la Arteria Coronaria , MicroARNs , Infarto del Miocardio , Humanos , Leucocitos Mononucleares , MicroARNs/genética , Infarto del Miocardio/diagnóstico , Infarto del Miocardio/genética , Enfermedad de la Arteria Coronaria/diagnóstico , Enfermedad de la Arteria Coronaria/genética , Biomarcadores , Aprendizaje AutomáticoRESUMEN
Type 2 diabetes mellitus (T2DM) is a challenging and progressive metabolic disease caused by insulin resistance. Skeletal muscle is the major insulin-sensitive tissue that plays a pivotal role in blood sugar homeostasis. Dysfunction of muscle metabolism is implicated in the disturbance of glucose homeostasis, the development of insulin resistance, and T2DM. Understanding metabolism reprogramming in newly diagnosed patients provides opportunities for early diagnosis and treatment of T2DM as a challenging disease to manage. Here, we applied a system biology approach to investigate metabolic dysregulations associated with the early stage of T2DM. We first reconstructed a human muscle-specific metabolic model. The model was applied for personalized metabolic modeling and analyses in newly diagnosed patients. We found that several pathways and metabolites, mainly implicating in amino acids and lipids metabolisms, were dysregulated. Our results indicated the significance of perturbation of pathways implicated in building membrane and extracellular matrix (ECM). Dysfunctional metabolism in these pathways possibly interrupts the signaling process and develops insulin resistance. We also applied a machine learning method to predict potential metabolite markers of insulin resistance in skeletal muscle. 13 exchange metabolites were predicted as the potential markers. The efficiency of these markers in discriminating insulin-resistant muscle was successfully validated.
Asunto(s)
Diabetes Mellitus Tipo 2 , Resistencia a la Insulina , Humanos , Insulina/metabolismo , Glucemia/metabolismo , Músculo Esquelético/metabolismoRESUMEN
A large amount of lignocellulosic waste is generated every day in the world, and their accumulation in the agroecosystems, integration in soil compositions, or incineration for energy production has severe environmental pollution effects. Using enzymes as biocatalysts for the biodegradation of lignocellulosic materials, especially in harsh processing conditions, is a practical step towards green energy and environmental biosafety. Hence, the current study focuses on enzyme computationally screened from camel rumen metagenomics data as specialized microbiota that have the capacity to degrade lignocellulosic-rich and recalcitrant materials. The novel hyperthermostable xylanase named PersiXyn10 with the performance at extreme conditions was proper activity within a broad temperature (30-100 â) and pH range (4.0-11.0) but showed the maximum xylanolytic activity in severe alkaline and temperature conditions, pH 8.0 and temperature 90 â. Also, the enzyme had highly resistant to metals, surfactants, and organic solvents in optimal conditions. The introduced xylanase had unique properties in terms of thermal stability by maintaining over 82% of its activity after 15 days of incubation at 90 â. Considering the crucial role of hyperthermostable xylanases in the paper industry, the PersiXyn10 was subjected to biodegradation of paper pulp. The proper performance of hyperthermostable PersiXyn10 on the paper pulp was confirmed by structural analysis (SEM and FTIR) and produced 31.64 g/L of reducing sugar after 144 h hydrolysis. These results proved the applicability of the hyperthermostable xylanase in biobleaching and saccharification of lignocellulosic biomass for declining the environmental hazards.
Asunto(s)
Endo-1,4-beta Xilanasas , Microbiota , Animales , Endo-1,4-beta Xilanasas/química , Endo-1,4-beta Xilanasas/metabolismo , Lignina/metabolismo , Temperatura , HidrólisisRESUMEN
Discharging the tannery wastewater into the environment is a serious challenge worldwide due to the release of severe recalcitrant pollutants such as oil compounds and organic materials. The biological treatment through enzymatic hydrolysis is a cheap and eco-friendly method for eliminating fatty substances from wastewater. In this context, lipases can be utilized for bio-treatment of wastewater in multifaceted industrial applications. To overcome the limitations in removing pollutants in the effluent, we aimed to identify a novel robust stable lipase (PersiLipase1) from metagenomic data of tannery wastewater for effective bio-degradation of the oily wastewater pollution. The lipase displayed remarkable thermostability and maintained over 81 % of its activity at 60 °C.After prolonged incubation for 35 days at 60°C, the PersiLipase1 still maintained 53.9 % of its activity. The enzyme also retained over 67 % of its activity in a wide range of pH (4.0 to 9.0). In addition, PersiLipase1 demonstrated considerable tolerance toward metal ions and organic solvents (e.g., retaining >70% activity after the addition of 100 mM of chemicals). Hydrolysis of olive oil and sheep fat by this enzyme showed 100 % efficiency. Furthermore, the PersiLipase1 proved to be efficient for biotreatment of oil and grease from tannery wastewater with the hydrolysis efficiency of 90.76 % ± 0.88. These results demonstrated that the metagenome-derived PersiLipase1 from tannery wastewater has a promising potential for the biodegradation and management of oily wastewater pollution.
Asunto(s)
Lipasa , Aguas Residuales , Animales , Ovinos , Lipasa/química , Hidrólisis , Detergentes , Solventes/química , Concentración de Iones de Hidrógeno , TemperaturaRESUMEN
Finding the causal relation between a gene and a disease using experimental approaches is a time-consuming and expensive task. However, computational approaches are cost-efficient methods for identifying candidate genes. This article proposes a new heterogeneous biological network embedding approach, named NetEM, to identify disease-associated genes. To evaluate NetEM, we examine six complex diseases, including peroxisomal disorders, sarcoma, grave's disease, lysosomal storage diseases, blood coagulation disorders, and cardiomyopathy hypertrophic. Our experiments indicate that NetEM outperforms three well-known state-of-the-art algorithms: Cardigan, DIAMOnD and GeneWanderer, in identifying disease genes. We examine TCGA data of Invasive Lobular Breast Cancer and CPTAC data of human glioblastoma as other case studies to evaluate NetEM using real data. This evaluation also indicates the validity of the method. The source codes of NetEM and data are available in the supplementary of this article.
Asunto(s)
Glioblastoma , Sarcoma , Humanos , Algoritmos , Biología ComputacionalRESUMEN
BACKGROUND: While of predominant abundance across vertebrate genomes and significant biological implications, the relevance of short tandem repeats (STRs) (also known as microsatellites) to speciation remains largely elusive and attributed to random coincidence for the most part. Here we collected data on the whole-genome abundance of mono-, di-, and trinucleotide STRs in nine species, encompassing rodents and primates, including rat, mouse, olive baboon, gelada, macaque, gorilla, chimpanzee, bonobo, and human. The collected data were used to analyze hierarchical clustering of the STR abundances in the selected species. RESULTS: We found massive differential STR abundances between the rodent and primate orders. In addition, while numerous STRs had random abundance across the nine selected species, the global abundance conformed to three consistent < clusters>, as follows: <rat, mouse>, <gelada, macaque, olive baboon>, and <gorilla, chimpanzee, bonobo, human>, which coincided with the phylogenetic distances of the selected species (p < 4E-05). Exceptionally, in the trinucleotide STR compartment, human was significantly distant from all other species. CONCLUSION: Based on hierarchical clustering, we propose that the global abundance of STRs is non-random in rodents and primates, and probably had a determining impact on the speciation of the two orders. We also propose the STRs and STR lengths, which predominantly conformed to the phylogeny of the selected species, exemplified by (t)10, (ct)6, and (taa4). Phylogenetic and experimental platforms are warranted to further examine the observed patterns and the biological mechanisms associated with those STRs.
Asunto(s)
Gorilla gorilla , Roedores , Humanos , Ratones , Ratas , Animales , Roedores/genética , Gorilla gorilla/genética , Pan troglodytes/genética , Filogenia , Pan paniscus , Primates/genética , Repeticiones de Microsatélite/genética , MacacaRESUMEN
OBJECTIVES: All patients with cirrhosis should be periodically examined for esophageal varices (EV), however, a large percentage of patients undergoing screening, do not have EV or have only mild EV and do not have high-risk characteristics. Therefore, developing a non-invasive method to predict the occurrence of EV in patients with liver cirrhosis as a non-invasive method with high accuracy seems useful. In the present research, we compared the performance of several machine learning (ML) methods to predict EV on laboratory and clinical data to choose the best model. METHODS: Four-hundred-and-ninety data from the Liver and Gastroenterology Research Center of Shahid Beheshti University of Medical Sciences in the period 2014-2021, were analyzed applying models including random forest (RF), artificial neural network (ANN), support vector machine (SVM), and logistic regression. RESULTS: RF and SVM had the best results in general for all grades of EV. RF showed remarkably better results and the highest area under the curve (AUC). After that, SVM and ANN had the AUC of 98%, for grade 3, the SVM algorithm had the highest AUC after RF (89%). CONCLUSIONS: The findings may help to better predict EV with high precision and accuracy and also can help reduce the burden of frequent visits to endoscopic centers. It can also help practitioners to manage cirrhosis by predicting EV with lower costs.
Asunto(s)
Várices Esofágicas y Gástricas , Humanos , Várices Esofágicas y Gástricas/diagnóstico , Cirrosis Hepática/complicaciones , Cirrosis Hepática/diagnóstico , Área Bajo la Curva , Aprendizaje AutomáticoRESUMEN
Chronic myeloid leukemia (CML) is a model of leukemogenesis in which the exact molecular mechanisms underlying blast crisis still remained unexplored. The current study identified multiple common and rare important findings in myeloid blast crisis CML (MBC-CML) using integrated genomic sequencing, covering all classes of genes implicated in the leukemogenesis model. Integrated genomic sequencing via Whole Exome Sequencing (WES), Chromosome-seq and RNA-sequencing were conducted on the peripheral blood samples of three CML patients in the myeloid blast crisis. An in-house filtering pipeline was applied to assess important variants in cancer-related genes. Standard variant interpretation guidelines were used for the interpretation of potentially important findings (PIFs) and potentially actionable findings (PAFs). Single nucleotide variation (SNV) and small InDel analysis by WES detected sixteen PIFs affecting all five known classes of leukemogenic genes in myeloid malignancies including signaling pathway components (ABL1, PIK3CB, PTPN11), transcription factors (GATA2, PHF6, IKZF1, WT1), epigenetic regulators (ASXL1), tumor suppressor and DNA repair genes (BRCA2, ATM, CHEK2) and components of spliceosome (PRPF8). These variants affect genes involved in leukemia stem cell proliferation, self-renewal, and differentiation. Both patients No.1 and No.2 had actionable known missense variants on ABL1 (p.Y272H, p.F359V) and frameshift variants on ASXL1 (p.A627Gfs*8, p.G646Wfs*12). The GATA2-L359S in patient No.1, PTPN11-G503V and IKZF1-R208Q variants in the patient No.3 were also PAFs. RNA-sequencing was used to confirm all of the identified variants. In the patient No. 3, chromosome sequencing revealed multiple pathogenic deletions in the short and long arms of chromosome 7, affecting at least three critical leukemogenic genes (IKZF1, EZH2, and CUX1). The large deletion discovered on the short arm of chromosome 17 in patient No. 2 resulted in the deletion of TP53 gene as well. Integrated genomic sequencing combined with RNA-sequencing can successfully discover and confirm a wide range of variants, from SNVs to CNVs. This strategy may be an effective method for identifying actionable findings and understanding the pathophysiological mechanisms underlying MBC-CML, as well as providing further insights into the genetic basis of MBC-CML and its management in the future.
Asunto(s)
Crisis Blástica , Leucemia Mielógena Crónica BCR-ABL Positiva , Crisis Blástica/genética , Deleción Cromosómica , Proteínas de Fusión bcr-abl/genética , Genómica , Humanos , Leucemia Mielógena Crónica BCR-ABL Positiva/genética , Leucemia Mielógena Crónica BCR-ABL Positiva/patología , ARNRESUMEN
BACKGROUND: While the evolutionary divergence of cis-regulatory sequences impacts translation initiation sites (TISs), the implication of tandem repeats (TRs) in TIS selection remains largely elusive. Here, we employed the TIS homology concept to study a possible link between TRs of all core lengths and repeats with TISs. METHODS: Human, as reference sequence, and 83 other species were selected, and data was extracted on the entire protein-coding genes (n = 1,611,368) and transcripts (n = 2,730,515) annotated for those species from Ensembl 102. Following TIS identification, two different weighing vectors were employed to assign TIS homology, and the co-occurrence pattern of TISs with the upstream flanking TRs was studied in the selected species. The results were assessed in 10-fold cross-validation. RESULTS: On average, every TIS was flanked by 1.19 TRs of various categories within its 120 bp upstream sequence, per species. We detected statistically significant enrichment of non-homologous human TISs co-occurring with human-specific TRs. On the contrary, homologous human TISs co-occurred significantly with non-human-specific TRs. 2991 human genes had at least one transcript, TIS of which was flanked by a human-specific TR. Text mining of a number of the identified genes, such as CACNA1A, EIF5AL1, FOXK1, GABRB2, MYH2, SLC6A8, and TTN, yielded predominant expression and functions in the human brain and/or skeletal muscle. CONCLUSION: We conclude that TRs ubiquitously flank and contribute to TIS selection at the trans-species level. Future functional analyses, such as a combination of genome editing strategies and in vitro protein synthesis may be employed to further investigate the impact of TRs on TIS selection.
Asunto(s)
Secuencias Repetidas en TándemRESUMEN
OBJECTIVES: The present study was conducted to improve the performance of predictive methods by introducing the most important factors which have the highest effects on the prediction of esophageal varices (EV) grades among patients with cirrhosis. METHODS: In the present study, the ensemble learning methods, including Catboost and XGB classifier, were used to choose the most potent predictors of EV grades solely based on routine laboratory and clinical data, a dataset of 490 patients with cirrhosis gathered. To increase the validity of the results, a five-fold cross-validation method was applied. The model was conducted using python language, Anaconda open-source platform. TRIPOD checklist for prediction model development was completed. RESULTS: The Catboost model predicted all the targets correctly with 100% precision. However, the XGB classifier had the best performance for predicting grades 0 and 1, and totally the accuracy was 91.02%. The most significant variables, according to the best performing model, which was CatBoost, were child score, white blood cell (WBC), vitalism K (K), and international normalized ratio (INR). CONCLUSIONS: Using machine learning models, especially ensemble learning models, can remarkably increase the prediction performance. The models allow practitioners to predict EV risk at any clinical visit and decrease unneeded esophagogastroduodenoscopy (EGD) and consequently reduce morbidity, mortality, and cost of the long-term follow-ups for patients with cirrhosis.
Asunto(s)
Várices Esofágicas y Gástricas , Várices , Humanos , Endoscopía del Sistema Digestivo , Várices Esofágicas y Gástricas/diagnóstico , Cirrosis Hepática/complicaciones , Cirrosis Hepática/diagnóstico , Aprendizaje Automático , Valor Predictivo de las PruebasRESUMEN
OBJECTIVES: The aim of the study was to implement a non-invasive model to predict ascites grades among patients with cirrhosis. METHODS: In the present study, we used modern machine learning (ML) methods to develop a scoring system solely based on routine laboratory and clinical data to help physicians accurately diagnose and predict different degrees of ascites. We used ANACONDA3-5.2.0 64 bit, free and open-source platform distribution of Python programming language with numerous modules, packages, and rich libraries that provide various methods for classification problems. Through the 10-fold cross-validation, we employed three common learning models on our dataset, k-nearest neighbors (KNN), support vector machine (SVM), and neural network classification algorithms. RESULTS: According to the data received from the research institute, three types of data analysis have been performed. The algorithms used to predict ascites were KNN, cross-validation (CV), and multilayer perceptron neural networks (MLPNN), which achieved an average accuracy of 94, 91, and 90%, respectively. Also, in the average accuracy of the algorithms, KNN had the highest accuracy of 94%. CONCLUSIONS: We applied well-known ML approaches to predict ascites. The findings showed a strong performance compared to the classical statistical approaches. This ML-based approach can help to avoid unnecessary risks and costs for patients with acute stages of the disease.