Búsqueda | Portal Regional de la BVS

1.

DICEP: An integrative approach to augmenting genomic island detection.

De, Ronika; Jani, Mehul; Azad, Rajeev K.

J Biotechnol ; 2024 Apr 17.

Artículo en Inglés | MEDLINE | ID: mdl-38641137

RESUMEN

Mobilization of clusters of genes called genomic islands (GIs) across bacterial lineages facilitates dissemination of traits, such as, resistance against antibiotics, virulence or hypervirulence, and versatile metabolic capabilities. Robust delineation of GIs is critical to understanding bacterial evolution that has a vast impact on different life forms. Methods for identification of GIs exploit different evolutionary features or signals encoded within the genomes of bacteria, however, the current state-of-the-art in GI detection still leaves much to be desired. Here, we have taken a combinatorial approach that accounted for GI specific features such as compositional bias, aberrant phyletic pattern, and marker gene enrichment within an integrative framework to delineate GIs in bacterial genomes. Our GI prediction tool, DICEP, was assessed on simulated genomes and well-characterized bacterial genomes. DICEP compared favorably with current GI detection tools on real and synthetic datasets.

2.

Benchmarking RNA-Seq Aligners at Base-Level and Junction Base-Level Resolution Using the Arabidopsis thaliana Genome.

Coxe, Tallon; Burks, David J; Singh, Utkarsh; Mittler, Ron; Azad, Rajeev K.

Plants (Basel) ; 13(5)2024 Feb 21.

Artículo en Inglés | MEDLINE | ID: mdl-38475429

RESUMEN

The utmost goal of selecting an RNA-Seq alignment software is to perform accurate alignments with a robust algorithm, which is capable of detecting the various intricacies underlying read-mapping procedures and beyond. Most alignment software tools are typically pre-tuned with human or prokaryotic data, and therefore may not be suitable for applications to other organisms, such as plants. The rapidly growing plant RNA-Seq databases call for the assessment of the alignment tools on curated plant data, which will aid the calibration of these tools for applications to plant transcriptomic data. We therefore focused here on benchmarking RNA-Seq read alignment tools, using simulated data derived from the model organism Arabidopsis thaliana. We assessed the performance of five popular RNA-Seq alignment tools that are currently available, based on their usage (citation count). By introducing annotated single nucleotide polymorphisms (SNPs) from The Arabidopsis Information Resource (TAIR), we recorded alignment accuracy at both base-level and junction base-level resolutions for each alignment tool. In addition to assessing the performance of the alignment tools at their default settings, accuracies were also recorded by varying the values of numerous parameters, including the confidence threshold and the level of SNP introduction. The performances of the aligners were found consistent under various testing conditions at the base-level accuracy; however, the junction base-level assessment produced varying results depending upon the applied algorithm. At the read base-level assessment, the overall performance of the aligner STAR was superior to other aligners, with the overall accuracy reaching over 90% under different test conditions. On the other hand, at the junction base-level assessment, SubRead emerged as the most promising aligner, with an overall accuracy over 80% under most test conditions.

3.

DiseaseNet: a transfer learning approach to noncommunicable disease classification.

Gore, Steven; Meche, Bailey; Shao, Danyang; Ginnett, Benjamin; Zhou, Kelly; Azad, Rajeev K.

BMC Bioinformatics ; 25(1): 107, 2024 Mar 11.

Artículo en Inglés | MEDLINE | ID: mdl-38468193

RESUMEN

As noncommunicable diseases (NCDs) pose a significant global health burden, identifying effective diagnostic and predictive markers for these diseases is of paramount importance. Epigenetic modifications, such as DNA methylation, have emerged as potential indicators for NCDs. These have previously been exploited in other contexts within the framework of neural network models that capture complex relationships within the data. Applications of neural networks have led to significant breakthroughs in various biological or biomedical fields but these have not yet been effectively applied to NCD modeling. This is, in part, due to limited datasets that are not amenable to building of robust neural network models. In this work, we leveraged a neural network trained on one class of NCDs, cancer, as the basis for a transfer learning approach to non-cancer NCD modeling. Our results demonstrate promising performance of the model in predicting three NCDs, namely, arthritis, asthma, and schizophrenia, for the respective blood samples, with an overall accuracy (f-measure) of 94.5%. Furthermore, a concept based explanation method called Testing with Concept Activation Vectors (TCAV) was used to investigate the importance of the sample sources and understand how future training datasets for multiple NCD models may be improved. Our findings highlight the effectiveness of transfer learning in developing accurate diagnostic and predictive models for NCDs.

Asunto(s)

Enfermedades no Transmisibles , Humanos , Redes Neurales de la Computación , Aprendizaje Automático

4.

The OsTIL1 lipocalin protects cell membranes from reactive oxygen species damage and maintains the 18:3-containing glycerolipid biosynthesis under cold stress in rice.

Ji, Lingxiao; Zhang, Zhengfeng; Liu, Shuang; Zhao, Liyan; Li, Qiang; Xiao, Benze; Suzuki, Nobuhiro; Burks, David J; Azad, Rajeev K; Xie, Guosheng.

Plant J ; 117(1): 72-91, 2024 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-37753661

RESUMEN

Lipocalins constitute a conserved protein family that binds to and transports a variety of lipids while fatty acid desaturases (FADs) are required for maintaining the cell membrane fluidity under cold stress. Nevertheless, it remains unclear whether plant lipocalins promote FADs for the cell membrane integrity under cold stress. Here, we identified the role of OsTIL1 lipocalin in FADs-mediated glycerolipid remodeling under cold stress. Overexpression and CRISPR/Cas9 mediated gene edition experiments demonstrated that OsTIL1 positively regulated cold stress tolerance by protecting the cell membrane integrity from reactive oxygen species damage and enhancing the activities of peroxidase and ascorbate peroxidase, which was confirmed by combined cold stress with a membrane rigidifier dimethyl sulfoxide or a H2 O2 scavenger dimethyl thiourea. OsTIL1 overexpression induced higher 18:3 content, and higher 18:3/18:2 and (18:2 + 18:3)/18:1 ratios than the wild type under cold stress whereas the gene edition mutant showed the opposite. Furthermore, the lipidomic analysis showed that OsTIL1 overexpression led to higher contents of 18:3-mediated glycerolipids, including galactolipids (monoglactosyldiacylglycerol and digalactosyldiacylglycerol) and phospholipids (phosphatidyl glycerol, phosphatidyl choline, phosphatidyl ethanolamine, phosphatidyl serine and phosphatidyl inositol) under cold stress. RNA-seq and enzyme linked immunosorbent assay analyses indicated that OsTIL1 overexpression enhanced the transcription and enzyme abundance of four ω-3 FADs (OsFAD3-1/3-2, 7, and 8) under cold stress. These results reveal an important role of OsTIL1 in maintaining the cell membrane integrity from oxidative damage under cold stress, providing a good candidate gene for improving cold tolerance in rice.

Asunto(s)

Respuesta al Choque por Frío , Oryza , Especies Reactivas de Oxígeno/metabolismo , Oryza/metabolismo , Estrés Oxidativo , Membrana Celular/metabolismo , Frío , Regulación de la Expresión Génica de las Plantas , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Plantas Modificadas Genéticamente/genética

5.

Using Machine Learning to Predict Genes Underlying Differentiation of Multipartite and Unipartite Traits in Bacteria.

Almalki, Fatemah; Sunuwar, Janak; Azad, Rajeev K.

Microorganisms ; 11(11)2023 Nov 13.

Artículo en Inglés | MEDLINE | ID: mdl-38004767

RESUMEN

Since the discovery of the second chromosome in the Rhodobacter sphaeroides 2.4.1 by Suwanto and Kaplan in 1989 and the revelation of gene sequences, multipartite genomes have been reported in over three hundred bacterial species under nine different phyla. This phenomenon shattered the dogma of a unipartite genome (a single circular chromosome) in bacteria. Recently, Artificial Intelligence (AI), machine learning (ML), and Deep Learning (DL) have emerged as powerful tools in the investigation of big data in a plethora of disciplines to decipher complex patterns in these data, including the large-scale analysis and interpretation of genomic data. An important inquiry in bacteriology pertains to the genetic factors that underlie the structural evolution of multipartite and unipartite bacterial species. Towards this goal, here we have attempted to leverage machine learning as a means to identify the genetic factors that underlie the differentiation of, in general, bacteria with multipartite genomes and bacteria with unipartite genomes. In this study, deploying ML algorithms yielded two gene lists of interest: one that contains 46 discriminatory genes obtained following an assessment on all gene sets, and another that contains 35 discriminatory genes obtained based on an investigation of genes that are differentially present (or absent) in the genomes of the multipartite bacteria and their respective close relatives. Our study revealed a small pool of genes that discriminate bacteria with multipartite genomes and their close relatives with single-chromosome genomes. Machine learning thus aided in uncovering the genetic factors that underlie the differentiation of bacterial multipartite and unipartite traits.

6.

Silicon versus Superbug: Assessing Machine Learning's Role in the Fight against Antimicrobial Resistance.

Coxe, Tallon; Azad, Rajeev K.

Antibiotics (Basel) ; 12(11)2023 Nov 08.

Artículo en Inglés | MEDLINE | ID: mdl-37998806

RESUMEN

In his 1945 Nobel Prize acceptance speech, Sir Alexander Fleming warned of antimicrobial resistance (AMR) if the necessary precautions were not taken diligently. As the growing threat of AMR continues to loom over humanity, we must look forward to alternative diagnostic tools and preventive measures to thwart looming economic collapse and untold mortality worldwide. The integration of machine learning (ML) methodologies within the framework of such tools/pipelines presents a promising avenue, offering unprecedented insights into the underlying mechanisms of resistance and enabling the development of more targeted and effective treatments. This paper explores the applications of ML in predicting and understanding AMR, highlighting its potential in revolutionizing healthcare practices. From the utilization of supervised-learning approaches to analyze genetic signatures of antibiotic resistance to the development of tools and databases, such as the Comprehensive Antibiotic Resistance Database (CARD), ML is actively shaping the future of AMR research. However, the successful implementation of ML in this domain is not without challenges. The dependence on high-quality data, the risk of overfitting, model selection, and potential bias in training data are issues that must be systematically addressed. Despite these challenges, the synergy between ML and biomedical research shows great promise in combating the growing menace of antibiotic resistance.

7.

A gene network-driven approach to infer novel pathogenicity-associated genes: application to Pseudomonas aeruginosa PAO1.

De, Ronika; Whiteley, Marvin; Azad, Rajeev K.

mSystems ; 8(6): e0047323, 2023 Dec 21.

Artículo en Inglés | MEDLINE | ID: mdl-37921470

RESUMEN

IMPORTANCE: We present here a new systems-level approach to decipher genetic factors and biological pathways associated with virulence and/or antibiotic treatment of bacterial pathogens. The power of this approach was demonstrated by application to a well-studied pathogen Pseudomonas aeruginosa PAO1. Our gene co-expression network-based approach unraveled known and unknown genes and their networks associated with pathogenicity in P. aeruginosa PAO1. The systems-level investigation of P. aeruginosa PAO1 helped identify putative pathogenicity and resistance-associated genetic factors that could not otherwise be detected by conventional approaches of differential gene expression analysis. The network-based analysis uncovered modules that harbor genes not previously reported by several original studies on P. aeruginosa virulence and resistance. These could potentially act as molecular determinants of P. aeruginosa PAO1 pathogenicity and responses to antibiotics.

Asunto(s)

Infecciones por Pseudomonas , Pseudomonas aeruginosa , Humanos , Pseudomonas aeruginosa/genética , Virulencia/genética , Redes Reguladoras de Genes/genética , Factores de Virulencia/genética , Infecciones por Pseudomonas/tratamiento farmacológico

8.

Benchmarking Metagenomic Classifiers on Simulated Ancient and Modern Metagenomic Data.

Pusadkar, Vaidehi; Azad, Rajeev K.

Microorganisms ; 11(10)2023 Oct 02.

Artículo en Inglés | MEDLINE | ID: mdl-37894136

RESUMEN

Taxonomic profiling of ancient metagenomic samples is challenging due to the accumulation of specific damage patterns on DNA over time. Although a number of methods for metagenome profiling have been developed, most of them have been assessed on modern metagenomes or simulated metagenomes mimicking modern metagenomes. Further, a comparative assessment of metagenome profilers on simulated metagenomes representing a spectrum of degradation depth, from the extremity of ancient (most degraded) to current or modern (not degraded) metagenomes, has not yet been performed. To understand the strengths and weaknesses of different metagenome profilers, we performed their comprehensive evaluation on simulated metagenomes representing human dental calculus microbiome, with the level of DNA damage successively raised to mimic modern to ancient metagenomes. All classes of profilers, namely, DNA-to-DNA, DNA-to-protein, and DNA-to-marker comparison-based profilers were evaluated on metagenomes with varying levels of damage simulating deamination, fragmentation, and contamination. Our results revealed that, compared to deamination and fragmentation, human and environmental contamination of ancient DNA (with modern DNA) has the most pronounced effect on the performance of each profiler. Further, the DNA-to-DNA (e.g., Kraken2, Bracken) and DNA-to-marker (e.g., MetaPhlAn4) based profiling approaches showed complementary strengths, which can be leveraged to elevate the state-of-the-art of ancient metagenome profiling.

9.

POSMM: an efficient alignment-free metagenomic profiler that complements alignment-based profiling.

Burks, David J; Pusadkar, Vaidehi; Azad, Rajeev K.

Environ Microbiome ; 18(1): 16, 2023 Mar 08.

Artículo en Inglés | MEDLINE | ID: mdl-36890583

RESUMEN

We present here POSMM (pronounced 'Possum'), Python-Optimized Standard Markov Model classifier, which is a new incarnation of the Markov model approach to metagenomic sequence analysis. Built on the top of a rapid Markov model based classification algorithm SMM, POSMM reintroduces high sensitivity associated with alignment-free taxonomic classifiers to probe whole genome or metagenome datasets of increasingly prohibitive sizes. Logistic regression models generated and optimized using the Python sklearn library, transform Markov model probabilities to scores suitable for thresholding. Featuring a dynamic database-free approach, models are generated directly from genome fasta files per run, making POSMM a valuable accompaniment to many other programs. By combining POSMM with ultrafast classifiers such as Kraken2, their complementary strengths can be leveraged to produce higher overall accuracy in metagenomic sequence classification than by either as a standalone classifier. POSMM is a user-friendly and highly adaptable tool designed for broad use by the metagenome scientific community.

10.

Leveraging comparative genomics to uncover alien genes in bacterial genomes.

Sengupta, Soham; Azad, Rajeev K.

Microb Genom ; 9(1)2023 01.

Artículo en Inglés | MEDLINE | ID: mdl-36748570

RESUMEN

A significant challenge in bacterial genomics is to catalogue genes acquired through the evolutionary process of horizontal gene transfer (HGT). Both comparative genomics and sequence composition-based methods have often been invoked to quantify horizontally acquired genes in bacterial genomes. Comparative genomics methods rely on completely sequenced genomes and therefore the confidence in their predictions increases as the databases become more enriched in completely sequenced genomes. Recent developments including in microbial genome sequencing call for reassessment of alien genes based on information-rich resources currently available. We revisited the comparative genomics approach and developed a new algorithm for alien gene detection. Our algorithm compared favourably with the existing comparative genomics-based methods and is capable of detecting both recent and ancient transfers. It can be used as a standalone tool or in concert with other complementary algorithms for comprehensively cataloguing alien genes in bacterial genomes.

Asunto(s)

Genoma Bacteriano , Genómica , Genómica/métodos , Algoritmos , Evolución Biológica

11.

Phytochrome B regulates reactive oxygen signaling during abiotic and biotic stress in plants.

Fichman, Yosef; Xiong, Haiyan; Sengupta, Soham; Morrow, Johanna; Loog, Hailey; Azad, Rajeev K; Hibberd, Julian M; Liscum, Emmanuel; Mittler, Ron.

New Phytol ; 237(5): 1711-1727, 2023 03.

Artículo en Inglés | MEDLINE | ID: mdl-36401805

RESUMEN

Reactive oxygen species (ROS) and the photoreceptor protein phytochrome B (phyB) play a key role in plant acclimation to stress. However, how phyB that primarily functions in the nuclei impacts ROS signaling mediated by respiratory burst oxidase homolog (RBOH) proteins that reside on the plasma membrane, during stress, is unknown. Arabidopsis thaliana and Oryza sativa mutants, RNA-Seq, bioinformatics, biochemistry, molecular biology, and whole-plant ROS imaging were used to address this question. Here, we reveal that phyB and RBOHs function as part of a key regulatory module that controls apoplastic ROS production, stress-response transcript expression, and plant acclimation in response to excess light stress. We further show that phyB can regulate ROS production during stress even if it is restricted to the cytosol and that phyB, respiratory burst oxidase protein D (RBOHD), and respiratory burst oxidase protein F (RBOHF) coregulate thousands of transcripts in response to light stress. Surprisingly, we found that phyB is also required for ROS accumulation in response to heat, wounding, cold, and bacterial infection. Our findings reveal that phyB plays a canonical role in plant responses to biotic and abiotic stresses, regulating apoplastic ROS production, possibly while at the cytosol, and that phyB and RBOHD/RBOHF function in the same regulatory pathway.

Asunto(s)

Proteínas de Arabidopsis , Arabidopsis , Proteínas de Arabidopsis/metabolismo , Fitocromo B/genética , Fitocromo B/metabolismo , Oxígeno/metabolismo , Especies Reactivas de Oxígeno/metabolismo , Arabidopsis/metabolismo , Estrés Fisiológico , Regulación de la Expresión Génica de las Plantas

12.

Molecular signatures in the progression of COVID-19 severity.

De, Ronika; Azad, Rajeev K.

Sci Rep ; 12(1): 22058, 2022 12 21.

Artículo en Inglés | MEDLINE | ID: mdl-36543855

RESUMEN

SARS-CoV-2 is the causative agent of COVID-19 that has infected over 642 million and killed over 6.6 million people around the globe. Underlying a wide range of clinical manifestations of this disease, from moderate to extremely severe systemic conditions, could be genes or pathways differentially expressing in the hosts. It is therefore important to gain insights into pathways involved in COVID-19 pathogenesis and host defense and thus understand the host response to this pathogen at the physiological and molecular level. To uncover genes and pathways involved in the differential clinical manifestations of this disease, we developed a novel gene co-expression network based pipeline that uses gene expression obtained from different SARS-CoV-2 infected human tissues. We leveraged the network to identify novel genes or pathways that likely differentially express and could be physiologically significant in the COVID-19 pathogenesis and progression but were deemed statistically non-significant and therefore not further investigated in the original studies. Our network-based approach aided in the identification of co-expression modules enriched in differentially expressing genes (DEGs) during different stages of COVID-19 and enabled discovery of novel genes involved in the COVID-19 pathogenesis, by virtue of their transcript abundance and association with genes expressing differentially in modules enriched in DEGs. We further prioritized by considering only those enriched gene modules that have most of their genes differentially expressed, inferred by the original studies or this study, and document here 7 novel genes potentially involved in moderate, 2 in severe, 48 in extremely severe COVID-19, and 96 novel genes involved in the progression of COVID-19 from severe to extremely severe conditions. Our study shines a new light on genes and their networks (modules) that drive the progression of COVID-19 from moderate to extremely severe condition. These findings could aid development of new therapeutics to combat COVID-19.

Asunto(s)

COVID-19 , Humanos , COVID-19/genética , SARS-CoV-2/genética , Redes Reguladoras de Genes

13.

Analysis of multipartite bacterial genomes using alignment free and alignment-based pipelines.

Almalki, Fatemah; Choudhary, Madhusudan; Azad, Rajeev K.

Arch Microbiol ; 205(1): 25, 2022 Dec 14.

Artículo en Inglés | MEDLINE | ID: mdl-36515719

RESUMEN

Since the discovery of second chromosome in Rhodobacter sphaeroides 2.4.1 in 1989, multipartite genomes have been reported in over three hundred bacterial species under nine different phyla. This has shattered the unipartite (single chromosome) genome dogma in bacteria. Since then, many questions on various aspects of multipartite genomes in bacteria have been addressed. However, our understanding of how multipartite genomes emerge and evolve is still lacking. Importantly, the knowledge of genetic factors underlying the differences in multipartite and single-chromosome genomes is lacking. In this work, we have performed comparative evolutionary and functional genomics analyses to identify molecular factors that discriminate multipartite from unipartite bacteria, with the goal to decipher taxon-specific factors, and those that are prevalent across the taxa, underlying these traits. We assessed the roles of evolutionary mechanisms, specifically gene gain, in driving the divergence of bacteria with single and multiple chromosomes. In addition, we performed functional genomic analysis to garner support for our findings from comparative evolutionary analysis. We found genes such as those encoding conserved hypothetical proteins in Deinococcus radiodurans R1, and putative phage phi-C31 gp36 major capsid like and hypothetical proteins in Rhodobacter sphaeroides 2.4.1, which are located on accessory chromosomes in these bacteria but were not found in the inferred ancestral sequences, and on the primary chromosomes, as well as were not found in their closest relatives with single chromosome within the same clade. Our study shines a new light on the potential roles of the secondary chromosomes in helping bacteria with multipartite genomes to adapt to specialized environments or growth conditions.

Asunto(s)

Genoma Bacteriano , Rhodobacter sphaeroides , Genómica , Evolución Biológica , Rhodobacter sphaeroides/genética , Evolución Molecular , Cromosomas Bacterianos/genética

14.

Reconstructing horizontal gene flow network to understand prokaryotic evolution.

Sengupta, Soham; Azad, Rajeev K.

Open Biol ; 12(11): 220169, 2022 11.

Artículo en Inglés | MEDLINE | ID: mdl-36446404

RESUMEN

Horizontal gene transfer (HGT) is a major source of phenotypic innovation and a mechanism of niche adaptation in prokaryotes. Quantification of HGT is critical to decipher its myriad roles in microbial evolution and adaptation. Advances in genome sequencing and bioinformatics have augmented our ability to understand the microbial world, particularly the direct or indirect influence of HGT on diverse life forms. Methods for detecting HGT can be classified into phylogenetic-based and parametric or composition-based approaches. Here, we exploited the complementary strengths of both the approaches to construct a high confidence horizontal gene flow network. Our network is unique in its ability to detect the transfer of native genes of a genome to genomes from other taxa, thus establishing donor and recipient organisms (taxa), rather than through a post hoc analysis as is the practice with several other approaches. The scale-free horizontal gene flow network presented here provides new insights into modes of transfer for the exchange of genetic information and also illuminates differential gene flow across phyla.

Asunto(s)

Flujo Génico , Células Procariotas , Filogenia , Redes Reguladoras de Genes , Biología Computacional

15.

Identification of Novel Antimicrobial Resistance Genes Using Machine Learning, Homology Modeling, and Molecular Docking.

Sunuwar, Janak; Azad, Rajeev K.

Microorganisms ; 10(11)2022 Oct 23.

Artículo en Inglés | MEDLINE | ID: mdl-36363694

RESUMEN

Antimicrobial resistance (AMR) threatens the healthcare system worldwide with the rise of emerging drug resistant infectious agents. AMR may render the current therapeutics ineffective or diminish their efficacy, and its rapid dissemination can have unmitigated health and socioeconomic consequences. Just like with many other health problems, recent computational advances including developments in machine learning or artificial intelligence hold a prodigious promise in deciphering genetic factors underlying emergence and dissemination of AMR and in aiding development of therapeutics for more efficient AMR solutions. Current machine learning frameworks focus mainly on known AMR genes and are, therefore, prone to missing genes that have not been implicated in resistance yet, including many uncharacterized genes whose functions have not yet been elucidated. Furthermore, new resistance traits may evolve from these genes leading to the rise of superbugs, and therefore, these genes need to be characterized. To infer novel resistance genes, we used complete gene sets of several bacterial strains known to be susceptible or resistant to specific drugs and associated phenotypic information within a machine learning framework that enabled prioritizing genes potentially involved in resistance. Further, homology modeling of proteins encoded by prioritized genes and subsequent molecular docking studies indicated stable interactions between these proteins and the antimicrobials that the strains containing these proteins are known to be resistant to. Our study highlights the capability of a machine learning framework to uncover novel genes that have not yet been implicated in resistance to any antimicrobials and thus could spur further studies targeted at neutralizing AMR.

16.

Mapping Strengths and Weaknesses of Different Clustering Approaches to Deciphering Bacterial Chimerism.

Burks, David J; Azad, Rajeev K.

OMICS ; 26(8): 422-439, 2022 08.

Artículo en Inglés | MEDLINE | ID: mdl-35925817

RESUMEN

Bacterial genomes are chimeras of DNA of different ancestries. Deconstructing chimeric genomes is central to understanding the evolutionary trajectories of their disparate components and thus the organisms as a whole in the light of their evolutionary contexts. Of specific interest is to delineate and quantify native (vertically inherited) and alien (horizontally acquired) components of bacterial genomes and also specify genomic fractions that represent different donor sources. An agglomerative clustering procedure that prioritizes grouping of proximal similar genomic segments has previously been invoked for this purpose in conjunction with a recursive segmentation procedure. Surprisingly, however, the relative strengths and weaknesses of different clustering approaches to deciphering bacterial chimerism have not yet been investigated, despite the need to robustly interpret tens of thousands of completely sequenced bacterial genomes and nearly complete genome assemblies available in the public databases. To bridge this knowledge gap and develop more robust approaches, we assessed different clustering methods, including segment order based (proximal) clustering, hierarchical clustering, affinity propagation clustering, and a novel network clustering approach on chimeric genomes modeled after bacterial genomes representing a broad spectrum of compositional complexity. Although segment order-based clustering and network clustering compared favorably with the other approaches in discriminating between native and alien DNA at genome optimized settings, network clustering did consistently better than other methods at parametric settings optimized on all test genomes together. Segment order-based clustering and hierarchical clustering outperformed other methods in alien DNA identification while preserving donor identity in the genomes. Our study highlights the strengths and weaknesses of different approaches and suggests how this can be leveraged to achieve a more robust deconstruction of bacterial chimerism.

Asunto(s)

Quimerismo , Genoma Bacteriano , Bacterias/genética , Análisis por Conglomerados , Genoma Bacteriano/genética , Genómica/métodos

17.

Factors That Influence the Choice of Markov Model Order in Discriminating DNA Sequences from Different Sources.

Pandey, Ravi S; Azad, Rajeev K.

OMICS ; 26(6): 348-355, 2022 06.

Artículo en Inglés | MEDLINE | ID: mdl-35648077

RESUMEN

Markov models have frequently been used in genetic sequence analysis. The number of parameters of a Markov model increases exponentially with model order, so it is often recommended that the order be chosen based on the size of data being modeled, lower orders for small and higher orders for large dataset sizes. Approaches based on model selection criterion have also been proposed. An important problem in microbiology and evolutionary biology is to decipher chimeric genomes of microbes, particularly, identify segments of distinct ancestries in genomes and reconstruct the plausible evolutionary scenarios that might have shaped the chimeric genomes in the microbial world. In this study, we assessed a Markov model-based segmentation method for its ability to detect compositionally disparate segments in chimeric sequence constructs as a function of model order, sequence length, and phylogenetic divergence. Our results show that the choice of Markov model order depends on both sequence size and composition. Higher order Markov models were found to be more effective in delineating sequence segments arising from closely related organisms in longer constructs; on the other hand, lower order Markov models were found to be more appropriate in delineating sequence segments arising from distantly related organisms in shorter constructs. These findings are important and timely, with broad implications in fields such as epidemiology that has to deal with the emergence of novel pathogenic chimeras that arise by foreign DNA acquisition, and ecology where chimeric structures may arise in various ecosystems, necessitating more robust approaches for their deconstruction and interpretation.

Asunto(s)

Ecosistema , Modelos Genéticos , Algoritmos , Secuencia de Bases , Cadenas de Markov , Filogenia , Análisis de Secuencia de ADN

18.

CancerNet: a unified deep learning network for pan-cancer diagnostics.

Gore, Steven; Azad, Rajeev K.

BMC Bioinformatics ; 23(1): 229, 2022 Jun 13.

Artículo en Inglés | MEDLINE | ID: mdl-35698059

RESUMEN

BACKGROUND: Despite remarkable advances in cancer research, cancer remains one of the leading causes of death worldwide. Early detection of cancer and localization of the tissue of its origin are key to effective treatment. Here, we leverage technological advances in machine learning or artificial intelligence to design a novel framework for cancer diagnostics. Our proposed framework detects cancers and their tissues of origin using a unified model of cancers encompassing 33 cancers represented in The Cancer Genome Atlas (TCGA). Our model exploits the learned features of different cancers reflected in the respective dysregulated epigenomes, which arise early in carcinogenesis and differ remarkably between different cancer types or subtypes, thus holding a great promise in early cancer detection. RESULTS: Our comprehensive assessment of the proposed model on the 33 different tissues of origin demonstrates its ability to detect and classify cancers to a high accuracy (> 99% overall F-measure). Furthermore, our model distinguishes cancers from pre-cancerous lesions to metastatic tumors and discriminates between hypomethylation changes due to age related epigenetic drift and true cancer. CONCLUSIONS: Beyond detection of primary cancers, our proposed computational model also robustly detects tissues of origin of secondary cancers, including metastatic cancers, second primary cancers, and cancers of unknown primaries. Our assessment revealed the ability of this model to characterize pre-cancer samples, a significant step forward in early cancer detection. Deployed broadly this model can deliver accurate diagnosis for a greatly expanded target patient population.

Asunto(s)

Aprendizaje Profundo , Neoplasias , Inteligencia Artificial , Humanos , Aprendizaje Automático , Neoplasias/diagnóstico , Neoplasias/genética , Neoplasias/patología , Redes Neurales de la Computación

19.

The Arabidopsis gene co-expression network.

Burks, David J; Sengupta, Soham; De, Ronika; Mittler, Ron; Azad, Rajeev K.

Plant Direct ; 6(4): e396, 2022 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-35492683

RESUMEN

Identifying genes that interact to confer a biological function to an organism is one of the main goals of functional genomics. High-throughput technologies for assessment and quantification of genome-wide gene expression patterns have enabled systems-level analyses to infer pathways or networks of genes involved in different functions under many different conditions. Here, we leveraged the publicly available, information-rich RNA-Seq datasets of the model plant Arabidopsis thaliana to construct a gene co-expression network, which was partitioned into clusters or modules that harbor genes correlated by expression. Gene ontology and pathway enrichment analyses were performed to assess functional terms and pathways that were enriched within the different gene modules. By interrogating the co-expression network for genes in different modules that associate with a gene of interest, diverse functional roles of the gene can be deciphered. By mapping genes differentially expressing under a certain condition in Arabidopsis onto the co-expression network, we demonstrate the ability of the network to uncover novel genes that are likely transcriptionally active but prone to be missed by standard statistical approaches due to their falling outside of the confidence zone of detection. To our knowledge, this is the first A. thaliana co-expression network constructed using the entire mRNA-Seq datasets (>20,000) available at the NCBI SRA database. The developed network can serve as a useful resource for the Arabidopsis research community to interrogate specific genes of interest within the network, retrieve the respective interactomes, decipher gene modules that are transcriptionally altered under certain condition or stage, and gain understanding of gene functions.

20.

Analysis of transcribed sequences from young and mature zebrafish thrombocytes.

Fallatah, Weam; De, Ronika; Burks, David; Azad, Rajeev K; Jagadeeswaran, Pudur.

PLoS One ; 17(3): e0264776, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-35320267

RESUMEN

The zebrafish is an excellent model system to study thrombocyte function and development. Due to the difficulties in separating young and mature thrombocytes, comparative transcriptomics between these two cell types has not been performed. It is important to study these differences in order to understand the mechanism of thrombocyte maturation. Here, we performed single-cell RNA sequencing of the young and mature zebrafish thrombocytes and compared the two datasets for young and mature thrombocyte transcripts. We found a total of 9143 genes expressed cumulatively in both young and mature thrombocytes, and among these, 72% of zebrafish thrombocyte-expressed genes have human orthologs according to the Ensembl human genome annotation. We also found 397 uniquely expressed genes in young and 2153 uniquely expressed genes in mature thrombocytes. Of these 397 and 2153 genes, 272 and 1620 corresponded to human orthologous genes, respectively. Of all genes expressed in both young and mature thrombocytes, 4224 have been reported to be expressed in human megakaryocytes, and 1603 were found in platelets. Among these orthologs, 156 transcription factor transcripts in thrombocytes were found in megakaryocytes and 60 transcription factor transcripts were found in platelets including a few already known factors such as Nfe2 and Nfe212a (related to Nfe2) that are present in both megakaryocytes, and platelets. These results indicate that thrombocytes have more megakaryocyte features and since platelets are megakaryocyte fragments, platelets also appear to be thrombocyte equivalents. In conclusion, our study delineates the differential gene expression patterns of young and mature thrombocytes, highlighting the processes regulating thrombocyte maturation. Future knockdown studies of these young and mature thrombocyte-specific genes are feasible and will provide the basis for understanding megakaryocyte maturation.

Asunto(s)

Plaquetas , Pez Cebra , Animales , Plaquetas/metabolismo , Pruebas de Función Plaquetaria , Factores de Transcripción/metabolismo , Pez Cebra/genética , Pez Cebra/metabolismo , Proteínas de Pez Cebra/genética , Proteínas de Pez Cebra/metabolismo

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA