Search | VHL Regional Portal

1.

Escherichia coli non-coding regulatory regions are highly conserved.

Lamoureux, Cameron R; Phaneuf, Patrick V; Palsson, Bernhard O; Zielinski, Daniel C.

NAR Genom Bioinform ; 6(2): lqae041, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38774514

ABSTRACT

Microbial genome sequences are rapidly accumulating, enabling large-scale studies of sequence variation. Existing studies primarily focus on coding regions to study amino acid substitution patterns in proteins. However, non-coding regulatory regions also play a distinct role in determining physiologic responses. To investigate intergenic sequence variation on a large-scale, we identified non-coding regulatory region alleles across 2350 Escherichia coli strains. This 'alleleome' consists of 117 781 unique alleles for 1169 reference regulatory regions (transcribing 1975 genes) at single base-pair resolution. We find that 64% of nucleotide positions are invariant, and variant positions vary in a median of just 0.6% of strains. Additionally, non-coding alleles are sufficient to recover E. coli phylogroups. We find that core promoter elements and transcription factor binding sites are significantly conserved, especially those located upstream of essential or highly-expressed genes. However, variability in conservation of transcription factor binding sites is significant both within and across regulons. Finally, we contrast mutations acquired during adaptive laboratory evolution with wild-type variation, finding that the former preferentially alter positions that the latter conserves. Overall, this analysis elucidates the wealth of information found in E. coli non-coding sequence variation and expands pangenomic studies to non-coding regulatory regions at single-nucleotide resolution.

2.

Bottom-up parameterization of enzyme rate constants: Reconciling inconsistent data.

Zielinski, Daniel C; Matos, Marta R A; de Bree, James E; Glass, Kevin; Sonnenschein, Nikolaus; Palsson, Bernhard O.

Metab Eng Commun ; 18: e00234, 2024 Jun.

Article in English | MEDLINE | ID: mdl-38711578

ABSTRACT

Kinetic models of metabolism are promising platforms for studying complex metabolic systems and designing production strains. Given the availability of enzyme kinetic data from historical experiments and machine learning estimation tools, a straightforward modeling approach is to assemble kinetic data enzyme by enzyme until a desired scale is reached. However, this type of 'bottom up' parameterization of kinetic models has been difficult due to a number of issues including gaps in kinetic parameters, the complexity of enzyme mechanisms, inconsistencies between parameters obtained from different sources, and in vitro-in vivo differences. Here, we present a computational workflow for the robust estimation of kinetic parameters for detailed mass action enzyme models while taking into account parameter uncertainty. The resulting software package, termed MASSef (the Mass Action Stoichiometry Simulation Enzyme Fitting package), can handle standard 'macroscopic' kinetic parameters, including Km, kcat, Ki, Keq, and nh, as well as diverse reaction mechanisms defined in terms of mass action reactions and 'microscopic' rate constants. We provide three enzyme case studies demonstrating that this approach can identify and reconcile inconsistent data either within in vitro experiments or between in vitro and in vivo enzyme function. We further demonstrate how parameterized enzyme modules can be used to assemble pathway-scale kinetic models consistent with in vivo behavior. This work builds on the legacy of knowledge on kinetic behavior of enzymes by enabling robust parameterization of enzyme kinetic models at scale utilizing the abundance of historical literature data and machine learning parameter estimates.

3.

BGCFlow: systematic pangenome workflow for the analysis of biosynthetic gene clusters across large genomic datasets.

Nuhamunada, Matin; Mohite, Omkar S; Phaneuf, Patrick V; Palsson, Bernhard O; Weber, Tilmann.

Nucleic Acids Res ; 2024 Apr 30.

Article in English | MEDLINE | ID: mdl-38686794

ABSTRACT

Genome mining is revolutionizing natural products discovery efforts. The rapid increase in available genomes demands comprehensive computational platforms to effectively extract biosynthetic knowledge encoded across bacterial pangenomes. Here, we present BGCFlow, a novel systematic workflow integrating analytics for large-scale genome mining of bacterial pangenomes. BGCFlow incorporates several genome analytics and mining tools grouped into five common stages of analysis such as: (i) data selection, (ii) functional annotation, (iii) phylogenetic analysis, (iv) genome mining, and (v) comparative analysis. Furthermore, BGCFlow provides easy configuration of different projects, parallel distribution, scheduled job monitoring, an interactive database to visualize tables, exploratory Jupyter Notebooks, and customized reports. Here, we demonstrate the application of BGCFlow by investigating the phylogenetic distribution of various biosynthetic gene clusters detected across 42 genomes of the Saccharopolyspora genus, known to produce industrially important secondary/specialized metabolites. The BGCFlow-guided analysis predicted more accurate dereplication of BGCs and guided the targeted comparative analysis of selected RiPPs. The scalable, interoperable, adaptable, re-entrant, and reproducible nature of the BGCFlow will provide an effective novel way to extract the biosynthetic knowledge from the ever-growing genomic datasets of biotechnologically relevant bacterial species.

4.

Advancing the scale of synthetic biology via cross-species transfer of cellular functions enabled by iModulon engraftment.

Choe, Donghui; Olson, Connor A; Szubin, Richard; Yang, Hannah; Sung, Jaemin; Feist, Adam M; Palsson, Bernhard O.

Nat Commun ; 15(1): 2356, 2024 Mar 15.

Article in English | MEDLINE | ID: mdl-38490991

ABSTRACT

Machine learning applied to large compendia of transcriptomic data has enabled the decomposition of bacterial transcriptomes to identify independently modulated sets of genes, such iModulons represent specific cellular functions. The identification of iModulons enables accurate identification of genes necessary and sufficient for cross-species transfer of cellular functions. We demonstrate cross-species transfer of: 1) the biotransformation of vanillate to protocatechuate, 2) a malonate catabolic pathway, 3) a catabolic pathway for 2,3-butanediol, and 4) an antimicrobial resistance to ampicillin found in multiple Pseudomonas species to Escherichia coli. iModulon-based engineering is a transformative strategy as it includes all genes comprising the transferred cellular function, including genes without functional annotation. Adaptive laboratory evolution was deployed to optimize the cellular function transferred, revealing mutations in the host. Combining big data analytics and laboratory evolution thus enhances the level of understanding of systems biology, and synthetic biology for strain design and development.

Subject(s)

Escherichia coli , Synthetic Biology , Escherichia coli/genetics , Escherichia coli/metabolism , Genes, Bacterial , Pseudomonas/genetics

5.

CRISPR-aided genome engineering for secondary metabolite biosynthesis in Streptomyces.

Lee, Yongjae; Hwang, Soonkyu; Kim, Woori; Kim, Ji Hun; Palsson, Bernhard O; Cho, Byung-Kwan.

J Ind Microbiol Biotechnol ; 512024 Jan 09.

Article in English | MEDLINE | ID: mdl-38439699

ABSTRACT

The demand for discovering novel microbial secondary metabolites is growing to address the limitations in bioactivities such as antibacterial, antifungal, anticancer, anthelmintic, and immunosuppressive functions. Among microbes, the genus Streptomyces holds particular significance for secondary metabolite discovery. Each Streptomyces species typically encodes approximately 30 secondary metabolite biosynthetic gene clusters (smBGCs) within its genome, which are mostly uncharacterized in terms of their products and bioactivities. The development of next-generation sequencing has enabled the identification of a large number of potent smBGCs for novel secondary metabolites that are imbalanced in number compared with discovered secondary metabolites. The clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated (Cas) system has revolutionized the translation of enormous genomic potential into the discovery of secondary metabolites as the most efficient genetic engineering tool for Streptomyces. In this review, the current status of CRISPR/Cas applications in Streptomyces is summarized, with particular focus on the identification of secondary metabolite biosynthesis gene clusters and their potential applications.This review summarizes the broad range of CRISPR/Cas applications in Streptomyces for natural product discovery and production. ONE-SENTENCE SUMMARY: This review summarizes the broad range of CRISPR/Cas applications in Streptomyces for natural product discovery and production.

Subject(s)

Biological Products , Streptomyces , Streptomyces/genetics , Streptomyces/metabolism , CRISPR-Cas Systems , Genetic Engineering , Genome, Bacterial , Biological Products/metabolism , Gene Editing

6.

StressME: Unified computing framework of Escherichia coli metabolism, gene expression, and stress responses.

Zhao, Jiao; Chen, Ke; Palsson, Bernhard O; Yang, Laurence.

PLoS Comput Biol ; 20(2): e1011865, 2024 Feb.

Article in English | MEDLINE | ID: mdl-38346086

ABSTRACT

Generalist microbes have adapted to a multitude of environmental stresses through their integrated stress response system. Individual stress responses have been quantified by E. coli metabolism and expression (ME) models under thermal, oxidative and acid stress, respectively. However, the systematic quantification of cross-stress & cross-talk among these stress responses remains lacking. Here, we present StressME: the unified stress response model of E. coli combining thermal (FoldME), oxidative (OxidizeME) and acid (AcidifyME) stress responses. StressME is the most up to date ME model for E. coli and it reproduces all published single-stress ME models. Additionally, it includes refined rate constants to improve prediction accuracy for wild-type and stress-evolved strains. StressME revealed certain optimal proteome allocation strategies associated with cross-stress and cross-talk responses. These stress-optimal proteomes were shaped by trade-offs between protective vs. metabolic enzymes; cytoplasmic vs. periplasmic chaperones; and expression of stress-specific proteins. As StressME is tuned to compute metabolic and gene expression responses under mild acid, oxidative, and thermal stresses, it is useful for engineering and health applications. The modular design of our open-source package also facilitates model expansion (e.g., to new stress mechanisms) by the computational biology community.

Subject(s)

Escherichia coli Proteins , Escherichia coli , Escherichia coli/metabolism , Escherichia coli Proteins/genetics , Escherichia coli Proteins/metabolism , Stress, Physiological/genetics , Oxidation-Reduction , Heat-Shock Proteins/metabolism , Acids/metabolism , Gene Expression

7.

Machine learning analysis of RB-TnSeq fitness data predicts functional gene modules in Pseudomonas putida KT2440.

Borchert, Andrew J; Bleem, Alissa C; Lim, Hyun Gyu; Rychel, Kevin; Dooley, Keven D; Kellermyer, Zoe A; Hodges, Tracy L; Palsson, Bernhard O; Beckham, Gregg T.

mSystems ; 9(3): e0094223, 2024 Mar 19.

Article in English | MEDLINE | ID: mdl-38323821

ABSTRACT

There is growing interest in engineering Pseudomonas putida KT2440 as a microbial chassis for the conversion of renewable and waste-based feedstocks, and metabolic engineering of P. putida relies on the understanding of the functional relationships between genes. In this work, independent component analysis (ICA) was applied to a compendium of existing fitness data from randomly barcoded transposon insertion sequencing (RB-TnSeq) of P. putida KT2440 grown in 179 unique experimental conditions. ICA identified 84 independent groups of genes, which we call fModules ("functional modules"), where gene members displayed shared functional influence in a specific cellular process. This machine learning-based approach both successfully recapitulated previously characterized functional relationships and established hitherto unknown associations between genes. Selected gene members from fModules for hydroxycinnamate metabolism and stress resistance, acetyl coenzyme A assimilation, and nitrogen metabolism were validated with engineered mutants of P. putida. Additionally, functional gene clusters from ICA of RB-TnSeq data sets were compared with regulatory gene clusters from prior ICA of RNAseq data sets to draw connections between gene regulation and function. Because ICA profiles the functional role of several distinct gene networks simultaneously, it can reduce the time required to annotate gene function relative to manual curation of RB-TnSeq data sets. IMPORTANCE: This study demonstrates a rapid, automated approach for elucidating functional modules within complex genetic networks. While Pseudomonas putida randomly barcoded transposon insertion sequencing data were used as a proof of concept, this approach is applicable to any organism with existing functional genomics data sets and may serve as a useful tool for many valuable applications, such as guiding metabolic engineering efforts in other microbes or understanding functional relationships between virulence-associated genes in pathogenic microbes. Furthermore, this work demonstrates that comparison of data obtained from independent component analysis of transcriptomics and gene fitness datasets can elucidate regulatory-functional relationships between genes, which may have utility in a variety of applications, such as metabolic modeling, strain engineering, or identification of antimicrobial drug targets.

Subject(s)

Pseudomonas putida , Pseudomonas putida/genetics , Gene Regulatory Networks , Genomics

8.

Reconstructing the transcriptional regulatory network of probiotic L. reuteri is enabled by transcriptomics and machine learning.

Josephs-Spaulding, Jonathan; Rajput, Akanksha; Hefner, Ying; Szubin, Richard; Balasubramanian, Archana; Li, Gaoyuan; Zielinski, Daniel C; Jahn, Leonie; Sommer, Morten; Phaneuf, Patrick; Palsson, Bernhard O.

mSystems ; 9(3): e0125723, 2024 Mar 19.

Article in English | MEDLINE | ID: mdl-38349131

ABSTRACT

Limosilactobacillus reuteri, a probiotic microbe instrumental to human health and sustainable food production, adapts to diverse environmental shifts via dynamic gene expression. We applied the independent component analysis (ICA) to 117 RNA-seq data sets to decode its transcriptional regulatory network (TRN), identifying 35 distinct signals that modulate specific gene sets. Our findings indicate that the ICA provides a qualitative advancement and captures nuanced relationships within gene clusters that other methods may miss. This study uncovers the fundamental properties of L. reuteri's TRN and deepens our understanding of its arginine metabolism and the co-regulation of riboflavin metabolism and fatty acid conversion. It also sheds light on conditions that regulate genes within a specific biosynthetic gene cluster and allows for the speculation of the potential role of isoprenoid biosynthesis in L. reuteri's adaptive response to environmental changes. By integrating transcriptomics and machine learning, we provide a system-level understanding of L. reuteri's response mechanism to environmental fluctuations, thus setting the stage for modeling the probiotic transcriptome for applications in microbial food production. IMPORTANCE: We have studied Limosilactobacillus reuteri, a beneficial probiotic microbe that plays a significant role in our health and production of sustainable foods, a type of foods that are nutritionally dense and healthier and have low-carbon emissions compared to traditional foods. Similar to how humans adapt their lifestyles to different environments, this microbe adjusts its behavior by modulating the expression of genes. We applied machine learning to analyze large-scale data sets on how these genes behave across diverse conditions. From this, we identified 35 unique patterns demonstrating how L. reuteri adjusts its genes based on 50 unique environmental conditions (such as various sugars, salts, microbial cocultures, human milk, and fruit juice). This research helps us understand better how L. reuteri functions, especially in processes like breaking down certain nutrients and adapting to stressful changes. More importantly, with our findings, we become closer to using this knowledge to improve how we produce more sustainable and healthier foods with the help of microbes.

Subject(s)

Limosilactobacillus reuteri , Probiotics , Humans , Limosilactobacillus reuteri/genetics , Gene Expression Profiling , Transcriptome/genetics , Machine Learning

9.

Independent component analysis reveals 49 independently modulated gene sets within the global transcriptional regulatory architecture of multidrug-resistant Acinetobacter baumannii.

Menon, Nitasha D; Poudel, Saugat; Sastry, Anand V; Rychel, Kevin; Szubin, Richard; Dillon, Nicholas; Tsunemoto, Hannah; Hirose, Yujiro; Nair, Bipin G; Kumar, Geetha B; Palsson, Bernhard O; Nizet, Victor.

mSystems ; 9(2): e0060623, 2024 Feb 20.

Article in English | MEDLINE | ID: mdl-38189271

ABSTRACT

Acinetobacter baumannii causes severe infections in humans, resists multiple antibiotics, and survives in stressful environmental conditions due to modulations of its complex transcriptional regulatory network (TRN). Unfortunately, our global understanding of the TRN in this emerging opportunistic pathogen is limited. Here, we apply independent component analysis, an unsupervised machine learning method, to a compendium of 139 RNA-seq data sets of three multidrug-resistant A. baumannii international clonal complex I strains (AB5075, AYE, and AB0057). This analysis allows us to define 49 independently modulated gene sets, which we call iModulons. Analysis of the identified A. baumannii iModulons reveals validating parallels to previously defined biological operons/regulons and provides a framework for defining unknown regulons. By utilizing the iModulons, we uncover potential mechanisms for a RpoS-independent general stress response, define global stress-virulence trade-offs, and identify conditions that may induce plasmid-borne multidrug resistance. The iModulons provide a model of the TRN that emphasizes the importance of transcriptional regulation of virulence phenotypes in A. baumannii. Furthermore, they suggest the possibility of future interventions to guide gene expression toward diminished pathogenic potential.IMPORTANCEThe rise in hospital outbreaks of multidrug-resistant Acinetobacter baumannii infections underscores the urgent need for alternatives to traditional broad-spectrum antibiotic therapies. The success of A. baumannii as a significant nosocomial pathogen is largely attributed to its ability to resist antibiotics and survive environmental stressors. However, there is limited literature available on the global, complex regulatory circuitry that shapes these phenotypes. Computational tools that can assist in the elucidation of A. baumannii's transcriptional regulatory network architecture can provide much-needed context for a comprehensive understanding of pathogenesis and virulence, as well as for the development of targeted therapies that modulate these pathways.

Subject(s)

Acinetobacter Infections , Acinetobacter baumannii , Humans , Acinetobacter baumannii/genetics , Acinetobacter Infections/drug therapy , Virulence/genetics , Gene Expression Regulation , Anti-Bacterial Agents/pharmacology

10.

A data-driven approach for timescale decomposition of biochemical reaction networks.

Akbari, Amir; Haiman, Zachary B; Palsson, Bernhard O.

mSystems ; 9(2): e0100123, 2024 Feb 20.

Article in English | MEDLINE | ID: mdl-38259168

ABSTRACT

Understanding the dynamics of biological systems in evolving environments is a challenge due to their scale and complexity. Here, we present a computational framework for the timescale decomposition of biochemical reaction networks to distill essential patterns from their intricate dynamics. This approach identifies timescale hierarchies, concentration pools, and coherent structures from time-series data, providing a system-level description of reaction networks at physiologically important timescales. We apply this technique to kinetic models of hypothetical and biological pathways, validating it by reproducing analytically characterized or previously known concentration pools of these pathways. Moreover, by analyzing the timescale hierarchy of the glycolytic pathway, we elucidate the connections between the stoichiometric and dissipative structures of reaction networks and the temporal organization of coherent structures. Specifically, we show that glycolysis is a cofactor-driven pathway, the slowest dynamics of which are described by a balance between high-energy phosphate bond and redox trafficking. Overall, this approach provides more biologically interpretable characterizations of network dynamics than large-scale kinetic models, thus facilitating model reduction and personalized medicine applications. IMPORTANCE Complex interactions within interconnected biochemical reaction networks enable cellular responses to a wide range of unpredictable environmental perturbations. Understanding how biological functions arise from these intricate interactions has been a long-standing problem in biology. Here, we introduce a computational approach to dissect complex biological systems' dynamics in evolving environments. This approach characterizes the timescale hierarchies of complex reaction networks, offering a system-level understanding at physiologically relevant timescales. Analyzing various hypothetical and biological pathways, we show how stoichiometric properties shape the way energy is dissipated throughout reaction networks. Notably, we establish that glycolysis operates as a cofactor-driven pathway, where the slowest dynamics are governed by a balance between high-energy phosphate bonds and redox trafficking. This approach enhances our understanding of network dynamics and facilitates the development of reduced-order kinetic models with biologically interpretable components.

Subject(s)

Cell Physiological Phenomena , Glycolysis , Kinetics , Phosphates

11.

Inferred regulons are consistent with regulator binding sequences in E. coli.

Qiu, Sizhe; Wan, Xinlong; Liang, Yueshan; Lamoureux, Cameron R; Akbari, Amir; Palsson, Bernhard O; Zielinski, Daniel C.

PLoS Comput Biol ; 20(1): e1011824, 2024 Jan.

Article in English | MEDLINE | ID: mdl-38252668

ABSTRACT

The transcriptional regulatory network (TRN) of E. coli consists of thousands of interactions between regulators and DNA sequences. Regulons are typically determined either from resource-intensive experimental measurement of functional binding sites, or inferred from analysis of high-throughput gene expression datasets. Recently, independent component analysis (ICA) of RNA-seq compendia has shown to be a powerful method for inferring bacterial regulons. However, it remains unclear to what extent regulons predicted by ICA structure have a biochemical basis in promoter sequences. Here, we address this question by developing machine learning models that predict inferred regulon structures in E. coli based on promoter sequence features. Models were constructed successfully (cross-validation AUROC > = 0.8) for 85% (40/47) of ICA-inferred E. coli regulons. We found that: 1) The presence of a high scoring regulator motif in the promoter region was sufficient to specify regulatory activity in 40% (19/47) of the regulons, 2) Additional features, such as DNA shape and extended motifs that can account for regulator multimeric binding, helped to specify regulon structure for the remaining 60% of regulons (28/47); 3) investigating regulons where initial machine learning models failed revealed new regulator-specific sequence features that improved model accuracy. Finally, we found that strong regulatory binding sequences underlie both the genes shared between ICA-inferred and experimental regulons as well as genes in the E. coli core pan-regulon of Fur. This work demonstrates that the structure of ICA-inferred regulons largely can be understood through the strength of regulator binding sites in promoter regions, reinforcing the utility of top-down inference for regulon discovery.

Subject(s)

Escherichia coli , Regulon , Regulon/genetics , Escherichia coli/genetics , Escherichia coli/metabolism , Bacteria/genetics , Binding Sites/genetics , Promoter Regions, Genetic/genetics , Gene Expression Regulation, Bacterial/genetics , Bacterial Proteins/metabolism

12.

Biological and Genetic Determinants of Glycolysis: Phosphofructokinase Isoforms Boost Energy Status of Stored Red Blood Cells and Transfusion Outcomes.

Nemkov, Travis; Stephenson, Daniel; Earley, Eric J; Keele, Gregory R; Hay, Ariel; Key, Alicia; Haiman, Zachary; Erickson, Christopher; Dzieciatkowska, Monika; Reisz, Julie A; Moore, Amy; Stone, Mars; Deng, Xutao; Kleinman, Steven; Spitalnik, Steven L; Hod, Eldad A; Hudson, Krystalyn E; Hansen, Kirk C; Palsson, Bernhard O; Churchill, Gary A; Roubinian, Nareg; Norris, Philip J; Busch, Michael P; Zimring, James C; Page, Grier P; D'Alessandro, Angelo.

bioRxiv ; 2024 Apr 17.

Article in English | MEDLINE | ID: mdl-38260479

ABSTRACT

Mature red blood cells (RBCs) lack mitochondria, and thus exclusively rely on glycolysis to generate adenosine triphosphate (ATP) during aging in vivo and during storage in vitro in the blood bank. Here we identify an association between blood donor age, sex, ethnicity and end-of-storage levels of glycolytic metabolites in 13,029 volunteers from the Recipient Epidemiology and Donor Evaluation Study. Associations were also observed to ancestry-specific genetic polymorphisms in regions encoding phosphofructokinase 1, platelet (which we detected in mature RBCs), hexokinase 1, and ADP-ribosyl cyclase 1 and 2 (CD38/BST1). Gene-metabolite associations were validated in fresh and stored RBCs from 525 Diversity Outbred mice, and via multi-omics characterization of 1,929 samples from 643 human RBC units during storage. ATP levels, breakdown, and deamination into hypoxanthine were associated with hemolysis in vitro and in vivo, both in healthy autologous transfusion recipients and in 5,816 critically ill patients receiving heterologous transfusions. Highlights: Blood donor age and sex affect glycolysis in stored RBCs from 13,029 volunteers;Ancestry, genetic polymorphisms in PFKP, HK1, CD38/BST1 influence RBC glycolysis;RBC PFKP boosts glycolytic fluxes when ATP is low, such as in stored RBCs;ATP and hypoxanthine are biomarkers of hemolysis in vitro and in vivo.

13.

Global pathogenomic analysis identifies known and candidate genetic antimicrobial resistance determinants in twelve species.

Hyun, Jason C; Monk, Jonathan M; Szubin, Richard; Hefner, Ying; Palsson, Bernhard O.

Nat Commun ; 14(1): 7690, 2023 Nov 24.

Article in English | MEDLINE | ID: mdl-38001096

ABSTRACT

Surveillance programs for managing antimicrobial resistance (AMR) have yielded thousands of genomes suited for data-driven mechanism discovery. We present a workflow integrating pangenomics, gene annotation, and machine learning to identify AMR genes at scale. When applied to 12 species, 27,155 genomes, and 69 drugs, we 1) find AMR gene transfer mostly confined within related species, with 925 genes in multiple species but just eight in multiple phylogenetic classes, 2) demonstrate that discovery-oriented support vector machines outperform contemporary methods at recovering known AMR genes, recovering 263 genes compared to 145 by Pyseer, and 3) identify 142 AMR gene candidates. Validation of two candidates in E. coli BW25113 reveals cases of conditional resistance: ΔcycA confers ciprofloxacin resistance in minimal media with D-serine, and frdD V111D confers ampicillin resistance in the presence of ampC by modifying the overlapping promoter. We expect this approach to be adaptable to other species and phenotypes.

Subject(s)

Anti-Bacterial Agents , Escherichia coli , Anti-Bacterial Agents/pharmacology , Escherichia coli/genetics , Drug Resistance, Bacterial/genetics , Phylogeny , Ciprofloxacin/pharmacology

14.

Functional annotation of enzyme-encoding genes using deep learning with transformer layers.

Kim, Gi Bae; Kim, Ji Yeon; Lee, Jong An; Norsigian, Charles J; Palsson, Bernhard O; Lee, Sang Yup.

Nat Commun ; 14(1): 7370, 2023 11 14.

Article in English | MEDLINE | ID: mdl-37963869

ABSTRACT

Functional annotation of open reading frames in microbial genomes remains substantially incomplete. Enzymes constitute the most prevalent functional gene class in microbial genomes and can be described by their specific catalytic functions using the Enzyme Commission (EC) number. Consequently, the ability to predict EC numbers could substantially reduce the number of un-annotated genes. Here we present a deep learning model, DeepECtransformer, which utilizes transformer layers as a neural network architecture to predict EC numbers. Using the extensively studied Escherichia coli K-12 MG1655 genome, DeepECtransformer predicted EC numbers for 464 un-annotated genes. We experimentally validated the enzymatic activities predicted for three proteins (YgfF, YciO, and YjdM). Further examination of the neural network's reasoning process revealed that the trained neural network relies on functional motifs of enzymes to predict EC numbers. Thus, DeepECtransformer is a method that facilitates the functional annotation of uncharacterized genes.

Subject(s)

Deep Learning , Escherichia coli K12 , Escherichia coli K12/genetics , Proteins/genetics , Genome , Escherichia coli/genetics , Molecular Sequence Annotation , Open Reading Frames

15.

Differential Expression Analysis Utilizing Condition-Specific Metabolic Pathways.

Mattei, Gianluca; Gan, Zhuohui; Ramazzotti, Matteo; Palsson, Bernhard O; Zielinski, Daniel C.

Metabolites ; 13(11)2023 Nov 03.

Article in English | MEDLINE | ID: mdl-37999223

ABSTRACT

Pathway analysis is ubiquitous in biological data analysis due to the ability to integrate small simultaneous changes in functionally related components. While pathways are often defined based on either manual curation or network topological properties, an attractive alternative is to generate pathways around specific functions, in which metabolism can be defined as the production and consumption of specific metabolites. In this work, we present an algorithm, termed MetPath, that calculates pathways for condition-specific production and consumption of specific metabolites. We demonstrate that these pathways have several useful properties. Pathways calculated in this manner (1) take into account the condition-specific metabolic role of a gene product, (2) are localized around defined metabolic functions, and (3) quantitatively weigh the importance of expression to a function based on the flux contribution of the gene product. We demonstrate how these pathways elucidate network interactions between genes across different growth conditions and between cell types. Furthermore, the calculated pathways compare favorably to manually curated pathways in predicting the expression correlation between genes. To facilitate the use of these pathways, we have generated a large compendium of pathways under different growth conditions for E. coli. The MetPath algorithm provides a useful tool for metabolic network-based statistical analyses of high-throughput data.

16.

Modeling Red Blood Cell Metabolism in the Omics Era.

Key, Alicia; Haiman, Zachary; Palsson, Bernhard O; D'Alessandro, Angelo.

Metabolites ; 13(11)2023 Nov 11.

Article in English | MEDLINE | ID: mdl-37999241

ABSTRACT

Red blood cells (RBCs) are abundant (more than 80% of the total cells in the human body), yet relatively simple, as they lack nuclei and organelles, including mitochondria. Since the earliest days of biochemistry, the accessibility of blood and RBCs made them an ideal matrix for the characterization of metabolism. Because of this, investigations into RBC metabolism are of extreme relevance for research and diagnostic purposes in scientific and clinical endeavors. The relative simplicity of RBCs has made them an eligible model for the development of reconstruction maps of eukaryotic cell metabolism since the early days of systems biology. Computational models hold the potential to deepen knowledge of RBC metabolism, but also and foremost to predict in silico RBC metabolic behaviors in response to environmental stimuli. Here, we review now classic concepts on RBC metabolism, prior work in systems biology of unicellular organisms, and how this work paved the way for the development of reconstruction models of RBC metabolism. Translationally, we discuss how the fields of metabolomics and systems biology have generated evidence to advance our understanding of the RBC storage lesion, a process of decline in storage quality that impacts over a hundred million blood units transfused every year.

17.

A data-driven approach for timescale decomposition of biochemical reaction networks.

Akbari, Amir; Haiman, Zachary B; Palsson, Bernhard O.

bioRxiv ; 2023 Aug 22.

Article in English | MEDLINE | ID: mdl-37662221

ABSTRACT

Understanding the dynamics of biological systems in evolving environments is a challenge due to their scale and complexity. Here, we present a computational framework for timescale decomposition of biochemical reaction networks to distill essential patterns from their intricate dynamics. This approach identifies timescale hierarchies, concentration pools, and coherent structures from time-series data, providing a system-level description of reaction networks at physiologically important timescales. We apply this technique to kinetic models of hypothetical and biological pathways, validating it by reproducing analytically characterized or previously known concentration pools of these pathways. Moreover, by analyzing the timescale hierarchy of the glycolytic pathway, we elucidate the connections between the stoichiometric and dissipative structures of reaction networks and the temporal organization of coherent structures. Specifically, we show that glycolysis is a cofactor driven pathway, the slowest dynamics of which are described by a balance between high-energy phosphate bond and redox trafficking. Overall, this approach provides more biologically interpretable characterizations of network dynamics than large-scale kinetic models, thus facilitating model reduction and personalized medicine applications.

18.

Laboratory evolution, transcriptomics, and modeling reveal mechanisms of paraquat tolerance.

Rychel, Kevin; Tan, Justin; Patel, Arjun; Lamoureux, Cameron; Hefner, Ying; Szubin, Richard; Johnsen, Josefin; Mohamed, Elsayed Tharwat Tolba; Phaneuf, Patrick V; Anand, Amitesh; Olson, Connor A; Park, Joon Ho; Sastry, Anand V; Yang, Laurence; Feist, Adam M; Palsson, Bernhard O.

Cell Rep ; 42(9): 113105, 2023 09 26.

Article in English | MEDLINE | ID: mdl-37713311

ABSTRACT

Relationships between the genome, transcriptome, and metabolome underlie all evolved phenotypes. However, it has proved difficult to elucidate these relationships because of the high number of variables measured. A recently developed data analytic method for characterizing the transcriptome can simplify interpretation by grouping genes into independently modulated sets (iModulons). Here, we demonstrate how iModulons reveal deep understanding of the effects of causal mutations and metabolic rewiring. We use adaptive laboratory evolution to generate E. coli strains that tolerate high levels of the redox cycling compound paraquat, which produces reactive oxygen species (ROS). We combine resequencing, iModulons, and metabolic models to elucidate six interacting stress-tolerance mechanisms: (1) modification of transport, (2) activation of ROS stress responses, (3) use of ROS-sensitive iron regulation, (4) motility, (5) broad transcriptional reallocation toward growth, and (6) metabolic rewiring to decrease NADH production. This work thus demonstrates the power of iModulon knowledge mapping for evolution analysis.

Subject(s)

Escherichia coli , Paraquat , Paraquat/pharmacology , Reactive Oxygen Species/metabolism , Escherichia coli/metabolism , Transcriptome/genetics , Gene Expression Profiling

19.

A multi-scale expression and regulation knowledge base for Escherichia coli.

Lamoureux, Cameron R; Decker, Katherine T; Sastry, Anand V; Rychel, Kevin; Gao, Ye; McConn, John Luke; Zielinski, Daniel C; Palsson, Bernhard O.

Nucleic Acids Res ; 51(19): 10176-10193, 2023 10 27.

Article in English | MEDLINE | ID: mdl-37713610

ABSTRACT

Transcriptomic data is accumulating rapidly; thus, scalable methods for extracting knowledge from this data are critical. Here, we assembled a top-down expression and regulation knowledge base for Escherichia coli. The expression component is a 1035-sample, high-quality RNA-seq compendium consisting of data generated in our lab using a single experimental protocol. The compendium contains diverse growth conditions, including: 9 media; 39 supplements, including antibiotics; 42 heterologous proteins; and 76 gene knockouts. Using this resource, we elucidated global expression patterns. We used machine learning to extract 201 modules that account for 86% of known regulatory interactions, creating the regulatory component. With these modules, we identified two novel regulons and quantified systems-level regulatory responses. We also integrated 1675 curated, publicly-available transcriptomes into the resource. We demonstrated workflows for analyzing new data against this knowledge base via deconstruction of regulation during aerobic transition. This resource illuminates the E. coli transcriptome at scale and provides a blueprint for top-down transcriptomic analysis of non-model organisms.

Subject(s)

Escherichia coli , Knowledge Bases , Escherichia coli/genetics , Escherichia coli/metabolism , Escherichia coli Proteins/genetics , Escherichia coli Proteins/metabolism , Gene Expression Profiling , Gene Expression Regulation, Bacterial , Transcriptome

20.

Pangenome analysis reveals the genetic basis for taxonomic classification of the Lactobacillaceae family.

Rajput, Akanksha; Chauhan, Siddharth M; Mohite, Omkar S; Hyun, Jason C; Ardalani, Omid; Jahn, Leonie J; Sommer, Morten Oa; Palsson, Bernhard O.

Food Microbiol ; 115: 104334, 2023 Oct.

Article in English | MEDLINE | ID: mdl-37567624

ABSTRACT

Lactobacillaceae represent a large family of important microbes that are foundational to the food industry. Many genome sequences of Lactobacillaceae strains are now available, enabling us to conduct a comprehensive pangenome analysis of this family. We collected 3591 high-quality genomes from public sources and found that: 1) they contained enough genomes for 26 species to perform a pangenomic analysis, 2) the normalized Heap's coefficient λ (a measure of pangenome openness) was found to have an average value of 0.27 (ranging from 0.07 to 0.37), 3) the pangenome openness was correlated with the abundance and genomic location of transposons and mobilomes, 4) the pangenome for each species was divided into core, accessory, and rare genomes, that highlight the species-specific properties (such as motility and restriction-modification systems), 5) the pangenome of Lactiplantibacillus plantarum (which contained the highest number of genomes found amongst the 26 species studied) contained nine distinct phylogroups, and 6) genome mining revealed a richness of detected biosynthetic gene clusters, with functions ranging from antimicrobial and probiotic to food preservation, but â¼93% were of unknown function. This study provides the first in-depth comparative pangenomics analysis of the Lactobacillaceae family.

Subject(s)

Genomics , Lactobacillaceae , Phylogeny

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL