RESUMO
Current machine learning techniques enable robust association of biological signals with measured phenotypes, but these approaches are incapable of identifying causal relationships. Here, we develop an integrated "white-box" biochemical screening, network modeling, and machine learning approach for revealing causal mechanisms and apply this approach to understanding antibiotic efficacy. We counter-screen diverse metabolites against bactericidal antibiotics in Escherichia coli and simulate their corresponding metabolic states using a genome-scale metabolic network model. Regression of the measured screening data on model simulations reveals that purine biosynthesis participates in antibiotic lethality, which we validate experimentally. We show that antibiotic-induced adenine limitation increases ATP demand, which elevates central carbon metabolism activity and oxygen consumption, enhancing the killing effects of antibiotics. This work demonstrates how prospective network modeling can couple with machine learning to identify complex causal mechanisms underlying drug efficacy.
Assuntos
Antibacterianos/metabolismo , Antibacterianos/farmacologia , Redes e Vias Metabólicas/efeitos dos fármacos , Adenina/metabolismo , Biologia Computacional/métodos , Avaliação Pré-Clínica de Medicamentos/métodos , Escherichia coli/metabolismo , Aprendizado de Máquina , Redes e Vias Metabólicas/imunologia , Modelos Teóricos , Purinas/metabolismoRESUMO
Purine biosynthesis and metabolism, conserved in all living organisms, is essential for cellular energy homeostasis and nucleic acid synthesis. The de novo synthesis of purine precursors is under tight negative feedback regulation mediated by adenosine and guanine nucleotides. We describe a distinct early-onset neurodegenerative condition resulting from mutations in the adenosine monophosphate deaminase 2 gene (AMPD2). Patients have characteristic brain imaging features of pontocerebellar hypoplasia (PCH) due to loss of brainstem and cerebellar parenchyma. We found that AMPD2 plays an evolutionary conserved role in the maintenance of cellular guanine nucleotide pools by regulating the feedback inhibition of adenosine derivatives on de novo purine synthesis. AMPD2 deficiency results in defective GTP-dependent initiation of protein translation, which can be rescued by administration of purine precursors. These data suggest AMPD2-related PCH as a potentially treatable early-onset neurodegenerative disease.
Assuntos
AMP Desaminase/metabolismo , Atrofias Olivopontocerebelares/metabolismo , Purinas/biossíntese , AMP Desaminase/química , AMP Desaminase/genética , Animais , Tronco Encefálico/patologia , Cerebelo/patologia , Criança , Feminino , Guanosina Trifosfato/metabolismo , Humanos , Masculino , Camundongos , Camundongos Knockout , Mutação , Células-Tronco Neurais/metabolismo , Atrofias Olivopontocerebelares/genética , Atrofias Olivopontocerebelares/patologia , Biossíntese de Proteínas , Saccharomyces cerevisiae/enzimologia , Saccharomyces cerevisiae/metabolismoRESUMO
A biochemical pathway consists of a series of interconnected biochemical reactions to accomplish specific life activities. The participating reactants and resultant products of a pathway, including gene fragments, proteins, and small molecules, coalesce to form a complex reaction network. Biochemical pathways play a critical role in the biochemical domain as they can reveal the flow of biochemical reactions in living organisms, making them essential for understanding life processes. Existing studies of biochemical pathway networks are mainly based on experimentation and pathway database analysis methods, which are plagued by substantial cost constraints. Inspired by the success of representation learning approaches in biomedicine, we develop the biochemical pathway prediction (BPP) platform, which is an automatic BPP platform to identify potential links or attributes within biochemical pathway networks. Our BPP platform incorporates a variety of representation learning models, including the latest hypergraph neural networks technology to model biochemical reactions in pathways. In particular, BPP contains the latest biochemical pathway-based datasets and enables the prediction of potential participants or products of biochemical reactions in biochemical pathways. Additionally, BPP is equipped with an SHAP explainer to explain the predicted results and to calculate the contributions of each participating element. We conduct extensive experiments on our collected biochemical pathway dataset to benchmark the effectiveness of all models available on BPP. Furthermore, our detailed case studies based on the chronological pattern of our dataset demonstrate the effectiveness of our platform. Our BPP web portal, source code and datasets are freely accessible at https://github.com/Glasgow-AI4BioMed/BPP.
Assuntos
Biologia Computacional , Redes Neurais de Computação , Biologia Computacional/métodos , Redes e Vias Metabólicas , Software , Algoritmos , HumanosRESUMO
Eukaryotic cells compartmentalize biochemical processes in different organelles, often relying on metabolic cycles to shuttle reducing equivalents across intracellular membranes. NADPH serves as the electron carrier for the maintenance of redox homeostasis and reductive biosynthesis, with separate cytosolic and mitochondrial pools providing reducing power in each respective location. This cellular organization is critical for numerous functions but complicates analysis of metabolic pathways using available methods. Here we develop an approach to resolve NADP(H)-dependent pathways present within both the cytosol and the mitochondria. By tracing hydrogen in compartmentalized reactions that use NADPH as a cofactor, including the production of 2-hydroxyglutarate by mutant isocitrate dehydrogenase enzymes, we can observe metabolic pathway activity in these distinct cellular compartments. Using this system we determine the direction of serine/glycine interconversion within the mitochondria and cytosol, highlighting the ability of this approach to resolve compartmentalized reactions in intact cells.
Assuntos
Citosol/metabolismo , Mitocôndrias/metabolismo , NADP/metabolismo , Linhagem Celular Tumoral , Glucose/metabolismo , Glicina/metabolismo , Humanos , Isocitrato Desidrogenase/metabolismo , Análise do Fluxo Metabólico , Serina/metabolismoRESUMO
Technological advances in high-resolution mass spectrometry (MS) vastly increased the number of samples that can be processed in a life science experiment, as well as volume and complexity of the generated data. To address the bottleneck of high-throughput data processing, we present SmartPeak (https://github.com/AutoFlowResearch/SmartPeak), an application that encapsulates advanced algorithms to enable fast, accurate, and automated processing of capillary electrophoresis-, gas chromatography-, and liquid chromatography (LC)-MS(/MS) data and high-pressure LC data for targeted and semitargeted metabolomics, lipidomics, and fluxomics experiments. The application allows for an approximate 100-fold reduction in the data processing time compared to manual processing while enhancing quality and reproducibility of the results.
Assuntos
Processamento Eletrônico de Dados/métodos , Metabolômica/métodos , Automação , Cromatografia Líquida , Eletroforese Capilar , Espectrometria de Massas em Tandem , Fatores de TempoRESUMO
Growth rate and yield are fundamental features of microbial growth. However, we lack a mechanistic and quantitative understanding of the rate-yield relationship. Studies pairing computational predictions with experiments have shown the importance of maintenance energy and proteome allocation in explaining rate-yield tradeoffs and overflow metabolism. Recently, adaptive evolution experiments of Escherichia coli reveal a phenotypic diversity beyond what has been explained using simple models of growth rate versus yield. Here, we identify a two-dimensional rate-yield tradeoff in adapted E. coli strains where the dimensions are (A) a tradeoff between growth rate and yield and (B) a tradeoff between substrate (glucose) uptake rate and growth yield. We employ a multi-scale modeling approach, combining a previously reported coarse-grained small-scale proteome allocation model with a fine-grained genome-scale model of metabolism and gene expression (ME-model), to develop a quantitative description of the full rate-yield relationship for E. coli K-12 MG1655. The multi-scale analysis resolves the complexity of ME-model which hindered its practical use in proteome complexity analysis, and provides a mechanistic explanation of the two-dimensional tradeoff. Further, the analysis identifies modifications to the P/O ratio and the flux allocation between glycolysis and pentose phosphate pathway (PPP) as potential mechanisms that enable the tradeoff between glucose uptake rate and growth yield. Thus, the rate-yield tradeoffs that govern microbial adaptation to new environments are more complex than previously reported, and they can be understood in mechanistic detail using a multi-scale modeling approach.
Assuntos
Proteínas de Bactérias/metabolismo , Escherichia coli/metabolismo , Evolução Molecular , Proteínas de Bactérias/genética , Escherichia coli/genética , Genoma Bacteriano/genética , Modelos Biológicos , Proteoma/genética , Proteoma/metabolismo , Biologia de SistemasRESUMO
Fast metabolite quantification methods are required for high throughput screening of microbial strains obtained by combinatorial or evolutionary engineering approaches. In this study, a rapid RIP-LC-MS/MS (RapidRIP) method for high-throughput quantitative metabolomics was developed and validated that was capable of quantifying 102 metabolites from central, amino acid, energy, nucleotide, and cofactor metabolism in less than 5 minutes. The method was shown to have comparable sensitivity and resolving capability as compared to a full length RIP-LC-MS/MS method (FullRIP). The RapidRIP method was used to quantify the metabolome of seven industrial strains of E. coli revealing significant differences in glycolytic, pentose phosphate, TCA cycle, amino acid, and energy and cofactor metabolites were found. These differences translated to statistically and biologically significant differences in thermodynamics of biochemical reactions between strains that could have implications when choosing a host for bioprocessing.
Assuntos
Escherichia coli/metabolismo , Metaboloma , Metabolômica/métodos , Cromatografia Líquida/métodos , Escherichia coli/genética , Espectrometria de Massas/métodos , Especificidade da EspécieRESUMO
Aromatic metabolites provide the backbone for numerous industrial and pharmaceutical compounds of high value. The Phosphotransferase System (PTS) is common to many bacteria, and is the primary mechanism for glucose uptake by Escherichia coli. The PTS was removed to conserve phosphoenolpyruvate (pep), which is a precursor for aromatic metabolites and consumed by the PTS, for aromatic metabolite production. Replicate adaptive laboratory evolution (ALE) of PTS and detailed omics data sets collected revealed that the PTS bridged the gap between respiration and fermentation, leading to distinct high fermentative and high respiratory rate phenotypes. It was also found that while all strains retained high levels of aromatic amino acid (AAA) biosynthetic precursors, only one replicate from the high glycolytic clade retained high levels of intracellular AAAs. The fast growth and high AAA precursor phenotypes could provide a starting host for cell factories targeting the overproduction aromatic metabolites.
Assuntos
Aminoácidos Aromáticos , Evolução Molecular Direcionada , Metabolismo Energético , Escherichia coli , Consumo de Oxigênio , Sistema Fosfotransferase de Açúcar do Fosfoenolpiruvato/genética , Aminoácidos Aromáticos/biossíntese , Aminoácidos Aromáticos/genética , Escherichia coli/genética , Escherichia coli/metabolismoRESUMO
Methylglyoxal is a highly toxic metabolite that can be produced in all living organisms. Methylglyoxal was artificially elevated by removal of the tpiA gene from a growth optimized Escherichia coli strain. The initial response to elevated methylglyoxal and its toxicity was characterized, and detoxification mechanisms were studied using adaptive laboratory evolution. We found that: 1) Multi-omics analysis revealed biological consequences of methylglyoxal toxicity, which included attack on macromolecules including DNA and RNA and perturbation of nucleotide levels; 2) Counter-intuitive cross-talk between carbon starvation and inorganic phosphate signalling was revealed in the tpiA deletion strain that required mutations in inorganic phosphate signalling mechanisms to alleviate; and 3) The split flux through lower glycolysis depleted glycolytic intermediates requiring a host of synchronized and coordinated mutations in non-intuitive network locations in order to re-adjust the metabolic flux map to achieve optimal growth. Such mutations included a systematic inactivation of the Phosphotransferase System (PTS) and alterations in cell wall biosynthesis enzyme activity. This study demonstrated that deletion of major metabolic genes followed by ALE was a productive approach to gain novel insight into the systems biology underlying optimal phenotypic states.
Assuntos
Proteínas de Escherichia coli , Escherichia coli , Deleção de Genes , Glicólise/genética , Aldeído Pirúvico/metabolismo , Triose-Fosfato Isomerase/genética , Adaptação Fisiológica/genética , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismoRESUMO
A mechanistic understanding of how new phenotypes develop to overcome the loss of a gene product provides valuable insight on both the metabolic and regulatory functions of the lost gene. The pgi gene, whose product catalyzes the second step in glycolysis, was deleted in a growth-optimized Escherichia coli K-12 MG1655 strain. The initial knockout (KO) strain exhibited an 80% drop in growth rate that was largely recovered in eight replicate, but phenotypically distinct, cultures after undergoing adaptive laboratory evolution (ALE). Multi-omic data sets showed that the loss of pgi substantially shifted pathway usage, leading to a redox and sugar phosphate stress response. These stress responses were overcome by unique combinations of innovative mutations selected for by ALE. Thus, the coordinated mechanisms from genome to metabolome that lead to multiple optimal phenotypes after the loss of a major gene product were revealed.IMPORTANCE A mechanistic understanding of how microbes are able to overcome the loss of a gene through regulatory and metabolic changes is not well understood. Eight independent adaptive laboratory evolution (ALE) experiments with pgi knockout strains resulted in eight phenotypically distinct endpoints that were able to overcome the gene loss. Utilizing multi-omics analysis, the coordinated mechanisms from genome to metabolome that lead to multiple optimal phenotypes after the loss of a major gene product were revealed.
Assuntos
Escherichia coli K12/enzimologia , Escherichia coli K12/genética , Proteínas de Escherichia coli/genética , Glucose-6-Fosfato Isomerase/genética , Escherichia coli K12/metabolismo , Proteínas de Escherichia coli/metabolismo , Técnicas de Inativação de Genes , Glucose-6-Fosfato Isomerase/metabolismo , Glicólise , Mutação , Oxirredução , FenótipoRESUMO
Absolute quantification of free intracellular metabolites is a valuable tool in both pathway discovery and metabolic engineering. In this study, we conducted a comprehensive examination of different hot and cold combined quenching/extraction approaches to extract and quantify intracellular metabolites of Pseudomonas taiwanensis (P. taiwanensis) VLB120 to provide a useful reference data set of absolute intracellular metabolite concentrations. The suitability of commonly used metabolomics tools including a pressure driven fast filtration system followed by combined quenching/extraction techniques (such as cold methanol/acetonitrile/water, hot water, and boiling ethanol/water, as well as cold ethanol/water) were tested and evaluated for P. taiwanensis VLB120 metabolome analysis. In total 94 out of 107 detected intracellular metabolites were quantified using an isotope-ratio-based approach. The quantified metabolites include amino acids, nucleotides, central carbon metabolism intermediates, redox cofactors, and others. The acquired data demonstrate that the pressure driven fast filtration approach followed by boiling ethanol quenching/extraction is the most adequate technique for P. taiwanensis VLB120 metabolome analysis based on quenching efficiency, extraction yields of metabolites, and experimental reproducibility.
Assuntos
Metaboloma , Metabolômica/métodos , Pseudomonas/química , Extração em Fase Sólida/métodos , Acetonitrilas/química , Temperatura Baixa , Etanol/química , Temperatura Alta , Metanol/química , Pseudomonas/fisiologia , Solventes/química , Água/químicaRESUMO
Metabolic flux analysis (MFA) is considered to be the gold standard for determining the intracellular flux distribution of biological systems. The majority of work using MFA has been limited to core models of metabolism due to challenges in implementing genome-scale MFA and the undesirable trade-off between increased scope and decreased precision in flux estimations. This work presents a tunable workflow for expanding the scope of MFA to the genome-scale without trade-offs in flux precision. The genome-scale MFA model presented here, iDM2014, accounts for 537 net reactions, which includes the core pathways of traditional MFA models and also covers the additional pathways of purine, pyrimidine, isoprenoid, methionine, riboflavin, coenzyme A, and folate, as well as other biosynthetic pathways. When evaluating the iDM2014 using a set of measured intracellular intermediate and cofactor mass isotopomer distributions (MIDs),1 it was found that a total of 232 net fluxes of central and peripheral metabolism could be resolved in the E. coli network. The increase in scope was shown to cover the full biosynthetic route to an expanded set of bioproduction pathways, which should facilitate applications such as the design of more complex bioprocessing strains and aid in identifying new antimicrobials. Importantly, it was found that there was no loss in precision of core fluxes when compared to a traditional core model, and additionally there was an overall increase in precision when considering all observable reactions.
Assuntos
Escherichia coli/genética , Escherichia coli/metabolismo , Genoma Bacteriano , Análise do Fluxo Metabólico , Modelos Biológicos , Isótopos de CarbonoRESUMO
The analytical challenges to acquire accurate isotopic data of intracellular metabolic intermediates for stationary, nonstationary, and dynamic metabolic flux analysis (MFA) are numerous. This work presents MID Max, a novel LC-MS/MS workflow, acquisition, and isotopomer deconvolution method for MFA that takes advantage of additional scan types that maximizes the number of mass isotopomer distributions (MIDs) that can be acquired in a given experiment. The analytical method was found to measure the MIDs of 97 metabolites, corresponding to 74 unique metabolite-fragment pairs (32 precursor spectra and 42 product spectra) with accuracy and precision. The compounds measured included metabolic intermediates in central carbohydrate metabolism and cofactors of peripheral metabolism (e.g., ATP). Using only a subset of the acquired MIDs, the method was found to improve the precision of flux estimations and number of resolved exchange fluxes for wild-type E. coli compared to traditional methods and previously published data sets.
Assuntos
Trifosfato de Adenosina/análise , Metabolismo dos Carboidratos , Análise do Fluxo Metabólico/métodos , Espectrometria de Massas em Tandem/métodos , Trifosfato de Adenosina/metabolismo , Cromatografia Líquida de Alta Pressão , Escherichia coli/química , Escherichia coli/metabolismo , Marcação por Isótopo , Estrutura MolecularRESUMO
The genome-scale model (GEM) of metabolism in the bacterium Escherichia coli K-12 has been in development for over a decade and is now in wide use. GEM-enabled studies of E. coli have been primarily focused on six applications: (1) metabolic engineering, (2) model-driven discovery, (3) prediction of cellular phenotypes, (4) analysis of biological network properties, (5) studies of evolutionary processes, and (6) models of interspecies interactions. In this review, we provide an overview of these applications along with a critical assessment of their successes and limitations, and a perspective on likely future developments in the field. Taken together, the studies performed over the past decade have established a genome-scale mechanistic understanding of genotype-phenotype relationships in E. coli metabolism that forms the basis for similar efforts for other microbial species. Future challenges include the expansion of GEMs by integrating additional cellular processes beyond metabolism, the identification of key constraints based on emerging data types, and the development of computational methods able to handle such large-scale network models with sufficient accuracy.
Assuntos
Escherichia coli K12/genética , Proteínas de Escherichia coli/genética , Regulação Bacteriana da Expressão Gênica , Genoma Bacteriano , Redes e Vias Metabólicas/genética , Modelos Genéticos , Evolução Biológica , Simulação por Computador , Escherichia coli K12/metabolismo , Proteínas de Escherichia coli/metabolismo , Estudos de Associação Genética , Genótipo , Engenharia Metabólica/métodos , FenótipoRESUMO
The advent of model-enabled workflows in systems biology allows for the integration of experimental data types with genome-scale models to discover new features of biology. This work demonstrates such a workflow, aimed at establishing a metabolomics platform applied to study the differences in metabolomes between anaerobic and aerobic growth of Escherichia coli. Constraint-based modeling was utilized to deduce a target list of compounds for downstream method development. An analytical and experimental methodology was developed and tailored to the compound chemistry and growth conditions of interest. This included the construction of a rapid sampling apparatus for use with anaerobic cultures. The resulting genome-scale data sets for anaerobic and aerobic growth were validated by comparison to previous small-scale studies comparing growth of E. coli under the same conditions. The metabolomics data were then integrated with the E. coli genome-scale metabolic model (GEM) via a sensitivity analysis that utilized reaction thermodynamics to reconcile simulated growth rates and reaction directionalities. This analysis highlighted several optimal network usage inconsistencies, including the incorrect use of the beta-oxidation pathway for synthesis of fatty acids. This analysis also identified enzyme promiscuity for the pykA gene, that is critical for anaerobic growth, and which has not been previously incorporated into metabolic models of E coli.
Assuntos
Escherichia coli/metabolismo , Metabolômica/métodos , Modelos Biológicos , Aerobiose/fisiologia , Anaerobiose/fisiologia , Bioengenharia , Reatores Biológicos/microbiologia , Redes e Vias Metabólicas/fisiologia , TermodinâmicaRESUMO
Motivation: INCA is a powerful tool for metabolic flux analysis, however, import and export of data and results can be tedious and limit the use of INCA in automated workflows. Results: The INCAWrapper enables the use of INCA purely through Python, which allows the use of INCA in common data science workflows. Availability and implementation: The INCAWrapper is implemented in Python and can be found at https://github.com/biosustain/incawrapper. It is freely available under an MIT License. To run INCA, the user needs their own MATLAB and INCA licenses. INCA is freely available for noncommercial use at mfa.vueinnovations.com.
RESUMO
Generative modeling and representation learning of tandem mass spectrometry data aim to learn an interpretable and instrument-agnostic digital representation of metabolites directly from MS/MS spectra. Interpretable and instrument-agnostic digital representations would facilitate comparisons of MS/MS spectra between instrument vendors and enable better and more accurate queries of large MS/MS spectra databases for metabolite identification. In this study, we apply generative modeling and representation learning using variational autoencoders to understand the extent to which tandem mass spectra can be disentangled into their factors of generation (e.g., collision energy, ionization mode, instrument type, etc.) with minimal prior knowledge of the factors. We find that variational autoencoders can disentangle tandem mass spectra data with the proper choice of hyperparameters into meaningful latent representations aligned with known factors of variation. We develop a two-step approach to facilitate the selection of models that are disentangled, which could be applied to other complex and high-dimensional data sets.
Assuntos
Aprendizagem , Espectrometria de Massas em Tandem , Bases de Dados FactuaisRESUMO
Automation is playing an increasingly significant role in synthetic biology. Groundbreaking technologies, developed over the past 20 years, have enormously accelerated the construction of efficient microbial cell factories. Integrating state-of-the-art tools (e.g. for genome engineering and analytical techniques) into the design-build-test-learn cycle (DBTLc) will shift the metabolic engineering paradigm from an almost artisanal labor towards a fully automated workflow. Here, we provide a perspective on how a fully automated DBTLc could be harnessed to construct the next-generation bacterial cell factories in a fast, high-throughput fashion. Innovative toolsets and approaches that pushed the boundaries in each segment of the cycle are reviewed to this end. We also present the most recent efforts on automation of the DBTLc, which heralds a fully autonomous pipeline for synthetic biology in the near future.
Assuntos
Engenharia Metabólica , Biologia Sintética , Engenharia Metabólica/métodosRESUMO
Multi-omics datasets are becoming of key importance to drive discovery in fundamental research as much as generating knowledge for applied biotechnology. However, the construction of such large datasets is usually time-consuming and expensive. Automation might enable to overcome these issues by streamlining workflows from sample generation to data analysis. Here, we describe the construction of a complex workflow for the generation of high-throughput microbial multi-omics datasets. The workflow comprises a custom-built platform for automated cultivation and sampling of microbes, sample preparation protocols, analytical methods for sample analysis and automated scripts for raw data processing. We demonstrate possibilities and limitations of such workflow in generating data for three biotechnologically relevant model organisms, namely Escherichia coli, Saccharomyces cerevisiae, and Pseudomonas putida.