RESUMEN
Metabolism is deeply intertwined with aging. Effects of metabolic interventions on aging have been explained with intracellular metabolism, growth control, and signaling. Studying chronological aging in yeast, we reveal a so far overlooked metabolic property that influences aging via the exchange of metabolites. We observed that metabolites exported by young cells are re-imported by chronologically aging cells, resulting in cross-generational metabolic interactions. Then, we used self-establishing metabolically cooperating communities (SeMeCo) as a tool to increase metabolite exchange and observed significant lifespan extensions. The longevity of the SeMeCo was attributable to metabolic reconfigurations in methionine consumer cells. These obtained a more glycolytic metabolism and increased the export of protective metabolites that in turn extended the lifespan of cells that supplied them with methionine. Our results establish metabolite exchange interactions as a determinant of cellular aging and show that metabolically cooperating cells can shape the metabolic environment to extend their lifespan.
Asunto(s)
Longevidad , Saccharomyces cerevisiae , Saccharomyces cerevisiae/metabolismo , Metionina/metabolismo , Transducción de SeñalRESUMEN
Genome-metabolism interactions enable cell growth. To probe the extent of these interactions and delineate their functional contributions, we quantified the Saccharomyces amino acid metabolome and its response to systematic gene deletion. Over one-third of coding genes, in particular those important for chromatin dynamics, translation, and transport, contribute to biosynthetic metabolism. Specific amino acid signatures characterize genes of similar function. This enabled us to exploit functional metabolomics to connect metabolic regulators to their effectors, as exemplified by TORC1, whose inhibition in exponentially growing cells is shown to match an interruption in endomembrane transport. Providing orthogonal information compared to physical and genetic interaction networks, metabolomic signatures cluster more than half of the so far uncharacterized yeast genes and provide functional annotation for them. A major part of coding genes is therefore participating in gene-metabolism interactions that expose the metabolism regulatory network and enable access to an underexplored space in gene function.
Asunto(s)
Aminoácidos/biosíntesis , Metaboloma , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/metabolismo , Aminoácidos/genética , Cromatina/metabolismo , Eliminación de Gen , Regulación Fúngica de la Expresión Génica , Redes Reguladoras de Genes , Metaboloma/genética , Metabolómica/métodos , Familia de Multigenes , Fosfatidilinositol 3-Quinasas/genética , Fosfatidilinositol 3-Quinasas/metabolismo , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Factores de Transcripción/genética , Transcripción GenéticaRESUMEN
Metagenomic analyses of microbial communities have revealed a large degree of interspecies and intraspecies genetic diversity through the reconstruction of metagenome assembled genomes (MAGs). Yet, metabolic modeling efforts mainly rely on reference genomes as the starting point for reconstruction and simulation of genome scale metabolic models (GEMs), neglecting the immense intra- and inter-species diversity present in microbial communities. Here, we present metaGEM (https://github.com/franciscozorrilla/metaGEM), an end-to-end pipeline enabling metabolic modeling of multi-species communities directly from metagenomes. The pipeline automates all steps from the extraction of context-specific prokaryotic GEMs from MAGs to community level flux balance analysis (FBA) simulations. To demonstrate the capabilities of metaGEM, we analyzed 483 samples spanning lab culture, human gut, plant-associated, soil, and ocean metagenomes, reconstructing over 14,000 GEMs. We show that GEMs reconstructed from metagenomes have fully represented metabolism comparable to isolated genomes. We demonstrate that metagenomic GEMs capture intraspecies metabolic diversity and identify potential differences in the progression of type 2 diabetes at the level of gut bacterial metabolic exchanges. Overall, metaGEM enables FBA-ready metabolic model reconstruction directly from metagenomes, provides a resource of metabolic models, and showcases community-level modeling of microbiomes associated with disease conditions allowing generation of mechanistic hypotheses.
Asunto(s)
Bases de Datos Genéticas , Microbioma Gastrointestinal/genética , Metagenoma , Plantas/genética , Humanos , Microbiología del SueloRESUMEN
Protein quantification via label-free mass spectrometry (MS) has become an increasingly popular method for predicting genome-wide absolute protein abundances. A known caveat of this approach, however, is the poor technical reproducibility, that is, how consistent predictions are when the same sample is measured repeatedly. Here, we measured proteomics data for Saccharomyces cerevisiae with both biological and inter-batch technical triplicates, to analyze both accuracy and precision of protein quantification via MS. Moreover, we analyzed how these metrics vary when applying different methods for converting MS intensities to absolute protein abundances. We demonstrate that our simple normalization and rescaling approach can perform as accurately, yet more precisely, than methods which rely on external standards. Additionally, we show that inter-batch reproducibility is worse than biological reproducibility for all evaluated methods. These results offer a new benchmark for assessing MS data quality for protein quantification, while also underscoring current limitations in this approach.
Asunto(s)
Benchmarking , Saccharomyces cerevisiae , Proteoma , Proteómica , Reproducibilidad de los ResultadosRESUMEN
OBJECTIVE: The first antiseizure medication (ASM) is ineffective or intolerable in 50% of epilepsy cases. Selection between more than 25 available ASMs is guided by epilepsy factors, but also age and comorbidities. Randomized evidence for particular patient subgroups is seldom available. We asked whether register data could be used for retention rate calculations based on demographics, comorbidities, and ASM history, and quantified the potential improvement in retention rates of the first ASM in several large epilepsy cohorts. We also describe retention rates in patients with epilepsy after traumatic brain injury and dementia, patient groups with little available evidence. METHODS: We used medical, demographic, and drug prescription data from epilepsy cohorts from comprehensive Swedish registers, containing 6380 observations. By analyzing 381 840 prescriptions, we studied retention rates of first- and second-line ASMs for patients with epilepsy in multiple sclerosis (MS), brain infection, dementia, traumatic brain injury, or stroke. The rank of retention rates of ASMs was validated by comparison to published randomized control trials. We identified the optimal stratification for each brain disease, and quantified the potential improvement if all patients had received the optimal ASM. RESULTS: Using optimal stratification for each brain disease, the potential improvement in retention rate (percentage points) was MS, 20%; brain infection, 21%; dementia, 14%; trauma, 21%; and stroke, 14%. In epilepsy after trauma, levetiracetam had the highest retention rate at 80% (95% confidence interval [CI] = 65-89), exceeding that of the most commonly prescribed ASM, carbamazepine (p = .04). In epilepsy after dementia, lamotrigine (77%, 95% CI = 68-84) and levetiracetam (74%, 95% CI = 68-79) had higher retention rates than carbamazepine (p = .006 and p = .01, respectively). SIGNIFICANCE: We conclude that personalized ASM selection could improve retention rates and that national registers have potential as big data sources for personalized medicine in epilepsy.
Asunto(s)
Lesiones Traumáticas del Encéfalo , Demencia , Epilepsia , Accidente Cerebrovascular , Anticonvulsivantes/uso terapéutico , Lesiones Traumáticas del Encéfalo/tratamiento farmacológico , Carbamazepina/uso terapéutico , Epilepsia/tratamiento farmacológico , Epilepsia/epidemiología , Humanos , Levetiracetam/uso terapéutico , Sistema de Registros , Accidente Cerebrovascular/tratamiento farmacológicoRESUMEN
Microbial communities populate most environments on earth and play a critical role in ecology and human health. Their composition is thought to be largely shaped by interspecies competition for the available resources, but cooperative interactions, such as metabolite exchanges, have also been implicated in community assembly. The prevalence of metabolic interactions in microbial communities, however, has remained largely unknown. Here, we systematically survey, by using a genome-scale metabolic modeling approach, the extent of resource competition and metabolic exchanges in over 800 communities. We find that, despite marked resource competition at the level of whole assemblies, microbial communities harbor metabolically interdependent groups that recur across diverse habitats. By enumerating flux-balanced metabolic exchanges in these co-occurring subcommunities we also predict the likely exchanged metabolites, such as amino acids and sugars, that can promote group survival under nutritionally challenging conditions. Our results highlight metabolic dependencies as a major driver of species co-occurrence and hint at cooperative groups as recurring modules of microbial community architecture.
Asunto(s)
Redes y Vías Metabólicas/fisiología , Consorcios Microbianos/fisiología , Interacciones Microbianas/fisiología , Modelos Biológicos , Simbiosis , Consorcios Microbianos/genética , Filogenia , Especificidad de la Especie , Estadísticas no ParamétricasRESUMEN
One of the primary mechanisms through which a cell exerts control over its metabolic state is by modulating expression levels of its enzyme-coding genes. However, the changes at the level of enzyme expression allow only indirect control over metabolite levels, for two main reasons. First, at the level of individual reactions, metabolite levels are non-linearly dependent on enzyme abundances as per the reaction kinetics mechanisms. Secondly, specific metabolite pools are tightly interlinked with the rest of the metabolic network through their production and consumption reactions. While the role of reaction kinetics in metabolite concentration control is well studied at the level of individual reactions, the contribution of network connectivity has remained relatively unclear. Here we report a modeling framework that integrates both reaction kinetics and network connectivity constraints for describing the interplay between metabolite concentrations and mRNA levels. We used this framework to investigate correlations between the gene expression and the metabolite concentration changes in Saccharomyces cerevisiae during its metabolic cycle, as well as in response to three fundamentally different biological perturbations, namely gene knockout, nutrient shock and nutrient change. While the kinetic constraints applied at the level of individual reactions were found to be poor descriptors of the mRNA-metabolite relationship, their use in the context of the network enabled us to correlate changes in the expression of enzyme-coding genes to the alterations in metabolite levels. Our results highlight the key contribution of metabolic network connectivity in mediating cellular control over metabolite levels, and have implications towards bridging the gap between genotype and metabolic phenotype.
Asunto(s)
Expresión Génica , Redes y Vías Metabólicas/genética , CinéticaRESUMEN
The main forces driving protein complex evolution are currently not well understood, especially in homomers, where quaternary structure might frequently evolve neutrally. Here we examine the factors determining oligomerisation by analysing the evolution of enzymes in circumstances where homomers rarely evolve. We show that 1) In extracellular environments, most enzymes with known structure are monomers, while in the cytoplasm homomers, indicating that the evolution of oligomers is cellular environment dependent; 2) The evolution of quaternary structure within protein orthogroups is more consistent with the predictions of constructive neutral evolution than an adaptive process: quaternary structure is gained easier than it is lost, and most extracellular monomers evolved from proteins that were monomers also in their ancestral state, without the loss of interfaces. Our results indicate that oligomerisation is context-dependent, and even when adaptive, in many cases it is probably not driven by the intrinsic properties of enzymes, like their biochemical function, but rather the properties of the environment where the enzyme is active. These factors might be macromolecular crowding and excluded volume effects facilitating the evolution of interfaces, and the maintenance of cellular homeostasis through shaping cytoplasm fluidity, protein degradation, or diffusion rates.
Asunto(s)
Citoplasma , Enzimas , Evolución Molecular , Estructura Cuaternaria de Proteína , Enzimas/química , Enzimas/metabolismo , Enzimas/genética , Citoplasma/metabolismo , Multimerización de ProteínaRESUMEN
In recent years, generative protein sequence models have been developed to sample novel sequences. However, predicting whether generated proteins will fold and function remains challenging. We evaluate a set of 20 diverse computational metrics to assess the quality of enzyme sequences produced by three contrasting generative models: ancestral sequence reconstruction, a generative adversarial network and a protein language model. Focusing on two enzyme families, we expressed and purified over 500 natural and generated sequences with 70-90% identity to the most similar natural sequences to benchmark computational metrics for predicting in vitro enzyme activity. Over three rounds of experiments, we developed a computational filter that improved the rate of experimental success by 50-150%. The proposed metrics and models will drive protein engineering research by serving as a benchmark for generative protein sequence models and helping to select active variants for experimental testing.
RESUMEN
Despite the current wealth of sequencing data, one-third of all biochemically characterized metabolic enzymes lack a corresponding gene or protein sequence, and as such can be considered orphan enzymes. They represent a major gap between our molecular and biochemical knowledge, and consequently are not amenable to modern systemic analyses. As 555 of these orphan enzymes have metabolic pathway neighbours, we developed a global framework that utilizes the pathway and (meta)genomic neighbour information to assign candidate sequences to orphan enzymes. For 131 orphan enzymes (37% of those for which (meta)genomic neighbours are available), we associate sequences to them using scoring parameters with an estimated accuracy of 70%, implying functional annotation of 16,345 gene sequences in numerous (meta)genomes. As a case in point, two of these candidate sequences were experimentally validated to encode the predicted activity. In addition, we augmented the currently available genome-scale metabolic models with these new sequence-function associations and were able to expand the models by on average 8%, with a considerable change in the flux connectivity patterns and improved essentiality prediction.
Asunto(s)
Enzimas/genética , Metagenoma/genética , Metagenómica/métodos , Mapeo Cromosómico , Bases de Datos Genéticas , Enzimas/metabolismo , Humanos , Redes y Vías Metabólicas , Modelos Biológicos , Análisis de Secuencia de ADN , Biología de SistemasRESUMEN
CONTEXT: Humans respond profoundly to changes in diet, while nutrition and environment have a great impact on population health. It is therefore important to deeply characterize the human nutritional responses. OBJECTIVE: Endocrine parameters and the metabolome of human plasma are rapidly responding to acute nutritional interventions such as caloric restriction or a glucose challenge. It is less well understood whether the plasma proteome would be equally dynamic, and whether it could be a source of corresponding biomarkers. METHODS: We used high-throughput mass spectrometry to determine changes in the plasma proteome of i) 10 healthy, young, male individuals in response to 2 days of acute caloric restriction followed by refeeding; ii) 200 individuals of the Ely epidemiological study before and after a glucose tolerance test at 4 time points (0, 30, 60, 120 minutes); and iii) 200 random individuals from the Generation Scotland study. We compared the proteomic changes detected with metabolome data and endocrine parameters. RESULTS: Both caloric restriction and the glucose challenge substantially impacted the plasma proteome. Proteins responded across individuals or in an individual-specific manner. We identified nutrient-responsive plasma proteins that correlate with changes in the metabolome, as well as with endocrine parameters. In particular, our study highlights the role of apolipoprotein C1 (APOC1), a small, understudied apolipoprotein that was affected by caloric restriction and dominated the response to glucose consumption and differed in abundance between individuals with and without type 2 diabetes. CONCLUSION: Our study identifies APOC1 as a dominant nutritional responder in humans and highlights the interdependency of acute nutritional response proteins and the endocrine system.
Asunto(s)
Diabetes Mellitus Tipo 2 , Proteoma , Humanos , Masculino , Proteómica , Glucosa , Restricción CalóricaRESUMEN
Advanced machine learning (ML) algorithms produce highly accurate models of gene expression, uncovering novel regulatory features in nucleotide sequences involving multiple cis-regulatory regions across whole genes and structural properties. These broaden our understanding of gene regulation and point to new principles to test and adopt in the field of plant science.
Asunto(s)
Regulación de la Expresión Génica de las Plantas , Genes de Plantas , Regulación de la Expresión Génica de las Plantas/genética , Aprendizaje Automático , Algoritmos , Secuencias Reguladoras de Ácidos NucleicosRESUMEN
Temperature is a fundamental environmental factor that shapes the evolution of organisms. Learning thermal determinants of protein sequences in evolution thus has profound significance for basic biology, drug discovery, and protein engineering. Here, we use a data set of over 3 million BRENDA enzymes labeled with optimal growth temperatures (OGTs) of their source organisms to train a deep neural network model (DeepET). The protein-temperature representations learned by DeepET provide a temperature-related statistical summary of protein sequences and capture structural properties that affect thermal stability. For prediction of enzyme optimal catalytic temperatures and protein melting temperatures via a transfer learning approach, our DeepET model outperforms classical regression models trained on rationally designed features and other deep-learning-based representations. DeepET thus holds promise for understanding enzyme thermal adaptation and guiding the engineering of thermostable enzymes.
Asunto(s)
Ingeniería de Proteínas , Proteínas , Estabilidad de Enzimas , Proteínas/química , Secuencia de Aminoácidos , TemperaturaRESUMEN
The use of renewable plant biomass, lignocellulose, to produce biofuels and biochemicals using microbial cell factories plays a fundamental role in the future bioeconomy. The development of cell factories capable of efficiently fermenting complex biomass streams will improve the cost-effectiveness of microbial conversion processes. At present, inhibitory compounds found in hydrolysates of lignocellulosic biomass substantially influence the performance of a cell factory and the economic feasibility of lignocellulosic biofuels and chemicals. Here, we present and statistically analyze data on Saccharomyces cerevisiae mutants engineered for altered tolerance towards the most common inhibitors found in lignocellulosic hydrolysates: acetic acid, formic acid, furans, and phenolic compounds. We collected data from 7971 experiments including single overexpression or deletion of 3955 unique genes. The mutants included in the analysis had been shown to display increased or decreased tolerance to individual inhibitors or combinations of inhibitors found in lignocellulosic hydrolysates. Moreover, the data included mutants grown on synthetic hydrolysates, in which inhibitors were added at concentrations that mimicked those of lignocellulosic hydrolysates. Genetic engineering aimed at improving inhibitor or hydrolysate tolerance was shown to alter the specific growth rate or length of the lag phase, cell viability, and vitality, block fermentation, and decrease product yield. Different aspects of strain engineering aimed at improving hydrolysate tolerance, such as choice of strain and experimental set-up are discussed and put in relation to their biological relevance. While successful genetic engineering is often strain and condition dependent, we highlight the conserved role of regulators, transporters, and detoxifying enzymes in inhibitor tolerance. The compiled meta-analysis can guide future engineering attempts and aid the development of more efficient cell factories for the conversion of lignocellulosic biomass.
Asunto(s)
Biocombustibles , Saccharomyces cerevisiae , Biomasa , Minería de Datos , Fermentación , Lignina/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismoRESUMEN
Design of de novo synthetic regulatory DNA is a promising avenue to control gene expression in biotechnology and medicine. Using mutagenesis typically requires screening sizable random DNA libraries, which limits the designs to span merely a short section of the promoter and restricts their control of gene expression. Here, we prototype a deep learning strategy based on generative adversarial networks (GAN) by learning directly from genomic and transcriptomic data. Our ExpressionGAN can traverse the entire regulatory sequence-expression landscape in a gene-specific manner, generating regulatory DNA with prespecified target mRNA levels spanning the whole gene regulatory structure including coding and adjacent non-coding regions. Despite high sequence divergence from natural DNA, in vivo measurements show that 57% of the highly-expressed synthetic sequences surpass the expression levels of highly-expressed natural controls. This demonstrates the applicability and relevance of deep generative design to expand our knowledge and control of gene expression regulation in any desired organism, condition or tissue.
Asunto(s)
Genoma , Genómica , ADN/genética , Expresión Génica , Regulación de la Expresión GénicaRESUMEN
Recombinant protein production can cause severe stress on cellular metabolism, resulting in limited titer and product quality. To investigate cellular and metabolic characteristics associated with these limitations, we compare HEK293 clones producing either erythropoietin (EPO) (secretory) or GFP (non-secretory) protein at different rates. Transcriptomic and functional analyses indicate significantly higher metabolism and oxidative phosphorylation in EPO producers compared with parental and GFP cells. In addition, ribosomal genes exhibit specific expression patterns depending on the recombinant protein and the production rate. In a clone displaying a dramatically increased EPO secretion, we detect higher gene expression related to negative regulation of endoplasmic reticulum (ER) stress, including upregulation of ATF6B, which aids EPO production in a subset of clones by overexpression or small interfering RNA (siRNA) knockdown. Our results offer potential target pathways and genes for further development of the secretory power in mammalian cell factories.
Asunto(s)
Estrés del Retículo Endoplásmico , Eritropoyetina , Animales , Estrés del Retículo Endoplásmico/fisiología , Eritropoyetina/genética , Eritropoyetina/metabolismo , Células HEK293/metabolismo , Humanos , Mamíferos/metabolismo , Transporte de Proteínas , Proteínas Recombinantes/metabolismoRESUMEN
Global healthcare systems are challenged by the COVID-19 pandemic. There is a need to optimize allocation of treatment and resources in intensive care, as clinically established risk assessments such as SOFA and APACHE II scores show only limited performance for predicting the survival of severely ill COVID-19 patients. Additional tools are also needed to monitor treatment, including experimental therapies in clinical trials. Comprehensively capturing human physiology, we speculated that proteomics in combination with new data-driven analysis strategies could produce a new generation of prognostic discriminators. We studied two independent cohorts of patients with severe COVID-19 who required intensive care and invasive mechanical ventilation. SOFA score, Charlson comorbidity index, and APACHE II score showed limited performance in predicting the COVID-19 outcome. Instead, the quantification of 321 plasma protein groups at 349 timepoints in 50 critically ill patients receiving invasive mechanical ventilation revealed 14 proteins that showed trajectories different between survivors and non-survivors. A predictor trained on proteomic measurements obtained at the first time point at maximum treatment level (i.e. WHO grade 7), which was weeks before the outcome, achieved accurate classification of survivors (AUROC 0.81). We tested the established predictor on an independent validation cohort (AUROC 1.0). The majority of proteins with high relevance in the prediction model belong to the coagulation system and complement cascade. Our study demonstrates that plasma proteomics can give rise to prognostic predictors substantially outperforming current prognostic markers in intensive care.
RESUMEN
Type 2 diabetes mellitus (T2DM) is a disorder characterized by both insulin resistance and impaired insulin secretion. Recent transcriptomics studies related to T2DM have revealed changes in expression of a large number of metabolic genes in a variety of tissues. Identification of the molecular mechanisms underlying these transcriptional changes and their impact on the cellular metabolic phenotype is a challenging task due to the complexity of transcriptional regulation and the highly interconnected nature of the metabolic network. In this study we integrate skeletal muscle gene expression datasets with human metabolic network reconstructions to identify key metabolic regulatory features of T2DM. These features include reporter metabolites--metabolites with significant collective transcriptional response in the associated enzyme-coding genes, and transcription factors with significant enrichment of binding sites in the promoter regions of these genes. In addition to metabolites from TCA cycle, oxidative phosphorylation, and lipid metabolism (known to be associated with T2DM), we identified several reporter metabolites representing novel biomarker candidates. For example, the highly connected metabolites NAD+/NADH and ATP/ADP were also identified as reporter metabolites that are potentially contributing to the widespread gene expression changes observed in T2DM. An algorithm based on the analysis of the promoter regions of the genes associated with reporter metabolites revealed a transcription factor regulatory network connecting several parts of metabolism. The identified transcription factors include members of the CREB, NRF1 and PPAR family, among others, and represent regulatory targets for further experimental analysis. Overall, our results provide a holistic picture of key metabolic and regulatory nodes potentially involved in the pathogenesis of T2DM.
Asunto(s)
Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/metabolismo , Regulación de la Expresión Génica , Redes y Vías Metabólicas , Algoritmos , Análisis por Conglomerados , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Masculino , Metaboloma , Modelos Biológicos , Músculo Esquelético , Análisis de Secuencia por Matrices de Oligonucleótidos , Elementos Reguladores de la Transcripción , Transcripción GenéticaRESUMEN
Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.