RESUMO
Intragenic translational heterogeneity describes the variation in translation at the level of transcripts for an individual gene. A factor that contributes to this source of variation is the mRNA structure. Both the composition of the thermodynamic ensemble, i.e., the stationary distribution of mRNA structures, and the switching dynamics between those play a role. The effect of the switching dynamics on intragenic translational heterogeneity remains poorly understood. We present a stochastic translation model that accounts for mRNA structure switching and is derived from a Markov model via approximate stochastic filtering. We assess the approximation on various timescales and provide a method to quantify how mRNA structure dynamics contributes to translational heterogeneity. With our approach, we allow quantitative information on mRNA switching from biophysical experiments or coarse-grain molecular dynamics simulations of mRNA structures to be included in gene regulatory chemical reaction network models without an increase in the number of species. Thereby, our model bridges a gap between mRNA structure kinetics and gene expression models, which we hope will further improve our understanding of gene regulatory networks and facilitate genetic circuit design.
Assuntos
Redes Reguladoras de Genes , Modelos Genéticos , RNA Mensageiro/genética , Processos EstocásticosRESUMO
Molecular reactions within a cell are inherently stochastic, and cells often differ in morphological properties or interact with a heterogeneous environment. Consequently, cell populations exhibit heterogeneity both due to these intrinsic and extrinsic causes. Although state-of-the-art studies that focus on dissecting this heterogeneity use single-cell measurements, the bulk data that shows only the mean expression levels is still in routine use. The fingerprint of the heterogeneity is present also in bulk data, despite being hidden from direct measurement. In particular, this heterogeneity can affect the mean expression levels via bimolecular interactions with low-abundant environment species. We make this statement rigorous for the class of linear reaction systems that are embedded in a discrete state Markov environment. The analytic expression that we provide for the stationary mean depends on the reaction rate constants of the linear subsystem, as well as the generator and stationary distribution of the Markov environment. We demonstrate the effect of the environment on the stationary mean. Namely, we show how the heterogeneous case deviates from the quasi-steady state (Q.SS) case when the embedded system is fast compared to the environment.
Assuntos
Processos Estocásticos , CélulasRESUMO
Analyses of structural dynamics of biomolecules hold great promise to deepen the understanding of and ability to construct complex molecular systems. To this end, both experimental and computational means are available, such as fluorescence quenching experiments or molecular dynamics simulations, respectively. We argue that while seemingly disparate, both fields of study have to deal with the same type of data about the same underlying phenomenon of conformational switching. Two central challenges typically arise in both contexts: (i) the amount of obtained data is large, and (ii) it is often unknown how many distinct molecular states underlie these data. In this study, we build on the established idea of Markov state modeling and propose a generative, Bayesian nonparametric hidden Markov state model that addresses these challenges. Utilizing hierarchical Dirichlet processes, we treat different meta-stable molecule conformations as distinct Markov states, the number of which we then do not have to seta priori. In contrast to existing approaches to both experimental as well as simulation data that are based on the same idea, we leverage a mean-field variational inference approach, enabling scalable inference on large amounts of data. Furthermore, we specify the model also for the important case of angular data, which however proves to be computationally intractable. Addressing this issue, we propose a computationally tractable approximation to the angular model. We demonstrate the method on synthetic ground truth data and apply it to known benchmark problems as well as electrophysiological experimental data from a conformation-switching ion channel to highlight its practical utility.
Assuntos
Simulação de Dinâmica Molecular , Teorema de Bayes , Conformação MolecularRESUMO
We propose an approach to modeling large-scale multi-agent dynamical systems allowing interactions among more than just pairs of agents using the theory of mean field games and the notion of hypergraphons, which are obtained as limits of large hypergraphs. To the best of our knowledge, ours is the first work on mean field games on hypergraphs. Together with an extension to a multi-layer setup, we obtain limiting descriptions for large systems of non-linear, weakly interacting dynamical agents. On the theoretical side, we prove the well-foundedness of the resulting hypergraphon mean field game, showing both existence and approximate Nash properties. On the applied side, we extend numerical and learning algorithms to compute the hypergraphon mean field equilibria. To verify our approach empirically, we consider a social rumor spreading model, where we give agents intrinsic motivation to spread rumors to unaware agents, and an epidemic control problem.
RESUMO
In this work, we perform Bayesian inference tasks for the chemical master equation in the tensor-train format. The tensor-train approximation has been proven to be very efficient in representing high-dimensional data arising from the explicit representation of the chemical master equation solution. An additional advantage of representing the probability mass function in the tensor-train format is that parametric dependency can be easily incorporated by introducing a tensor product basis expansion in the parameter space. Time is treated as an additional dimension of the tensor and a linear system is derived to solve the chemical master equation in time. We exemplify the tensor-train method by performing inference tasks such as smoothing and parameter inference using the tensor-train framework. A very high compression ratio is observed for storing the probability mass function of the solution. Since all linear algebra operations are performed in the tensor-train format, a significant reduction in the computational time is observed as well.
RESUMO
The robust and precise on and off switching of one or more genes of interest, followed by expression or repression is essential for many biological circuits as well as for industrial applications. However, many regulated systems published to date influence the viability of the host cell, show high basal expression or enable only the overexpression of the target gene without the possibility of fine regulation. Herein, we describe an AND gate designed to overcome these limitations by combining the advantages of three well established systems, namely the scaffold RNA CRISPR/dCas9 platform that is controlled by Gal10 as a natural and by LexA-ER-AD as heterologous transcription factor. We hence developed a predictable and modular, versatile expression control system. The selection of a reporter gene set up combining a gene of interest (GOI) with a fluorophore by the ribosomal skipping T2A sequence allows to adapt the system to any gene of interest without losing reporter function. In order to obtain a better understanding of the underlying principles and the functioning of our system, we backed our experimental findings with the development of a mathematical model and single-cell analysis.
Assuntos
Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Transativadores/genética , Transcrição Gênica , Sistemas CRISPR-Cas/genética , Regulação da Expressão Gênica/genética , Genes Reporter/genética , Modelos Teóricos , Análise de Célula Única , Ativação Transcricional/genéticaRESUMO
MOTIVATION: Genome-scale gene networks contain regulatory genes called hubs that have many interaction partners. These genes usually play an essential role in gene regulation and cellular processes. Despite recent advancements in high-throughput technology, inferring gene networks with hub genes from high-dimensional data still remains a challenging problem. Novel statistical network inference methods are needed for efficient and accurate reconstruction of hub networks from high-dimensional data. RESULTS: To address this challenge we propose DW-Lasso, a degree weighted Lasso (least absolute shrinkage and selection operator) method which infers gene networks with hubs efficiently under the low sample size setting. Our network reconstruction approach is formulated as a two stage procedure: first, the degree of networks is estimated iteratively, and second, the gene regulatory network is reconstructed using degree information. A useful property of the proposed method is that it naturally favors the accumulation of neighbors around hub genes and thereby helps in accurate modeling of the high-throughput data under the assumption that the underlying network exhibits hub structure. In a simulation study, we demonstrate good predictive performance of the proposed method in comparison to traditional Lasso type methods in inferring hub and scale-free graphs. We show the effectiveness of our method in an application to microarray data of Escherichia coli and RNA sequencing data of Kidney Clear Cell Carcinoma from The Cancer Genome Atlas datasets. AVAILABILITY AND IMPLEMENTATION: Under the GNU General Public Licence at https://cran.r-project.org/package=DWLasso. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Redes Reguladoras de Genes , GenomaRESUMO
It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense.
Assuntos
Causalidade , Redes Reguladoras de Genes , Neoplasias/genética , Mapeamento de Interação de Proteínas/métodos , Software , Biologia de Sistemas , Algoritmos , Biologia Computacional , Simulação por Computador , Perfilação da Expressão Gênica , Humanos , Modelos Biológicos , Transdução de Sinais , Células Tumorais CultivadasRESUMO
The paper outlines a general approach to deriving quasi-steady-state approximations (QSSAs) of the stochastic reaction networks describing the Michaelis-Menten enzyme kinetics. In particular, it explains how different sets of assumptions about chemical species abundance and reaction rates lead to the standard QSSA, the total QSSA, and the reverse QSSA. These three QSSAs have been widely studied in the literature in deterministic ordinary differential equation settings, and several sets of conditions for their validity have been proposed. With the help of the multiscaling techniques introduced in Ball et al. (Ann Appl Probab 16(4):1925-1961, 2006), Kang and Kurtz (Ann Appl Probab 23(2):529-583, 2013), it is seen that the conditions for deterministic QSSAs largely agree (with some exceptions) with the ones for stochastic QSSAs in the large-volume limits. The paper also illustrates how the stochastic QSSA approach may be extended to more complex stochastic kinetic networks like, for instance, the enzyme-substrate-inhibitor system.
Assuntos
Enzimas/metabolismo , Modelos Biológicos , Biocatálise , Inibidores Enzimáticos/metabolismo , Cinética , Conceitos Matemáticos , Redes e Vias Metabólicas , Processos Estocásticos , Especificidade por SubstratoRESUMO
Synthetic biology aims at designing modular genetic circuits that can be assembled according to the desired function. When embedded in a cell, a circuit module becomes a small subnetwork within a larger environmental network, and its dynamics is therefore affected by potentially unknown interactions with the environment. It is well-known that the presence of the environment not only causes extrinsic noise but also memory effects, which means that the dynamics of the subnetwork is affected by its past states via a memory function that is characteristic of the environment. We study several generic scenarios for the coupling between a small module and a larger environment, with the environment consisting of a chain of mono-molecular reactions. By mapping the dynamics of this coupled system onto random walks, we are able to give exact analytical expressions for the arising memory functions. Hence, our results give insights into the possible types of memory functions and thereby help to better predict subnetwork dynamics.
Assuntos
Biologia Sintética , Redes Reguladoras de Genes , Modelos Biológicos , ProbabilidadeRESUMO
Approximate solutions of the chemical master equation and the chemical Fokker-Planck equation are an important tool in the analysis of biomolecular reaction networks. Previous studies have highlighted a number of problems with the moment-closure approach used to obtain such approximations, calling it an ad hoc method. In this article, we give a new variational derivation of moment-closure equations which provides us with an intuitive understanding of their properties and failure modes and allows us to correct some of these problems. We use mixtures of product-Poisson distributions to obtain a flexible parametric family which solves the commonly observed problem of divergences at low system sizes. We also extend the recently introduced entropic matching approach to arbitrary ansatz distributions and Markov processes, demonstrating that it is a special case of variational moment closure. This provides us with a particularly principled approximation method. Finally, we extend the above approaches to cover the approximation of multi-time joint distributions, resulting in a viable alternative to process-level approximations which are often intractable.
RESUMO
Mathematical methods combined with measurements of single-cell dynamics provide a means to reconstruct intracellular processes that are only partly or indirectly accessible experimentally. To obtain reliable reconstructions, the pooling of measurements from several cells of a clonal population is mandatory. However, cell-to-cell variability originating from diverse sources poses computational challenges for such process reconstruction. We introduce a scalable Bayesian inference framework that properly accounts for population heterogeneity. The method allows inference of inaccessible molecular states and kinetic parameters; computation of Bayes factors for model selection; and dissection of intrinsic, extrinsic and technical noise. We show how additional single-cell readouts such as morphological features can be included in the analysis. We use the method to reconstruct the expression dynamics of a gene under an inducible promoter in yeast from time-lapse microscopy data.
Assuntos
Teorema de Bayes , Fenômenos Fisiológicos Celulares , Galactoquinase/metabolismo , Proteínas Luminescentes/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Algoritmos , Simulação por Computador , Galactoquinase/genética , Processamento de Imagem Assistida por Computador , Cinética , Microscopia de Fluorescência , Modelos Biológicos , Método de Monte Carlo , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Transdução de Sinais , Processos EstocásticosRESUMO
Determining the sensitivity of certain system states or outputs to variations in parameters facilitates our understanding of the inner working of that system and is an essential design tool for the de novo construction of robust systems. In cell biology, the output of interest is often the response of a certain reaction network to some input (e.g., stressors or nutrients) and one aims to quantify the sensitivity of this response in the presence of parameter heterogeneity. We argue that for such applications, parametric sensitivities in their standard form do not paint a complete picture of a system's robustness since one assumes that all cells in the population have the same parameters and are perturbed in the same way. Here, we consider stochasticreaction networks in which the parameters are randomly distributed over the population and propose a new sensitivity index that captures the robustness of system outputs upon changes in the characteristics of the parameter distribution, rather than the parameters themselves. Subsequently, we make use of Girsanov's likelihood ratio method to construct a Monte Carlo estimator of this sensitivity index. However, it turns out that this estimator has an exceedingly large variance. To overcome this problem, we propose a novel estimation algorithm that makes use of a marginalization of the path distribution of stochasticreaction networks and leads to Rao-Blackwellized estimators with reduced variance.
RESUMO
The dynamics of stochastic reaction networks within cells are inevitably modulated by factors considered extrinsic to the network such as, for instance, the fluctuations in ribosome copy numbers for a gene regulatory network. While several recent studies demonstrate the importance of accounting for such extrinsic components, the resulting models are typically hard to analyze. In this work we develop a general mathematical framework that allows to uncouple the network from its dynamic environment by incorporating only the environment's effect onto the network into a new model. More technically, we show how such fluctuating extrinsic components (e.g., chemical species) can be marginalized in order to obtain this decoupled model. We derive its corresponding process- and master equations and show how stochastic simulations can be performed. Using several case studies, we demonstrate the significance of the approach.
Assuntos
Biologia Computacional/métodos , Modelos Biológicos , Processos Estocásticos , Algoritmos , Redes Reguladoras de Genes/genética , Ribossomos/genética , Ribossomos/metabolismoRESUMO
Recent computational studies indicate that the molecular noise of a cellular process may be a rich source of information about process dynamics and parameters. However, accessing this source requires stochastic models that are usually difficult to analyze. Therefore, parameter estimation for stochastic systems using distribution measurements, as provided for instance by flow cytometry, currently remains limited to very small and simple systems. Here we propose a new method that makes use of low-order moments of the measured distribution and thereby keeps the essential parts of the provided information, while still staying applicable to systems of realistic size. We demonstrate how cell-to-cell variability can be incorporated into the analysis obviating the need for the ubiquitous assumption that the measurements stem from a homogeneous cell population. We demonstrate the method for a simple example of gene expression using synthetic data generated by stochastic simulation. Subsequently, we use time-lapsed flow cytometry data for the osmo-stress induced transcriptional response in budding yeast to calibrate a stochastic model, which is then used as a basis for predictions. Our results show that measurements of the mean and the variance can be enough to determine the model parameters, even if the measured distributions are not well-characterized by low-order moments only--e.g., if they are bimodal.
Assuntos
Regulação Fúngica da Expressão Gênica/fisiologia , Glicerol/metabolismo , Modelos Genéticos , Saccharomyces cerevisiae/genética , Estresse Fisiológico/genética , Equilíbrio Hidroeletrolítico/genética , Simulação por Computador , Citometria de Fluxo , Proteínas Quinases Ativadas por Mitógeno/genética , Proteínas de Saccharomyces cerevisiae/genética , Transdução de Sinais/genética , Processos EstocásticosRESUMO
MOTIVATION: After more than a decade since microarrays were used to predict phenotype of biological samples, real-life applications for disease screening and identification of patients who would best benefit from treatment are still emerging. The interest of the scientific community in identifying best approaches to develop such prediction models was reaffirmed in a competition style international collaboration called IMPROVER Diagnostic Signature Challenge whose results we describe herein. RESULTS: Fifty-four teams used public data to develop prediction models in four disease areas including multiple sclerosis, lung cancer, psoriasis and chronic obstructive pulmonary disease, and made predictions on blinded new data that we generated. Teams were scored using three metrics that captured various aspects of the quality of predictions, and best performers were awarded. This article presents the challenge results and introduces to the community the approaches of the best overall three performers, as well as an R package that implements the approach of the best overall team. The analyses of model performance data submitted in the challenge as well as additional simulations that we have performed revealed that (i) the quality of predictions depends more on the disease endpoint than on the particular approaches used in the challenge; (ii) the most important modeling factor (e.g. data preprocessing, feature selection and classifier type) is problem dependent; and (iii) for optimal results datasets and methods have to be carefully matched. Biomedical factors such as the disease severity and confidence in diagnostic were found to be associated with the misclassification rates across the different teams. AVAILABILITY: The lung cancer dataset is available from Gene Expression Omnibus (accession, GSE43580). The maPredictDSC R package implementing the approach of the best overall team is available at www.bioconductor.org or http://bioinformaticsprb.med.wayne.edu/.
Assuntos
Perfilação da Expressão Gênica/métodos , Técnicas de Diagnóstico Molecular , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Fenótipo , Doença/genética , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Esclerose Múltipla/diagnóstico , Esclerose Múltipla/genética , Psoríase/diagnóstico , Psoríase/genética , Doença Pulmonar Obstrutiva Crônica/diagnóstico , Doença Pulmonar Obstrutiva Crônica/genéticaRESUMO
Reaction networks are commonly used to model the dynamics of populations subject to transformations that follow an imposed stoichiometry. This paper focuses on the efficient characterisation of dynamical properties of Discrete Reaction Networks (DRNs). DRNs can be seen as modeling the underlying discrete nondeterministic transitions of stochastic models of reaction networks. In that sense, a proof of non-reachability in a given DRN has immediate implications for any concrete stochastic model based on that DRN, independent of the choice of kinetic laws and constants. Moreover, if we assume that stochastic kinetic rates are given by the mass-action law (or any other kinetic law that gives non-vanishing probability to each reaction if the required number of interacting substrates is present), then reachability properties are equivalent in the two settings. The analysis of two types of global dynamical properties of DRNs is addressed: irreducibility, i.e., the ability to reach any discrete state from any other state; and recurrence, i.e., the ability to return to any initial state. Our results consider both the verification of such properties when species are present in a large copy number, and in the general case. The necessary and sufficient conditions obtained involve algebraic conditions on the network reactions which in most cases can be verified using linear programming. Finally, the relationship of DRN irreducibility and recurrence with dynamical properties of stochastic and continuous models of reaction networks is discussed.
Assuntos
Cinética , Modelos Biológicos , Modelos Químicos , Processos Estocásticos , Relógios Circadianos , FosforilaçãoRESUMO
We consider a continuous-time Markov chain (CTMC) whose state space is partitioned into aggregates, and each aggregate is assigned a probability measure. A sufficient condition for defining a CTMC over the aggregates is presented as a variant of weak lumpability, which also characterizes that the measure over the original process can be recovered from that of the aggregated one. We show how the applicability of de-aggregation depends on the initial distribution. The application section is devoted to illustrate how the developed theory aids in reducing CTMC models of biochemical systems particularly in connection to protein-protein interactions. We assume that the model is written by a biologist in form of site-graph-rewrite rules. Site-graph-rewrite rules compactly express that, often, only a local context of a protein (instead of a full molecular species) needs to be in a certain configuration in order to trigger a reaction event. This observation leads to suitable aggregate Markov chains with smaller state spaces, thereby providing sufficient reduction in computational complexity. This is further exemplified in two case studies: simple unbounded polymerization and early EGFR/insulin crosstalk.
Assuntos
Cadeias de Markov , Modelos Biológicos , Probabilidade , Proteínas/metabolismo , Transdução de Sinais/fisiologia , Fator de Crescimento Epidérmico/fisiologia , Insulina/fisiologia , PolimerizaçãoRESUMO
Energy and its dissipation are fundamental to all living systems, including cells. Insufficient abundance of energy carriers -as caused by the additional burden of artificial genetic circuits- shifts a cell's priority to survival, also impairing the functionality of the genetic circuit. Moreover, recent works have shown the importance of energy expenditure in information transmission. Despite living organisms being non-equilibrium systems, non-equilibrium models capable of accounting for energy dissipation and non-equilibrium response curves are not yet employed in genetic design automation (GDA) software. To this end, we introduce Energy Aware Technology Mapping, the automated design of genetic logic circuits with respect to energy efficiency and functionality. The basis for this is an energy aware non-equilibrium steady state (NESS) model of gene expression, capturing characteristics like energy dissipation -which we link to the entropy production rate- and transcriptional bursting, relevant to eukaryotes as well as prokaryotes. Our evaluation shows that a genetic logic circuit's functional performance and energy efficiency are disjoint optimization goals. For our benchmark, energy efficiency improves by 37.2% on average when comparing to functionally optimized variants. We discover a linear increase in energy expenditure and overall protein expression with the circuit size, where Energy Aware Technology Mapping allows for designing genetic logic circuits with the energy efficiency of circuits that are one to two gates smaller. Structural variants improve this further, while results show the Pareto dominance among structures of a single Boolean function. By incorporating energy demand into the design, Energy Aware Technology Mapping enables energy efficiency by design. This extends current GDA tools and complements approaches coping with burden in vivo.
RESUMO
Energy and its dissipation are fundamental to all living systems, including cells. Insufficient abundance of energy carriersâas caused by the additional burden of artificial genetic circuitsâshifts a cell's priority to survival, also impairing the functionality of the genetic circuit. Moreover, recent works have shown the importance of energy expenditure in information transmission. Despite living organisms being non-equilibrium systems, non-equilibrium models capable of accounting for energy dissipation and non-equilibrium response curves are not yet employed in genetic design automation (GDA) software. To this end, we introduce Energy Aware Technology Mapping, the automated design of genetic logic circuits with respect to energy efficiency and functionality. The basis for this is an energy aware non-equilibrium steady state model of gene expression, capturing characteristics like energy dissipationâwhich we link to the entropy production rateâand transcriptional bursting, relevant to eukaryotes as well as prokaryotes. Our evaluation shows that a genetic logic circuit's functional performance and energy efficiency are disjoint optimization goals. For our benchmark, energy efficiency improves by 37.2% on average when comparing to functionally optimized variants. We discover a linear increase in energy expenditure and overall protein expression with the circuit size, where Energy Aware Technology Mapping allows for designing genetic logic circuits with the energetic costs of circuits that are one to two gates smaller. Structural variants improve this further, while results show the Pareto dominance among structures of a single Boolean function. By incorporating energy demand into the design, Energy Aware Technology Mapping enables energy efficiency by design. This extends current GDA tools and complements approaches coping with burden in vivo.