RESUMEN
Consider the problem of estimating the branch lengths in a symmetric 2-state substitution model with a known topology and a general, clock-like or star-shaped tree with three leaves. We show that the maximum likelihood estimates are analytically tractable and can be obtained from pairwise sequence comparisons. Furthermore, we demonstrate that this property does not generalize to larger state spaces, more complex models or larger trees. Our arguments are based on an enumeration of the free parameters of the model and the dimension of the minimal sufficient data vector. Our interest in this problem arose from discussions with our former colleague Freddy Bugge Christiansen.
Asunto(s)
Evolución Molecular , Modelos Genéticos , Funciones de Verosimilitud , FilogeniaRESUMEN
A coalescent model of a sample of size n is derived from a birth-death process that originates at a random time in the past from a single founder individual. Over time, the descendants of the founder evolve into a population of large (infinite) size from which a sample of size n is taken. The parameters and time of the birth-death process are scaled in N0, the size of the present-day population, while letting N0â∞, similarly to how the standard Kingman coalescent process arises from the Wright-Fisher model. The model is named the Limit Birth-Death (LBD) coalescent model. Simulations from the LBD coalescent model with sample size n are computationally slow compared to standard coalescent models. Therefore, we suggest different approximations to the LBD coalescent model assuming the population size is a deterministic function of time rather than a stochastic process. Furthermore, we introduce a hybrid LBD coalescent model, that combines the exactness of the LBD coalescent model model with the speed of the approximations.
Asunto(s)
Genética de Población , Modelos Genéticos , Densidad de Población , Tamaño de la Muestra , Procesos EstocásticosRESUMEN
We provide a general mathematical framework based on the theory of graphical models to study admixture graphs. Admixture graphs are used to describe the ancestral relationships between past and present populations, allowing for population merges and migration events, by means of gene flow. We give various mathematical properties of admixture graphs with particular focus on properties of the so-called F-statistics. Also the Wright-Fisher model is studied and a general expression for the loss of heterozygosity is derived.
Asunto(s)
Flujo Genético , Genética de Población , Procesos Estocásticos , Genética de Población/estadística & datos numéricos , Heterocigoto , Humanos , Modelos TeóricosRESUMEN
In many areas of genetics it is of relevance to consider a population of individuals that is founded by a single individual in the past. One model for such a scenario is the conditioned reconstructed process with Bernoulli sampling that describes the evolution of a population of individuals that originates from a single individual. Several aspects of this reconstructed process are studied, in particular the Markov structure of the process. It is shown that at any given time in the past, the conditioned reconstructed process behaves as the original conditioned reconstructed process after a suitable time-dependent change of the sampling probability. Additionally, it is discussed how mutations accumulate in a sample of particles. It is shown that random sampling of particles at the present time has the effect of making the mutation rate look time-dependent. Conditions are given under which this sampling effect is negligible. A possible extension of the reconstructed process that allows for multiple founding particles is discussed.
Asunto(s)
Distribución Binomial , Genética de Población , Modelos Genéticos , Probabilidad , Algoritmos , Tasa de Natalidad , Genealogía y Heráldica , Humanos , Cadenas de Markov , Mortalidad , MutaciónRESUMEN
Mathematical modelling has become an established tool for studying the dynamics of biological systems. Current applications range from building models that reproduce quantitative data to identifying systems with predefined qualitative features, such as switching behaviour, bistability or oscillations. Mathematically, the latter question amounts to identifying parameter values associated with a given qualitative feature. We introduce a procedure to partition the parameter space of a parameterized system of ordinary differential equations into regions for which the system has a unique or multiple equilibria. The procedure is based on the computation of the Brouwer degree, and it creates a multivariate polynomial with parameter depending coefficients. The signs of the coefficients determine parameter regions with and without multistationarity. A particular strength of the procedure is the avoidance of numerical analysis and parameter sampling. The procedure consists of a number of steps. Each of these steps might be addressed algorithmically using various computer programs and available software, or manually. We demonstrate our procedure on several models of gene transcription and cell signalling, and show that in many cases we obtain a complete partitioning of the parameter space with respect to multistationarity.
Asunto(s)
Algoritmos , Interpretación Estadística de Datos , Modelos Biológicos , Modelos Estadísticos , Análisis Multivariante , Simulación por ComputadorRESUMEN
Calcium ions (Ca(2+)) have an important role as secondary messengers in numerous signal transduction processes, and cells invest much energy in controlling and maintaining a steep gradient between intracellular (â¼0.1-micromolar) and extracellular (â¼2-millimolar) Ca(2+) concentrations. Calmodulin-stimulated calcium pumps, which include the plasma-membrane Ca(2+)-ATPases (PMCAs), are key regulators of intracellular Ca(2+) in eukaryotes. They contain a unique amino- or carboxy-terminal regulatory domain responsible for autoinhibition, and binding of calcium-loaded calmodulin to this domain releases autoinhibition and activates the pump. However, the structural basis for the activation mechanism is unknown and a key remaining question is how calmodulin-mediated PMCA regulation can cover both basal Ca(2+) levels in the nanomolar range as well as micromolar-range Ca(2+) transients generated by cell stimulation. Here we present an integrated study combining the determination of the high-resolution crystal structure of a PMCA regulatory-domain/calmodulin complex with in vivo characterization and biochemical, biophysical and bioinformatics data that provide mechanistic insights into a two-step PMCA activation mechanism mediated by calcium-loaded calmodulin. The structure shows the entire PMCA regulatory domain and reveals an unexpected 2:1 stoichiometry with two calcium-loaded calmodulin molecules binding to different sites on a long helix. A multifaceted characterization of the role of both sites leads to a general structural model for calmodulin-mediated regulation of PMCAs that allows stringent, highly responsive control of intracellular calcium in eukaryotes, making it possible to maintain a stable, basal level at a threshold Ca(2+) concentration, where steep activation occurs.
Asunto(s)
Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , ATPasas Transportadoras de Calcio/química , ATPasas Transportadoras de Calcio/metabolismo , Calcio/metabolismo , Calmodulina/química , Eucariontes/metabolismo , Secuencia de Aminoácidos , Arabidopsis/química , Arabidopsis/enzimología , Proteínas de Arabidopsis/genética , Sitios de Unión , ATPasas Transportadoras de Calcio/genética , Calmodulina/metabolismo , Activación Enzimática , Espacio Intracelular/química , Espacio Intracelular/metabolismo , Modelos Moleculares , Datos de Secuencia Molecular , Unión Proteica , Estructura Terciaria de Proteína , Alineación de SecuenciaRESUMEN
In many areas of science it is custom to perform many, potentially millions, of tests simultaneously. To gain statistical power it is common to group tests based on a priori criteria such as predefined regions or by sliding windows. However, it is not straightforward to choose grouping criteria and the results might depend on the chosen criteria. Methods that summarize, or aggregate, test statistics or p-values, without relying on a priori criteria, are therefore desirable. We present a simple method to aggregate a sequence of stochastic variables, such as test statistics or p-values, into fewer variables without assuming a priori defined groups. We provide different ways to evaluate the significance of the aggregated variables based on theoretical considerations and resampling techniques, and show that under certain assumptions the FWER is controlled in the strong sense. Validity of the method was demonstrated using simulations and real data analyses. Our method may be a useful supplement to standard procedures relying on evaluation of test statistics individually. Moreover, by being agnostic and not relying on predefined selected regions, it might be a practical alternative to conventionally used methods of aggregation of p-values over regions. The method is implemented in Python and freely available online (through GitHub, see the Supplementary information).
Asunto(s)
Modelos Teóricos , Programas Informáticos , Algoritmos , Simulación por Computador , Interpretación Estadística de Datos , Internet , Reproducibilidad de los ResultadosRESUMEN
Known graphical conditions for the generic and global convergence to equilibria of the dynamical system arising from a reaction network are shown to be invariant under the so-called successive removal of intermediates, a systematic procedure to simplify the network, making the graphical conditions considerably easier to check.
Asunto(s)
Variación Genética , Modelos Teóricos , HumanosRESUMEN
The quasi-steady state approximation and time-scale separation are commonly applied methods to simplify models of biochemical reaction networks based on ordinary differential equations (ODEs). The concentrations of the "fast" species are assumed effectively to be at steady state with respect to the "slow" species. Under this assumption the steady state equations can be used to eliminate the "fast" variables and a new ODE system with only the slow species can be obtained. We interpret a reduced system obtained by time-scale separation as the ODE system arising from a unique reaction network, by identification of a set of reactions and the corresponding rate functions. The procedure is graphically based and can easily be worked out by hand for small networks. For larger networks, we provide a pseudo-algorithm. We study properties of the reduced network, its kinetics and conservation laws, and show that the kinetics of the reduced network fulfil realistic assumptions, provided the original network does. We illustrate our results using biological examples such as substrate mechanisms, post-translational modification systems and networks with intermediates (transient) steps.
Asunto(s)
Fenómenos Bioquímicos/fisiología , Modelos Biológicos , Algoritmos , Cinética , Procesamiento Proteico-Postraduccional/fisiologíaRESUMEN
For dynamical systems arising from chemical reaction networks, persistence is the property that each species concentration remains positively bounded away from zero, as long as species concentrations were all positive in the beginning. We describe two graphical procedures for simplifying reaction networks without breaking known necessary or sufficient conditions for persistence, by iteratively removing so-called intermediates and catalysts from the network. The procedures are easy to apply and, in many cases, lead to highly simplified network structures, such as monomolecular networks. For specific classes of reaction networks, we show that these conditions for persistence are equivalent to one another. Furthermore, they can also be characterized by easily checkable strong connectivity properties of a related graph. In particular, this is the case for (conservative) monomolecular networks, as well as cascades of a large class of post-translational modification systems (of which the MAPK cascade and the n-site futile cycle are prominent examples). Since one of the aforementioned sufficient conditions for persistence precludes the existence of boundary steady states, our method also provides a graphical tool to check for that.
Asunto(s)
Fenómenos Bioquímicos/fisiología , Técnicas de Química Analítica/métodos , Sistema de Señalización de MAP Quinasas/fisiología , Procesamiento Proteico-Postraduccional/fisiologíaRESUMEN
We consider the relationship between stationary distributions for stochastic models of reaction systems and Lyapunov functions for their deterministic counterparts. Specifically, we derive the well-known Lyapunov function of reaction network theory as a scaling limit of the non-equilibrium potential of the stationary distribution of stochastically modeled complex balanced systems. We extend this result to general birth-death models and demonstrate via example that similar scaling limits can yield Lyapunov functions even for models that are not complex or detailed balanced, and may even have multiple equilibria.
Asunto(s)
Modelos Biológicos , Cinética , Cadenas de Markov , Conceptos Matemáticos , Redes y Vías Metabólicas , Dinámica Poblacional/estadística & datos numéricos , Procesos EstocásticosRESUMEN
MOTIVATION: Modeling and analysis of complex systems are important aspects of understanding systemic behavior. In the lack of detailed knowledge about a system, we often choose modeling equations out of convenience and search the (high-dimensional) parameter space randomly to learn about model properties. Qualitative modeling sidesteps the issue of choosing specific modeling equations and frees the inference from specific properties of the equations. We consider classes of ordinary differential equation (ODE) models arising from interactions of species/entities, such as (bio)chemical reaction networks or ecosystems. A class is defined by imposing mild assumptions on the interaction rates. In this framework, we investigate whether there can be multiple positive steady states in some ODE models in a given class. RESULTS: We have developed and implemented a method to decide whether any ODE model in a given class cannot have multiple steady states. The method runs efficiently on models of moderate size. We tested the method on a large set of models for gene silencing by sRNA interference and on two publicly available databases of biological models, KEGG and Biomodels. We recommend that this method is used as (i) a pre-screening step for selecting an appropriate model and (ii) for investigating the robustness of non-existence of multiple steady state for a given ODE model with respect to variation in interaction rates. AVAILABILITY AND IMPLEMENTATION: Scripts and examples in Maple are available in the Supplementary Information. CONTACT: wiuf@math.ku.dk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Modelos Biológicos , Biología de Sistemas/métodos , Algoritmos , Biología Computacional/métodos , Fosforilación , Interferencia de ARNRESUMEN
Achieving a complete understanding of cellular signal transduction requires deciphering the relation between structural and biochemical features of a signaling system and the shape of the signal-response relationship it embeds. Using explicit analytical expressions and numerical simulations, we present here this relation for four-layered phosphorelays, which are signaling systems that are ubiquitous in prokaryotes and also found in lower eukaryotes and plants. We derive an analytical expression that relates the shape of the signal-response relationship in a relay to the kinetic rates of forward, reverse phosphorylation and hydrolysis reactions. This reveals a set of mathematical conditions which, when satisfied, dictate the shape of the signal-response relationship. We find that a specific topology also observed in nature can satisfy these conditions in such a way to allow plasticity among hyperbolic and sigmoidal signal-response relationships. Particularly, the shape of the signal-response relationship of this relay topology can be tuned by altering kinetic rates and total protein levels at different parts of the relay. These findings provide an important step towards predicting response dynamics of phosphorelays, and the nature of subsequent physiological responses that they mediate, solely from topological features and few composite measurements; measuring the ratio of reverse and forward phosphorylation rate constants could be sufficient to determine the shape of the signal-response relationship the relay exhibits. Furthermore, they highlight the potential ways in which selective pressures on signal processing could have played a role in the evolution of the observed structural and biochemical characteristic in phosphorelays.
Asunto(s)
Modelos Biológicos , Fosforilación/fisiología , Transducción de Señal/fisiología , Biología Computacional , HidrólisisRESUMEN
Many biological, physical, and social interactions have a particular dependence on where they take place; e.g., in living cells, protein movement between the nucleus and cytoplasm affects cellular responses (i.e., proteins must be present in the nucleus to regulate their target genes). Here we use recent developments from dynamical systems and chemical reaction network theory to identify and characterize the key-role of the spatial organization of eukaryotic cells in cellular information processing. In particular, the existence of distinct compartments plays a pivotal role in whether a system is capable of multistationarity (multiple response states), and is thus directly linked to the amount of information that the signaling molecules can represent in the nucleus. Multistationarity provides a mechanism for switching between different response states in cell signaling systems and enables multiple outcomes for cellular-decision making. We combine different mathematical techniques to provide a heuristic procedure to determine if a system has the capacity for multiple steady states, and find conditions that ensure that multiple steady states cannot occur. Notably, we find that introducing species localization can alter the capacity for multistationarity, and we mathematically demonstrate that shuttling confers flexibility for and greater control of the emergence of an all-or-nothing response of a cell.
Asunto(s)
Compartimento Celular , Transducción de Señal , Estructuras Celulares/metabolismo , Teoría de la Información , Modelos BiológicosRESUMEN
Common sequence variants have recently joined rare structural polymorphisms as genetic factors with strong evidence for association with schizophrenia. Here we extend our previous genome-wide association study and meta-analysis (totalling 7 946 cases and 19 036 controls) by examining an expanded set of variants using an enlarged follow-up sample (up to 10 260 cases and 23 500 controls). In addition to previously reported alleles in the major histocompatibility complex region, near neurogranin (NRGN) and in an intron of transcription factor 4 (TCF4), we find two novel variants showing genome-wide significant association: rs2312147[C], upstream of vaccinia-related kinase 2 (VRK2) [odds ratio (OR) = 1.09, P = 1.9 × 10(-9)] and rs4309482[A], between coiled-coiled domain containing 68 (CCDC68) and TCF4, about 400 kb from the previously described risk allele, but not accounted for by its association (OR = 1.09, P = 7.8 × 10(-9)).
Asunto(s)
Factores de Transcripción Básicos con Cremalleras de Leucinas y Motivos Hélice-Asa-Hélice/genética , Polimorfismo de Nucleótido Simple , Proteínas Serina-Treonina Quinasas/genética , Esquizofrenia/genética , Factores de Transcripción/genética , Alelos , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Riesgo , Factor de Transcripción 4RESUMEN
We define a subclass of chemical reaction networks called post-translational modification systems. Important biological examples of such systems include MAPK cascades and two-component systems which are well-studied experimentally as well as theoretically. The steady states of such a system are solutions to a system of polynomial equations. Even for small systems the task of finding the solutions is daunting. We develop a mathematical framework based on the notion of a cut (a particular subset of species in the system), which provides a linear elimination procedure to reduce the number of variables in the system to a set of core variables. The steady states are parameterized algebraically by the core variables, and graphical conditions for when steady states with positive core variables imply positivity of all variables are given. Further, minimal cuts are the connected components of the species graph and provide conservation laws. A criterion for when a (maximal) set of independent conservation laws can be derived from cuts is given.
Asunto(s)
Modelos Biológicos , Procesamiento Proteico-Postraduccional , Transducción de Señal , Cinética , Modelos Lineales , Sistema de Señalización de MAP Quinasas , Conceptos Matemáticos , Redes y Vías MetabólicasRESUMEN
With a view towards artificial cells, molecular communication systems, molecular multiagent systems and federated learning, we propose a novel reaction network scheme (termed the Baum-Welch (BW) reaction network) that learns parameters for hidden Markov models (HMMs). All variables including inputs and outputs are encoded by separate species. Each reaction in the scheme changes only one molecule of one species to one molecule of another. The reverse change is also accessible but via a different set of enzymes, in a design reminiscent of futile cycles in biochemical pathways. We show that every positive fixed point of the BW algorithm for HMMs is a fixed point of the reaction network scheme, and vice versa. Furthermore, we prove that the 'expectation' step and the 'maximization' step of the reaction network separately converge exponentially fast and compute the same values as the E-step and the M-step of the BW algorithm. We simulate example sequences, and show that our reaction network learns the same parameters for the HMM as the BW algorithm, and that the log-likelihood increases continuously along the trajectory of the reaction network.
Asunto(s)
Algoritmos , Cadenas de MarkovRESUMEN
The genome of recently admixed individuals or hybrids has characteristic genetic patterns that can be used to learn about their recent admixture history. One of these are patterns of interancestry heterozygosity, which can be inferred from SNP data from either called genotypes or genotype likelihoods, without the need for information on genomic location. This makes them applicable to a wide range of data that are often used in evolutionary and conservation genomic studies, such as low-depth sequencing mapped to scaffolds and reduced representation sequencing. Here we implement maximum likelihood estimation of interancestry heterozygosity patterns using two complementary models. We furthermore develop apoh (Admixture Pedigrees of Hybrids), a software that uses estimates of paired ancestry proportions to detect recently admixed individuals or hybrids, and to suggest possible admixture pedigrees. It furthermore calculates several hybrid indices that make it easier to identify and rank possible admixture pedigrees that could give rise to the estimated patterns. We implemented apoh both as a command line tool and as a Graphical User Interface that allows the user to automatically and interactively explore, rank and visualize compatible recent admixture pedigrees, and calculate the different summary indices. We validate the performance of the method using admixed family trios from the 1000 Genomes Project. In addition, we show its applicability on identifying recent hybrids from RAD-seq data of Grant's gazelle (Nanger granti and Nanger petersii) and whole genome low-depth data of waterbuck (Kobus ellipsiprymnus) which shows complex admixture of up to four populations.
Asunto(s)
Genética de Población , Genoma , Humanos , Linaje , Genoma/genética , Genotipo , Programas InformáticosRESUMEN
Principal component analysis (PCA) is commonly used in genetics to infer and visualize population structure and admixture between populations. PCA is often interpreted in a way similar to inferred admixture proportions, where it is assumed that individuals belong to one of several possible populations or are admixed between these populations. We propose a new method to assess the statistical fit of PCA (interpreted as a model spanned by the top principal components) and to show that violations of the PCA assumptions affect the fit. Our method uses the chosen top principal components to predict the genotypes. By assessing the covariance (and the correlation) of the residuals (the differences between observed and predicted genotypes), we are able to detect violation of the model assumptions. Based on simulations and genome-wide human data, we show that our assessment of fit can be used to guide the interpretation of the data and to pinpoint individuals that are not well represented by the chosen principal components. Our method works equally on other similar models, such as the admixture model, where the mean of the data is represented by linear matrix decomposition.