Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
J Mol Evol ; 88(2): 136-150, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31781936

RESUMO

The underlying structure of the canonical amino acid substitution matrix (aaSM) is examined by considering stepwise improvements in the differential recognition of amino acids according to their chemical properties during the branching history of the two aminoacyl-tRNA synthetase (aaRS) superfamilies. The evolutionary expansion of the genetic code is described by a simple parameterization of the aaSM, in which (i) the number of distinguishable amino acid types, (ii) the matrix dimension and (iii) the number of parameters, each increases by one for each bifurcation in an aaRS phylogeny. Parameterized matrices corresponding to trees in which the size of an amino acid sidechain is the only discernible property behind its categorization as a substrate, exclusively for a Class I or II aaRS, provide a significantly better fit to empirically determined aaSM than trees with random bifurcation patterns. A second split between polar and nonpolar amino acids in each Class effects a vastly greater further improvement. The earliest Class-separated epochs in the phylogenies of the aaRS reflect these enzymes' capability to distinguish tRNAs through the recognition of acceptor stem identity elements via the minor (Class I) and major (Class II) helical grooves, which is how the ancient operational code functioned. The advent of tRNA recognition using the anticodon loop supports the evolution of the optimal map of amino acid chemistry found in the later genetic code, an essentially digital categorization, in which polarity is the major functional property, compensating for the unrefined, haphazard differentiation of amino acids achieved by the operational code.


Assuntos
Substituição de Aminoácidos , Aminoacil-tRNA Sintetases/genética , Código Genético , Filogenia , Aminoácidos/genética , Anticódon , Evolução Molecular , Modelos Genéticos
2.
J Math Biol ; 81(2): 549-573, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32710155

RESUMO

A matrix Lie algebra is a linear space of matrices closed under the operation [Formula: see text]. The "Lie closure" of a set of matrices is the smallest matrix Lie algebra which contains the set. In the context of Markov chain theory, if a set of rate matrices form a Lie algebra, their corresponding Markov matrices are closed under matrix multiplication; this has been found to be a useful property in phylogenetics. Inspired by previous research involving Lie closures of DNA models, it was hypothesised that finding the Lie closure of a codon model could help to solve the problem of mis-estimation of the non-synonymous/synonymous rate ratio, [Formula: see text]. We propose two different methods of finding a linear space from a model: the first is the linear closure which is the smallest linear space which contains the model, and the second is the linear version which changes multiplicative constraints in the model to additive ones. For each of these linear spaces we then find the Lie closures of them. Under both methods, it was found that closed codon models would require thousands of parameters, and that any partial solution to this problem that was of a reasonable size violated stochasticity. Investigation of toy models indicated that finding the Lie closure of matrix linear spaces which deviated only slightly from a simple model resulted in a Lie closure that was close to having the maximum number of parameters possible. Given that Lie closures are not practical, we propose further consideration of the two variants of linearly closed models.


Assuntos
Códon , DNA , Modelos Biológicos , Cadeias de Markov , Filogenia
3.
Syst Biol ; 67(5): 905-915, 2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-29788496

RESUMO

We give a non-technical introduction to convergence-divergence models, a new modeling approach for phylogenetic data that allows for the usual divergence of lineages after lineage-splitting but also allows for taxa to converge, i.e. become more similar over time. By examining the $3$-taxon case in some detail, we illustrate that phylogeneticists have been "spoiled" in the sense of not having to think about the structural parameters in their models by virtue of the strong assumption that evolution is tree-like. We show that there are not always good statistical reasons to prefer the usual class of tree-like models over more general convergence-divergence models. Specifically, we show many $3$-taxon data sets can be equally well explained by supposing violation of the molecular clock due to change in the rate of evolution along different edges, or by keeping the assumption of a constant rate of evolution but instead assuming that evolution is not a purely divergent process. Given the abundance of evidence that evolution is not strictly tree-like, our discussion is an illustration that as phylogeneticists we need to think clearly about the structural form of the models we use. For cases with four taxa, we show that there will be far greater ability to distinguish models with convergence from non-clock-like tree models. [Akaike information criterion; convergence-divergence models; distinguishability; identifiability; likelihood; molecular clock; phylogeny.].


Assuntos
Evolução Molecular , Modelos Genéticos , Filogenia , Evolução Biológica
4.
Bull Math Biol ; 81(2): 361-383, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30073568

RESUMO

We present and explore a general method for deriving a Lie-Markov model from a finite semigroup. If the degree of the semigroup is k, the resulting model is a continuous-time Markov chain on k-states and, as a consequence of the product rule in the semigroup, satisfies the property of multiplicative closure. This means that the product of any two probability substitution matrices taken from the model produces another substitution matrix also in the model. We show that our construction is a natural generalization of the concept of group-based models.


Assuntos
Cadeias de Markov , Filogenia , Biologia Computacional , Evolução Molecular , Conceitos Matemáticos , Modelos Genéticos , Modelos Estatísticos , Processos Estocásticos
5.
Bull Math Biol ; 79(3): 619-634, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-28188429

RESUMO

We present a method of dimensional reduction for the general Markov model of sequence evolution on a phylogenetic tree. We show that taking certain linear combinations of the associated random variables (site pattern counts) reduces the dimensionality of the model from exponential in the number of extant taxa, to quadratic in the number of taxa, while retaining the ability to statistically identify phylogenetic divergence events. A key feature is the identification of an invariant subspace which depends only bilinearly on the model parameters, in contrast to the usual multi-linear dependence in the full space. We discuss potential applications including the computation of split (edge) weights on phylogenetic trees from observed sequence data.


Assuntos
Modelos Genéticos , Filogenia , Evolução Biológica , Cadeias de Markov , Conceitos Matemáticos
6.
J Math Biol ; 75(6-7): 1619-1654, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-28434023

RESUMO

Recently there has been renewed interest in phylogenetic inference methods based on phylogenetic invariants, alongside the related Markov invariants. Broadly speaking, both these approaches give rise to polynomial functions of sequence site patterns that, in expectation value, either vanish for particular evolutionary trees (in the case of phylogenetic invariants) or have well understood transformation properties (in the case of Markov invariants). While both approaches have been valued for their intrinsic mathematical interest, it is not clear how they relate to each other, and to what extent they can be used as practical tools for inference of phylogenetic trees. In this paper, by focusing on the special case of binary sequence data and quartets of taxa, we are able to view these two different polynomial-based approaches within a common framework. To motivate the discussion, we present three desirable statistical properties that we argue any invariant-based phylogenetic method should satisfy: (1) sensible behaviour under reordering of input sequences; (2) stability as the taxa evolve independently according to a Markov process; and (3) explicit dependence on the assumption of a continuous-time process. Motivated by these statistical properties, we develop and explore several new phylogenetic inference methods. In particular, we develop a statistically bias-corrected version of the Markov invariants approach which satisfies all three properties. We also extend previous work by showing that the phylogenetic invariants can be implemented in such a way as to satisfy property (3). A simulation study shows that, in comparison to other methods, our new proposed approach based on bias-corrected Markov invariants is extremely powerful for phylogenetic inference. The binary case is of particular theoretical interest as-in this case only-the Markov invariants can be expressed as linear combinations of the phylogenetic invariants. A wider implication of this is that, for models with more than two states-for example DNA sequence alignments with four-state models-we find that methods which rely on phylogenetic invariants are incapable of satisfying all three of the stated statistical properties. This is because in these cases the relevant Markov invariants belong to a class of polynomials independent from the phylogenetic invariants.


Assuntos
Filogenia , Bioestatística/métodos , Simulação por Computador , DNA/genética , Evolução Molecular , Cadeias de Markov , Conceitos Matemáticos , Modelos Genéticos , Alinhamento de Sequência
7.
Syst Biol ; 64(4): 638-50, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-25858352

RESUMO

When the process underlying DNA substitutions varies across evolutionary history, some standard Markov models underlying phylogenetic methods are mathematically inconsistent. The most prominent example is the general time-reversible model (GTR) together with some, but not all, of its submodels. To rectify this deficiency, nonhomogeneous Lie Markov models have been identified as the class of models that are consistent in the face of a changing process of DNA substitutions regardless of taxon sampling. Some well-known models in popular use are within this class, but are either overly simplistic (e.g., the Kimura two-parameter model) or overly complex (the general Markov model). On a diverse set of biological data sets, we test a hierarchy of Lie Markov models spanning the full range of parameter richness. Compared against the benchmark of the ever-popular GTR model, we find that as a whole the Lie Markov models perform well, with the best performing models having 8-10 parameters and the ability to recognize the distinction between purines and pyrimidines.


Assuntos
Classificação/métodos , Modelos Biológicos , Filogenia , Animais , DNA/química , DNA/genética , DNA Mitocondrial/química , DNA Mitocondrial/genética , Humanos , Nucleotídeos/genética , Nucleotídeos/metabolismo , Plantas/genética
8.
J Math Biol ; 73(2): 259-82, 2016 08.
Artigo em Inglês | MEDLINE | ID: mdl-26660305

RESUMO

We consider the continuous-time presentation of the strand symmetric phylogenetic substitution model (in which rate parameters are unchanged under nucleotide permutations given by Watson-Crick base conjugation). Algebraic analysis of the model's underlying structure as a matrix group leads to a change of basis where the rate generator matrix is given by a two-part block decomposition. We apply representation theoretic techniques and, for any (fixed) number of phylogenetic taxa L and polynomial degree D of interest, provide the means to classify and enumerate the associated Markov invariants. In particular, in the quadratic and cubic cases we prove there are precisely [Formula: see text] and [Formula: see text] linearly independent Markov invariants, respectively. Additionally, we give the explicit polynomial forms of the Markov invariants for (i) the quadratic case with any number of taxa L, and (ii) the cubic case in the special case of a three-taxon phylogenetic tree. We close by showing our results are of practical interest since the quadratic Markov invariants provide independent estimates of phylogenetic distances based on (i) substitution rates within Watson-Crick conjugate pairs, and (ii) substitution rates across conjugate base pairs.


Assuntos
Classificação/métodos , Modelos Genéticos , Filogenia , Algoritmos
10.
J Math Biol ; 70(4): 855-91, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24723068

RESUMO

Continuous-time Markov chains are a standard tool in phylogenetic inference. If homogeneity is assumed, the chain is formulated by specifying time-independent rates of substitutions between states in the chain. In applications, there are usually extra constraints on the rates, depending on the situation. If a model is formulated in this way, it is possible to generalise it and allow for an inhomogeneous process, with time-dependent rates satisfying the same constraints. It is then useful to require that, under some time restrictions, there exists a homogeneous average of this inhomogeneous process within the same model. This leads to the definition of "Lie Markov models" which, as we will show, are precisely the class of models where such an average exists. These models form Lie algebras and hence concepts from Lie group theory are central to their derivation. In this paper, we concentrate on applications to phylogenetics and nucleotide evolution, and derive the complete hierarchy of Lie Markov models that respect the grouping of nucleotides into purines and pyrimidines-that is, models with purine/pyrimidine symmetry. We also discuss how to handle the subtleties of applying Lie group methods, most naturally defined over the complex field, to the stochastic case of a Markov process, where parameter values are restricted to be real and positive. In particular, we explore the geometric embedding of the cone of stochastic rate matrices within the ambient space of the associated complex Lie algebra.


Assuntos
Modelos Genéticos , Nucleotídeos de Purina/genética , Nucleotídeos de Pirimidina/genética , Animais , DNA/genética , Evolução Molecular , Humanos , Cadeias de Markov , Conceitos Matemáticos , Filogenia , Processos Estocásticos
11.
BMC Evol Biol ; 14: 236, 2014 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-25472897

RESUMO

BACKGROUND: Hadamard conjugation is part of the standard mathematical armoury in the analysis of molecular phylogenetic methods. For group-based models, the approach provides a one-to-one correspondence between the so-called "edge length" and "sequence" spectrum on a phylogenetic tree. The Hadamard conjugation has been used in diverse phylogenetic applications not only for inference but also as an important conceptual tool for thinking about molecular data leading to generalizations beyond strictly tree-like evolutionary modelling. RESULTS: For general group-based models of phylogenetic branching processes, we reformulate the problem of constructing a one-one correspondence between pattern probabilities and edge parameters. This takes a classic result previously shown through use of Fourier analysis and presents it in the language of tensors and group representation theory. This derivation makes it clear why the inversion is possible, because, under their usual definition, group-based models are defined for abelian groups only. CONCLUSION: We provide an inversion of group-based phylogenetic models that can implemented using matrix multiplication between rectangular matrices indexed by ordered-partitions of varying sizes. Our approach provides additional context for the construction of phylogenetic probability distributions on network structures, and highlights the potential limitations of restricting to group-based models in this setting.


Assuntos
Modelos Genéticos , Filogenia , Evolução Biológica , Cadeias de Markov
12.
Syst Biol ; 62(1): 78-92, 2013 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-22914976

RESUMO

In their 2008 and 2009 articles, Sumner and colleagues introduced the "squangles"-a small set of Markov invariants for phylogenetic quartets. The squangles are consistent with the general Markov (GM) model and can be used to infer quartets without the need to explicitly estimate all parameters. As the GM model is inhomogeneous and hence nonstationary, the squangles are expected to perform well compared with standard approaches when there are changes in base composition among species. However, the GM model assumes constant rates across sites, so the squangles should be confounded by data generated with invariant sites or other forms of rate-variation across sites. Here we implement the squangles in a least-squares setting that returns quartets weighted by either confidence or internal edge lengths, and we show how these weighted quartets can be used as input into a variety of supertree and supernetwork methods. For the first time, we quantitatively investigate the robustness of the squangles to breaking of the constant rates-across-sites assumption on both simulated and real data sets; and we suggest a modification that improves the performance of the squangles in the presence of invariant sites. Our conclusion is that the squangles provide a novel tool for phylogenetic estimation that is complementary to methods that explicitly account for rate-variation across sites, but rely on homogeneous-and hence stationary-models.


Assuntos
Classificação/métodos , Modelos Genéticos , Filogenia , Animais , Simulação por Computador , Análise dos Mínimos Quadrados , Mamíferos/classificação , Mamíferos/genética , Cadeias de Markov , Reprodutibilidade dos Testes
14.
J Theor Biol ; 327: 88-90, 2013 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-23402954
15.
Artigo em Inglês | MEDLINE | ID: mdl-22331860

RESUMO

We consider novel phylogenetic models with rate matrices that arise via the embedding of a progenitor model on a small number of character states, into a target model on a larger number of character states. Adapting representation-theoretic results from recent investigations of Markov invariants for the general rate matrix model, we give a prescription for identifying and counting Markov invariants for such "symmetric embedded" models, and we provide enumerations of these for the first few cases with a small number of character states. The simplest example is a target model on three states, constructed from a general 2 state model; the "2 --> 3" embedding. We show that for 2 taxa, there exist two invariants of quadratic degree that can be used to directly infer pairwise distances from observed sequences under this model. A simple simulation study verifies their theoretical expected values, and suggests that, given the appropriateness of the model class, they have superior statistical properties than the standard (log) Det invariant (which is of cubic degree for this case).


Assuntos
Modelos Genéticos , Filogenia , Cadeias de Markov
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA