Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Bull Math Biol ; 81(2): 598-617, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-29589255

RESUMO

Given a collection [Formula: see text] of subsets of a finite set X, we say that [Formula: see text] is phylogenetically flexible if, for any collection R of rooted phylogenetic trees whose leaf sets comprise the collection [Formula: see text], R is compatible (i.e. there is a rooted phylogenetic X-tree that displays each tree in R). We show that [Formula: see text] is phylogenetically flexible if and only if it satisfies a Hall-type inequality condition of being 'slim'. Using submodularity arguments, we show that there is a polynomial-time algorithm for determining whether or not [Formula: see text] is slim. This 'slim' condition reduces to a simpler inequality in the case where all of the sets in [Formula: see text] have size 3, a property we call 'thin'. Thin sets were recently shown to be equivalent to the existence of an (unrooted) tree for which the median function provides an injective mapping to its vertex set; we show here that the unrooted tree in this representation can always be chosen to be a caterpillar tree. We also characterise when a collection [Formula: see text] of subsets of size 2 is thin (in terms of the flexibility of total orders rather than phylogenies) and show that this holds if and only if an associated bipartite graph is a forest. The significance of our results for phylogenetics is in providing precise and efficiently verifiable conditions under which supertree methods that require consistent inputs of trees can be applied to any input trees on given subsets of species.


Assuntos
Modelos Genéticos , Filogenia , Algoritmos , Biologia Computacional , Evolução Molecular , Genômica/estatística & dados numéricos , Conceitos Matemáticos , Modelos Estatísticos
2.
J Theor Biol ; 437: 222-224, 2018 01 21.
Artigo em Inglês | MEDLINE | ID: mdl-29080779

RESUMO

A variety of evolutionary processes in biology can be viewed as settings where organisms 'catalyse' the formation of new types of organisms. One example, relevant to the origin of life, is where transient biological colonies (e.g. prokaryotes or protocells) give rise to new colonies via lateral gene transfer. In this short note, we describe and analyse a simple random process which models such settings. By applying theory from general birth-death processes, we describe how the survival of a population under catalytic diversification depends on interplay of the catalysis rate and the initial population size. We also note how such process can also be viewed within the framework of 'self-sustaining autocatalytic networks'.


Assuntos
Algoritmos , Células Artificiais/metabolismo , Transferência Genética Horizontal , Modelos Genéticos , Células Procarióticas/metabolismo , Simulação por Computador , Evolução Molecular , Genoma Bacteriano/genética , Cadeias de Markov , Origem da Vida
3.
J Theor Biol ; 420: 174-179, 2017 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-28263815

RESUMO

The reconstruction of phylogenetic trees from discrete character data typically relies on models that assume the characters evolve under a continuous-time Markov process operating at some overall rate λ. When λ is too high or too low, it becomes difficult to distinguish a short interior edge from a polytomy (the tree that results from collapsing the edge). In this note, we investigate the rate that maximizes the expected log-likelihood ratio (i.e. the Kullback-Leibler separation) between the four-leaf unresolved (star) tree and a four-leaf binary tree with interior edge length ϵ. For a simple two-state model, we show that as ϵ converges to 0 the optimal rate also converges to zero when the four pendant edges have equal length. However, when the four pendant branches have unequal length, two local optima can arise, and it is possible for the globally optimal rate to converge to a non-zero constant as ϵ→0. Moreover, in the setting where the four pendant branches have equal lengths and either (i) we replace the two-state model by an infinite-state model or (ii) we retain the two-state model and replace the Kullback-Leibler separation by Euclidean distance as the maximization goal, then the optimal rate also converges to a non-zero constant.


Assuntos
Modelos Teóricos , Filogenia , Evolução Molecular , Cadeias de Markov , Modelos Genéticos
4.
J Math Biol ; 74(5): 1107-1138, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-27604275

RESUMO

The reconstruction of phylogenetic trees from molecular sequence data relies on modelling site substitutions by a Markov process, or a mixture of such processes. In general, allowing mixed processes can result in different tree topologies becoming indistinguishable from the data, even for infinitely long sequences. However, when the underlying Markov process supports linear phylogenetic invariants, then provided these are sufficiently informative, the identifiability of the tree topology can be restored. In this paper, we investigate a class of processes that support linear invariants once the stationary distribution is fixed, the 'equal input model'. This model generalizes the 'Felsenstein 1981' model (and thereby the Jukes-Cantor model) from four states to an arbitrary number of states (finite or infinite), and it can also be described by a 'random cluster' process. We describe the structure and dimension of the vector spaces of phylogenetic mixtures and of linear invariants for any fixed phylogenetic tree (and for all trees-the so called 'model invariants'), on any number n of leaves. We also provide a precise description of the space of mixtures and linear invariants for the special case of [Formula: see text] leaves. By combining techniques from discrete random processes and (multi-) linear algebra, our results build on a classic result that was first established by James Lake (Mol Biol Evol 4:167-191, 1987).


Assuntos
Modelos Biológicos , Filogenia , Cadeias de Markov , Dados de Sequência Molecular
5.
J Theor Biol ; 374: 54-9, 2015 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-25843219

RESUMO

The evolution of aligned DNA sequence sites is generally modeled by a Markov process operating along the edges of a phylogenetic tree. It is well known that the probability distribution on the site patterns at the tips of the tree determines the tree topology, and its branch lengths. However, the number of patterns is typically much larger than the number of edges, suggesting considerable redundancy in the branch length estimation. In this paper we ask whether the probabilities of just the 'edge-specific' patterns (the ones that correspond to a change of state on a single edge) suffice to recover the branch lengths of the tree, under a symmetric 2-state Markov process. We first show that this holds provided the branch lengths are sufficiently short, by applying the inverse function theorem. We then consider whether this restriction to short branch lengths is necessary. We show that for trees with up to four leaves it can be lifted. This leaves open the interesting question of whether this holds in general. Our results also extend to certain Markov processes on more than 2-states, such as the Jukes-Cantor model.


Assuntos
Evolução Molecular , Modelos Biológicos , Filogenia , Algoritmos , Cadeias de Markov , Probabilidade , Software
6.
Theor Popul Biol ; 101: 61-6, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25772708

RESUMO

A wide range of stochastic processes that model the growth and decline of populations exhibit a curious dichotomy: with certainty either the population goes extinct or its size tends to infinity. There is an elegant and classical theorem that explains why this dichotomy must hold under certain assumptions concerning the process. In this note, I explore how these assumptions might be relaxed further in order to obtain the same, or a similar conclusion, and obtain both positive and negative results.


Assuntos
Extinção Biológica , Modelos Biológicos , Dinâmica Populacional , Humanos , Cadeias de Markov , Processos Estocásticos
7.
Math Biosci ; 227(2): 125-35, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20627110

RESUMO

Statistical consistency in phylogenetics has traditionally referred to the accuracy of estimating phylogenetic parameters for a fixed number of species as we increase the number of characters. However, it is also useful to consider a dual type of statistical consistency where we increase the number of species, rather than characters. This raises some basic questions: what can we learn about the evolutionary process as we increase the number of species? In particular, does having more species allow us to infer the ancestral state of characters accurately? This question is particularly important when sequence evolution varies in a complex way from character to character, as methods applicable for i.i.d. models may no longer be valid. In this paper, we assemble a collection of results to analyse various approaches for inferring ancestral information with increasing accuracy as the number of taxa increases.


Assuntos
Evolução Molecular , Modelos Genéticos , Filogenia , Algoritmos , Simulação por Computador , Funções Verossimilhança , Cadeias de Markov , Modelos Estatísticos , Probabilidade
8.
Artigo em Inglês | MEDLINE | ID: mdl-19179706

RESUMO

Ancestral maximum likelihood (AML) is a method that simultaneously reconstructs a phylogenetic tree and ancestral sequences from extant data (sequences at the leaves). The tree and ancestral sequences maximize the probability of observing the given data under a Markov model of sequence evolution, in which branch lengths are also optimized but constrained to take the same value on any edge across all sequence sites. AML differs from the more usual form of maximum likelihood (ML) in phylogenetics because ML averages over all possible ancestral sequences. ML has long been know to be statistically consistent--that is, it converges on the correct tree with probability approaching 1 as the sequence length grows. However, the statistical consistency of AML has not been formally determined, despite informal remarks in a literature that dates back 20 years. In this short note we prove a general result that implies that AML is statistically inconsistent. In particular we show that AML can 'shrink' short edges in a tree, resulting in a tree that has no internal resolution as the sequence length grows. Our results apply to any number of taxa.


Assuntos
Biologia Computacional/métodos , Modelos Estatísticos , Filogenia , Análise por Conglomerados , Cadeias de Markov
9.
J Theor Biol ; 256(2): 247-52, 2009 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-18955066

RESUMO

In evolutionary biology, genetic sequences carry with them a trace of the underlying tree that describes their evolution from a common ancestral sequence. The question of how many sequence sites are required to recover this evolutionary relationship accurately depends on the model of sequence evolution, the substitution rate, divergence times and the method used to infer phylogenetic history. A particularly challenging problem for phylogenetic methods arises when a rapid divergence event occurred in the distant past. We analyse an idealised form of this problem in which the terminal edges of a symmetric four-taxon tree are some factor (lambda) times the length of the interior edge. We determine an order lambda(2) lower bound on the growth rate for the sequence length required to resolve the tree (independent of any particular branch length). We also show that this rate of sequence length growth can be achieved by existing methods (including the simple 'maximum parsimony' method), and compare these order lambda(2) bounds with an order lambda growth rate for a model that describes low-homoplasy evolution. In the final section, we provide a generic bound on the sequence length requirement for a more general class of Markov processes.


Assuntos
Sequência de Bases , Evolução Molecular , Modelos Genéticos , Filogenia , Animais , Cadeias de Markov
10.
J Theor Biol ; 256(3): 467-72, 2009 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-19000697

RESUMO

Distance-based approaches in phylogenetics such as Neighbor-Joining are a fast and popular approach for building trees. These methods take pairs of sequences, and from them construct a value that, in expectation, is additive under a stochastic model of site substitution. Most models assume a distribution of rates across sites, often based on a gamma distribution. Provided the (shape) parameter of this distribution is known, the method can correctly reconstruct the tree. However, if the shape parameter is not known then we show that topologically different trees, with different shape parameters and associated positive branch lengths, can lead to exactly matching distributions on pairwise site patterns between all pairs of taxa. Thus, one could not distinguish between the two trees using pairs of sequences without some prior knowledge of the shape parameter. More surprisingly, this can happen for any choice of distinct shape parameters on the two trees, and thus the result is not peculiar to a particular or contrived selection of the shape parameters. On a positive note, we point out known conditions where identifiability can be restored (namely, when the branch lengths are clocklike, or if methods such as maximum likelihood are used).


Assuntos
Simulação por Computador , Modelos Genéticos , Filogenia , Homologia de Sequência de Aminoácidos , Animais , Variação Genética , Cadeias de Markov
11.
Syst Biol ; 55(4): 644-51, 2006 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-16969940

RESUMO

The Noah's Ark Problem (NAP) is a comprehensive cost-effectiveness methodology for biodiversity conservation that was introduced by Weitzman (1998) and utilizes the phylogenetic tree containing the taxa of interest to assess biodiversity. Given a set of taxa, each of which has a particular survival probability that can be increased at some cost, the NAP seeks to allocate limited funds to conserving these taxa so that the future expected biodiversity is maximized. Finding optimal solutions using this framework is a computationally difficult problem to which a simple and efficient "greedy" algorithm has been proposed in the literature and applied to conservation problems. We show that, although algorithms of this type cannot produce optimal solutions for the general NAP, there are two restricted scenarios of the NAP for which a greedy algorithm is guaranteed to produce optimal solutions. The first scenario requires the taxa to have equal conservation cost; the second scenario requires an ultrametric tree. The NAP assumes a linear relationship between the funding allocated to conservation of a taxon and the increased survival probability of that taxon. This relationship is briefly investigated and one variation is suggested that can also be solved using a greedy algorithm.


Assuntos
Algoritmos , Biodiversidade , Conservação dos Recursos Naturais/métodos , Modelos Teóricos , Filogenia , Simulação por Computador , Conservação dos Recursos Naturais/economia
12.
Syst Biol ; 53(2): 327-32, 2004 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-15205056

RESUMO

Given a collection of discrete characters (e.g., aligned DNA sites, gene adjacencies), a common measure of distance between taxa is the proportion of characters for which taxa have different character states. Tree reconstruction based on these (uncorrected) distances can be statistically inconsistent and can lead to trees different from those obtained using character-based methods such as maximum likelihood or maximum parsimony. However, in these cases the distance data often reveal their unreliability by some deviation from additivity, as indicated by conflicting support for more than one tree. We describe two results that show how uncorrected (and miscorrected) distance data can be simultaneously perfectly additive and misleading. First, multistate character data can be perfectly compatible and define one tree, and yet the uncorrected distances derived from these characters are perfectly treelike (and obey a molecular clock), only for a completely different tree. Second, under a Markov model of character evolution a similar phenomenon can occur; not only is there statistical inconsistency using uncorrected distances, but there is no evidence of this inconsistency because the distances look perfectly treelike (this does not occur in the classic two-parameter Felsenstein zone). We characterize precisely when uncorrected distances are additive on the true (and on a false) tree for four taxa. We also extend this result to a more general setting that applies to distances corrected according to an incorrect model.


Assuntos
Classificação/métodos , Evolução Molecular , Modelos Genéticos , Filogenia , Interpretação Estatística de Dados , Cadeias de Markov
13.
Math Biosci ; 187(2): 189-203, 2004 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-14739084

RESUMO

We investigate a simple model that generates random partitions of the leaf set of a tree. Of particular interest is the reconstruction question: what number k of independent samples (partitions) are required to correctly reconstruct the underlying tree (with high probability)? We demonstrate a phase transition for k as a function of the mutation rate, from logarithmic to polynomial dependence on the size of the tree. We also describe a simple polynomial-time tree reconstruction algorithm that applies in the logarithmic region. This model and the associated reconstruction questions are motivated by a Markov model for genomic evolution in molecular biology.


Assuntos
Evolução Molecular , Modelos Genéticos , Filogenia , Análise por Conglomerados , Cadeias de Markov
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA