Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Ann Comb ; 28(1): 1-32, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38433929

RESUMO

An equidistant X-cactus is a type of rooted, arc-weighted, directed acyclic graph with leaf set X, that is used in biology to represent the evolutionary history of a set X of species. In this paper, we introduce and investigate the space of equidistant X-cactuses. This space contains, as a subset, the space of ultrametric trees on X that was introduced by Gavryushkin and Drummond. We show that equidistant-cactus space is a CAT(0)-metric space which implies, for example, that there are unique geodesic paths between points. As a key step to proving this, we present a combinatorial result concerning ranked rooted X-cactuses. In particular, we show that such graphs can be encoded in terms of a pairwise compatibility condition arising from a poset of collections of pairs of subsets of X that satisfy certain set-theoretic properties. As a corollary, we also obtain an encoding of ranked, rooted X-trees in terms of partitions of X, which provides an alternative proof that the space of ultrametric trees on X is CAT(0). We expect that our results will provide the basis for novel ways to perform statistical analyses on collections of equidistant X-cactuses, as well as new directions for defining and understanding spaces of more general, arc-weighted phylogenetic networks.

2.
Syst Biol ; 70(6): 1163-1180, 2021 10 13.
Artigo em Inglês | MEDLINE | ID: mdl-33560427

RESUMO

Popular optimality criteria for phylogenetic trees focus on sequences of characters that are applicable to all the taxa. As studies grow in breadth, it can be the case that some characters are applicable for a portion of the taxa and inapplicable for others. Past work has explored the limitations of treating inapplicable characters as missing data, noting that this strategy may favor trees where internal nodes are assigned impossible states, where the arrangement of taxa within subclades is unduly influenced by variation in distant parts of the tree, and/or where taxa that otherwise share most primary characters are grouped distantly. Approaches that avoid the first two problems have recently been proposed. Here, we propose an alternative approach which avoids all three problems. We focus on data matrices that use reductive coding of traits, that is, explicitly incorporate the innate hierarchy induced by inapplicability, and as such our approach extend to hierarchical characters, in general. In the spirit of maximum parsimony, the proposed criterion seeks the phylogenetic tree with the minimal changes across any tree branch, but where changes are defined in terms of dissimilarity metrics that weigh the effects of inapplicable characters. The approach can accommodate binary, multistate, ordered, unordered, and polymorphic characters. We give a polynomial-time algorithm, inspired by Fitch's algorithm, to score trees under a family of dissimilarity metrics, and prove its correctness. We show that the resulting optimality criteria is computationally hard, by reduction to the NP-hardness of the maximum parsimony optimality criteria. We demonstrate our approach using synthetic and empirical data sets and compare the results with other recently proposed methods for choosing optimal phylogenetic trees when the data includes hierarchical characters. [Character optimization, dissimilarity metrics, hierarchical characters, inapplicable data, phylogenetic tree search.].


Assuntos
Algoritmos , Fenótipo , Filogenia
3.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2823-2827, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33242309

RESUMO

Tree-based phylogenetic networks, which may be roughly defined as leaf-labeled networks built by adding arcs only between the original tree edges, have elegant properties for modeling evolutionary histories. We answer an open question of Francis, Semple, and Steel about the complexity of determining how far a phylogenetic network is from being tree-based, including non-binary phylogenetic networks. We show that finding a phylogenetic tree covering the maximum number of nodes in a phylogenetic network can be computed in polynomial time via an encoding into a minimum-cost flow problem.


Assuntos
Algoritmos , Biologia Computacional/métodos , Evolução Molecular , Filogenia , Modelos Genéticos
4.
Bull Math Biol ; 82(1): 10, 2020 01 14.
Artigo em Inglês | MEDLINE | ID: mdl-31932987

RESUMO

Maximum likelihood estimators are a popular method for scoring phylogenetic trees to best explain the evolutionary histories of biomolecular sequences. In 1994, Steel showed that, given an incompatible set of binary characters and a fixed tree topology, there exist multiple sets of branch lengths that are optima of the maximum average likelihood estimator. Since parsimony techniques-another popular method of scoring evolutionary trees-tend to exhibit favorable behavior on data compatible with the tree, Steel asked if the same is true for likelihood estimators, or if multiple optima can occur for compatible sequences. We show that, despite exhibiting behavior similar to parsimony, multiple local optima can occur for compatible characters for the most parsimonious likelihood estimator. We caution that thorough understanding of likelihood criteria is necessary before they are used to analyze biological data.


Assuntos
Evolução Biológica , Modelos Biológicos , Filogenia , Algoritmos , Animais , Simulação por Computador , Evolução Molecular , Humanos , Funções Verossimilhança , Conceitos Matemáticos , Modelos Genéticos , Software
5.
Proc Biol Sci ; 285(1892)2018 11 28.
Artigo em Inglês | MEDLINE | ID: mdl-30487309

RESUMO

The use of discrete character data for disparity analyses has become more popular, partially due to the recognition that character data describe variation at large taxonomic scales, as well as the increasing availability of both character matrices co-opted from phylogenetic analysis and software tools. As taxonomic scope increases, the need to describe variation leads to some characters that may describe traits not found across all the taxa. In such situations, it is common practice to treat inapplicable characters as missing data when calculating dissimilarity matrices for disparity studies. For commonly used dissimilarity metrics like Wills's GED and Gower's coefficient, this can lead to the reranking of pairwise dissimilarities, resulting in taxa that share more primary character states being assigned larger dissimilarity values than taxa that share fewer. We introduce a family of metrics that proportionally weight primary characters according to the secondary characters that describe them, effectively eliminating this problem, and compare their performance to common dissimilarity metrics and previously proposed weighting schemes. When applied to empirical datasets, we confirm that choice of dissimilarity metric frequently affects the rank order of pairwise distances, differentially influencing downstream macroevolutionary inferences.


Assuntos
Evolução Biológica , Classificação/métodos , Fenótipo , Modelos Biológicos , Filogenia
6.
Syst Biol ; 66(1): e83-e94, 2017 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-28173538

RESUMO

Trees are a canonical structure for representing evolutionary histories. Many popular criteria used to infer optimal trees are computationally hard, and the number of possible tree shapes grows super-exponentially in the number of taxa. The underlying structure of the spaces of trees yields rich insights that can improve the search for optimal trees, both in accuracy and in running time, and the analysis and visualization of results. We review the past work on analyzing and comparing trees by their shape as well as recent work that incorporates trees with weighted branch lengths.


Assuntos
Classificação , Filogenia , Reprodutibilidade dos Testes
7.
Bull Math Biol ; 78(5): 1058-75, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27234257

RESUMO

Finding the best phylogenetic tree under the maximum parsimony optimality criterion is computationally difficult. We quantify the occurrence of such optima for well-behaved sets of data. When nearest neighbor interchange operations are used, multiple local optima can occur even for "perfect" sequence data, which results in hill-climbing searches that never reach a global optimum. In contrast, we show that when neighbors are defined via the subtree prune and regraft metric, there is a single local optimum for perfect sequence data, and thus, every such search finds a global optimum quickly. We further characterize conditions for which sequences simulated under the Cavender-Farris-Neyman and Jukes-Cantor models of evolution yield well-behaved search spaces.


Assuntos
Filogenia , Algoritmos , Simulação por Computador , Evolução Molecular , Funções Verossimilhança , Conceitos Matemáticos , Modelos Genéticos , Saccharomyces/classificação , Saccharomyces/genética
8.
Bull Math Biol ; 78(5): 961-9, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27125655

RESUMO

We address an open question of Francis and Steel about phylogenetic networks and trees. They give a polynomial time algorithm to decide if a phylogenetic network, N, is tree-based and pose the problem: given a fixed tree T and network N, is N based on T? We show that it is [Formula: see text]-hard to decide, by reduction from 3-Dimensional Matching (3DM) and further that the problem is fixed-parameter tractable.


Assuntos
Filogenia , Algoritmos , Evolução Molecular , Conceitos Matemáticos , Modelos Genéticos
9.
Syst Biol ; 64(1): 56-65, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25164916

RESUMO

Finding the optimal evolutionary history for a set of taxa is a challenging computational problem, even when restricting possible solutions to be "tree-like" and focusing on the maximum-parsimony optimality criterion. This has led to much work on using heuristic tree searches to find approximate solutions. We present an approach for finding exact optimal solutions that employs and complements the current heuristic methods for finding optimal trees. Given a set of taxa and a set of aligned sequences of characters, there may be subsets of characters that are compatible, and for each such subset there is an associated (possibly partially resolved) phylogeny with edges corresponding to each character state change. These perfect phylogenies serve as anchor trees for our constrained search space. We show that, for sequences with compatible sites, the parsimony score of any tree [Formula: see text] is at least the parsimony score of the anchor trees plus the number of inferred changes between [Formula: see text] and the anchor trees. As the maximum-parsimony optimality score is additive, the sum of the lower bounds on compatible character partitions provides a lower bound on the complete alignment of characters. This yields a region in the space of trees within which the best tree is guaranteed to be found; limiting the search for the optimal tree to this region can significantly reduce the number of trees that must be examined in a search of the space of trees. We analyze this method empirically using four different biological data sets as well as surveying 400 data sets from the TreeBASE repository, demonstrating the effectiveness of our technique in reducing the number of steps in exact heuristic searches for trees under the maximum-parsimony optimality criterion.


Assuntos
Algoritmos , Classificação/métodos , Filogenia , Animais , Fungos/classificação , Magnoliopsida/classificação
10.
Artigo em Inglês | MEDLINE | ID: mdl-24334399

RESUMO

We answer Bryant's combinatorial challenge on minimal walks of phylogenetic treespace under the nearest-neighbor interchange (NNI) metric. We show that the shortest path through the NNI-treespace of n-leaf trees is Hamiltonian for all n. That is, there is a minimal path that visits all binary trees exactly once, under NNI moves.


Assuntos
Biologia Computacional/métodos , Modelos Genéticos , Filogenia , Algoritmos
11.
Artigo em Inglês | MEDLINE | ID: mdl-23702562

RESUMO

A nearest-neighbor-interchange (NNI)-walk is a sequence of unrooted phylogenetic trees, T1, T2, . . . , T(k) where each consecutive pair of trees differs by a single NNI move. We give tight bounds on the length of the shortest NNI-walks that visit all trees in a subtree-prune-and-regraft (SPR) neighborhood of a given tree. For any unrooted, binary tree, T, on n leaves, the shortest walk takes Θ(n²) additional steps more than the number of trees in the SPR neighborhood. This answers Bryant's Second Combinatorial Challenge from the Phylogenetics Challenges List, the Isaac Newton Institute, 2011, and the Penny Ante Problem List, 2009.


Assuntos
Biologia Computacional/métodos , Modelos Genéticos , Filogenia , Algoritmos , Evolução Molecular
12.
Artigo em Inglês | MEDLINE | ID: mdl-21788676

RESUMO

We show that two important problems that have applications in computational biology are ASP-complete, which implies that, given a solution to a problem, it is NP-complete to decide if another solution exists. We show first that a variation of BETWEENNESS, which is the underlying problem of questions related to radiation hybrid mapping, is ASP-complete. Subsequently, we use that result to show that QUARTET COMPATIBILITY, a fundamental problem in phylogenetics that asks whether a set of quartets can be represented by a parent tree, is also ASP-complete. The latter result shows that Steel's QUARTET CHALLENGE, which asks whether a solution to QUARTET COMPATIBILITY is unique, is coNP-complete.


Assuntos
Biologia Computacional , Filogenia , Algoritmos
13.
Artigo em Inglês | MEDLINE | ID: mdl-20671326

RESUMO

We show that subtree prune and regraft (uSPR) distance on unrooted trees is fixed parameter tractable with respect to the distance. We also make progress on a conjecture of Steel on the preservation of uSPR distance under chain reduction, improving on lower bounds of Hickey et al.


Assuntos
Biologia Computacional/métodos , Filogenia , Algoritmos , Evolução Molecular , Modelos Genéticos , Modelos Teóricos
14.
Artigo em Inglês | MEDLINE | ID: mdl-20530818

RESUMO

A tanglegram is a pair of trees on the same set of leaves with matching leaves in the two trees joined by an edge. Tanglegrams are widely used in biology--to compare evolutionary histories of host and parasite species and to analyze genes of species in the same geographical area. We consider optimization problems in tanglegram drawings. We show a linear time algorithm to decide if a tanglegram admits a planar embedding by a reduction to the planar graph drawing problem. This problem was also studied by Fernau et al. A similar reduction to a graph crossing problem also helps to solve an open problem they posed, showing a fixed-parameter tractable algorithm for minimizing the number of crossings over all d-ary trees. For the case where one tree is fixed, we show an O(n log n) algorithm to determine the drawing of the second tree that minimizes the number of crossings. This improves the bound from earlier methods. We introduce a new optimization criterion using Spearman's footrule distance and give an O(n²) algorithm. We also show integer programming formulations to quickly obtain tanglegram drawings that minimize the two optimization measures discussed. We prove lower bounds on the maximum gap between the optimal solution and the heuristic of Dwyer and Schreiber to minimize crossings.


Assuntos
Algoritmos , Filogenia , Evolução Molecular
15.
Evol Bioinform Online ; 3: 86-98, 2007 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-19461978

RESUMO

Hybridization is an important evolutionary process for many groups of species. Thus, conflicting signals in a data set may not be the result of sampling or modeling errors, but due to the fact that hybridization has played a significant role in the evolutionary history of the species under consideration. Assuming that the initial set of gene trees is correct, a basic problem for biologists is to compute this minimum number of hybridization events to explain this set. In this paper, we describe a new reduction-based algorithm for computing the minimum number, when the initial data set consists of two trees. Although the two-tree problem is NP-hard, our algorithm always gives the exact solution and runs efficiently on many real biological problems. Previous algorithms for the two-tree problem either solve a restricted version of the problem or give an answer with no guarantee of the closeness to the exact solution. We illustrate our algorithm on a grass data set. This new algorithm is freely available for application at either http://www.bi.uni-duesseldorf.de/approximately linz or http://www.math.canterbury.ac.nz/approximately cas83.

16.
J Comput Biol ; 13(8): 1419-34, 2006 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17061919

RESUMO

We give a 5-approximation algorithm to the rooted Subtree-Prune-and-Regraft (rSPR) distance between two phylogenies, which was recently shown to be NP-complete. This paper presents the first approximation result for this important tree distance. The algorithm follows a standard format for tree distances. The novel ideas are in the analysis. In the analysis, the cost of the algorithm uses a "cascading" scheme that accounts for possible wrong moves. This accounting is missing from previous analysis of tree distance approximation algorithms. Further, we show how all algorithms of this type can be implemented in linear time and give experimental results.


Assuntos
Algoritmos , Biologia Computacional/métodos , Filogenia , Animais , Modelos Biológicos
17.
J Comput Biol ; 12(6): 796-811, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16108717

RESUMO

We present new methods for reconstructing reticulate evolution of species due to events such as horizontal transfer or hybrid speciation; both methods are based upon extensions of Wayne Maddison's approach in his seminal 1997 paper. Our first method is a polynomial time algorithm for constructing phylogenetic networks from two gene trees contained inside the network. We allow the network to have an arbitrary number of reticulations, but we limit the reticulation in the network so that the cycles in the network are node-disjoint ("galled"). Our second method is a polynomial time algorithm for constructing networks with one reticulation, where we allow for errors in the estimated gene trees. Using simulations, we demonstrate improved performance of this method over both NeighborNet and Maddison's method.


Assuntos
Evolução Molecular , Modelos Genéticos , Filogenia , Simulação por Computador , Frequência do Gene , Variação Genética , Modelos Estatísticos , Mutação , Seleção Genética
18.
Syst Biol ; 54(3): 471-82, 2005 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16012112

RESUMO

We explored the use of multidimensional scaling (MDS) of tree-to-tree pairwise distances to visualize the relationships among sets of phylogenetic trees. We found the technique to be useful for exploring "tree islands" (sets of topologically related trees among larger sets of near-optimal trees), for comparing sets of trees obtained from bootstrapping and Bayesian sampling, for comparing trees obtained from the analysis of several different genes, and for comparing multiple Bayesian analyses. The technique was also useful as a teaching aid for illustrating the progress of a Bayesian analysis and as an exploratory tool for examining large sets of phylogenetic trees. We also identified some limitations to the method, including distortions of the multidimensional tree space into two dimensions through the MDS technique, and the definition of the MDS-defined space based on a limited sample of trees. Nonetheless, the technique is a useful approach for the analysis of large sets of phylogenetic trees.


Assuntos
Classificação/métodos , Interpretação Estatística de Dados , Filogenia , Software , Teorema de Bayes
19.
Pac Symp Biocomput ; : 211-22, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-11928477

RESUMO

Whole-genome phylogenetic studies require various sources of phylogenetic signals to produce an accurate picture of the evolutionary history of a group of genomes. In particular, sequence-based reconstruction will play an important role, especially in resolving more recent events. But using sequences at the level of whole genomes means working with very large amounts of data--large numbers of sequences--as well as large phylogenetic distances, so that reconstruction methods must be both fast and robust as well as accurate. We study the accuracy, convergence rate, and speed of several fast reconstruction methods: neighbor-joining, Weighbor (a weighted version of neighbor-joining), greedy parsimony, and a new phylogenetic reconstruction method based on disk-covering and parsimony search (DCM-NJ + MP). Our study uses extensive simulations based on random birth-death trees, with controlled deviations from ultrametricity. We find that Weighbor, thanks to its sophisticated handling of probabilities, outperforms other methods for short sequences, while our new method is the best choice for sequence lengths above 100. For very large sequence lengths, all four methods have similar accuracy, so that the speed of neighbor-joining and greedy parsimony makes them the two methods of choice.


Assuntos
Bases de Dados Factuais , Filogenia , Sequência de Aminoácidos , Sequência de Bases , DNA/genética , Modelos Genéticos , RNA/genética , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...