Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Syst Biol ; 68(5): 814-827, 2019 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-30865279

RESUMO

Perfect phylogenies are fundamental in the study of evolutionary trees because they capture the situation when each evolutionary trait emerges only once in history; if such events are believed to be rare, then by Occam's Razor such parsimonious trees are preferable as a hypothesis of evolution. A classical result states that 2-state characters permit a perfect phylogeny precisely if each subset of 2 characters permits one. More recently, it was shown that for 3-state characters the same property holds but for size-3 subsets. A long-standing open problem asked whether such a constant exists for each number of states. More precisely, it has been conjectured that for any fixed number of states $r$ there exists a constant $f(r)$ such that a set of $r$-state characters $C$ has a perfect phylogeny if and only if every subset of at most $f(r)$ characters has a perfect phylogeny. Informally, the conjecture states that checking fixed-size subsets of characters is enough to correctly determine whether input data permits a perfect phylogeny, irrespective of the number of characters in the input. In this article, we show that this conjecture is false. In particular, we show that for any constant $t$, there exists a set $C$ of $8$-state characters such that $C$ has no perfect phylogeny, but there exists a perfect phylogeny for every subset of at most $t$ characters. Moreover, there already exists a perfect phylogeny when ignoring just one of the characters, independent of which character you ignore. This negative result complements the two negative results ("strikes") of Bodlaender et al. (1992,2000). We reflect on the consequences of this third strike, pointing out that while it does close off some routes for efficient algorithm development, many others remain open.


Assuntos
Classificação/métodos , Filogenia , Algoritmos
2.
J Math Biol ; 78(1-2): 527-547, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30121824

RESUMO

Phylogenetic networks are often constructed by merging multiple conflicting phylogenetic signals into a directed acyclic graph. It is interesting to explore whether a network constructed in this way induces biologically-relevant phylogenetic signals that were not present in the input. Here we show that, given a multiple alignment A for a set of taxa X and a rooted phylogenetic network N whose leaves are labelled by X, it is NP-hard to locate a most parsimonious phylogenetic tree displayed by N (with respect to A) even when the level of N-the maximum number of reticulation nodes within a biconnected component-is 1 and A contains only 2 distinct states. (If, additionally, gaps are allowed the problem becomes APX-hard.) We also show that under the same conditions, and assuming a simple binary symmetric model of character evolution, finding a most likely tree displayed by the network is NP-hard. These negative results contrast with earlier work on parsimony in which it is shown that if A consists of a single column the problem is fixed parameter tractable in the level. We conclude with a discussion of why, despite the NP-hardness, both the parsimony and likelihood problem can likely be well-solved in practice.


Assuntos
Modelos Genéticos , Filogenia , Algoritmos , Animais , Biologia Computacional , Evolução Molecular , Especiação Genética , Humanos , Conceitos Matemáticos
3.
Trends Genet ; 29(8): 439-41, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23764187

RESUMO

Networks allow the investigation of evolutionary relationships that do not fit a tree model. They are becoming a leading tool for describing the evolutionary relationships between organisms, given the comparative complexities among genomes.


Assuntos
Evolução Biológica , Modelos Genéticos , Filogenia , Animais , Genoma , Saccharomyces/genética
4.
Bull Math Biol ; 78(9): 1773-1795, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27659024

RESUMO

Phylogenetic networks are increasingly used in evolutionary biology to represent the history of species that have undergone reticulate events such as horizontal gene transfer, hybrid speciation and recombination. One of the most fundamental questions that arise in this context is whether the evolution of a gene with one copy in all species can be explained by a given network. In mathematical terms, this is often translated in the following way: is a given phylogenetic tree contained in a given phylogenetic network? Recently this tree containment problem has been widely investigated from a computational perspective, but most studies have only focused on the topology of the phylogenies, ignoring a piece of information that, in the case of phylogenetic trees, is routinely inferred by evolutionary analyses: branch lengths. These measure the amount of change (e.g., nucleotide substitutions) that has occurred along each branch of the phylogeny. Here, we study a number of versions of the tree containment problem that explicitly account for branch lengths. We show that, although length information has the potential to locate more precisely a tree within a network, the problem is computationally hard in its most general form. On a positive note, for a number of special cases of biological relevance, we provide algorithms that solve this problem efficiently. This includes the case of networks of limited complexity, for which it is possible to recover, among the trees contained by the network with the same topology as the input tree, the closest one in terms of branch lengths.


Assuntos
Modelos Genéticos , Filogenia , Algoritmos , Evolução Molecular , Conceitos Matemáticos
5.
BMC Bioinformatics ; 15: 127, 2014 May 05.
Artigo em Inglês | MEDLINE | ID: mdl-24884964

RESUMO

BACKGROUND: Reticulate events play an important role in determining evolutionary relationships. The problem of computing the minimum number of such events to explain discordance between two phylogenetic trees is a hard computational problem. Even for binary trees, exact solvers struggle to solve instances with reticulation number larger than 40-50. RESULTS: Here we present CycleKiller and NonbinaryCycleKiller, the first methods to produce solutions verifiably close to optimality for instances with hundreds or even thousands of reticulations. CONCLUSIONS: Using simulations, we demonstrate that these algorithms run quickly for large and difficult instances, producing solutions that are very close to optimality. As a spin-off from our simulations we also present TerminusEst, which is the fastest exact method currently available that can handle nonbinary trees: this is used to measure the accuracy of the NonbinaryCycleKiller algorithm. All three methods are based on extensions of previous theoretical work (SIDMA 26(4):1635-1656, TCBB 10(1):18-25, SIDMA 28(1):49-66) and are publicly available. We also apply our methods to real data.


Assuntos
Algoritmos , Filogenia , Hibridização Genética , Software
6.
Bioinformatics ; 26(12): i124-31, 2010 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-20529896

RESUMO

UNLABELLED: Phylogenetic trees are widely used to display estimates of how groups of species are evolved. Each phylogenetic tree can be seen as a collection of clusters, subgroups of the species that evolved from a common ancestor. When phylogenetic trees are obtained for several datasets (e.g. for different genes), then their clusters are often contradicting. Consequently, the set of all clusters of such a dataset cannot be combined into a single phylogenetic tree. Phylogenetic networks are a generalization of phylogenetic trees that can be used to display more complex evolutionary histories, including reticulate events, such as hybridizations, recombinations and horizontal gene transfers. Here, we present the new Cass algorithm that can combine any set of clusters into a phylogenetic network. We show that the networks constructed by Cass are usually simpler than networks constructed by other available methods. Moreover, we show that Cass is guaranteed to produce a network with at most two reticulations per biconnected component, whenever such a network exists. We have implemented Cass and integrated it into the freely available Dendroscope software. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Filogenia , Algoritmos , Análise por Conglomerados , Evolução Molecular
7.
J Theor Biol ; 269(1): 245-55, 2011 Jan 21.
Artigo em Inglês | MEDLINE | ID: mdl-21044634

RESUMO

Rooted phylogenetic networks are used to model non-treelike evolutionary histories. Such networks are often constructed by combining trees, clusters, triplets or characters into a single network that in some well-defined sense simultaneously represents them all. We review these four models and investigate how they are related. Motivated by the parsimony principle, one often aims to construct a network that contains as few reticulations (non-treelike evolutionary events) as possible. In general, the model chosen influences the minimum number of reticulation events required. However, when one obtains the input data from two binary (i.e. fully resolved) trees, we show that the minimum number of reticulations is independent of the model. The number of reticulations necessary to represent the trees, triplets, clusters (in the softwired sense) and characters (with unrestricted multiple crossover recombination) are all equal. Furthermore, we show that these results also hold when not the number of reticulations but the level of the constructed network is minimised. We use these unification results to settle several computational complexity questions that have been open in the field for some time. We also give explicit examples to show that already for data obtained from three binary trees the models begin to diverge.


Assuntos
Filogenia , Análise por Conglomerados , Modelos Genéticos , Recombinação Genética/genética
8.
BMC Ecol Evol ; 21(1): 220, 2021 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-34876022

RESUMO

BACKGROUND: Rooted phylogenetic networks are used to display complex evolutionary history involving so-called reticulation events, such as genetic recombination. Various methods have been developed to construct such networks, using for example a multiple sequence alignment or multiple phylogenetic trees as input data. Coronaviruses are known to recombine frequently, but rooted phylogenetic networks have not yet been used extensively to describe their evolutionary history. Here, we created a workflow to compare the evolutionary history of SARS-CoV-2 with other SARS-like viruses using several rooted phylogenetic network inference algorithms. This workflow includes filtering noise from sets of phylogenetic trees by contracting edges based on branch length and bootstrap support, followed by resolution of multifurcations. We explored the running times of the network inference algorithms, the impact of filtering on the properties of the produced networks, and attempted to derive biological insights regarding the evolution of SARS-CoV-2 from them. RESULTS: The network inference algorithms are capable of constructing rooted phylogenetic networks for coronavirus data, although running-time limitations require restricting such datasets to a relatively small number of taxa. Filtering generally reduces the number of reticulations in the produced networks and increases their temporal consistency. Taxon bat-SL-CoVZC45 emerges as a major and structural source of discordance in the dataset. The tested algorithms often indicate that SARS-CoV-2/RaTG13 is a tree-like clade, with possibly some reticulate activity further back in their history. A smaller number of constructed networks posit SARS-CoV-2 as a possible recombinant, although this might be a methodological artefact arising from the interaction of bat-SL-CoVZC45 discordance and the optimization criteria used. CONCLUSION: Our results demonstrate that as part of a wider workflow and with careful attention paid to running time, rooted phylogenetic network algorithms are capable of producing plausible networks from coronavirus data. These networks partly corroborate existing theories about SARS-CoV-2, and partly produce new avenues for exploration regarding the location and significance of reticulate activity within the wider group of SARS-like viruses. Our workflow may serve as a model for pipelines in which phylogenetic network algorithms can be used to analyse different datasets and test different hypotheses.


Assuntos
COVID-19 , SARS-CoV-2 , Algoritmos , Humanos , Filogenia
9.
J Bioinform Comput Biol ; 7(4): 597-623, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19634194

RESUMO

Phylogenetic networks provide a way to describe and visualize evolutionary histories that have undergone so-called reticulate evolutionary events such as recombination, hybridization or horizontal gene transfer. The level k of a network determines how non-treelike the evolution can be, with level-0 networks being trees. We study the problem of constructing level-k phylogenetic networks from triplets, i.e. phylogenetic trees for three leaves (taxa). We give, for each k, a level-k network that is uniquely defined by its triplets. We demonstrate the applicability of this result by using it to prove that (1) for all k > or = 1 it is NP-hard to construct a level-k network consistent with all input triplets, and (2) for all k > or = 0 it is NP-hard to construct a level-k network consistent with a maximum number of input triplets, even when the input is dense. As a response to this intractability, we give an exact algorithm for constructing level-1 networks consistent with a maximum number of input triplets.


Assuntos
Algoritmos , Transferência Genética Horizontal/genética , Genética Populacional , Modelos Genéticos , Linhagem , Filogenia , Animais , Simulação por Computador , Humanos
10.
Artigo em Inglês | MEDLINE | ID: mdl-18451439

RESUMO

The problem Parsimony Haplotyping (PH) asks for the smallest set of haplotypes which can explain a given set of genotypes, and the problem Minimum Perfect Phylogeny Haplotyping (MPPH) asks for the smallest such set which also allows the haplotypes to be embedded in a perfect phylogeny, an evolutionary tree with biologically-motivated restrictions. For PH, we extend recent work by further mapping the interface between ;;easy'' and ;;hard'' instances, within the framework of (k,l)-bounded instances where the number of 2's per column and row of the input matrix is restricted. By exploring, in the same way, the tractability frontier of MPPH we provide the first concrete, positive results for this problem. In addition, we construct for both PH and MPPH polynomial time approximation algorithms, based on properties of the columns of the input matrix.


Assuntos
Algoritmos , Haplótipos , Filogenia , Evolução Biológica , Biologia Computacional , Genótipo , Modelos Genéticos , Modelos Estatísticos
11.
Algorithmica ; 80(11): 2993-3022, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30956378

RESUMO

The hybridization number problem requires us to embed a set of binary rooted phylogenetic trees into a binary rooted phylogenetic network such that the number of nodes with indegree two is minimized. However, from a biological point of view accurately inferring the root location in a phylogenetic tree is notoriously difficult and poor root placement can artificially inflate the hybridization number. To this end we study a number of relaxed variants of this problem. We start by showing that the fundamental problem of determining whether an unrooted phylogenetic network displays (i.e. embeds) an unrooted phylogenetic tree, is NP-hard. On the positive side we show that this problem is FPT in reticulation number. In the rooted case the corresponding FPT result is trivial, but here we require more subtle argumentation. Next we show that the hybridization number problem for unrooted networks (when given two unrooted trees) is equivalent to the problem of computing the tree bisection and reconnect distance of the two unrooted trees. In the third part of the paper we consider the "root uncertain" variant of hybridization number. Here we are free to choose the root location in each of a set of unrooted input trees such that the hybridization number of the resulting rooted trees is minimized. On the negative side we show that this problem is APX-hard. On the positive side, we show that the problem is FPT in the hybridization number, via kernelization, for any number of input trees.

12.
Artigo em Inglês | MEDLINE | ID: mdl-27008672

RESUMO

Given two phylogenetic trees on the same set of taxa X , the maximum parsimony distance dMP is defined as the maximum, ranging over all characters χ on X, of the absolute difference in parsimony score induced by χ on the two trees. In this note, we prove that for binary trees there exists a character achieving this maximum that is convex on one of the trees (i.e., the parsimony score induced on that tree is equal to the number of states in the character minus 1) and such that the number of states in the character is at most 7dMP-5 . This is the first non-trivial bound on the number of states required by optimal characters, convex or otherwise. The result potentially has algorithmic significance because, unlike general characters, convex characters with a bounded number of states can be enumerated in polynomial time.


Assuntos
Algoritmos , Filogenia , Biologia Computacional , Modelos Lineares , Modelos Genéticos
13.
Artigo em Inglês | MEDLINE | ID: mdl-26887004

RESUMO

Evolutionary data has been traditionally modeled via phylogenetic trees; however, branching alone cannot model conflicting phylogenetic signals, so networks are used instead. Ancestral recombination graphs (ARGs) are used to model the evolution of incompatible sets of SNP data, allowing each site to mutate only once. The model often aims to minimize the number of recombinations. Similarly, incompatible cluster data can be represented by a reticulation network that minimizes reticulation events. The ARG literature has traditionally been disjoint from the reticulation network literature. By building on results from the reticulation network literature, we resolve an open question of interest to the ARG community. We explicitly prove that the History Bound, a lower bound on the number of recombinations in an ARG for a binary matrix, which was previously only defined procedurally, is equal to the minimum number of reticulation nodes in a network for the corresponding cluster data. To facilitate the proof, we give an algorithm that constructs this network using intermediate values from the procedural History Bound definition. We then develop a top-down algorithm for computing the History Bound, which has the same worst-case runtime as the known dynamic program, and show that it is likely to run faster in typical cases.


Assuntos
Algoritmos , Biologia Computacional/métodos , Evolução Molecular , Filogenia , Recombinação Genética/genética , Análise por Conglomerados , Modelos Genéticos , Polimorfismo de Nucleotídeo Único
14.
Artigo em Inglês | MEDLINE | ID: mdl-23702540

RESUMO

Here, we present a new fixed parameter tractable algorithm to compute the hybridization number r of two rooted, not necessarily binary phylogenetic trees on taxon set Χ in time (6(r)r!) · poly(n), where n = |Χ|. The novelty of this approach is its use of terminals, which are maximal elements of a natural partial order on Χ, and several insights from the softwired clusters literature. This yields a surprisingly simple and practical bounded-search algorithm and offers an alternative perspective on the underlying combinatorial structure of the hybridization number problem.


Assuntos
Algoritmos , Biologia Computacional/métodos , Hibridização Genética , Análise por Conglomerados , Filogenia
15.
PLoS One ; 8(8): e71148, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23940707

RESUMO

Over the past two decades, several fungal outbreaks have occurred, including the high-profile 'Vancouver Island' and 'Pacific Northwest' outbreaks, caused by Cryptococcus gattii, which has affected hundreds of otherwise healthy humans and animals. Over the same time period, C. gattii was the cause of several additional case clusters at localities outside of the tropical and subtropical climate zones where the species normally occurs. In every case, the causative agent belongs to a previously rare genotype of C. gattii called AFLP6/VGII, but the origin of the outbreak clades remains enigmatic. Here we used phylogenetic and recombination analyses, based on AFLP and multiple MLST datasets, and coalescence gene genealogy to demonstrate that these outbreaks have arisen from a highly-recombining C. gattii population in the native rainforest of Northern Brazil. Thus the modern virulent C. gattii AFLP6/VGII outbreak lineages derived from mating events in South America and then dispersed to temperate regions where they cause serious infections in humans and animals.


Assuntos
Criptococose/microbiologia , Cryptococcus gattii/genética , Animais , Brasil , Colúmbia Britânica/epidemiologia , Células Cultivadas , Criptococose/epidemiologia , Cryptococcus gattii/classificação , Cryptococcus gattii/patogenicidade , Surtos de Doenças , Genes Fúngicos , Humanos , Funções Verossimilhança , Macrófagos/microbiologia , Camundongos , Camundongos Endogâmicos BALB C , Dados de Sequência Molecular , Tipagem de Sequências Multilocus , Técnicas de Tipagem Micológica , Noroeste dos Estados Unidos/epidemiologia , Filogenia , Polimorfismo de Fragmento de Restrição , Árvores , Clima Tropical , Virulência
16.
Artigo em Inglês | MEDLINE | ID: mdl-21968961

RESUMO

Rooted phylogenetic networks are often used to represent conflicting phylogenetic signals. Given a set of clusters, a network is said to represent these clusters in the softwired sense if, for each cluster, at least one tree embedded in the network contains it. Motivated by parsimony we might wish to construct such a network using as few reticulations as possible, or minimizing the level of the network, i.e. the maximum number of reticulations used in any "tangled" region of the network. Although these are NP-hard problems, here we prove that, for every fixed k ≥ 0, it is polynomial-time solvable to construct a phylogenetic network with level equal to k representing a cluster set, or to determine that no such network exists. However, this algorithm does not lend itself to a practical implementation. We also prove that the comparatively efficient CASS algorithm correctly solves this problem (and also minimizes the reticulation number) when input clusters are obtained from two not necessarily binary gene trees on the same set of taxa but does not always minimize level for general cluster sets. Finally, we describe a new algorithm which generates in polynomial-time all binary phylogenetic networks with exactly r reticulations representing a set of input clusters (for every fixed r ≥ 0).


Assuntos
Biologia Computacional/métodos , Modelos Genéticos , Filogenia , Algoritmos , Análise por Conglomerados , Evolução Molecular
17.
Sci Rep ; 2: 580, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22896812

RESUMO

The metabolism of organisms can be studied with comprehensive stoichiometric models of their metabolic networks. Flux balance analysis (FBA) calculates optimal metabolic performance of stoichiometric models. However, detailed biological interpretation of FBA is limited because, in general, a huge number of flux patterns give rise to the same optimal performance. The complete description of the resulting optimal solution spaces was thus far a computationally intractable problem. Here we present CoPE-FBA: Comprehensive Polyhedra Enumeration Flux Balance Analysis, a computational method that solves this problem. CoPE-FBA indicates that the thousands to millions of optimal flux patterns result from a combinatorial explosion of flux patterns in just a few metabolic sub-networks. The entire optimal solution space can now be compactly described in terms of the topology of these sub-networks. CoPE-FBA simplifies the biological interpretation of stoichiometric models of metabolism, and provides a profound understanding of metabolic flexibility in optimal states.


Assuntos
Biologia Computacional/métodos , Redes e Vias Metabólicas , Metabolômica/métodos , Modelos Biológicos , Simulação por Computador , Escherichia coli/metabolismo , Glucose/metabolismo , Software
18.
Artigo em Inglês | MEDLINE | ID: mdl-21393651

RESUMO

Recently, much attention has been devoted to the construction of phylogenetic networks which generalize phylogenetic trees in order to accommodate complex evolutionary processes. Here, we present an efficient, practical algorithm for reconstructing level-1 phylogenetic networks--a type of network slightly more general than a phylogenetic tree--from triplets. Our algorithm has been made publicly available as the program LEV1ATHAN. It combines ideas from several known theoretical algorithms for phylogenetic tree and network reconstruction with two novel subroutines. Namely, an exponential-time exact and a greedy algorithm both of which are of independent theoretical interest. Most importantly, LEV1ATHAN runs in polynomial time and always constructs a level-1 network. If the data are consistent with a phylogenetic tree, then the algorithm constructs such a tree. Moreover, if the input triplet set is dense and, in addition, is fully consistent with some level-1 network, it will find such a network. The potential of LEV1ATHAN is explored by means of an extensive simulation study and a biological data set. One of our conclusions is that LEV1ATHAN is able to construct networks consistent with a high percentage of input triplets, even when these input triplets are affected by a low to moderate level of noise.


Assuntos
Algoritmos , Biologia Computacional/métodos , Modelos Genéticos , Filogenia , Análise por Conglomerados , Simulação por Computador , HIV-1/genética , Leveduras/genética
19.
Artigo em Inglês | MEDLINE | ID: mdl-21358008

RESUMO

The genetic code is known to have a high level of error robustness and has been shown to be very error robust compared to randomly selected codes, but to be significantly less error robust than a certain code found by a heuristic algorithm. We formulate this optimization problem as a Quadratic Assignment Problem and use this to formally verify that the code found by the heuristic algorithm is the global optimum. We also argue that it is strongly misleading to compare the genetic code only with codes sampled from the fixed block model, because the real code space is orders of magnitude larger. We thus enlarge the space from which random codes can be sampled from approximately 2.433 × 10(18) codes to approximately 5.908 × 10(45) codes. We do this by leaving the fixed block model, and using the wobble rules to formulate the characteristics acceptable for a genetic code. By relaxing more constraints, three larger spaces are also constructed. Using a modified error function, the genetic code is found to be more error robust compared to a background of randomly generated codes with increasing space size. We point out that these results do not necessarily imply that the code was optimized during evolution for error minimization, but that other mechanisms could be the reason for this error robustness.


Assuntos
Algoritmos , Biologia Computacional/métodos , Código Genético , Modelos Genéticos , Evolução Molecular
20.
Artigo em Inglês | MEDLINE | ID: mdl-19875864

RESUMO

Jansson and Sung showed that, given a dense set of input triplets T (representing hypotheses about the local evolutionary relationships of triplets of taxa), it is possible to determine in polynomial time whether there exists a level-1 network consistent with T, and if so, to construct such a network [24]. Here, we extend this work by showing that this problem is even polynomial time solvable for the construction of level-2 networks. This shows that, assuming density, it is tractable to construct plausible evolutionary histories from input triplets even when such histories are heavily nontree-like. This further strengthens the case for the use of triplet-based methods in the construction of phylogenetic networks. We also implemented the algorithm and applied it to yeast data.


Assuntos
Biologia Computacional/métodos , Algoritmos , Teorema de Bayes , Evolução Biológica , Simulação por Computador , Interpretação Estatística de Dados , Funções Verossimilhança , Modelos Estatísticos , Filogenia , Alinhamento de Sequência , Análise de Sequência de DNA , Software , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA