Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Bull Math Biol ; 86(9): 114, 2024 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-39101994

RESUMO

Bayesian phylogenetic inference is powerful but computationally intensive. Researchers may find themselves with two phylogenetic posteriors on overlapping data sets and may wish to approximate a combined result without having to re-run potentially expensive Markov chains on the combined data set. This raises the question: given overlapping subsets of a set of taxa (e.g. species or virus samples), and given posterior distributions on phylogenetic tree topologies for each of these taxon sets, how can we optimize a probability distribution on phylogenetic tree topologies for the entire taxon set? In this paper we develop a variational approach to this problem and demonstrate its effectiveness. Specifically, we develop an algorithm to find a suitable support of the variational tree topology distribution on the entire taxon set, as well as a gradient-descent algorithm to minimize the divergence from the restrictions of the variational distribution to each of the given per-subset probability distributions, in an effort to approximate the posterior distribution on the entire taxon set.


Assuntos
Algoritmos , Teorema de Bayes , Cadeias de Markov , Conceitos Matemáticos , Modelos Genéticos , Filogenia , Simulação por Computador , Probabilidade
2.
Acta Biotheor ; 71(4): 22, 2023 Nov 03.
Artigo em Inglês | MEDLINE | ID: mdl-37922001

RESUMO

The fundamental Hennigian principle, grouping solely on synapomorphy, is seldom used in modern phylogenetics. In the submitted paper, we apply this principle in reanalyzing five datasets comprising 197 complete plastid genomes (plastomes). We focused on the latter because plastome-based DNA sequence data gained dramatic popularity in molecular systematics during the last decade. We show that pattern-cladistic analyses based on complete plastid genome sequences can successfully resolve affinities between plant taxa, simultaneously simplifying both the genomic and analytical frameworks of phylogenetic studies. We developed "Matrix to Newick" (M2N), a program to represent the standard molecular alignment of plastid genomes in the form of trees or relationships directly. Thus, massive plastome-based DNA sequence data can be successfully represented in a relational form rather than as a standard molecular alignment. Application of methods of median supertree construction (the Average Consensus method has been used as an example in this study) or Maximum Parsimony analysis to relational representations of plastome sequence data may help systematist to avoid the complicated assumption-based frameworks of Maximum Likelihood or Bayesian phylogenetics that are most used today in massive plastid sequence data analyses. We also found that significant amounts of pure genomic information that typically accommodate the majority of current plastid phylogenomic studies can be effectively dropped by systematists if they focus on the pattern-cladistics or relational analyses of plastome-based molecular data. The proposed pattern-cladistic approach is a powerful and straightforward heuristic alternative to modern plastome-based phylogenetics.


Assuntos
Genomas de Plastídeos , Filogenia , Teorema de Bayes , Plastídeos/genética , Genômica , Evolução Molecular
3.
J Math Biol ; 78(7): 2015-2057, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30968198

RESUMO

Best match graphs arise naturally as the first processing intermediate in algorithms for orthology detection. Let T be a phylogenetic (gene) tree T and [Formula: see text] an assignment of leaves of T to species. The best match graph [Formula: see text] is a digraph that contains an arc from x to y if the genes x and y reside in different species and y is one of possibly many (evolutionary) closest relatives of x compared to all other genes contained in the species [Formula: see text]. Here, we characterize best match graphs and show that it can be decided in cubic time and quadratic space whether [Formula: see text] derived from a tree in this manner. If the answer is affirmative, there is a unique least resolved tree that explains [Formula: see text], which can also be constructed in cubic time.


Assuntos
Algoritmos , Evolução Biológica , Gráficos por Computador , Genes/genética , Modelos Genéticos , Humanos , Filogenia
4.
Mol Phylogenet Evol ; 116: 69-77, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-28797692

RESUMO

Recent developments in phylogenetic methods and data acquisition have allowed for the construction of large and comprehensive phylogenetic relationships. Published phylogenies represent an enormous resource that not only facilitates the resolution of questions related to comparative biology, but also provides a resource on which to gauge the development of concordance across the tree of life. From the Open Tree of Life, we gathered 290 avian phylogenies representing all major groups that have been published over the last few decades and analyzed how concordance and conflict develop among these trees through time. Nine large scale phylogenetic hypotheses (including a new synthetic tree from this study) were used for comparisons. We found that conflicts were over-represented both along the backbone (higher-level neoavian relationships) and within the oscine Passeriformes. Importantly, although we have made major strides in the resolution of major clades, recent published comprehensive trees, as well as trees of individual clades, continue to contribute significantly to the resolution of relationships throughout the avian phylogeny. Our analyses highlight the need for continued research into the resolution of avian relationships.


Assuntos
Aves/classificação , Animais , Consenso , Modelos Biológicos , Filogenia
5.
BMC Bioinformatics ; 17(1): 436, 2016 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-27793083

RESUMO

BACKGROUND: Molecular evolution studies involve many different hard computational problems solved, in most cases, with heuristic algorithms that provide a nearly optimal solution. Hence, diverse software tools exist for the different stages involved in a molecular evolution workflow. RESULTS: We present MEvoLib, the first molecular evolution library for Python, providing a framework to work with different tools and methods involved in the common tasks of molecular evolution workflows. In contrast with already existing bioinformatics libraries, MEvoLib is focused on the stages involved in molecular evolution studies, enclosing the set of tools with a common purpose in a single high-level interface with fast access to their frequent parameterizations. The gene clustering from partial or complete sequences has been improved with a new method that integrates accessible external information (e.g. GenBank's features data). Moreover, MEvoLib adjusts the fetching process from NCBI databases to optimize the download bandwidth usage. In addition, it has been implemented using parallelization techniques to cope with even large-case scenarios. CONCLUSIONS: MEvoLib is the first library for Python designed to facilitate molecular evolution researches both for expert and novel users. Its unique interface for each common task comprises several tools with their most used parameterizations. It has also included a method to take advantage of biological knowledge to improve the gene partition of sequence datasets. Additionally, its implementation incorporates parallelization techniques to enhance computational costs when handling very large input datasets.


Assuntos
Evolução Molecular , Biblioteca Gênica , Software , Algoritmos , Sequência de Bases , Biologia Computacional/métodos , DNA Mitocondrial/genética , Genes , Humanos
6.
J Comput Biol ; 31(4): 312-327, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38634854

RESUMO

Phylogenetic inference and reconstruction methods generate hypotheses on evolutionary history. Competing inference methods are frequently used, and the evaluation of the generated hypotheses is achieved using tree comparison costs. The Robinson-Foulds (RF) distance is a widely used cost to compare the topology of two trees, but this cost is sensitive to tree error and can overestimate tree differences. To overcome this limitation, a refined version of the RF distance called the Cluster Affinity (CA) distance was introduced. However, CA distances are symmetric and cannot compare different types of trees. These asymmetric comparisons occur when gene trees are compared with species trees, when disparate datasets are integrated into a supertree, or when tree comparison measures are used to infer a phylogenetic network. In this study, we introduce a relaxation of the original Affinity distance to compare heterogeneous trees called the asymmetric CA cost. We also develop a biologically interpretable cost, the Cluster Support cost that normalizes by cluster size across gene trees. The characteristics of these costs are similar to the symmetric CA cost. We describe efficient algorithms, derive the exact diameters, and use these to standardize the cost to be applicable in practice. These costs provide objective, fine-scale, and biologically interpretable values that can assess differences and similarities between phylogenetic trees.


Assuntos
Algoritmos , Filogenia , Análise por Conglomerados , Modelos Genéticos , Biologia Computacional/métodos , Evolução Molecular
7.
Algorithms Mol Biol ; 18(1): 19, 2023 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-38041123

RESUMO

Cancer progression and treatment can be informed by reconstructing its evolutionary history from tumor cells. Although many methods exist to estimate evolutionary trees (called phylogenies) from molecular sequences, traditional approaches assume the input data are error-free and the output tree is fully resolved. These assumptions are challenged in tumor phylogenetics because single-cell sequencing produces sparse, error-ridden data and because tumors evolve clonally. Here, we study the theoretical utility of methods based on quartets (four-leaf, unrooted phylogenetic trees) in light of these barriers. We consider a popular tumor phylogenetics model, in which mutations arise on a (highly unresolved) tree and then (unbiased) errors and missing values are introduced. Quartets are then implied by mutations present in two cells and absent from two cells. Our main result is that the most probable quartet identifies the unrooted model tree on four cells. This motivates seeking a tree such that the number of quartets shared between it and the input mutations is maximized. We prove an optimal solution to this problem is a consistent estimator of the unrooted cell lineage tree; this guarantee includes the case where the model tree is highly unresolved, with error defined as the number of false negative branches. Lastly, we outline how quartet-based methods might be employed when there are copy number aberrations and other challenges specific to tumor phylogenetics.

8.
Algorithms Mol Biol ; 16(1): 12, 2021 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-34183037

RESUMO

One of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a "supertree method". Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP-hard. Exact-RFS-2 is available in open source form on Github at https://github.com/yuxilin51/GreedyRFS .

9.
Zootaxa ; 4567(2): zootaxa.4567.2.11, 2019 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-31715904

RESUMO

The most common methods for combining different phylogenetic trees with uneven but overlapping taxon sampling are the Matrix Representation with Parsimony (MRP) and consensus tree methods. Although straightforward, some steps of MRP are time-consuming and risky when manually performed, especially the preparation of the matrix representations from the original topologies, and the creation of the single matrix containing all the information of the individual trees. Here we present Building MRP-Matrices (BuM), a free online tool for generating a combined matrix, following Baum and Ragan coding scheme, from files containing phylogenetic trees in parenthetical format.


Assuntos
Filogenia , Animais , Internet
10.
Algorithms Mol Biol ; 12: 7, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28331536

RESUMO

BACKGROUND: Semi-labeled trees generalize ordinary phylogenetic trees, allowing internal nodes to be labeled by higher-order taxa. Taxonomies are examples of semi-labeled trees. Suppose we are given collection [Formula: see text] of semi-labeled trees over various subsets of a set of taxa. The ancestral compatibility problem asks whether there is a semi-labeled tree that respects the clusterings and the ancestor/descendant relationships implied by the trees in [Formula: see text]. The running time and space usage of the best previous algorithm for testing ancestral compatibility depend on the degrees of the nodes in the trees in [Formula: see text]. RESULTS: We give a algorithm for the ancestral compatibility problem that runs in [Formula: see text] time and uses [Formula: see text] space, where [Formula: see text] is the total number of nodes and edges in the trees in [Formula: see text]. CONCLUSIONS: Taxonomies enable researchers to expand greatly the taxonomic coverage of their phylogenetic analyses. The running time of our method does not depend on the degrees of the nodes in the trees in [Formula: see text]. This characteristic is important when taxonomies-which can have nodes of high degree-are used.

11.
Adv Genet ; 100: 211-266, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29153401

RESUMO

Fungi are possibly the most diverse eukaryotic kingdom, with over a million member species and an evolutionary history dating back a billion years. Fungi have been at the forefront of eukaryotic genomics, and owing to initiatives like the 1000 Fungal Genomes Project the amount of fungal genomic data has increased considerably over the last 5 years, enabling large-scale comparative genomics of species across the kingdom. In this chapter, we first review fungal evolution and the history of fungal genomics. We then review in detail seven phylogenomic methods and reconstruct the phylogeny of 84 fungal species from 8 phyla using each method. Six methods have seen extensive use in previous fungal studies, while a Bayesian supertree method is novel to fungal phylogenomics. We find that both established and novel phylogenomic methods can accurately reconstruct the fungal kingdom. Finally, we discuss the accuracy and suitability of each phylogenomic method utilized.


Assuntos
Fungos/genética , Genoma Fúngico , Genômica , Filogenia , Evolução Molecular , Modelos Genéticos
12.
mSphere ; 2(2)2017.
Artigo em Inglês | MEDLINE | ID: mdl-28435885

RESUMO

The oomycetes are a class of microscopic, filamentous eukaryotes within the Stramenopiles-Alveolata-Rhizaria (SAR) supergroup which includes ecologically significant animal and plant pathogens, most infamously the causative agent of potato blight Phytophthora infestans. Single-gene and concatenated phylogenetic studies both of individual oomycete genera and of members of the larger class have resulted in conflicting conclusions concerning species phylogenies within the oomycetes, particularly for the large Phytophthora genus. Genome-scale phylogenetic studies have successfully resolved many eukaryotic relationships by using supertree methods, which combine large numbers of potentially disparate trees to determine evolutionary relationships that cannot be inferred from individual phylogenies alone. With a sufficient amount of genomic data now available, we have undertaken the first whole-genome phylogenetic analysis of the oomycetes using data from 37 oomycete species and 6 SAR species. In our analysis, we used established supertree methods to generate phylogenies from 8,355 homologous oomycete and SAR gene families and have complemented those analyses with both phylogenomic network and concatenated supermatrix analyses. Our results show that a genome-scale approach to oomycete phylogeny resolves oomycete classes and individual clades within the problematic Phytophthora genus. Support for the resolution of the inferred relationships between individual Phytophthora clades varies depending on the methodology used. Our analysis represents an important first step in large-scale phylogenomic analysis of the oomycetes. IMPORTANCE The oomycetes are a class of eukaryotes and include ecologically significant animal and plant pathogens. Single-gene and multigene phylogenetic studies of individual oomycete genera and of members of the larger classes have resulted in conflicting conclusions concerning interspecies relationships among these species, particularly for the Phytophthora genus. The onset of next-generation sequencing techniques now means that a wealth of oomycete genomic data is available. For the first time, we have used genome-scale phylogenetic methods to resolve oomycete phylogenetic relationships. We used supertree methods to generate single-gene and multigene species phylogenies. Overall, our supertree analyses utilized phylogenetic data from 8,355 oomycete gene families. We have also complemented our analyses with superalignment phylogenies derived from 131 single-copy ubiquitous gene families. Our results show that a genome-scale approach to oomycete phylogeny resolves oomycete classes and clades. Our analysis represents an important first step in large-scale phylogenomic analysis of the oomycetes.

13.
PeerJ ; 5: e3578, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28740753

RESUMO

BACKGROUND: This paper is a comment on the idea of matrix-free Cladistics. Demonstration of this idea's efficiency is a major goal of the study. Within the proposed framework, the ordinary (phenetic) matrix is necessary only as "source" of Hennigian trees, not as a primary subject of the analysis. Switching from the matrix-based thinking to the matrix-free Cladistic approach clearly reveals that optimizations of the character-state changes are related not to the real processes, but to the form of the data representation. METHODS: We focused our study on the binary data. We wrote the simple ruby-based script FORESTER version 1.0 that helps represent a binary matrix as an array of the rooted trees (as a "Hennigian forest"). The binary representations of the genomic (DNA) data have been made by script 1001. The Average Consensus method as well as the standard Maximum Parsimony (MP) approach has been used to analyze the data. PRINCIPLE FINDINGS: The binary matrix may be easily re-written as a set of rooted trees (maximal relationships). The latter might be analyzed by the Average Consensus method. Paradoxically, this method, if applied to the Hennigian forests, in principle can help to identify clades despite the absence of the direct evidence from the primary data. Our approach may handle the clock- or non clock-like matrices, as well as the hypothetical, molecular or morphological data. DISCUSSION: Our proposal clearly differs from the numerous phenetic alignment-free techniques of the construction of the phylogenetic trees. Dealing with the relations, not with the actual "data" also distinguishes our approach from all optimization-based methods, if the optimization is defined as a way to reconstruct the sequences of the character-state changes on a tree, either the standard alignment-based techniques or the "direct" alignment-free procedure. We are not viewing our recent framework as an alternative to the three-taxon statement analysis (3TA), but there are two major differences between our recent proposal and the 3TA, as originally designed and implemented: (1) the 3TA deals with the three-taxon statements or minimal relationships. According to the logic of 3TA, the set of the minimal trees must be established as a binary matrix and used as an input for the parsimony program. In this paper, we operate directly with maximal relationships written just as trees, not as binary matrices, while also using the Average Consensus method instead of the MP analysis. The solely 'reversal'-based groups can always be found by our method without the separate scoring of the putative reversals before analyses.

14.
Philos Trans R Soc Lond B Biol Sci ; 370(1678): 20140337, 2015 09 26.
Artigo em Inglês | MEDLINE | ID: mdl-26323767

RESUMO

The origin of the eukaryotic cell is considered one of the major evolutionary transitions in the history of life. Current evidence strongly supports a scenario of eukaryotic origin in which two prokaryotes, an archaebacterial host and an α-proteobacterium (the free-living ancestor of the mitochondrion), entered a stable symbiotic relationship. The establishment of this relationship was associated with a process of chimerization, whereby a large number of genes from the α-proteobacterial symbiont were transferred to the host nucleus. A general framework allowing the conceptualization of eukaryogenesis from a genomic perspective has long been lacking. Recent studies suggest that the origins of several archaebacterial phyla were coincident with massive imports of eubacterial genes. Although this does not indicate that these phyla originated through the same process that led to the origin of Eukaryota, it suggests that Archaebacteria might have had a general propensity to integrate into their genomes large amounts of eubacterial DNA. We suggest that this propensity provides a framework in which eukaryogenesis can be understood and studied in the light of archaebacterial ecology. We applied a recently developed supertree method to a genomic dataset composed of 392 eubacterial and 51 archaebacterial genera to test whether large numbers of genes flowing from Eubacteria are indeed coincident with the origin of major archaebacterial clades. In addition, we identified two potential large-scale transfers of uncertain directionality at the base of the archaebacterial tree. Our results are consistent with previous findings and seem to indicate that eubacterial gene imports (particularly from δ-Proteobacteria, Clostridia and Actinobacteria) were an important factor in archaebacterial history. Archaebacteria seem to have long relied on Eubacteria as a source of genetic diversity, and while the precise mechanism that allowed these imports is unknown, we suggest that our results support the view that processes comparable to those through which eukaryotes emerged might have been common in archaebacterial history.


Assuntos
Bactérias/genética , Evolução Biológica , Fluxo Gênico , Genoma Bacteriano , Modelos Genéticos
15.
Algorithms Mol Biol ; 9: 13, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24742332

RESUMO

BACKGROUND: Deciding whether there is a single tree -a supertree- that summarizes the evolutionary information in a collection of unrooted trees is a fundamental problem in phylogenetics. We consider two versions of this question: agreement and compatibility. In the first, the supertree is required to reflect precisely the relationships among the species exhibited by the input trees. In the second, the supertree can be more refined than the input trees. Testing for compatibility is an NP-complete problem; however, the problem is solvable in polynomial time when the number of input trees is fixed. Testing for agreement is also NP-complete, but it is not known whether it is fixed-parameter tractable. Compatibility can be characterized in terms of the existence of a specific kind of triangulation in a structure known as the display graph. Alternatively, it can be characterized as a chordal graph sandwich problem in a structure known as the edge label intersection graph. No characterization of agreement was known. RESULTS: We present a simple and natural characterization of compatibility in terms of minimal cuts in the display graph, which is closely related to compatibility of splits. We then derive a characterization for agreement. CONCLUSIONS: Explicit characterizations of tree compatibility and agreement are essential to finding practical algorithms for these problems. The simplicity of the characterizations presented here could help to achieve this goal.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA