Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Mais filtros

Base de dados
Intervalo de ano de publicação
Artigo em Inglês | MEDLINE | ID: mdl-24407302


Phylogenetic inference is a computationally difficult problem, and constructing high-quality phylogenies that can build upon existing phylogenetic knowledge and synthesize insights from new data remains a major challenge. We introduce knowledge-enhanced phylogenetic problems for both supertree and supermatrix phylogenetic analyses. These problems seek an optimal phylogenetic tree that can only be assembled from a user-supplied set of, possibly incompatible, phylogenetic relationships. We describe exact polynomial time algorithms for the knowledge-enhanced versions of the NP-hard Robinson Foulds, gene duplication, duplication and loss, and deep coalescence supertree problems. Further, we demonstrate that our algorithms can rapidly improve upon results of local search heuristics for these problems. Finally, we introduce a knowledge-enhanced search heuristic that can be applied to any discrete character data set using the maximum parsimony (MP) phylogenetic problem. Although this approach is not guaranteed to find exact solutions, we show that it also can improve upon solutions from commonly used MP heuristics.

Algoritmos , Biologia Computacional/métodos , Filogenia , Inteligência Artificial , Análise por Conglomerados , Evolução Molecular , Duplicação Gênica , Software
Syst Biol ; 60(2): 117-25, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21186249


Phylogenetic analyses using genome-scale data sets must confront incongruence among gene trees, which in plants is exacerbated by frequent gene duplications and losses. Gene tree parsimony (GTP) is a phylogenetic optimization criterion in which a species tree that minimizes the number of gene duplications induced among a set of gene trees is selected. The run time performance of previous implementations has limited its use on large-scale data sets. We used new software that incorporates recent algorithmic advances to examine the performance of GTP on a plant data set consisting of 18,896 gene trees containing 510,922 protein sequences from 136 plant taxa (giving a combined alignment length of >2.9 million characters). The relationships inferred from the GTP analysis were largely consistent with previous large-scale studies of backbone plant phylogeny and resolved some controversial nodes. The placement of taxa that were present in few gene trees generally varied the most among GTP bootstrap replicates. Excluding these taxa either before or after the GTP analysis revealed high levels of phylogenetic support across plants. The analyses supported magnoliids sister to a eudicot + monocot clade and did not support the eurosid I and II clades. This study presents a nuclear genomic perspective on the broad-scale phylogenic relationships among plants, and it demonstrates that nuclear genes with a history of duplication and loss can be phylogenetically informative for resolving the plant tree of life.

Classificação/métodos , Filogenia , Plantas/classificação , Plantas/genética , Algoritmos , Etiquetas de Sequências Expressas , Genômica
BMC Bioinformatics ; 11: 574, 2010 Nov 23.
Artigo em Inglês | MEDLINE | ID: mdl-21092314


BACKGROUND: The ever-increasing wealth of genomic sequence information provides an unprecedented opportunity for large-scale phylogenetic analysis. However, species phylogeny inference is obfuscated by incongruence among gene trees due to evolutionary events such as gene duplication and loss, incomplete lineage sorting (deep coalescence), and horizontal gene transfer. Gene tree parsimony (GTP) addresses this issue by seeking a species tree that requires the minimum number of evolutionary events to reconcile a given set of incongruent gene trees. Despite its promise, the use of gene tree parsimony has been limited by the fact that existing software is either not fast enough to tackle large data sets or is restricted in the range of evolutionary events it can handle. RESULTS: We introduce iGTP, a platform-independent software program that implements state-of-the-art algorithms that greatly speed up species tree inference under the duplication, duplication-loss, and deep coalescence reconciliation costs. iGTP significantly extends and improves the functionality and performance of existing gene tree parsimony software and offers advanced features such as building effective initial trees using stepwise leaf addition and the ability to have unrooted gene trees in the input. Moreover, iGTP provides a user-friendly graphical interface with integrated tree visualization software to facilitate analysis of the results. CONCLUSIONS: iGTP enables, for the first time, gene tree parsimony analyses of thousands of genes from hundreds of taxa using the duplication, duplication-loss, and deep coalescence reconciliation costs, all from within a convenient graphical user interface.

Genômica/métodos , Filogenia , Software , Algoritmos , Bases de Dados Genéticas , Evolução Molecular , Duplicação Gênica , Genoma
Artigo em Inglês | MEDLINE | ID: mdl-19407347


The gene-duplication problem is to infer a species supertree from a collection of gene trees that are confounded by complex histories of gene-duplication events. This problem is NP-complete and thus requires efficient and effective heuristics. Existing heuristics perform a stepwise search of the tree space, where each step is guided by an exact solution to an instance of a local search problem. A classical local search problem is the {\tt NNI} search problem, which is based on the nearest neighbor interchange operation. In this work, we 1) provide a novel near-linear time algorithm for the {\tt NNI} search problem, 2) introduce extensions that significantly enlarge the search space of the {\tt NNI} search problem, and 3) present algorithms for these extended versions that are asymptotically just as efficient as our algorithm for the {\tt NNI} search problem. The exceptional speedup achieved in the extended {\tt NNI} search problems makes the gene-duplication problem more tractable for large-scale phylogenetic analyses. We verify the performance of our algorithms in a comparison study using sets of large randomly generated gene trees.

Algoritmos , Duplicação Gênica , Modelos Genéticos , Animais , Biologia Computacional , Filogenia , Análise de Sequência de DNA
Syst Biol ; 57(3): 335-46, 2008 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-18570030


As an archive of sequence data for over 165,000 species, GenBank is an indispensable resource for phylogenetic inference. Here we describe an informatics processing pipeline and online database, the PhyLoTA Browser (, which offers a view of GenBank tailored for molecular phylogenetics. The first release of the Browser is computed from 2.6 million sequences representing the taxonomically enriched subset of GenBank sequences for eukaryotes (excluding most genome survey sequences, ESTs, and other high-throughput data). In addition to summarizing sequence diversity and species diversity across nodes in the NCBI taxonomy, it reports 87,000 potentially phylogenetically informative clusters of homologous sequences, which can be viewed or downloaded, along with provisional alignments and coarse phylogenetic trees. At each node in the NCBI hierarchy, the user can display a "data availability matrix" of all available sequences for entries in a subtaxa-by-clusters matrix. This matrix provides a guidepost for subsequent assembly of multigene data sets or supertrees. The database allows for comparison of results from previous GenBank releases, highlighting recent additions of either sequences or taxa to GenBank and letting investigators track progress on data availability worldwide. Although the reported alignments and trees are extremely approximate, the database reports several statistics correlated with alignment quality to help users choose from alternative data sources.

Bases de Dados de Ácidos Nucleicos , Filogenia , Software , Análise por Conglomerados , Biologia Computacional/métodos , Internet
Bioinformatics ; 24(13): 1540-1, 2008 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-18474508


UNLABELLED: DupTree is a new software program for inferring rooted species trees from collections of gene trees using the gene tree parsimony approach. The program implements a novel algorithm that significantly improves upon the run time of standard search heuristics for gene tree parsimony, and enables the first truly genome-scale phylogenetic analyses. In addition, DupTree allows users to examine alternate rootings and to weight the reconciliation costs for gene trees. DupTree is an open source project written in C++. AVAILABILITY: DupTree for Mac OS X, Windows, and Linux along with a sample dataset and an on-line manual are available at

Algoritmos , Mapeamento Cromossômico/métodos , Análise Mutacional de DNA/métodos , Evolução Molecular , Modelos Genéticos , Análise de Sequência de DNA/métodos , Software , Sequência de Bases , Simulação por Computador , Dados de Sequência Molecular , Filogenia