Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Syst Biol ; 65(1): 161-76, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26231183

RESUMEN

Sampling tree space is the most challenging aspect of Bayesian phylogenetic inference. The sheer number of alternative topologies is problematic by itself. In addition, the complex dependency between branch lengths and topology increases the difficulty of moving efficiently among topologies. Current tree proposals are fast but sample new trees using primitive transformations or re-mappings of old branch lengths. This reduces acceptance rates and presumably slows down convergence and mixing. Here, we explore branch proposals that do not rely on old branch lengths but instead are based on approximations of the conditional posterior. Using a diverse set of empirical data sets, we show that most conditional branch posteriors can be accurately approximated via a [Formula: see text] distribution. We empirically determine the relationship between the logarithmic conditional posterior density, its derivatives, and the characteristics of the branch posterior. We use these relationships to derive an independence sampler for proposing branches with an acceptance ratio of ~90% on most data sets. This proposal samples branches between 2× and 3× more efficiently than traditional proposals with respect to the effective sample size per unit of runtime. We also compare the performance of standard topology proposals with hybrid proposals that use the new independence sampler to update those branches that are most affected by the topological change. Our results show that hybrid proposals can sometimes noticeably decrease the number of generations necessary for topological convergence. Inconsistent performance gains indicate that branch updates are not the limiting factor in improving topological convergence for the currently employed set of proposals. However, our independence sampler might be essential for the construction of novel tree proposals that apply more radical topology changes.


Asunto(s)
Clasificación/métodos , Modelos Teóricos , Filogenia , Algoritmos , Teorema de Bayes , Simulación por Computador , Cadenas de Markov , Método de Montecarlo
2.
Bioinformatics ; 31(15): 2577-9, 2015 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-25819675

RESUMEN

MOTIVATION: Phylogenies are increasingly used in all fields of medical and biological research. Because of the next generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. We present ExaML version 3, a dedicated production-level code for inferring phylogenies on whole-transcriptome and whole-genome alignments using supercomputers. RESULTS: We introduce several improvements and extensions to ExaML: Extensions of substitution models and supported data types, the integration of a novel load balance algorithm as well as a parallel I/O optimization that significantly improve parallel efficiency, and a production-level implementation for Intel MIC-based hardware platforms.


Asunto(s)
Algoritmos , Computadores , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Filogenia , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Simulación por Computador , Humanos , Interfaz Usuario-Computador
3.
Gigascience ; 4: 4, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25741440

RESUMEN

BACKGROUND: Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses. FINDINGS: Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence. CONCLUSIONS: The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.


Asunto(s)
Aves/genética , Filogenia , Animales , Aves/clasificación , Clasificación/métodos , ADN/química , Elementos Transponibles de ADN , Genoma , Genómica , Alineación de Secuencia
4.
Science ; 346(6215): 1320-31, 2014 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-25504713

RESUMEN

To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods created to handle genome-scale data. We recovered a highly resolved tree that confirms previously controversial sister or close relationships. We identified the first divergence in Neoaves, two groups we named Passerea and Columbea, representing independent lineages of diverse and convergently evolved land and water bird species. Among Passerea, we infer the common ancestor of core landbirds to have been an apex predator and confirm independent gains of vocal learning. Among Columbea, we identify pigeons and flamingoes as belonging to sister clades. Even with whole genomes, some of the earliest branches in Neoaves proved challenging to resolve, which was best explained by massive protein-coding sequence convergence and high levels of incomplete lineage sorting that occurred during a rapid radiation after the Cretaceous-Paleogene mass extinction event about 66 million years ago.


Asunto(s)
Aves/genética , Genoma , Filogenia , Animales , Proteínas Aviares/genética , Secuencia de Bases , Evolución Biológica , Aves/clasificación , Elementos Transponibles de ADN , Genes , Especiación Genética , Mutación INDEL , Intrones , Análisis de Secuencia de ADN
5.
Science ; 346(6210): 763-7, 2014 11 07.
Artículo en Inglés | MEDLINE | ID: mdl-25378627

RESUMEN

Insects are the most speciose group of animals, but the phylogenetic relationships of many major lineages remain unresolved. We inferred the phylogeny of insects from 1478 protein-coding genes. Phylogenomic analyses of nucleotide and amino acid sequences, with site-specific nucleotide or domain-specific amino acid substitution models, produced statistically robust and congruent results resolving previously controversial phylogenetic relations hips. We dated the origin of insects to the Early Ordovician [~479 million years ago (Ma)], of insect flight to the Early Devonian (~406 Ma), of major extant lineages to the Mississippian (~345 Ma), and the major diversification of holometabolous insects to the Early Cretaceous. Our phylogenomic study provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.


Asunto(s)
Proteínas de Insectos/clasificación , Insectos/clasificación , Filogenia , Animales , Código Genético , Genoma de los Insectos , Genómica , Proteínas de Insectos/genética , Insectos/genética , Factores de Tiempo
6.
Mol Biol Evol ; 31(10): 2553-6, 2014 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-25135941

RESUMEN

Modern sequencing technology now allows biologists to collect the entirety of molecular evidence for reconstructing evolutionary trees. We introduce a novel, user-friendly software package engineered for conducting state-of-the-art Bayesian tree inferences on data sets of arbitrary size. Our software introduces a nonblocking parallelization of Metropolis-coupled chains, modifications for efficient analyses of data sets comprising thousands of partitions and memory saving techniques. We report on first experiences with Bayesian inferences at the whole-genome level using the SuperMUC supercomputer and simulated data.


Asunto(s)
Biología Computacional/métodos , Genoma , Programas Informáticos , Teorema de Bayes , Modelos Genéticos , Filogenia , Alineación de Secuencia
7.
Mol Biol Evol ; 31(1): 239-49, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24140757

RESUMEN

Phylogenetic relationships of the primarily wingless insects are still considered unresolved. Even the most comprehensive phylogenomic studies that addressed this question did not yield congruent results. To get a grip on these problems, we here analyzed the sources of incongruence in these phylogenomic studies by using an extended transcriptome data set. Our analyses showed that unevenly distributed missing data can be severely misleading by inflating node support despite the absence of phylogenetic signal. In consequence, only decisive data sets should be used which exclusively comprise data blocks containing all taxa whose relationships are addressed. Additionally, we used Four-cluster Likelihood Mapping (FcLM) to measure the degree of congruence among genes of a data set, as a measure of support alternative to bootstrap. FcLM showed incongruent signal among genes, which in our case is correlated neither with functional class assignment of these genes nor with model misspecification due to unpartitioned analyses. The herein analyzed data set is the currently largest data set covering primarily wingless insects, but failed to elucidate their interordinal phylogenetic relationships. Although this is unsatisfying from a phylogenetic perspective, we try to show that the analyses of structure and signal within phylogenomic data can protect us from biased phylogenetic inferences due to analytical artifacts.


Asunto(s)
Bases de Datos Factuales , Evolución Molecular , Insectos/clasificación , Insectos/genética , Filogenia , Animales , Mapeo Cromosómico , Genómica , Técnicas de Genotipaje/métodos , Modelos Genéticos , Alineación de Secuencia , Transcriptoma
8.
BMC Bioinformatics ; 14: 216, 2013 Jul 09.
Artículo en Inglés | MEDLINE | ID: mdl-23834340

RESUMEN

BACKGROUND: In population genetics, simulation is a fundamental tool for analyzing how basic evolutionary forces such as natural selection, recombination, and mutation shape the genetic landscape of a population. Forward simulation represents the most powerful, but, at the same time, most compute-intensive approach for simulating the genetic material of a population. RESULTS: We introduce AnA-FiTS, a highly optimized forward simulation software, that is up to two orders of magnitude faster than current state-of-the-art software. In addition, we present a novel algorithm that further improves runtimes by up to an additional order of magnitude, for simulations where a fraction of the mutations is neutral (e.g., only 10% of mutations have an effect on fitness). Apart from simulated sequences, our tool also generates a graph structure that depicts the complete observable history of neutral mutations. CONCLUSIONS: The substantial performance improvements allow for conducting forward simulations at the chromosome and genome level. The graph structure generated by our algorithm can give rise to novel approaches for visualizing and analyzing the output of forward simulations.


Asunto(s)
Puntos de Rotura del Cromosoma , Simulación por Computador , Roturas del ADN de Doble Cadena , Genética de Población , Genoma , Mutación , Algoritmos , Secuencia de Bases/genética , Evolución Biológica , Recombinación Genética , Selección Genética , Programas Informáticos , Tiempo
9.
Syst Biol ; 62(1): 162-6, 2013 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-22962004

RESUMEN

The presence of rogue taxa (rogues) in a set of trees can frequently have a negative impact on the results of a bootstrap analysis (e.g., the overall support in consensus trees). We introduce an efficient graph-based algorithm for rogue taxon identification as well as an interactive webservice implementing this algorithm. Compared with our previous method, the new algorithm is up to 4 orders of magnitude faster, while returning qualitatively identical results. Because of this significant improvement in scalability, the new algorithm can now identify substantially more complex and compute-intensive rogue taxon constellations. On a large and diverse collection of real-world data sets, we show that our method yields better supported reduced/pruned consensus trees than any competing rogue taxon identification method. Using the parallel version of our open-source code, we successfully identified rogue taxa in a set of 100 trees with 116 334 taxa each. For simulated data sets, we show that when removing/pruning rogue taxa with our method from a tree set, we consistently obtain bootstrap consensus trees as well as maximum-likelihood trees that are topologically closer to the respective true trees.


Asunto(s)
Algoritmos , Clasificación/métodos , Internet , Filogenia , Programas Informáticos , Simulación por Computador , Reproducibilidad de los Resultados
10.
Mol Biol Evol ; 29(11): 3601-11, 2012 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-22723303

RESUMEN

We used next-generation sequencing to characterize the genomes of nine species of Orobanchaceae of known phylogenetic relationships, different life forms, and including a polyploid species. The study species are the autotrophic, nonparasitic Lindenbergia philippensis, the hemiparasitic Schwalbea americana, and seven nonphotosynthetic parasitic species of Orobanche (Orobanche crenata, Orobanche cumana, Orobanche gracilis (tetraploid), and Orobanche pancicii) and Phelipanche (Phelipanche lavandulacea, Phelipanche purpurea, and Phelipanche ramosa). Ty3/Gypsy elements comprise 1.93%-28.34% of the nine genomes and Ty1/Copia elements comprise 8.09%-22.83%. When compared with L. philippensis and S. americana, the nonphotosynthetic species contain higher proportions of repetitive DNA sequences, perhaps reflecting relaxed selection on genome size in parasitic organisms. Among the parasitic species, those in the genus Orobanche have smaller genomes but higher proportions of repetitive DNA than those in Phelipanche, mostly due to a diversification of repeats and an accumulation of Ty3/Gypsy elements. Genome downsizing in the tetraploid O. gracilis probably led to sequence loss across most repeat types.


Asunto(s)
ADN de Plantas/genética , Genoma de Planta/genética , Orobanchaceae/genética , Filogenia , Secuencias Repetitivas de Ácidos Nucleicos/genética , Análisis de Secuencia de ADN/métodos , Análisis por Conglomerados , Tamaño del Genoma/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Datos de Secuencia Molecular , Especificidad de la Especie
11.
Artículo en Inglés | MEDLINE | ID: mdl-21301032

RESUMEN

Many of the steps in phylogenetic reconstruction can be confounded by "rogue" taxa­taxa that cannot be placed with assurance anywhere within the tree, indeed, whose location within the tree varies with almost any choice of algorithm or parameters. Phylogenetic consensus methods, in particular, are known to suffer from this problem. In this paper, we provide a novel framework to define and identify rogue taxa. In this framework, we formulate a bicriterion optimization problem, the relative information criterion, that models the net increase in useful information present in the consensus tree when certain taxa are removed from the input data. We also provide an effective greedy heuristic to identify a subset of rogue taxa and use this heuristic in a series of experiments, with both pathological examples from the literature and a collection of large biological data sets. As the presence of rogue taxa in a set of bootstrap replicates can lead to deceivingly poor support values, we propose a procedure to recompute support values in light of the rogue taxa identified by our algorithm; applying this procedure to our biological data sets caused a large number of edges to move from "unsupported" to "supported" status, indicating that many existing phylogenies should be recomputed and reevaluated to reduce any inaccuracies introduced by rogue taxa. We also discuss the implementation issues encountered while integrating our algorithm into RAxML v7.2.7, particularly those dealing with scaling up the analyses. This integration enables practitioners to benefit from our algorithm in the analysis of very large data sets (up to 2,500 taxa and 10,000 trees, although we present the results of even larger analyses).


Asunto(s)
Algoritmos , Biología Computacional/métodos , Modelos Genéticos , Filogenia , Análisis por Conglomerados , Secuencia de Consenso , Bases de Datos Genéticas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...