Búsqueda | Portal de Búsqueda de la BVS Colombia

1.

OpenTree: A Python Package for Accessing and Analyzing Data from the Open Tree of Life.

Mctavish, Emily Jane; Sánchez-Reyes, Luna Luisa; Holder, Mark T.

Syst Biol ; 70(6): 1295-1301, 2021 10 13.

Artículo en Inglés | MEDLINE | ID: mdl-33970279

RESUMEN

The Open Tree of Life project constructs a comprehensive, dynamic, and digitally available tree of life by synthesizing published phylogenetic trees along with taxonomic data. Open Tree of Life provides web-service application programming interfaces (APIs) to make the tree estimate, unified taxonomy, and input phylogenetic data available to anyone. Here, we describe the Python package opentree, which provides a user friendly Python wrapper for these APIs and a set of scripts and tutorials for straightforward downstream data analyses. We demonstrate the utility of these tools by generating an estimate of the phylogenetic relationships of all bird families, and by capturing a phylogenetic estimate for all taxa observed at the University of California Merced Vernal Pools and Grassland Reserve.[Evolution; open science; phylogenetics; Python; taxonomy.].

Asunto(s)

Análisis de Datos , Programas Informáticos , Humanos , Filogenia

2.

Incorporating the speciation process into species delimitation.

Sukumaran, Jeet; Holder, Mark T; Knowles, L Lacey.

PLoS Comput Biol ; 17(5): e1008924, 2021 05.

Artículo en Inglés | MEDLINE | ID: mdl-33983918

RESUMEN

The "multispecies" coalescent (MSC) model that underlies many genomic species-delimitation approaches is problematic because it does not distinguish between genetic structure associated with species versus that of populations within species. Consequently, as both the genomic and spatial resolution of data increases, a proliferation of artifactual species results as within-species population lineages, detected due to restrictions in gene flow, are identified as distinct species. The toll of this extends beyond systematic studies, getting magnified across the many disciplines that rely upon an accurate framework of identified species. Here we present the first of a new class of approaches that addresses this issue by incorporating an extended speciation process for species delimitation. We model the formation of population lineages and their subsequent development into independent species as separate processes and provide for a way to incorporate current understanding of the species boundaries in the system through specification of species identities of a subset of population lineages. As a result, species boundaries and within-species lineages boundaries can be discriminated across the entire system, and species identities can be assigned to the remaining lineages of unknown affinities with quantified probabilities. In addition to the identification of species units in nature, the primary goal of species delimitation, the incorporation of a speciation model also allows us insights into the links between population and species-level processes. By explicitly accounting for restrictions in gene flow not only between, but also within, species, we also address the limits of genetic data for delimiting species. Specifically, while genetic data alone is not sufficient for accurate delimitation, when considered in conjunction with other information we are able to not only learn about species boundaries, but also about the tempo of the speciation process itself.

Asunto(s)

Especiación Genética , Modelos Genéticos , Algoritmos , Animales , Biología Computacional , Simulación por Computador , Flujo Génico , Genética de Población , Modelos Estadísticos , Filogenia , Programas Informáticos , Especificidad de la Especie , Factores de Tiempo

3.

Genome-wide genotyping estimates mating system parameters and paternity in the island species Tolpis succulenta.

Gibson, Matthew J S; Crawford, Daniel J; Holder, Mark T; Mort, Mark E; Kerbs, Benjamin; de Sequeira, Miguel Menezes; Kelly, John K.

Am J Bot ; 107(8): 1189-1197, 2020 08.

Artículo en Inglés | MEDLINE | ID: mdl-32864742

RESUMEN

PREMISE: The mating system has profound consequences, not only for ecology and evolution, but also for the conservation of threatened or endangered species. Unfortunately, small populations are difficult to study owing to limits on sample size and genetic marker diversity. Here, we estimated mating system parameters in three small populations of an island plant using genomic genotyping. Although self-incompatible (SI) species are known to often set some self-seed, little is known about how "leaky SI" affects selfing rates in nature or the role that multiple paternity plays in small populations. METHODS: We generalized the BORICE mating system program to determine the siring pattern within maternal families. We applied this algorithm to maternal families from three populations of Tolpis succulenta from Madeira Island and genotyped the progeny using RADseq. We applied BORICE to estimate each individual offspring as outcrossed or selfed, the paternity of each outcrossed offspring, and the level of inbreeding of each maternal plant. RESULTS: Despite a functional self-incompatibility system, these data establish T. succulenta as a pseudo-self-compatible (PSC) species. Two of 75 offspring were strongly indicated as products of self-fertilization. Despite selfing, all adult maternal plants were fully outbred. There was high differentiation among and low variation within populations, consistent with a history of genetic isolation of these small populations. There were generally multiple sires per maternal family. Twenty-two percent of sib contrasts (between outcrossed offspring within maternal families) shared the same sire. CONCLUSIONS: Genome-wide genotyping, combined with appropriate analytical methods, enables estimation of mating system and multiple paternity in small populations. These data address questions about the evolution of reproductive traits and the conservation of threatened populations.

Asunto(s)

Paternidad , Autofecundación , Genotipo , Islas , Portugal , Reproducción

4.

Phylesystem: a git-based data store for community-curated phylogenetic estimates.

McTavish, Emily Jane; Hinchliff, Cody E; Allman, James F; Brown, Joseph W; Cranston, Karen A; Holder, Mark T; Rees, Jonathan A; Smith, Stephen A.

Bioinformatics ; 31(17): 2794-800, 2015 Sep 01.

Artículo en Inglés | MEDLINE | ID: mdl-25940563

RESUMEN

MOTIVATION: Phylogenetic estimates from published studies can be archived using general platforms like Dryad (Vision, 2010) or TreeBASE (Sanderson et al., 1994). Such services fulfill a crucial role in ensuring transparency and reproducibility in phylogenetic research. However, digital tree data files often require some editing (e.g. rerooting) to improve the accuracy and reusability of the phylogenetic statements. Furthermore, establishing the mapping between tip labels used in a tree and taxa in a single common taxonomy dramatically improves the ability of other researchers to reuse phylogenetic estimates. As the process of curating a published phylogenetic estimate is not error-free, retaining a full record of the provenance of edits to a tree is crucial for openness, allowing editors to receive credit for their work and making errors introduced during curation easier to correct. RESULTS: Here, we report the development of software infrastructure to support the open curation of phylogenetic data by the community of biologists. The backend of the system provides an interface for the standard database operations of creating, reading, updating and deleting records by making commits to a git repository. The record of the history of edits to a tree is preserved by git's version control features. Hosting this data store on GitHub (http://github.com/) provides open access to the data store using tools familiar to many developers. We have deployed a server running the 'phylesystem-api', which wraps the interactions with git and GitHub. The Open Tree of Life project has also developed and deployed a JavaScript application that uses the phylesystem-api and other web services to enable input and curation of published phylogenetic statements. AVAILABILITY AND IMPLEMENTATION: Source code for the web service layer is available at https://github.com/OpenTreeOfLife/phylesystem-api. The data store can be cloned from: https://github.com/OpenTreeOfLife/phylesystem. A web application that uses the phylesystem web services is deployed at http://tree.opentreeoflife.org/curator. Code for that tool is available from https://github.com/OpenTreeOfLife/opentree. CONTACT: mtholder@gmail.com.

Asunto(s)

Biología Computacional/métodos , Bases de Datos Factuales , Almacenamiento y Recuperación de la Información , Filogenia , Programas Informáticos , Humanos , Internet , Lenguajes de Programación , Reproducibilidad de los Resultados , Interfaz Usuario-Computador

5.

Phycas: software for Bayesian phylogenetic analysis.

Lewis, Paul O; Holder, Mark T; Swofford, David L.

Syst Biol ; 64(3): 525-31, 2015 May.

Artículo en Inglés | MEDLINE | ID: mdl-25577605

RESUMEN

Phycas is open source, freely available Bayesian phylogenetics software written primarily in C++ but with a Python interface. Phycas specializes in Bayesian model selection for nucleotide sequence data, particularly the estimation of marginal likelihoods, central to computing Bayes Factors. Marginal likelihoods can be estimated using newer methods (Thermodynamic Integration and Generalized Steppingstone) that are more accurate than the widely used Harmonic Mean estimator. In addition, Phycas supports two posterior predictive approaches to model selection: Gelfand-Ghosh and Conditional Predictive Ordinates. The General Time Reversible family of substitution models, as well as a codon model, are available, and data can be partitioned with all parameters unlinked except tree topology and edge lengths. Phycas provides for analyses in which the prior on tree topologies allows polytomous trees as well as fully resolved trees, and provides for several choices for edge length priors, including a hierarchical model as well as the recently described compound Dirichlet prior, which helps avoid overly informative induced priors on tree length.

Asunto(s)

Clasificación/métodos , Filogenia , Programas Informáticos , Algoritmos , Teorema de Bayes , Chlorophyta/clasificación , Chlorophyta/genética

6.

Twisted trees and inconsistency of tree estimation when gaps are treated as missing data - The impact of model mis-specification in distance corrections.

McTavish, Emily Jane; Steel, Mike; Holder, Mark T.

Mol Phylogenet Evol ; 93: 289-95, 2015 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-26256643

RESUMEN

Statistically consistent estimation of phylogenetic trees or gene trees is possible if pairwise sequence dissimilarities can be converted to a set of distances that are proportional to the true evolutionary distances. Susko et al. (2004) reported some strikingly broad results about the forms of inconsistency in tree estimation that can arise if corrected distances are not proportional to the true distances. They showed that if the corrected distance is a concave function of the true distance, then inconsistency due to long branch attraction will occur. If these functions are convex, then two "long branch repulsion" trees will be preferred over the true tree - though these two incorrect trees are expected to be tied as the preferred true. Here we extend their results, and demonstrate the existence of a tree shape (which we refer to as a "twisted Farris-zone" tree) for which a single incorrect tree topology will be guaranteed to be preferred if the corrected distance function is convex. We also report that the standard practice of treating gaps in sequence alignments as missing data is sufficient to produce non-linear corrected distance functions if the substitution process is not independent of the insertion/deletion process. Taken together, these results imply inconsistent tree inference under mild conditions. For example, if some positions in a sequence are constrained to be free of substitutions and insertion/deletion events while the remaining sites evolve with independent substitutions and insertion/deletion events, then the distances obtained by treating gaps as missing data can support an incorrect tree topology even given an unlimited amount of data.

Asunto(s)

Modelos Genéticos , Algoritmos , Evolución Biológica , Mutación INDEL , Filogenia , Alineación de Secuencia

7.

Speeding up iterative applications of the BUILD supertree algorithm.

Redelings, Benjamin D; Holder, Mark T.

PeerJ ; 12: e16624, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38188165

RESUMEN

The Open Tree of Life (OToL) project produces a supertree that summarizes phylogenetic knowledge from tree estimates published in the primary literature. The supertree construction algorithm iteratively calls Aho's Build algorithm thousands of times in order to assess the compatability of different phylogenetic groupings. We describe an incrementalized version of the Build algorithm that is able to share work between successive calls to Build. We provide details that allow a programmer to implement the incremental algorithm BuildInc, including pseudo-code and a description of data structures. We assess the effect of BuildInc on our supertree algorithm by analyzing simulated data and by analyzing a supertree problem taken from the OpenTree 13.4 synthesis tree. We find that BuildInc provides up to 550-fold speedup for our supertree algorithm.

Asunto(s)

Algoritmos , Conocimiento , Filogenia

8.

A dirichlet process prior for estimating lineage-specific substitution rates.

Heath, Tracy A; Holder, Mark T; Huelsenbeck, John P.

Mol Biol Evol ; 29(3): 939-55, 2012 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-22049064

RESUMEN

We introduce a new model for relaxing the assumption of a strict molecular clock for use as a prior in Bayesian methods for divergence time estimation. Lineage-specific rates of substitution are modeled using a Dirichlet process prior (DPP), a type of stochastic process that assumes lineages of a phylogenetic tree are distributed into distinct rate classes. Under the Dirichlet process, the number of rate classes, assignment of branches to rate classes, and the rate value associated with each class are treated as random variables. The performance of this model was evaluated by conducting analyses on data sets simulated under a range of different models. We compared the Dirichlet process model with two alternative models for rate variation: the strict molecular clock and the independent rates model. Our results show that divergence time estimation under the DPP provides robust estimates of node ages and branch rates without significantly reducing power. Further analyses were conducted on a biological data set, and we provide examples of ways to summarize Markov chain Monte Carlo samples under this model.

Asunto(s)

Evolución Molecular , Modelos Genéticos , Tasa de Mutación , Filogenia , Teorema de Bayes , Simulación por Computador , Cadenas de Markov , Método de Montecarlo , Procesos Estocásticos

9.

SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.

Liu, Kevin; Warnow, Tandy J; Holder, Mark T; Nelesen, Serita M; Yu, Jiaye; Stamatakis, Alexandros P; Linder, C Randal.

Syst Biol ; 61(1): 90-106, 2012 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-22139466

RESUMEN

Highly accurate estimation of phylogenetic trees for large data sets is difficult, in part because multiple sequence alignments must be accurate for phylogeny estimation methods to be accurate. Coestimation of alignments and trees has been attempted but currently only SATé estimates reasonably accurate trees and alignments for large data sets in practical time frames (Liu K., Raghavan S., Nelesen S., Linder C.R., Warnow T. 2009b. Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science. 324:1561-1564). Here, we present a modification to the original SATé algorithm that improves upon SATé (which we now call SATé-I) in terms of speed and of phylogenetic and alignment accuracy. SATé-II uses a different divide-and-conquer strategy than SATé-I and so produces smaller more closely related subsets than SATé-I; as a result, SATé-II produces more accurate alignments and trees, can analyze larger data sets, and runs more efficiently than SATé-I. Generally, SATé is a metamethod that takes an existing multiple sequence alignment method as an input parameter and boosts the quality of that alignment method. SATé-II-boosted alignment methods are significantly more accurate than their unboosted versions, and trees based upon these improved alignments are more accurate than trees based upon the original alignments. Because SATé-I used maximum likelihood (ML) methods that treat gaps as missing data to estimate trees and because we found a correlation between the quality of tree/alignment pairs and ML scores, we explored the degree to which SATé's performance depends on using ML with gaps treated as missing data to determine the best tree/alignment pair. We present two lines of evidence that using ML with gaps treated as missing data to optimize the alignment and tree produces very poor results. First, we show that the optimization problem where a set of unaligned DNA sequences is given and the output is the tree and alignment of those sequences that maximize likelihood under the Jukes-Cantor model is uninformative in the worst possible sense. For all inputs, all trees optimize the likelihood score. Second, we show that a greedy heuristic that uses GTR+Gamma ML to optimize the alignment and the tree can produce very poor alignments and trees. Therefore, the excellent performance of SATé-II and SATé-I is not because ML is used as an optimization criterion for choosing the best tree/alignment pair but rather due to the particular divide-and-conquer realignment techniques employed.

Asunto(s)

Filogenia , Alineación de Secuencia/métodos , Programas Informáticos , Algoritmos , Automatización , Simulación por Computador , ADN , Evolución Molecular , Funciones de Verosimilitud

10.

BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics.

Ayres, Daniel L; Darling, Aaron; Zwickl, Derrick J; Beerli, Peter; Holder, Mark T; Lewis, Paul O; Huelsenbeck, John P; Ronquist, Fredrik; Swofford, David L; Cummings, Michael P; Rambaut, Andrew; Suchard, Marc A.

Syst Biol ; 61(1): 170-3, 2012 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-21963610

RESUMEN

Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.

Asunto(s)

Biología Computacional/métodos , Filogenia , Programas Informáticos , Algoritmos , Metodologías Computacionales , Evolución Molecular , Genoma

11.

NeXML: rich, extensible, and verifiable representation of comparative data and metadata.

Vos, Rutger A; Balhoff, James P; Caravas, Jason A; Holder, Mark T; Lapp, Hilmar; Maddison, Wayne P; Midford, Peter E; Priyam, Anurag; Sukumaran, Jeet; Xia, Xuhua; Stoltzfus, Arlin.

Syst Biol ; 61(4): 675-89, 2012 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-22357728

RESUMEN

In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input-output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML.

Asunto(s)

Evolución Biológica , Biología Computacional/normas , Lenguajes de Programación , Biodiversidad , Clasificación , Informática , Modelos Biológicos , Filogenia , Programas Informáticos

12.

What's in a likelihood? Simple models of protein evolution and the contribution of structurally viable reconstructions to the likelihood.

Lakner, Clemens; Holder, Mark T; Goldman, Nick; Naylor, Gavin J P.

Syst Biol ; 60(2): 161-74, 2011 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-21233085

RESUMEN

Most phylogenetic models of protein evolution assume that sites are independent and identically distributed. Interactions between sites are ignored, and the likelihood can be conveniently calculated as the product of the individual site likelihoods. The calculation considers all possible transition paths (also called substitution histories or mappings) that are consistent with the observed states at the terminals, and the probability density of any particular reconstruction depends on the substitution model. The likelihood is the integral of the probability density of each substitution history taken over all possible histories that are consistent with the observed data. We investigated the extent to which transition paths that are incompatible with a protein's three-dimensional structure contribute to the likelihood. Several empirical amino acid models were tested for sequence pairs of different degrees of divergence. When simulating substitutional histories starting from a real sequence, the structural integrity of the simulated sequences quickly disintegrated. This result indicates that simple models are clearly unable to capture the constraints on sequence evolution. However, when we sampled transition paths between real sequences from the posterior probability distribution according to these same models, we found that the sampled histories were largely consistent with the tertiary structure. This suggests that simple empirical substitution models may be adequate for interpolating changes between observed sequences during phylogenetic inference despite the fact that the models cannot predict the effects of structural constraints from first principles. This study is significant because it provides a quantitative assessment of the biological realism of substitution models from the perspective of protein structure, and it provides insight on the prospects for improving models of protein sequence evolution.

Asunto(s)

Evolución Molecular , Proteínas/química , Proteínas/genética , Animales , Humanos , Funciones de Verosimilitud , Filogenia , Probabilidad

13.

The phylogenetic position of Myxozoa: exploring conflicting signals in phylogenomic and ribosomal data sets.

Evans, Nathaniel M; Holder, Mark T; Barbeitos, Marcos S; Okamura, Beth; Cartwright, Paulyn.

Mol Biol Evol ; 27(12): 2733-46, 2010 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-20576761

RESUMEN

Myxozoans are a diverse group of microscopic endoparasites that have been the focus of much controversy regarding their phylogenetic position. Two dramatically different hypotheses have been put forward regarding the placement of Myxozoa within Metazoa. One hypothesis, supported by ribosomal DNA (rDNA) data, place Myxozoa as a sister taxon to Bilateria. The alternative hypothesis, supported by phylogenomic data and morphology, place Myxozoa within Cnidaria. Here, we investigate these conflicting hypotheses and explore the effects of missing data, model choice, and inference methods, all of which can have an effect in placing highly divergent taxa. In addition, we identify subsets of the data that most influence the placement of Myxozoa and explore their effects by removing them from the data sets. Assembling the largest taxonomic sampling of myxozoans and cnidarians to date, with a comprehensive sampling of other metazoans for 18S and 28S nuclear rDNA sequences, we recover a well-supported placement of Myxozoa as an early diverging clade of Bilateria. By conducting parametric bootstrapping, we find that the bilaterian placement of Buddenbrockia could not alone be explained by long-branch attraction. After trimming a published phylogenomic data set, to circumvent problems of missing data, we recover the myxozoan Buddenbrockia plumatellae as a medusozoan cnidarian. In further explorations of these data sets, we find that removal of just a few identified sites under a maximum likelihood criterion employing the Whelan and Goldman amino acid substitution model changes the placement of Buddenbrockia from within Cnidaria to the alternative hypothesis at the base of Bilateria. Under a Bayesian criterion employing the CAT model, the cnidarian placement is more resilient to data removal, but under one test, a well-supported early diverging bilaterian position for Buddenbrockia is recovered. Our results confirm the existence of two relatively stable placements for myxozoans and demonstrate that conflicting signal exists not only between the two types of data but also within the phylogenomic data set. These analyses underscore the importance of careful model selection, taxon and data sampling, and in-depth data exploration when investigating the phylogenetic placement of highly divergent taxa.

Asunto(s)

Bases de Datos Genéticas , Myxozoa/clasificación , Filogenia , ARN Ribosómico 18S/genética , ARN Ribosómico 28S/genética , Animales , Secuencia de Bases , Cnidarios/clasificación , Cnidarios/genética , ADN Ribosómico/genética , Myxozoa/genética , Ribosomas/genética

14.

DendroPy: a Python library for phylogenetic computing.

Sukumaran, Jeet; Holder, Mark T.

Bioinformatics ; 26(12): 1569-71, 2010 Jun 15.

Artículo en Inglés | MEDLINE | ID: mdl-20421198

RESUMEN

UNLABELLED: DendroPy is a cross-platform library for the Python programming language that provides for object-oriented reading, writing, simulation and manipulation of phylogenetic data, with an emphasis on phylogenetic tree operations. DendroPy uses a splits-hash mapping to perform rapid calculations of tree distances, similarities and shape under various metrics. It contains rich simulation routines to generate trees under a number of different phylogenetic and coalescent models. DendroPy's data simulation and manipulation facilities, in conjunction with its support of a broad range of phylogenetic data formats (NEXUS, Newick, PHYLIP, FASTA, NeXML, etc.), allow it to serve a useful role in various phyloinformatics and phylogeographic pipelines. AVAILABILITY: The stable release of the library is available for download and automated installation through the Python Package Index site (http://pypi.python.org/pypi/DendroPy), while the active development source code repository is available to the public from GitHub (http://github.com/jeetsukumaran/DendroPy).

Asunto(s)

Biología Computacional/métodos , Filogenia , Programas Informáticos , Algoritmos , Bases de Datos Factuales , Lenguajes de Programación

15.

Estimating phylogenetic trees from pairwise likelihoods and posterior probabilities of substitution counts.

Holder, Mark T; Steel, Mike.

J Theor Biol ; 280(1): 159-66, 2011 Jul 07.

Artículo en Inglés | MEDLINE | ID: mdl-21540039

RESUMEN

The field of phylogenetic tree estimation has been dominated by three broad classes of methods: distance-based approaches, parsimony and likelihood-based methods (including maximum likelihood (ML) and Bayesian approaches). Here we introduce two new approaches to tree inference: pairwise likelihood estimation and a distance-based method that estimates the number of substitutions along the paths through the tree. Our results include the derivation of the formulae for the probability that two leaves will be identical at a site given a number of substitutions along the path connecting them. We also derive the posterior probability of the number of substitutions along a path between two sequences. The calculations for the posterior probabilities are exact for group-based, symmetric models of character evolution, but are only approximate for more general models.

Asunto(s)

Evolución Molecular , Modelos Genéticos , Filogenia

16.

Estimating trees from filtered data: identifiability of models for morphological phylogenetics.

Allman, Elizabeth S; Holder, Mark T; Rhodes, John A.

J Theor Biol ; 263(1): 108-19, 2010 Mar 07.

Artículo en Inglés | MEDLINE | ID: mdl-20004210

RESUMEN

As an alternative to parsimony analyses, stochastic models have been proposed (Lewis, 2001; Nylander et al., 2004) for morphological characters, so that maximum likelihood or Bayesian analyses may be used for phylogenetic inference. A key feature of these models is that they account for ascertainment bias, in that only varying, or parsimony-informative characters are observed. However, statistical consistency of such model-based inference requires that the model parameters be identifiable from the joint distribution they entail, and this issue has not been addressed. Here we prove that parameters for several such models, with finite state spaces of arbitrary size, are identifiable, provided the tree has at least eight leaves. If the tree topology is already known, then seven leaves suffice for identifiability of the numerical parameters. The method of proof involves first inferring a full distribution of both parsimony-informative and non-informative pattern joint probabilities from the parsimony-informative ones, using phylogenetic invariants. The failure of identifiability of the tree parameter for four-taxon trees is also investigated.

Asunto(s)

Biología Computacional/métodos , Filogenia , Algoritmos , Teorema de Bayes , Clasificación , Funciones de Verosimilitud , Cadenas de Markov , Modelos Genéticos , Modelos Estadísticos , Modelos Teóricos , Probabilidad , Terminología como Asunto

17.

A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species.

Redelings, Benjamin D; Holder, Mark T.

PeerJ ; 5: e3058, 2017.

Artículo en Inglés | MEDLINE | ID: mdl-28265520

RESUMEN

We present a new supertree method that enables rapid estimation of a summary tree on the scale of millions of leaves. This supertree method summarizes a collection of input phylogenies and an input taxonomy. We introduce formal goals and criteria for such a supertree to satisfy in order to transparently and justifiably represent the input trees. In addition to producing a supertree, our method computes annotations that describe which grouping in the input trees support and conflict with each group in the supertree. We compare our supertree construction method to a previously published supertree construction method by assessing their performance on input trees used to construct the Open Tree of Life version 4, and find that our method increases the number of displayed input splits from 35,518 to 39,639 and decreases the number of conflicting input splits from 2,760 to 1,357. The new supertree method also improves on the previous supertree construction method in that it produces no unsupported branches and avoids unnecessary polytomies. This pipeline is currently used by the Open Tree of Life project to produce all of the versions of project's "synthetic tree" starting at version 5. This software pipeline is called "propinquity". It relies heavily on "otcetera"-a set of C++ tools to perform most of the steps of the pipeline. All of the components are free software and are available on GitHub.

18.

The akaike information criterion will not choose the no common mechanism model.

Holder, Mark T; Lewis, Paul O; Swofford, David L.

Syst Biol ; 59(4): 477-85, 2010 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-20547783

Asunto(s)

Evolución Biológica , Modelos Genéticos , Modelos Estadísticos

19.

A justification for reporting the majority-rule consensus tree in Bayesian phylogenetics.

Holder, Mark T; Sukumaran, Jeet; Lewis, Paul O.

Syst Biol ; 57(5): 814-21, 2008 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-18853367

Asunto(s)

Variación Genética , Filogenia , Teorema de Bayes , Teoría de las Decisiones , Evolución Molecular , Modelos Genéticos

20.

Evidence for climate-driven diversification? A caution for interpreting ABC inferences of simultaneous historical events.

Oaks, Jamie R; Sukumaran, Jeet; Esselstyn, Jacob A; Linkem, Charles W; Siler, Cameron D; Holder, Mark T; Brown, Rafe M.

Evolution ; 67(4): 991-1010, 2013 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-23550751

RESUMEN

Approximate Bayesian computation (ABC) is rapidly gaining popularity in population genetics. One example, msBayes, infers the distribution of divergence times among pairs of taxa, allowing phylogeographers to test hypotheses about historical causes of diversification in co-distributed groups of organisms. Using msBayes, we infer the distribution of divergence times among 22 pairs of populations of vertebrates distributed across the Philippine Archipelago. Our objective was to test whether sea-level oscillations during the Pleistocene caused diversification across the islands. To guide interpretation of our results, we perform a suite of simulation-based power analyses. Our empirical results strongly support a recent simultaneous divergence event for all 22 taxon pairs, consistent with the prediction of the Pleistocene-driven diversification hypothesis. However, our empirical estimates are sensitive to changes in prior distributions, and our simulations reveal low power of the method to detect random variation in divergence times and bias toward supporting clustered divergences. Our results demonstrate that analyses exploring power and prior sensitivity should accompany ABC model selection inferences. The problems we identify are potentially mitigable with uniform priors over divergence models (rather than classes of models) and more flexible prior distributions on demographic and divergence-time parameters.

Asunto(s)

Evolución Biológica , Clima , Modelos Biológicos , Animales , Especiación Genética , Fenómenos Geológicos , Islas , Filogenia

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA