Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Syst Biol ; 69(3): 579-592, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-31747023

RESUMO

Studies have demonstrated that pervasive gene tree conflict underlies several important phylogenetic relationships where different species tree methods produce conflicting results. Here, we present a means of dissecting the phylogenetic signal for alternative resolutions within a data set in order to resolve recalcitrant relationships and, importantly, identify what the data set is unable to resolve. These procedures extend upon methods for isolating conflict and concordance involving specific candidate relationships and can be used to identify systematic error and disambiguate sources of conflict among species tree inference methods. We demonstrate these on a large phylogenomic plant data set. Our results support the placement of Amborella as sister to the remaining extant angiosperms, Gnetales as sister to pines, and the monophyly of extant gymnosperms. Several other contentious relationships, including the resolution of relationships within the bryophytes and the eudicots, remain uncertain given the low number of supporting gene trees. To address whether concatenation of filtered genes amplified phylogenetic signal for relationships, we implemented a combinatorial heuristic to test combinability of genes. We found that nested conflicts limited the ability of data filtering methods to fully ameliorate conflicting signal amongst gene trees. These analyses confirmed that the underlying conflicting signal does not support broad concatenation of genes. Our approach provides a means of dissecting a specific data set to address deep phylogenetic relationships while also identifying the inferential boundaries of the data set. [Angiosperms; coalescent; gene-tree conflict; genomics; phylogenetics; phylogenomics.].


Assuntos
Classificação/métodos , Filogenia , Plantas/classificação , Genes de Plantas/genética , Plantas/genética
2.
Mol Biol Evol ; 36(1): 112-126, 2019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30371871

RESUMO

Several plant lineages have evolved adaptations that allow survival in extreme and harsh environments including many families within the plant clade Portulacineae (Caryophyllales) such as the Cactaceae, Didiereaceae, and Montiaceae. Here, using newly generated transcriptomic data, we reconstructed the phylogeny of Portulacineae and examined potential correlates between molecular evolution and adaptation to harsh environments. Our phylogenetic results were largely congruent with previous analyses, but we identified several early diverging nodes characterized by extensive gene tree conflict. For particularly contentious nodes, we present detailed information about the phylogenetic signal for alternative relationships. We also analyzed the frequency of gene duplications, confirmed previously identified whole genome duplications (WGD), and proposed a previously unidentified WGD event within the Didiereaceae. We found that the WGD events were typically associated with shifts in climatic niche but did not find a direct association with WGDs and diversification rate shifts. Diversification shifts occurred within the Portulacaceae, Cactaceae, and Anacampserotaceae, and whereas these did not experience WGDs, the Cactaceae experienced extensive gene duplications. We examined gene family expansion and molecular evolutionary patterns with a focus on genes associated with environmental stress responses and found evidence for significant gene family expansion in genes with stress adaptation and clades found in extreme environments. These results provide important directions for further and deeper examination of the potential links between molecular evolutionary patterns and adaptation to harsh environments.


Assuntos
Adaptação Biológica , Evolução Biológica , Caryophyllales/genética , Temperatura Baixa , Secas , Família Multigênica , Poliploidia
3.
PLoS Comput Biol ; 15(2): e1006493, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30768597

RESUMO

Phylogenomic research is accelerating the publication of landmark studies that aim to resolve deep divergences of major organismal groups. Meanwhile, systems for identifying and integrating the products of phylogenomic inference-such as newly supported clade concepts-have not kept pace. However, the ability to verbalize node concept congruence and conflict across multiple, in effect simultaneously endorsed phylogenomic hypotheses, is a prerequisite for building synthetic data environments for biological systematics and other domains impacted by these conflicting inferences. Here we develop a novel solution to the conflict verbalization challenge, based on a logic representation and reasoning approach that utilizes the language of Region Connection Calculus (RCC-5) to produce consistent alignments of node concepts endorsed by incongruent phylogenomic studies. The approach employs clade concept labels to individuate concepts used by each source, even if these carry identical names. Indirect RCC-5 modeling of intensional (property-based) node concept definitions, facilitated by the local relaxation of coverage constraints, allows parent concepts to attain congruence in spite of their differentially sampled children. To demonstrate the feasibility of this approach, we align two recent phylogenomic reconstructions of higher-level avian groups that entail strong conflict in the "neoavian explosion" region. According to our representations, this conflict is constituted by 26 instances of input "whole concept" overlap. These instances are further resolvable in the output labeling schemes and visualizations as "split concepts", which provide the labels and relations needed to build truly synthetic phylogenomic data environments. Because the RCC-5 alignments fundamentally reflect the trained, logic-enabled judgments of systematic experts, future designs for such environments need to promote a culture where experts routinely assess the intensionalities of node concepts published by our peers-even and especially when we are not in agreement with each other.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Filogenia , Animais , Aves/genética , Simulação por Computador , Humanos , Idioma
4.
Syst Biol ; 67(2): 340-353, 2018 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-28945912

RESUMO

Divergence time estimation-the calibration of a phylogeny to geological time-is an integral first step in modeling the tempo of biological evolution (traits and lineages). However, despite increasingly sophisticated methods to infer divergence times from molecular genetic sequences, the estimated age of many nodes across the tree of life contrast significantly and consistently with timeframes conveyed by the fossil record. This is perhaps best exemplified by crown angiosperms, where molecular clock (Triassic) estimates predate the oldest (Early Cretaceous) undisputed angiosperm fossils by tens of millions of years or more. While the incompleteness of the fossil record is a common concern, issues of data limitation and model inadequacy are viable (if underexplored) alternative explanations. In this vein, Beaulieu et al. (2015) convincingly demonstrated how methods of divergence time inference can be misled by both (i) extreme state-dependent molecular substitution rate heterogeneity and (ii) biased sampling of representative major lineages. These results demonstrate the impact of (potentially common) model violations. Here, we suggest another potential challenge: that the configuration of the statistical inference problem (i.e., the parameters, their relationships, and associated priors) alone may preclude the reconstruction of the paleontological timeframe for the crown age of angiosperms. We demonstrate, through sampling from the joint prior (formed by combining the tree (diversification) prior with the calibration densities specified for fossil-calibrated nodes) that with no data present at all, that an Early Cretaceous crown angiosperms is rejected (i.e., has essentially zero probability). More worrisome, however, is that for the 24 nodes calibrated by fossils, almost all have indistinguishable marginal prior and posterior age distributions when employing routine lognormal fossil calibration priors. These results indicate that there is inadequate information in the data to over-rule the joint prior. Given that these calibrated nodes are strategically placed in disparate regions of the tree, they act to anchor the tree scaffold, and so the posterior inference for the tree as a whole is largely determined by the pseudodata present in the (often arbitrary) calibration densities. We recommend, as for any Bayesian analysis, that marginal prior and posterior distributions be carefully compared to determine whether signal is coming from the data or prior belief, especially for parameters of direct interest. This recommendation is not novel. However, given how rarely such checks are carried out in evolutionary biology, it bears repeating. Our results demonstrate the fundamental importance of prior/posterior comparisons in any Bayesian analysis, and we hope that they further encourage both researchers and journals to consistently adopt this crucial step as standard practice. Finally, we note that the results presented here do not refute the biological modeling concerns identified by Beaulieu et al. (2015). Both sets of issues remain apposite to the goals of accurate divergence time estimation, and only by considering them in tandem can we move forward more confidently.


Assuntos
Modelos Biológicos , Filogenia , Teorema de Bayes , Evolução Biológica , Fósseis , Magnoliopsida/classificação , Magnoliopsida/genética , Tempo
5.
Syst Biol ; 67(5): 916-924, 2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-29893968

RESUMO

Recent studies have demonstrated that conflict is common among gene trees in phylogenomic studies, and that less than one percent of genes may ultimately drive species tree inference in supermatrix analyses. Herein, we examined two data sets where supermatrix and coalescent-based species trees conflict. We identified two highly influential "outlier" genes in each data set. When removed from each data set, the inferred supermatrix trees matched the topologies obtained from coalescent analyses. We also demonstrate that, while the outlier genes in the vertebrate data set have been shown in a previous study to be the result of errors in orthology detection, the outlier genes from a plant data set did not exhibit any obvious systematic error, and therefore, may be the result of some biological process yet to be determined. While topological comparisons among a small set of alternate topologies can be helpful in discovering outlier genes, they can be limited in several ways, such as assuming all genes share the same topology. Coalescent species tree methods relax this assumption but do not explicitly facilitate the examination of specific edges. Coalescent methods often also assume that conflict is the result of incomplete lineage sorting. Herein, we explored a framework that allows for quickly examining alternative edges and support for large phylogenomic data sets that does not assume a single topology for all genes. For both data sets, these analyses provided detailed results confirming the support for coalescent-based topologies. This framework suggests that we can improve our understanding of the underlying signal in phylogenomic data sets by asking more targeted edge-based questions.


Assuntos
Caryophyllales/classificação , Genômica , Filogenia , Vertebrados/classificação , Animais , Caryophyllales/genética , Modelos Genéticos , Vertebrados/genética
6.
Bioinformatics ; 33(12): 1886-1888, 2017 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-28174903

RESUMO

SUMMARY: The ease with which phylogenomic data can be generated has drastically escalated the computational burden for even routine phylogenetic investigations. To address this, we present phyx : a collection of programs written in C ++ to explore, manipulate, analyze and simulate phylogenetic objects (alignments, trees and MCMC logs). Modelled after Unix/GNU/Linux command line tools, individual programs perform a single task and operate on standard I/O streams that can be piped to quickly and easily form complex analytical pipelines. Because of the stream-centric paradigm, memory requirements are minimized (often only a single tree or sequence in memory at any instance), and hence phyx is capable of efficiently processing very large datasets. AVAILABILITY AND IMPLEMENTATION: phyx runs on POSIX-compliant operating systems. Source code, installation instructions, documentation and example files are freely available under the GNU General Public License at https://github.com/FePhyFoFum/phyx. CONTACT: eebsmith@umich.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica/métodos , Filogenia , Software
7.
New Phytol ; 217(2): 836-854, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-28892163

RESUMO

The role played by whole genome duplication (WGD) in plant evolution is actively debated. WGDs have been associated with advantages such as superior colonization, various adaptations, and increased effective population size. However, the lack of a comprehensive mapping of WGDs within a major plant clade has led to uncertainty regarding the potential association of WGDs and higher diversification rates. Using seven chloroplast and nuclear ribosomal genes, we constructed a phylogeny of 5036 species of Caryophyllales, representing nearly half of the extant species. We phylogenetically mapped putative WGDs as identified from analyses on transcriptomic and genomic data and analyzed these in conjunction with shifts in climatic occupancy and lineage diversification rate. Thirteen putative WGDs and 27 diversification shifts could be mapped onto the phylogeny. Of these, four WGDs were concurrent with diversification shifts, with other diversification shifts occurring at more recent nodes than WGDs. Five WGDs were associated with shifts to colder climatic occupancy. While we find that many diversification shifts occur after WGDs, it is difficult to consider diversification and duplication to be tightly correlated. Our findings suggest that duplications may often occur along with shifts in either diversification rate, climatic occupancy, or rate of evolution.


Assuntos
Caryophyllales/genética , Duplicação Gênica , Variação Genética , Caryophyllales/classificação , Clima , Genoma de Planta , Filogenia
8.
Am J Bot ; 105(3): 302-314, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29746720

RESUMO

PREMISE OF THE STUDY: Large phylogenies can help shed light on macroevolutionary patterns that inform our understanding of fundamental processes that shape the tree of life. These phylogenies also serve as tools that facilitate other systematic, evolutionary, and ecological analyses. Here we combine genetic data from public repositories (GenBank) with phylogenetic data (Open Tree of Life project) to construct a dated phylogeny for seed plants. METHODS: We conducted a hierarchical clustering analysis of publicly available molecular data for major clades within the Spermatophyta. We constructed phylogenies of major clades, estimated divergence times, and incorporated data from the Open Tree of Life project, resulting in a seed plant phylogeny. We estimated diversification rates, excluding those taxa without molecular data. We also summarized topological uncertainty and data overlap for each major clade. KEY RESULTS: The trees constructed for Spermatophyta consisted of 79,881 and 353,185 terminal taxa; the latter included the Open Tree of Life taxa for which we could not include molecular data from GenBank. The diversification analyses demonstrated nested patterns of rate shifts throughout the phylogeny. Data overlap and inference uncertainty show significant variation throughout and demonstrate the continued need for data collection across seed plants. CONCLUSIONS: This study demonstrates a means for combining available resources to construct a dated phylogeny for plants. However, this approach is an early step and more developments are needed to add data, better incorporating underlying uncertainty, and improve resolution. The methods discussed here can also be applied to other major clades in the tree of life.


Assuntos
Evolução Biológica , Filogenia , Plantas/genética , Sementes , Classificação , Análise por Conglomerados , Ecologia
9.
Am J Bot ; 105(3): 385-403, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29746719

RESUMO

PREMISE OF THE STUDY: Phylogenetic support has been difficult to evaluate within the green plant tree of life partly due to a lack of specificity between conflicted versus poorly informed branches. As data sets continue to expand in both breadth and depth, new support measures are needed that are more efficient and informative. METHODS: We describe the Quartet Sampling (QS) method, a quartet-based evaluation system that synthesizes several phylogenetic and genomic analytical approaches. QS characterizes discordance in large-sparse and genome-wide data sets, overcoming issues of alignment sparsity and distinguishing strong conflict from weak support. We tested QS with simulations and recent plant phylogenies inferred from variously sized data sets. KEY RESULTS: QS scores demonstrated convergence with increasing replicates and were not strongly affected by branch depth. Patterns of QS support from different phylogenies led to a coherent understanding of ancestral branches defining key disagreements, including the relationships of Ginkgo to cycads, magnoliids to monocots and eudicots, and mosses to liverworts. The relationships of ANA-grade angiosperms (Amborella, Nymphaeales, Austrobaileyales), major monocot groups, bryophytes, and fern families are likely highly discordant in their evolutionary histories, rather than poorly informed. QS can also detect discordance due to introgression in phylogenomic data. CONCLUSIONS: Quartet Sampling is an efficient synthesis of phylogenetic tests that offers more comprehensive and specific information on branch support than conventional measures. The QS method corroborates growing evidence that phylogenomic investigations that incorporate discordance testing are warranted when reconstructing complex evolutionary histories, in particular those surrounding ANA-grade, monocots, and nonvascular plants.


Assuntos
Evolução Biológica , DNA de Plantas/análise , Genoma de Planta , Genômica/métodos , Filogenia , Viridiplantae/genética , Briófitas/genética , Simulação por Computador , Cycadopsida/genética , Gleiquênias/genética , Ginkgo biloba/genética , Hepatófitas/genética , Magnoliopsida/genética , Reprodutibilidade dos Testes
10.
Proc Biol Sci ; 284(1864)2017 10 11.
Artigo em Inglês | MEDLINE | ID: mdl-29021179

RESUMO

Puttick et al. (2017 Proc. R. Soc. B284, 20162290 (doi:10.1098/rspb.2016.2290)) performed a simulation study to compare accuracy among methods of inferring phylogeny from discrete morphological characters. They report that a Bayesian implementation of the Mk model (Lewis 2001 Syst. Biol.50, 913-925 (doi:10.1080/106351501753462876)) was most accurate (but with low resolution), while a maximum-likelihood (ML) implementation of the same model was least accurate. They conclude by strongly advocating that Bayesian implementations of the Mk model should be the default method of analysis for such data. While we appreciate the authors' attempt to investigate the accuracy of alternative methods of analysis, their conclusion is based on an inappropriate comparison of the ML point estimate, which does not consider confidence, with the Bayesian consensus, which incorporates estimation credibility into the summary tree. Using simulation, we demonstrate that ML and Bayesian estimates are concordant when confidence and credibility are comparably reflected in summary trees, a result expected from statistical theory. We therefore disagree with the conclusions of Puttick et al. and consider their prescription of any default method to be poorly founded. Instead, we recommend caution and thoughtful consideration of the model or method being applied to a morphological dataset.


Assuntos
Teorema de Bayes , Filogenia , Funções Verossimilhança , Fenótipo , Incerteza
11.
Mol Phylogenet Evol ; 116: 69-77, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-28797692

RESUMO

Recent developments in phylogenetic methods and data acquisition have allowed for the construction of large and comprehensive phylogenetic relationships. Published phylogenies represent an enormous resource that not only facilitates the resolution of questions related to comparative biology, but also provides a resource on which to gauge the development of concordance across the tree of life. From the Open Tree of Life, we gathered 290 avian phylogenies representing all major groups that have been published over the last few decades and analyzed how concordance and conflict develop among these trees through time. Nine large scale phylogenetic hypotheses (including a new synthetic tree from this study) were used for comparisons. We found that conflicts were over-represented both along the backbone (higher-level neoavian relationships) and within the oscine Passeriformes. Importantly, although we have made major strides in the resolution of major clades, recent published comprehensive trees, as well as trees of individual clades, continue to contribute significantly to the resolution of relationships throughout the avian phylogeny. Our analyses highlight the need for continued research into the resolution of avian relationships.


Assuntos
Aves/classificação , Animais , Consenso , Modelos Biológicos , Filogenia
12.
Bioinformatics ; 31(17): 2794-800, 2015 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-25940563

RESUMO

MOTIVATION: Phylogenetic estimates from published studies can be archived using general platforms like Dryad (Vision, 2010) or TreeBASE (Sanderson et al., 1994). Such services fulfill a crucial role in ensuring transparency and reproducibility in phylogenetic research. However, digital tree data files often require some editing (e.g. rerooting) to improve the accuracy and reusability of the phylogenetic statements. Furthermore, establishing the mapping between tip labels used in a tree and taxa in a single common taxonomy dramatically improves the ability of other researchers to reuse phylogenetic estimates. As the process of curating a published phylogenetic estimate is not error-free, retaining a full record of the provenance of edits to a tree is crucial for openness, allowing editors to receive credit for their work and making errors introduced during curation easier to correct. RESULTS: Here, we report the development of software infrastructure to support the open curation of phylogenetic data by the community of biologists. The backend of the system provides an interface for the standard database operations of creating, reading, updating and deleting records by making commits to a git repository. The record of the history of edits to a tree is preserved by git's version control features. Hosting this data store on GitHub (http://github.com/) provides open access to the data store using tools familiar to many developers. We have deployed a server running the 'phylesystem-api', which wraps the interactions with git and GitHub. The Open Tree of Life project has also developed and deployed a JavaScript application that uses the phylesystem-api and other web services to enable input and curation of published phylogenetic statements. AVAILABILITY AND IMPLEMENTATION: Source code for the web service layer is available at https://github.com/OpenTreeOfLife/phylesystem-api. The data store can be cloned from: https://github.com/OpenTreeOfLife/phylesystem. A web application that uses the phylesystem web services is deployed at http://tree.opentreeoflife.org/curator. Code for that tool is available from https://github.com/OpenTreeOfLife/opentree. CONTACT: mtholder@gmail.com.


Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Armazenamento e Recuperação da Informação , Filogenia , Software , Humanos , Internet , Linguagens de Programação , Reprodutibilidade dos Testes , Interface Usuário-Computador
13.
Mol Phylogenet Evol ; 105: 193-199, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27601346

RESUMO

New World Vultures are large-bodied carrion feeding birds in the family Cathartidae, currently consisting of seven species from five genera with geographic distributions in North and South America. No study to date has included all cathartid species in a single phylogenetic analysis. In this study, we investigated the phylogenetic relationships among all cathartid species using five nuclear (nuc; 4060bp) and two mitochondrial (mt; 2165bp) DNA loci with fossil calibrated gene tree (27 outgroup taxa) and coalescent-based species tree (2 outgroup taxa) analyses. We also included an additional four nuclear loci (2578bp) for the species tree analysis to explore changes in nodal support values. Although the stem lineage is inferred to have originated ∼69 million years ago (Ma; 74.5-64.9 credible interval), a more recent basal split within Cathartidae was recovered at ∼14Ma (17.1-11.1 credible interval). Two primary clades were identified: (1) Black Vulture (Coragyps atratus) together with the three Cathartes species (Lesser C. burrovianus and Greater C. melambrotus Yellow-headed Vultures, and Turkey Vulture C. aura), and (2) King Vulture (Sarcoramphus papa), California (Gymnogyps californianus) and Andean (Vultur gryphus) Condors. Support for taxon relationships within the two basal clades were inconsistent between analyses with the exception of Black Vulture sister to a monophyletic Cathartes clade. Increased support for a yellow-headed vulture clade was recovered in the species tree analysis using the four additional nuclear loci. Overall, these results are in agreement with cathartid life history (e.g. olfaction ability and behavior) and contrasting habitat affinities among sister taxa with overlapping geographic distributions. More research is needed using additional molecular loci to further resolve the phylogenetic relationships within the two basal cathartid clades, as speciation appeared to have occurred in a relatively short period of time.


Assuntos
Aves/classificação , Animais , Aves/genética , California , DNA , DNA Mitocondrial/genética , Filogenia , Análise de Sequência de DNA , América do Sul
14.
BMC Evol Biol ; 15: 150, 2015 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-26239519

RESUMO

BACKGROUND: The use of transcriptomic and genomic datasets for phylogenetic reconstruction has become increasingly common as researchers attempt to resolve recalcitrant nodes with increasing amounts of data. The large size and complexity of these datasets introduce significant phylogenetic noise and conflict into subsequent analyses. The sources of conflict may include hybridization, incomplete lineage sorting, or horizontal gene transfer, and may vary across the phylogeny. For phylogenetic analysis, this noise and conflict has been accommodated in one of several ways: by binning gene regions into subsets to isolate consistent phylogenetic signal; by using gene-tree methods for reconstruction, where conflict is presumed to be explained by incomplete lineage sorting (ILS); or through concatenation, where noise is presumed to be the dominant source of conflict. The results provided herein emphasize that analysis of individual homologous gene regions can greatly improve our understanding of the underlying conflict within these datasets. RESULTS: Here we examined two published transcriptomic datasets, the angiosperm group Caryophyllales and the aculeate Hymenoptera, for the presence of conflict, concordance, and gene duplications in individual homologs across the phylogeny. We found significant conflict throughout the phylogeny in both datasets and in particular along the backbone. While some nodes in each phylogeny showed patterns of conflict similar to what might be expected with ILS alone, the backbone nodes also exhibited low levels of phylogenetic signal. In addition, certain nodes, especially in the Caryophyllales, had highly elevated levels of strongly supported conflict that cannot be explained by ILS alone. CONCLUSION: This study demonstrates that phylogenetic signal is highly variable in phylogenomic data sampled across related species and poses challenges when conducting species tree analyses on large genomic and transcriptomic datasets. Further insight into the conflict and processes underlying these complex datasets is necessary to improve and develop adequate models for sequence analysis and downstream applications. To aid this effort, we developed the open source software phyparts ( https://bitbucket.org/blackrim/phyparts ), which calculates unique, conflicting, and concordant bipartitions, maps gene duplications, and outputs summary statistics such as internode certainy (ICA) scores and node-specific counts of gene duplications.


Assuntos
Duplicação Gênica , Transferência Genética Horizontal , Magnoliopsida/genética , Animais , Genômica , Magnoliopsida/fisiologia , Filogenia , Software , Vespas/classificação , Vespas/genética
15.
Bioinformatics ; 30(15): 2216-8, 2014 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-24728855

RESUMO

SUMMARY: Phylogenetic comparative methods are essential for addressing evolutionary hypotheses with interspecific data. The scale and scope of such data have increased dramatically in the past few years. Many existing approaches are either computationally infeasible or inappropriate for data of this size. To address both of these problems, we present geiger v2.0, a complete overhaul of the popular R package geiger. We have reimplemented existing methods with more efficient algorithms and have developed several new approaches for accomodating heterogeneous models and data types. AVAILABILITY AND IMPLEMENTATION: This R package is available on the CRAN repository http://cran.r-project.org/web/packages/geiger/. All source code is also available on github http://github.com/mwpennell/geiger-v2. geiger v2.0 depends on the ape package. CONTACT: mwpennell@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Evolução Biológica , Biologia Computacional/métodos , Modelos Biológicos , Filogenia , Linguagens de Programação , Algoritmos , Teorema de Bayes , Funções Verossimilhança
16.
New Phytol ; 207(2): 454-467, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-26053261

RESUMO

Our growing understanding of the plant tree of life provides a novel opportunity to uncover the major drivers of angiosperm diversity. Using a time-calibrated phylogeny, we characterized hot and cold spots of lineage diversification across the angiosperm tree of life by modeling evolutionary diversification using stepwise AIC (MEDUSA). We also tested the whole-genome duplication (WGD) radiation lag-time model, which postulates that increases in diversification tend to lag behind established WGD events. Diversification rates have been incredibly heterogeneous throughout the evolutionary history of angiosperms and reveal a pattern of 'nested radiations' - increases in net diversification nested within other radiations. This pattern in turn generates a negative relationship between clade age and diversity across both families and orders. We suggest that stochastically changing diversification rates across the phylogeny explain these patterns. Finally, we demonstrate significant statistical support for the WGD radiation lag-time model. Across angiosperms, nested shifts in diversification led to an overall increasing rate of net diversification and declining relative extinction rates through time. These diversification shifts are only rarely perfectly associated with WGD events, but commonly follow them after a lag period.


Assuntos
Biodiversidade , Evolução Biológica , Genoma de Planta , Magnoliopsida/genética , Filogenia , Evolução Molecular , Modelos Genéticos
17.
Mol Phylogenet Evol ; 92: 155-64, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26140861

RESUMO

The phylogeny of Galliformes (landfowl) has been studied extensively; however, the associated chronologies have been criticized recently due to misplaced or misidentified fossil calibrations. As a consequence, it is unclear whether any crown-group lineages arose in the Cretaceous and survived the Cretaceous-Paleogene (K-Pg; 65.5 Ma) mass extinction. Using Bayesian phylogenetic inference on an alignment spanning 14,539 bp of mitochondrial and nuclear DNA sequence data, four fossil calibrations, and a combination of uncorrelated lognormally distributed relaxed-clock and strict-clock models, we inferred a time-calibrated molecular phylogeny for 225 of the 291 extant Galliform taxa. These analyses suggest that crown Galliformes diversified in the Cretaceous and that three-stem lineages survived the K-Pg mass extinction. Ideally, characterizing the tempo and mode of diversification involves a taxonomically complete phylogenetic hypothesis. We used simple constraint structures to incorporate 66 data-deficient taxa and inferred the first taxon-complete phylogenetic hypothesis for the Galliformes. Diversification analyses conducted on 10,000 timetrees sampled from the posterior distribution of candidate trees show that the evolutionary history of the Galliformes is best explained by a rate-shift model including 1-3 clade-specific increases in diversification rate. We further show that the tempo and mode of diversification in the Galliformes conforms to a three-pulse model, with three-stem lineages arising in the Cretaceous and inter and intrafamilial diversification occurring after the K-Pg mass extinction, in the Paleocene-Eocene (65.5-33.9 Ma) or in association with the Eocene-Oligocene transition (33.9 Ma).


Assuntos
Galliformes/genética , Filogenia , Animais , Teorema de Bayes , Calibragem , Fósseis , Fatores de Tempo
18.
New Phytol ; 202(4): 1382-1397, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24611540

RESUMO

Succulent plants are widely distributed, reaching their highest diversity in arid and semi-arid regions. Their origin and diversification is thought to be associated with a global expansion of aridity. We test this hypothesis by investigating the tempo and pattern of Cactaceae diversification. Our results contribute to the understanding of the evolution of New World Succulent Biomes. We use the most taxonomically complete dataset currently available for Cactaceae. We estimate divergence times and utilize Bayesian and maximum likelihood methods that account for nonrandom taxonomic sampling, possible extinction scenarios and phylogenetic uncertainty to analyze diversification rates, and evolution of growth form and pollination syndrome. Cactaceae originated shortly after the Eocene-Oligocene global drop in CO2 , and radiation of its richest genera coincided with the expansion of aridity in North America during the late Miocene. A significant correlation between growth form and pollination syndrome was found, as well as a clear state dependence between diversification rate, and pollination and growth-form evolution. This study suggests a complex picture underlying the diversification of Cactaceae. It not only responded to the availability of new niches resulting from aridification, but also to the correlated evolution of novel growth forms and reproductive strategies.


Assuntos
Cactaceae/genética , Biodiversidade , Evolução Biológica , Cactaceae/fisiologia , Filogenia
19.
PLoS Comput Biol ; 9(9): e1003223, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24086118

RESUMO

Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG). The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees), we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to observe.


Assuntos
Filogenia , Homologia de Sequência do Ácido Nucleico
20.
BMC Bioinformatics ; 14: 158, 2013 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-23668630

RESUMO

BACKGROUND: Scientists rarely reuse expert knowledge of phylogeny, in spite of years of effort to assemble a great "Tree of Life" (ToL). A notable exception involves the use of Phylomatic, which provides tools to generate custom phylogenies from a large, pre-computed, expert phylogeny of plant taxa. This suggests great potential for a more generalized system that, starting with a query consisting of a list of any known species, would rectify non-standard names, identify expert phylogenies containing the implicated taxa, prune away unneeded parts, and supply branch lengths and annotations, resulting in a custom phylogeny suited to the user's needs. Such a system could become a sustainable community resource if implemented as a distributed system of loosely coupled parts that interact through clearly defined interfaces. RESULTS: With the aim of building such a "phylotastic" system, the NESCent Hackathons, Interoperability, Phylogenies (HIP) working group recruited 2 dozen scientist-programmers to a weeklong programming hackathon in June 2012. During the hackathon (and a three-month follow-up period), 5 teams produced designs, implementations, documentation, presentations, and tests including: (1) a generalized scheme for integrating components; (2) proof-of-concept pruners and controllers; (3) a meta-API for taxonomic name resolution services; (4) a system for storing, finding, and retrieving phylogenies using semantic web technologies for data exchange, storage, and querying; (5) an innovative new service, DateLife.org, which synthesizes pre-computed, time-calibrated phylogenies to assign ages to nodes; and (6) demonstration projects. These outcomes are accessible via a public code repository (GitHub.com), a website (http://www.phylotastic.org), and a server image. CONCLUSIONS: Approximately 9 person-months of effort (centered on a software development hackathon) resulted in the design and implementation of proof-of-concept software for 4 core phylotastic components, 3 controllers, and 3 end-user demonstration tools. While these products have substantial limitations, they suggest considerable potential for a distributed system that makes phylogenetic knowledge readily accessible in computable form. Widespread use of phylotastic systems will create an electronic marketplace for sharing phylogenetic knowledge that will spur innovation in other areas of the ToL enterprise, such as annotation of sources and methods and third-party methods of quality assessment.


Assuntos
Filogenia , Software , Internet
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA