Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Syst Biol ; 2023 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-38085256

RESUMEN

Time-scaled phylogenetic trees are an ultimate goal of evolutionary biology and a necessary ingredient in comparative studies. The accumulation of genomic data has resolved the tree of life to a great extent, yet timing evolutionary events remains challenging if not impossible without external information such as fossil ages and morphological characters. Methods for incorporating morphology in tree estimation have lagged behind their molecular counterparts, especially in the case of continuous characters. Despite recent advances, such tools are still direly needed as we approach the limits of what molecules can teach us. Here, we implement a suite of state-of-the-art methods for leveraging continuous morphology in phylogenetics, and by conducting extensive simulation studies we thoroughly validate and explore our methods' properties. While retaining model generality and scalability, we make it possible to estimate absolute and relative divergence times from multiple continuous characters while accounting for uncertainty. We compile and analyze one of the most data-type diverse data sets to date, comprised of contemporaneous and ancient molecular sequences, and discrete and continuous characters from living and extinct Carnivora taxa. We conclude by synthesizing lessons about our method's behavior, and suggest future research venues.

2.
PLoS Comput Biol ; 19(7): e1011226, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-37463154

RESUMEN

Phylogenetic models have become increasingly complex, and phylogenetic data sets have expanded in both size and richness. However, current inference tools lack a model specification language that can concisely describe a complete phylogenetic analysis while remaining independent of implementation details. We introduce a new lightweight and concise model specification language, 'LPhy', which is designed to be both human and machine-readable. A graphical user interface accompanies 'LPhy', allowing users to build models, simulate data, and create natural language narratives describing the models. These narratives can serve as the foundation for manuscript method sections. Additionally, we present a command-line interface for converting LPhy-specified models into analysis specification files (in XML format) compatible with the BEAST2 software platform. Collectively, these tools aim to enhance the clarity of descriptions and reporting of probabilistic models in phylogenetic studies, ultimately promoting reproducibility of results.


Asunto(s)
Lenguaje , Programas Informáticos , Humanos , Filogenia , Reproducibilidad de los Resultados , Modelos Estadísticos , Interfaz Usuario-Computador
3.
Bioinformatics ; 36(22-23): 5516-5518, 2021 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-33325502

RESUMEN

MOTIVATION: Genome sequencing projects have revealed frequent gains and losses of genes between species. Previous versions of our software, Computational Analysis of gene Family Evolution (CAFE), have allowed researchers to estimate parameters of gene gain and loss across a phylogenetic tree. However, the underlying model assumed that all gene families had the same rate of evolution, despite evidence suggesting a large amount of variation in rates among families. RESULTS: Here, we present CAFE 5, a completely re-written software package with numerous performance and user-interface enhancements over previous versions. These include improved support for multithreading, the explicit modeling of rate variation among families using gamma-distributed rate categories, and command-line arguments that preclude the use of accessory scripts. AVAILABILITY AND IMPLEMENTATION: CAFE 5 source code, documentation, test data and a detailed manual with examples are freely available at https://github.com/hahnlab/CAFE5/releases. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Syst Biol ; 71(1): 208-220, 2021 12 16.
Artículo en Inglés | MEDLINE | ID: mdl-34228807

RESUMEN

Evolutionary models account for either population- or species-level processes but usually not both. We introduce a new model, the FBD-MSC, which makes it possible for the first time to integrate both the genealogical and fossilization phenomena, by means of the multispecies coalescent (MSC) and the fossilized birth-death (FBD) processes. Using this model, we reconstruct the phylogeny representing all extant and many fossil Caninae, recovering both the relative and absolute time of speciation events. We quantify known inaccuracy issues with divergence time estimates using the popular strategy of concatenating molecular alignments and show that the FBD-MSC solves them. Our new integrative method and empirical results advance the paradigm and practice of probabilistic total evidence analyses in evolutionary biology.[Caninae; fossilized birth-death; molecular clock; multispecies coalescent; phylogenetics; species trees.].


Asunto(s)
Especiación Genética , Modelos Biológicos , Evolución Biológica , Fósiles , Filogenia
5.
PLoS Comput Biol ; 15(4): e1006650, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30958812

RESUMEN

Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.


Asunto(s)
Teorema de Bayes , Evolución Biológica , Filogenia , Programas Informáticos , Animales , Biología Computacional , Simulación por Computador , Evolución Molecular , Humanos , Cadenas de Markov , Modelos Genéticos , Método de Montecarlo
6.
Syst Biol ; 67(1): 158-169, 2018 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-28973673

RESUMEN

Genome-scale sequencing has been of great benefit in recovering species trees but has not provided final answers. Despite the rapid accumulation of molecular sequences, resolving short and deep branches of the tree of life has remained a challenge and has prompted the development of new strategies that can make the best use of available data. One such strategy-the concatenation of gene alignments-can be successful when coupled with many tree estimation methods, but has also been shown to fail when there are high levels of incomplete lineage sorting. Here, we focus on the failure of likelihood-based methods in retrieving a rooted, asymmetric four-taxon species tree from concatenated data when the species tree is in or near the anomaly zone-a region of parameter space where the most common gene tree does not match the species tree because of incomplete lineage sorting. First, we use coalescent theory to prove that most informative sites will support the species tree in the anomaly zone, and that as a consequence maximum-parsimony succeeds in recovering the species tree from concatenated data. We further show that maximum-likelihood tree estimation from concatenated data fails both inside and outside the anomaly zone, and that this failure cannot be easily predicted from the topology of the most common gene tree. We demonstrate that likelihood-based methods often fail in a region partially overlapping the anomaly zone, likely because of the lower relative cost of substitutions on discordant gene tree branches that are absent from the species tree. Our results confirm and extend previous reports on the performance of these methods applied to concatenated data from a rooted, asymmetric four-taxon species tree, and highlight avenues for future work improving the performance of methods aimed at recovering species tree.


Asunto(s)
Clasificación/métodos , Filogenia , Evolución Biológica , Especiación Genética , Funciones de Verosimilitud , Modelos Genéticos , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN
7.
Mol Biol Evol ; 33(12): 3299-3307, 2016 12.
Artículo en Inglés | MEDLINE | ID: mdl-27634870

RESUMEN

Phenotypic convergence is an exciting outcome of adaptive evolution, occurring when different species find similar solutions to the same problem. Unraveling the molecular basis of convergence provides a way to link genotype to adaptive phenotypes, but can also shed light on the extent to which molecular evolution is repeatable and predictable. Many recent genome-wide studies have uncovered a striking pattern of diminishing convergence over time, ascribing this pattern to the presence of intramolecular epistatic interactions. Here, we consider gene tree discordance as an alternative cause of changes in convergence levels over time in a primate dataset. We demonstrate that gene tree discordance can produce patterns of diminishing convergence by itself, and that controlling for discordance as a cause of apparent convergence makes the pattern disappear. We also show that synonymous substitutions, where neither selection nor epistasis should be prevalent, have the same diminishing pattern of molecular convergence in primates. Finally, we demonstrate that even in situations where biological discordance is not possible, discordance due to errors in species tree inference can drive similar patterns. Though intramolecular epistasis could in principle create a pattern of declining convergence over time, our results suggest a possible alternative explanation for this widespread pattern. These results contribute to a growing appreciation not just of the presence of gene tree discordance, but of the unpredictable effects this discordance can have on analyses of molecular evolution.


Asunto(s)
Evolución Molecular , Estudios de Asociación Genética/métodos , Variación Genética , Animales , Evolución Biológica , Epistasis Genética , Especiación Genética , Genoma , Genotipo , Modelos Genéticos , Filogenia , Primates/genética
8.
Mol Ecol ; 26(8): 2317-2330, 2017 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-28141906

RESUMEN

Gene flow can impede the evolution of reproductive isolating barriers between species. Reinforcement is the process by which prezygotic reproductive isolation evolves in sympatry due to selection to decrease costly hybridization. It is known that reinforcement can be prevented by too much gene flow, but we still do not know how often have prezygotic barriers evolved in the presence of gene flow or how much gene flow can occur during reinforcement. Flower colour divergence in the native Texas wildflower, Phlox drummondii, is one of the best-studied cases of reinforcement. Here we use genomic analyses to infer gene flow between P. drummondii and a closely related sympatric species, Phlox cuspidata. We de novo assemble transcriptomes of four Phlox species to determine the phylogenetic relationships between these species and find extensive discordance among gene tree topologies across genes. We find evidence of introgression between sympatric P. drummondii and P. cuspidata using the D-statistic, and use phylogenetic analyses to infer the predominant direction of introgression. We investigate geographic variation in gene flow by comparing the relative divergence of genes displaying discordant gene trees between an allopatric and sympatric sample. These analyses support the hypothesis that sympatric P. drummondii has experienced gene flow with P. cuspidata. We find that gene flow between these species is asymmetrical, which could explain why reinforcement caused divergence in only one of the sympatric species. Given the previous research in this system, we suggest strong selection can explain how reinforcement successfully evolved in this system despite gene flow in sympatry.


Asunto(s)
Evolución Biológica , Flujo Génico , Genoma de Planta , Magnoliopsida/genética , Simpatría , Flores/genética , Redes Reguladoras de Genes , Hibridación Genética , Modelos Genéticos , Filogenia , Texas , Transcriptoma
9.
Syst Biol ; 65(4): 711-21, 2016 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-26927960

RESUMEN

Substitution rates are known to be variable among genes, chromosomes, species, and lineages due to multifarious biological processes. Here, we consider another source of substitution rate variation due to a technical bias associated with gene tree discordance. Discordance has been found to be rampant in genome-wide data sets, often due to incomplete lineage sorting (ILS). This apparent substitution rate variation is caused when substitutions that occur on discordant gene trees are analyzed in the context of a single, fixed species tree. Such substitutions have to be resolved by proposing multiple substitutions on the species tree, and we therefore refer to this phenomenon as Substitutions Produced by ILS (SPILS). We use simulations to demonstrate that SPILS has a larger effect with increasing levels of ILS, and on trees with larger numbers of taxa. Specific branches of the species trees are consistently, but erroneously, inferred to be longer or shorter, and we show that these branches can be predicted based on discordant tree topologies. Moreover, we observe that fixing a species tree topology when performing tests of positive selection increases the false positive rate, particularly for genes whose discordant topologies are most affected by SPILS. Finally, we use data from multiple Drosophila species to show that SPILS can be detected in nature. Although the effects of SPILS are modest per gene, it has the potential to affect substitution rate variation whenever high levels of ILS are present, particularly in rapid radiations. The problems outlined here have implications for character mapping of any type of trait, and for any biological process that causes discordance. We discuss possible solutions to these problems, and areas in which they are likely to have caused faulty inferences of convergence and accelerated evolution.


Asunto(s)
Evolución Molecular , Genoma , Modelos Genéticos , Sustitución de Aminoácidos , Animales , Drosophila/genética , Filogenia
10.
bioRxiv ; 2023 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-38168278

RESUMEN

We introduce PhyloJunction, a computational framework designed to facilitate the prototyping, testing, and characterization of evolutionary models. PhyloJunction is distributed as an open-source Python library that can be used to implement a variety of models, through its flexible graphical modeling architecture and dedicated model specification language. Model design and use are exposed to users via command-line and graphical interfaces, which integrate the steps of simulating, summarizing, and visualizing data. This paper describes the features of PhyloJunction - which include, but are not limited to, a general implementation of a popular family of phylogenetic diversification models - and, moving forward, how it may be expanded to not only include new models, but to also become a platform for conducting and teaching statistical learning.

11.
Virus Evol ; 7(2): veab052, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34527282

RESUMEN

New Zealand, Australia, Iceland, and Taiwan all saw success in controlling their first waves of Coronavirus Disease 2019 (COVID-19). As islands, they make excellent case studies for exploring the effects of international travel and human movement on the spread of COVID-19. We employed a range of robust phylodynamic methods and genome subsampling strategies to infer the epidemiological history of Severe acute respiratory syndrome coronavirus 2 in these four countries. We compared these results to transmission clusters identified by the New Zealand Ministry of Health by contact tracing strategies. We estimated the effective reproduction number of COVID-19 as 1-1.4 during early stages of the pandemic and show that it declined below 1 as human movement was restricted. We also showed that this disease was introduced many times into each country and that introductions slowed down markedly following the reduction of international travel in mid-March 2020. Finally, we confirmed that New Zealand transmission clusters identified via standard health surveillance strategies largely agree with those defined by genomic data. We have demonstrated how the use of genomic data and computational biology methods can assist health officials in characterising the epidemiology of viral epidemics and for contact tracing.

12.
Nat Commun ; 11(1): 6351, 2020 12 11.
Artículo en Inglés | MEDLINE | ID: mdl-33311501

RESUMEN

New Zealand, a geographically remote Pacific island with easily sealable borders, implemented a nationwide 'lockdown' of all non-essential services to curb the spread of COVID-19. Here, we generate 649 SARS-CoV-2 genome sequences from infected patients in New Zealand with samples collected during the 'first wave', representing 56% of all confirmed cases in this time period. Despite its remoteness, the viruses imported into New Zealand represented nearly all of the genomic diversity sequenced from the global virus population. These data helped to quantify the effectiveness of public health interventions. For example, the effective reproductive number, Re of New Zealand's largest cluster decreased from 7 to 0.2 within the first week of lockdown. Similarly, only 19% of virus introductions into New Zealand resulted in ongoing transmission of more than one additional case. Overall, these results demonstrate the utility of genomic pathogen surveillance to inform public health and disease mitigation.


Asunto(s)
COVID-19/epidemiología , Genoma Viral/genética , Genómica/métodos , SARS-CoV-2/genética , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , COVID-19/virología , Niño , Preescolar , Femenino , Geografía , Humanos , Lactante , Recién Nacido , Masculino , Persona de Mediana Edad , Nueva Zelanda/epidemiología , Pandemias , Filogenia , SARS-CoV-2/clasificación , SARS-CoV-2/fisiología , Secuenciación Completa del Genoma/métodos , Adulto Joven
13.
Philos Trans R Soc Lond B Biol Sci ; 374(1777): 20180244, 2019 07 22.
Artículo en Inglés | MEDLINE | ID: mdl-31154973

RESUMEN

Accurate inferences of convergence require that the appropriate tree topology be used. If there is a mismatch between the tree a trait has evolved along and the tree used for analysis, then false inferences of convergence ('hemiplasy') can occur. To avoid problems of hemiplasy when there are high levels of gene tree discordance with the species tree, researchers have begun to construct tree topologies from individual loci. However, due to intralocus recombination, even locus-specific trees may contain multiple topologies within them. This implies that the use of individual tree topologies discordant with the species tree can still lead to incorrect inferences about molecular convergence. Here, we examine the frequency with which single exons and single protein-coding genes contain multiple underlying tree topologies, in primates and Drosophila, and quantify the effects of hemiplasy when using trees inferred from individual loci. In both clades, we find that there are most often multiple diagnosable topologies within single exons and whole genes, with 91% of Drosophila protein-coding genes containing multiple topologies. Because of this underlying topological heterogeneity, even using trees inferred from individual protein-coding genes results in 25% and 38% of substitutions falsely labelled as convergent in primates and Drosophila, respectively. While constructing local trees can reduce the problem of hemiplasy, our results suggest that it will be difficult to completely avoid false inferences of convergence. We conclude by suggesting several ways forward in the analysis of convergent evolution, for both molecular and morphological characters. This article is part of the theme issue 'Convergent evolution in the genomics era: new insights and directions'.


Asunto(s)
Drosophila/genética , Evolución Molecular , Primates/genética , Recombinación Genética , Animales , Drosophila/clasificación , Exones , Filogenia , Primates/clasificación , Proteínas/genética
14.
Elife ; 72018 08 14.
Artículo en Inglés | MEDLINE | ID: mdl-29969096

RESUMEN

We present a multispecies coalescent model for quantitative traits that allows for evolutionary inferences at micro- and macroevolutionary scales. A major advantage of this model is its ability to incorporate genealogical discordance underlying a quantitative trait. We show that discordance causes a decrease in the expected trait covariance between more closely related species relative to more distantly related species. If unaccounted for, this outcome can lead to an overestimation of a trait's evolutionary rate, to a decrease in its phylogenetic signal, and to errors when examining shifts in mean trait values. The number of loci controlling a quantitative trait appears to be irrelevant to all trends reported, and discordance also affected discrete, threshold traits. Our model and analyses point to the conditions under which different methods should fare better or worse, in addition to indicating current and future approaches that can mitigate the effects of discordance.


Asunto(s)
Modelos Genéticos , Sitios de Carácter Cuantitativo , Carácter Cuantitativo Heredable , Animales , Evolución Biológica , Humanos , Fenotipo , Filogenia , Especificidad de la Especie
15.
Genetics ; 200(1): 267-84, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25716978

RESUMEN

Characterizing the nature of the adaptive process at the genetic level is a central goal for population genetics. In particular, we know little about the sources of adaptive substitution or about the number of adaptive variants currently segregating in nature. Historically, population geneticists have focused attention on the hard-sweep model of adaptation in which a de novo beneficial mutation arises and rapidly fixes in a population. Recently more attention has been given to soft-sweep models, in which alleles that were previously neutral, or nearly so, drift until such a time as the environment shifts and their selection coefficient changes to become beneficial. It remains an active and difficult problem, however, to tease apart the telltale signatures of hard vs. soft sweeps in genomic polymorphism data. Through extensive simulations of hard- and soft-sweep models, here we show that indeed the two might not be separable through the use of simple summary statistics. In particular, it seems that recombination in regions linked to, but distant from, sites of hard sweeps can create patterns of polymorphism that closely mirror what is expected to be found near soft sweeps. We find that a very similar situation arises when using haplotype-based statistics that are aimed at detecting partial or ongoing selective sweeps, such that it is difficult to distinguish the shoulder of a hard sweep from the center of a partial sweep. While knowing the location of the selected site mitigates this problem slightly, we show that stochasticity in signatures of natural selection will frequently cause the signal to reach its zenith far from this site and that this effect is more severe for soft sweeps; thus inferences of the target as well as the mode of positive selection may be inaccurate. In addition, both the time since a sweep ends and biologically realistic levels of allelic gene conversion lead to errors in the classification and identification of selective sweeps. This general problem of "soft shoulders" underscores the difficulty in differentiating soft and partial sweeps from hard-sweep scenarios in molecular population genomics data. The soft-shoulder effect also implies that the more common hard sweeps have been in recent evolutionary history, the more prevalent spurious signatures of soft or partial sweeps may appear in some genome-wide scans.


Asunto(s)
Modelos Genéticos , Selección Genética , Evolución Molecular , Desequilibrio de Ligamiento , Polimorfismo Genético
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA