Pesquisa | Portal de Pesquisa da BVS

1.

Efficient Bayesian inference under the multispecies coalescent with migration.

Flouri, Tomás; Jiao, Xiyun; Huang, Jun; Rannala, Bruce; Yang, Ziheng.

Proc Natl Acad Sci U S A ; 120(44): e2310708120, 2023 Oct 31.

Artigo em Inglês | MEDLINE | ID: mdl-37871206

RESUMO

Analyses of genome sequence data have revealed pervasive interspecific gene flow and enriched our understanding of the role of gene flow in speciation and adaptation. Inference of gene flow using genomic data requires powerful statistical methods. Yet current likelihood-based methods involve heavy computation and are feasible for small datasets only. Here, we implement the multispecies-coalescent-with-migration model in the Bayesian program bpp, which can be used to test for gene flow and estimate migration rates, as well as species divergence times and population sizes. We develop Markov chain Monte Carlo algorithms for efficient sampling from the posterior, enabling the analysis of genome-scale datasets with thousands of loci. Implementation of both introgression and migration models in the same program allows us to test whether gene flow occurred continuously over time or in pulses. Analyses of genomic data from Anopheles mosquitoes demonstrate rich information in typical genomic datasets about the mode and rate of gene flow.

Assuntos

Algoritmos , Fluxo Gênico , Animais , Filogenia , Simulação por Computador , Teorema de Bayes , Funções Verossimilhança , Modelos Genéticos

2.

Bayesian Inference Under the Multispecies Coalescent with Ancient DNA Sequences.

Nagel, Anna A; Flouri, Tomás; Yang, Ziheng; Rannala, Bruce.

Syst Biol ; 2024 Jul 30.

Artigo em Inglês | MEDLINE | ID: mdl-39078610

RESUMO

Ancient DNA (aDNA) is increasingly being used to investigate questions such as the phylogenetic relationships and divergence times of extant and extinct species. If aDNA samples are sufficiently old, expected branch lengths (in units of nucleotide substitutions) are reduced relative to contemporary samples. This can be accounted for by incorporating sample ages into phylogenetic analyses. Existing methods that use tip (sample) dates infer gene trees rather than species trees, which can lead to incorrect or biased inferences of the species tree. Methods using a multispecies coalescent (MSC) model overcome these issues. We developed an MSC model with tip dates and implemented it in the program bpp. The method performed well for a range of biologically realistic scenarios, estimating calibrated divergence times and mutation rates precisely. Simulations suggest that estimation precision can be best improved by prioritizing sampling of many loci and more ancient samples. Incorrectly treating ancient samples as contemporary in analyzing simulated data, mimicking a common practice of empirical analyses, led to large systematic biases in model parameters, including divergence times. Two genomic datasets of mammoths and elephants were analyzed, demonstrating the method's empirical utility.

3.

Inference of Phylogenetic Networks from Sequence Data using Composite Likelihood.

Kong, Sungsik; Swofford, David L; Kubatko, Laura S.

Syst Biol ; 2024 Oct 10.

Artigo em Inglês | MEDLINE | ID: mdl-39387633

RESUMO

While phylogenies have been essential in understanding how species evolve, they do not adequately describe some evolutionary processes. For instance, hybridization, a common phenomenon where interbreeding between two species leads to formation of a new species, must be depicted by a phylogenetic network, a structure that modifies a phylogenetic tree by allowing two branches to merge into one, resulting in reticulation. However, existing methods for estimating networks become computationally expensive as the dataset size and/or topological complexity increase. The lack of methods for scalable inference hampers phylogenetic networks from being widely used in practice, despite accumulating evidence that hybridization occurs frequently in nature. Here, we propose a novel method, PhyNEST (Phylogenetic Network Estimation using SiTe patterns), that estimates binary, level-1 phylogenetic networks with a fixed, user-specified number of reticulations directly from sequence data. By using the composite likelihood as the basis for inference, PhyNEST is able to use the full genomic data in a computationally tractable manner, eliminating the need to summarize the data as a set of gene trees prior to network estimation. To search network space, PhyNEST implements both hill climbing and simulated annealing algorithms. PhyNEST assumes that the data are composed of coalescent independent sites that evolve according to the Jukes-Cantor substitution model and that the network has a constant effective population size. Simulation studies demonstrate that PhyNEST is often more accurate than two existing composite likelihood summary methods (SNaQ and PhyloNet) and that it is robust to at least one form of model misspecification (assuming a less complex nucleotide substitution model than the true generating model). We applied PhyNEST to reconstruct the evolutionary relationships among Heliconius butterflies and Papionini primates, characterized by hybrid speciation and widespread introgression, respectively. PhyNEST is implemented in an open-source Julia package and is publicly available at https://github.com/sungsik-kong/PhyNEST.jl.

4.

Hierarchical Heuristic Species Delimitation under the Multispecies Coalescent Model with Migration.

Kornai, Daniel; Jiao, Xiyun; Ji, Jiayi; Flouri, Tomás; Yang, Ziheng.

Syst Biol ; 2024 Aug 24.

Artigo em Inglês | MEDLINE | ID: mdl-39180155

RESUMO

The multispecies coalescent (MSC) model accommodates genealogical fluctuations across the genome and provides a natural framework for comparative analysis of genomic sequence data from closely related species to infer the history of species divergence and gene flow. Given a set of populations, hypotheses of species delimitation (and species phylogeny) may be formulated as instances of MSC models (e.g., MSC for one species versus MSC for two species) and compared using Bayesian model selection. This approach, implemented in the program bpp, has been found to be prone to over-splitting. Alternatively heuristic criteria based on population parameters (such as popula- tion split times, population sizes, and migration rates) estimated from genomic data may be used to delimit species. Here we develop hierarchical merge and split algorithms for heuristic species delimitation based on the genealogical divergence index (ððð) and implement them in a python pipeline called hhsd. We characterize the behavior of the ððð under a few simple scenarios of gene flow. We apply the new approaches to a dataset simulated under a model of isolation by distance as well as three empirical datasets. Our tests suggest that the new approaches produced sensible results and were less prone to over-splitting. We discuss possible strategies for accommodating paraphyletic species in the hierarchical algorithm, as well as the challenges of species delimitation based on heuristic criteria.

5.

Benefits and Limits of Phasing Alleles for Network Inference of Allopolyploid Complexes.

Tiley, George P; Crowl, Andrew A; Manos, Paul S; Sessa, Emily B; Solís-Lemus, Claudia; Yoder, Anne D; Burleigh, J Gordon.

Syst Biol ; 73(4): 666-682, 2024 Oct 25.

Artigo em Inglês | MEDLINE | ID: mdl-38733563

RESUMO

Accurately reconstructing the reticulate histories of polyploids remains a central challenge for understanding plant evolution. Although phylogenetic networks can provide insights into relationships among polyploid lineages, inferring networks may be hindered by the complexities of homology determination in polyploid taxa. We use simulations to show that phasing alleles from allopolyploid individuals can improve phylogenetic network inference under the multispecies coalescent by obtaining the true network with fewer loci compared with haplotype consensus sequences or sequences with heterozygous bases represented as ambiguity codes. Phased allelic data can also improve divergence time estimates for networks, which is helpful for evaluating allopolyploid speciation hypotheses and proposing mechanisms of speciation. To achieve these outcomes in empirical data, we present a novel pipeline that leverages a recently developed phasing algorithm to reliably phase alleles from polyploids. This pipeline is especially appropriate for target enrichment data, where the depth of coverage is typically high enough to phase entire loci. We provide an empirical example in the North American Dryopteris fern complex that demonstrates insights from phased data as well as the challenges of network inference. We establish that our pipeline (PATÉ: Phased Alleles from Target Enrichment data) is capable of recovering a high proportion of phased loci from both diploids and polyploids. These data may improve network estimates compared with using haplotype consensus assemblies by accurately inferring the direction of gene flow, but statistical nonidentifiability of phylogenetic networks poses a barrier to inferring the evolutionary history of reticulate complexes.

Assuntos

Alelos , Filogenia , Poliploidia , Classificação/métodos , Gleiquênias/genética , Gleiquênias/classificação , Simulação por Computador , Algoritmos , Modelos Genéticos

6.

Likelihood-Based Tests of Species Tree Hypotheses.

Adams, Richard; DeGiorgio, Michael.

Mol Biol Evol ; 40(7)2023 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-37440530

RESUMO

Likelihood-based tests of phylogenetic trees are a foundation of modern systematics. Over the past decade, an enormous wealth and diversity of model-based approaches have been developed for phylogenetic inference of both gene trees and species trees. However, while many techniques exist for conducting formal likelihood-based tests of gene trees, such frameworks are comparatively underdeveloped and underutilized for testing species tree hypotheses. To date, widely used tests of tree topology are designed to assess the fit of classical models of molecular sequence data and individual gene trees and thus are not readily applicable to the problem of species tree inference. To address this issue, we derive several analogous likelihood-based approaches for testing topologies using modern species tree models and heuristic algorithms that use gene tree topologies as input for maximum likelihood estimation under the multispecies coalescent. For the purpose of comparing support for species trees, these tests leverage the statistical procedures of their original gene tree-based counterparts that have an extended history for testing phylogenetic hypotheses at a single locus. We discuss and demonstrate a number of applications, limitations, and important considerations of these tests using simulated and empirical phylogenomic data sets that include both bifurcating topologies and reticulate network models of species relationships. Finally, we introduce the open-source R package SpeciesTopoTestR (SpeciesTopology Tests in R) that includes a suite of functions for conducting formal likelihood-based tests of species topologies given a set of input gene tree topologies.

Assuntos

Algoritmos , Modelos Genéticos , Filogenia , Funções Verossimilhança

7.

Inferring the Direction of Introgression Using Genomic Sequence Data.

Thawornwattana, Yuttapong; Huang, Jun; Flouri, Tomás; Mallet, James; Yang, Ziheng.

Mol Biol Evol ; 40(8)2023 08 03.

Artigo em Inglês | MEDLINE | ID: mdl-37552932

RESUMO

Genomic data are informative about the history of species divergence and interspecific gene flow, including the direction, timing, and strength of gene flow. However, gene flow in opposite directions generates similar patterns in multilocus sequence data, such as reduced sequence divergence between the hybridizing species. As a result, inference of the direction of gene flow is challenging. Here, we investigate the information about the direction of gene flow present in genomic sequence data using likelihood-based methods under the multispecies-coalescent-with-introgression model. We analyze the case of two species, and use simulation to examine cases with three or four species. We find that it is easier to infer gene flow from a small population to a large one than in the opposite direction, and easier to infer inflow (gene flow from outgroup species to an ingroup species) than outflow (gene flow from an ingroup species to an outgroup species). It is also easier to infer gene flow if there is a longer time of separate evolution between the initial divergence and subsequent introgression. When introgression is assumed to occur in the wrong direction, the time of introgression tends to be correctly estimated and the Bayesian test of gene flow is often significant, while estimates of introgression probability can be even greater than the true probability. We analyze genomic sequences from Heliconius butterflies to demonstrate that typical genomic datasets are informative about the direction of interspecific gene flow, as well as its timing and strength.

Assuntos

Borboletas , Animais , Funções Verossimilhança , Teorema de Bayes , Borboletas/genética , Genoma , Genômica , Fluxo Gênico , Filogenia , Hibridização Genética

8.

Identifiability of speciation times under the multispecies coalescent.

Kubatko, Laura; Leonard, Alexander; Chifman, Julia.

J Theor Biol ; 595: 111927, 2024 Aug 30.

Artigo em Inglês | MEDLINE | ID: mdl-39216590

RESUMO

The advent of rapid and inexpensive sequencing technologies has necessitated the development of computationally efficient methods for analyzing sequence data for many genes simultaneously in a phylogenetic framework. The coalescent process is the most commonly used model for linking the underlying genealogies of individual genes with the global species-level phylogeny, but inference under the coalescent model is computationally daunting in the typical inference frameworks (e.g., the likelihood and Bayesian frameworks) due to the dimensionality of the space of both gene trees and species trees. Here we consider estimation of the branch lengths in fixed species trees with three or four taxa, and show that these branch lengths are identifiable. We also show that for three and four taxa simple estimators for the branch lengths can be derived based on observed site pattern frequencies. Properties of these estimators, such as their asymptotic variances and large-sample distributions, are examined, and performance of the estimators is assessed using simulation. Finally, we use these estimators to develop a hypothesis test that can be used to delimit species under the coalescent model for three or four putative taxa.

9.

Lineage diversification and rampant hybridization among subspecies explain taxonomic confusion in the endemic Hawaiian fern Polypodium pellucidum.

Mendez-Reneau, Jonas I; Richards, Joseph L; Hobbie, Julia; Bollich, Emily; Kooyers, Nicholas J; Sigel, Erin M.

Am J Bot ; : e16379, 2024 Jul 30.

Artigo em Inglês | MEDLINE | ID: mdl-39081002

RESUMO

PREMISE: Polypodium pellucidum, a fern endemic to the Hawaiian Islands, encompasses five ecologically and morphologically variable subspecies, suggesting a complex history involving both rapid divergence and rampant hybridization. METHODS: We employed a large target-capture data set to investigate the evolution of genetic, morphological, and ecological variation in P. pellucidum. With a broad sampling across five Hawaiian Islands, we deciphered the evolutionary history of P. pellucidum, identified nonhybrid lineages and intraspecific hybrids, and inferred the relative influence of geography and ecology on their distributions. RESULTS: Polypodium pellucidum is monophyletic, dispersing to the Hawaiian archipelago 11.53-7.77 Ma and diversifying into extant clades between 5.66 and 4.73 Ma. We identified four nonhybrid clades with unique morphologies, ecological niches, and distributions. Additionally, we elucidated several intraspecific hybrid combinations and evidence for undiscovered or extinct "ghost" lineages contributing to extant hybrid populations. CONCLUSIONS: We provide a foundation for revising the taxonomy of P. pellucidum to account for cryptic lineages and intraspecific hybrids. Geologic succession of the Hawaiian Islands through cycles of volcanism, vegetative succession, and erosion has determined the available habitats and distribution of ecologically specific, divergent clades within P. pellucidum. Intraspecific hybrids have likely arisen due to ecological and or geological transitions, often persisting after the local extinction of their progenitors. This research contributes to our understanding of the evolution of Hawai'i's diverse fern flora and illuminated cryptic taxa to allow better-informed conservation efforts.

10.

Estimation of Cross-Species Introgression Rates Using Genomic Data Despite Model Unidentifiability.

Yang, Ziheng; Flouri, Tomás.

Mol Biol Evol ; 39(5)2022 05 03.

Artigo em Inglês | MEDLINE | ID: mdl-35417543

RESUMO

Full-likelihood implementations of the multispecies coalescent with introgression (MSci) model treat genealogical fluctuations across the genome as a major source of information to infer the history of species divergence and gene flow using multilocus sequence data. However, MSci models are known to have unidentifiability issues, whereby different models or parameters make the same predictions about the data and cannot be distinguished by the data. Previous studies of unidentifiability have focused on heuristic methods based on gene trees and do not make an efficient use of the information in the data. Here we study the unidentifiability of MSci models under the full-likelihood methods. We characterize the unidentifiability of the bidirectional introgression (BDI) model, which assumes that gene flow occurs in both directions. We derive simple rules for arbitrary BDI models, which create unidentifiability of the label-switching type. In general, an MSci model with k BDI events has 2k unidentifiable modes or towers in the posterior, with each BDI event between sister species creating within-model parameter unidentifiability and each BDI event between nonsister species creating between-model unidentifiability. We develop novel algorithms for processing Markov chain Monte Carlo samples to remove label-switching problems and implement them in the bpp program. We analyze real and synthetic data to illustrate the utility of the BDI models and the new algorithms. We discuss the unidentifiability of heuristic methods and provide guidelines for the use of MSci models to infer gene flow using genomic data.

Assuntos

Fluxo Gênico , Genômica , Algoritmos , Genômica/métodos , Modelos Genéticos , Filogenia

11.

Bayesian Phylogenetic Inference using Relaxed-clocks and the Multispecies Coalescent.

Flouri, Tomás; Huang, Jun; Jiao, Xiyun; Kapli, Paschalia; Rannala, Bruce; Yang, Ziheng.

Mol Biol Evol ; 39(8)2022 08 03.

Artigo em Inglês | MEDLINE | ID: mdl-35907248

RESUMO

The multispecies coalescent (MSC) model accommodates both species divergences and within-species coalescent and provides a natural framework for phylogenetic analysis of genomic data when the gene trees vary across the genome. The MSC model implemented in the program bpp assumes a molecular clock and the Jukes-Cantor model, and is suitable for analyzing genomic data from closely related species. Here we extend our implementation to more general substitution models and relaxed clocks to allow the rate to vary among species. The MSC-with-relaxed-clock model allows the estimation of species divergence times and ancestral population sizes using genomic sequences sampled from contemporary species when the strict clock assumption is violated, and provides a simulation framework for evaluating species tree estimation methods. We conducted simulations and analyzed two real datasets to evaluate the utility of the new models. We confirm that the clock-JC model is adequate for inference of shallow trees with closely related species, but it is important to account for clock violation for distant species. Our simulation suggests that there is valuable phylogenetic information in the gene-tree branch lengths even if the molecular clock assumption is seriously violated, and the relaxed-clock models implemented in bpp are able to extract such information. Our Markov chain Monte Carlo algorithms suffer from mixing problems when used for species tree estimation under the relaxed clock and we discuss possible improvements. We conclude that the new models are currently most effective for estimating population parameters such as species divergence times when the species tree is fixed.

Assuntos

Modelos Genéticos , Teorema de Bayes , Simulação por Computador , Cadeias de Markov , Método de Monte Carlo , Filogenia

12.

Inference of Gene Flow between Species under Misspecified Models.

Huang, Jun; Thawornwattana, Yuttapong; Flouri, Tomás; Mallet, James; Yang, Ziheng.

Mol Biol Evol ; 39(12)2022 12 05.

Artigo em Inglês | MEDLINE | ID: mdl-36317198

RESUMO

Genomic sequence data provide a rich source of information about the history of species divergence and interspecific hybridization or introgression. Despite recent advances in genomics and statistical methods, it remains challenging to infer gene flow, and as a result, one may have to estimate introgression rates and times under misspecified models. Here we use mathematical analysis and computer simulation to examine estimation bias and issues of interpretation when the model of gene flow is misspecified in analysis of genomic datasets, for example, if introgression is assigned to the wrong lineages. In the case of two species, we establish a correspondence between the migration rate in the continuous migration model and the introgression probability in the introgression model. When gene flow occurs continuously through time but in the analysis is assumed to occur at a fixed time point, common evolutionary parameters such as species divergence times are surprisingly well estimated. However, the time of introgression tends to be estimated towards the recent end of the period of continuous gene flow. When introgression events are assigned incorrectly to the parental or daughter lineages, introgression times tend to collapse onto species divergence times, with introgression probabilities underestimated. Overall, our analyses suggest that the simple introgression model is useful for extracting information concerning between-specific gene flow and divergence even when the model may be misspecified. However, for reliable inference of gene flow it is important to include multiple samples per species, in particular, from hybridizing species.

Assuntos

Fluxo Gênico , Genômica , Simulação por Computador

13.

Molecular Clocks without Rocks: New Solutions for Old Problems.

Tiley, George P; Poelstra, Jelmer W; Dos Reis, Mario; Yang, Ziheng; Yoder, Anne D.

Trends Genet ; 36(11): 845-856, 2020 11.

Artigo em Inglês | MEDLINE | ID: mdl-32709458

RESUMO

Molecular data have been used to date species divergences ever since they were described as documents of evolutionary history in the 1960s. Yet, an inadequate fossil record and discordance between gene trees and species trees are persistently problematic. We examine how, by accommodating gene tree discordance and by scaling branch lengths to absolute time using mutation rate and generation time, multispecies coalescent (MSC) methods can potentially overcome these challenges. We find that time estimates can differ - in some cases, substantially - depending on whether MSC methods or traditional phylogenetic methods that apply concatenation are used, and whether the tree is calibrated with pedigree-based mutation rates or with fossils. We discuss the advantages and shortcomings of both approaches and provide practical guidance for data analysis when using these methods.

Assuntos

Evolução Biológica , Fósseis , Mamíferos/classificação , Mamíferos/genética , Modelos Teóricos , Taxa de Mutação , Filogenia , Animais , Fluxo Gênico , Modelos Genéticos

14.

Gene flow assessment helps to distinguish strong genomic structure from speciation in an Iberian ant-eating spider.

Ortiz, David; Pekár, Stano; Bryjová, Anna.

Mol Phylogenet Evol ; 180: 107682, 2023 03.

Artigo em Inglês | MEDLINE | ID: mdl-36574825

RESUMO

Although genomic data is boosting our understanding of evolution, we still lack a solid framework to perform reliable genome-based species delineation. This problem is especially critical in the case of phylogeographically structured organisms, with allopatric populations showing similar divergence patterns as species. Here, we assess the species limits and phylogeography of Zodarion alacre, an ant-eating spider widely distributed across the Iberian Peninsula. We first performed species delimitation based on genome-wide data and then validated these results using additional evidence. A commonly employed species delimitation strategy detected four distinct lineages with almost no admixture, which present allopatric distributions. These lineages showed ecological differentiation but no clear morphological differentiation, and evidence of introgression in a mitochondrial barcode. Phylogenomic networks found evidence of substantial gene flow between lineages. Finally, phylogeographic methods highlighted remarkable isolation by distance and detected evidence of range expansion from south-central Portugal to central-north Spain. We conclude that despite their deep genomic differentiation, the lineages of Z. alacre do not show evidence of complete speciation. Our results likely shed light on why Zodarion is among the most diversified spider genera despite its limited distribution and support the use of gene flow evidence to inform species boundaries.

Assuntos

Fluxo Gênico , Aranhas , Animais , Filogenia , Especiação Genética , Aranhas/genética , Análise de Sequência de DNA , Filogeografia , Genômica , DNA Mitocondrial/genética

15.

Comparing inference under the multispecies coalescent with and without recombination.

Yan, Zhi; Ogilvie, Huw A; Nakhleh, Luay.

Mol Phylogenet Evol ; 181: 107724, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-36720421

RESUMO

Accurate inference of population parameters plays a pivotal role in unravelling evolutionary histories. While recombination has been universally accepted as a fundamental process in the evolution of sexually reproducing organisms, it remains challenging to model it exactly. Thus, existing coalescent-based approaches make different assumptions or approximations to facilitate phylogenetic inference, which can potentially bring about biases in estimates of evolutionary parameters when recombination is present. In this article, we evaluate the performance of population parameter estimation using three methods-StarBEAST2, SNAPP, and diCal2-that represent three different types of inference. We performed whole-genome simulations in which recombination rates, mutation rates, and levels of incomplete lineage sorting were varied. We show that StarBEAST2 using short or medium-sized loci is robust to realistic rates of recombination, which is in agreement with previous studies. SNAPP, as expected, is generally unaffected by recombination events. Most surprisingly, diCal2, a method that is designed to explicitly account for recombination, performs considerably worse than other methods under comparison.

Assuntos

Genoma , Taxa de Mutação , Filogenia , Recombinação Genética , Modelos Genéticos , Simulação por Computador

16.

Interspecific gene flow obscures phylogenetic relationships in an important insect pest species complex.

San Jose, Michael; Doorenweerd, Camiel; Geib, Scott; Barr, Norman; Dupuis, Julian R; Leblanc, Luc; Kauwe, Angela; Morris, Kimberley Y; Rubinoff, Daniel.

Mol Phylogenet Evol ; 188: 107892, 2023 11.

Artigo em Inglês | MEDLINE | ID: mdl-37524217

RESUMO

As genomic data proliferates, the prevalence of post-speciation gene flow is making species boundaries and relationships increasingly ambiguous. Although current approaches inferring fully bifurcating phylogenies based on concatenated datasets provide simple and robust answers to many species relationships, they may be inaccurate because the models ignore inter-specific gene flow and incomplete lineage sorting. To examine the potential error resulting from ignoring gene flow, we generated both a RAD-seq and a 500 protein-coding loci highly multiplexed amplicon (HiMAP) dataset for a monophyletic group of 12 species defined as the Bactrocera dorsalis sensu lato clade. With some of the world's worst agricultural pests, the taxonomy of the B. dorsalis s.l. clade is important for trade and quarantines. However, taxonomic confusion confounds resolution due to intra- and interspecific phenotypic variation and convergence, mitochondrial introgression across half of the species, and viable hybrids. We compared the topological convergence of our datasets using concatenated phylogenetic and various multispecies coalescent approaches, some of which account for gene flow. All analyses agreed on species delimitation, but there was incongruence between species relationships. Under concatenation, both datasets suggest identical species relationships with mostly high statistical support. However, multispecies coalescent and multispecies network approaches suggest markedly different hypotheses and detected significant gene flow. We suggest that the network approaches are likely more accurate because gene flow violates the assumptions of the concatenated phylogenetic analyses, but the data-reductive requirements of network approaches resulted in reduced statistical support and could not unambiguously resolve gene flow directions. Our study highlights the importance of testing for gene flow, particularly with phylogenomic datasets, even when concatenated approaches receive high statistical support.

Assuntos

Fluxo Gênico , Genômica , Animais , Filogenia , Genoma , Insetos/genética

17.

Complexity of the simplest species tree problem.

Zhu, Tianqi; Yang, Ziheng.

Mol Biol Evol ; 38(9): 3993-4009, 2021 08 23.

Artigo em Inglês | MEDLINE | ID: mdl-33492385

RESUMO

The multispecies coalescent model provides a natural framework for species tree estimation accounting for gene-tree conflicts. Although a number of species tree methods under the multispecies coalescent have been suggested and evaluated using simulation, their statistical properties remain poorly understood. Here, we use mathematical analysis aided by computer simulation to examine the identifiability, consistency, and efficiency of different species tree methods in the case of three species and three sequences under the molecular clock. We consider four major species-tree methods including concatenation, two-step, independent-sites maximum likelihood, and maximum likelihood. We develop approximations that predict that the probit transform of the species tree estimation error decreases linearly with the square root of the number of loci. Even in this simplest case, major differences exist among the methods. Full-likelihood methods are considerably more efficient than summary methods such as concatenation and two-step. They also provide estimates of important parameters such as species divergence times and ancestral population sizes,whereas these parameters are not identifiable by summary methods. Our results highlight the need to improve the statistical efficiency of summary methods and the computational efficiency of full likelihood methods of species tree estimation.

Assuntos

Modelos Genéticos , Simulação por Computador , Filogenia , Densidade Demográfica , Probabilidade

18.

A simulation study to examine the impact of recombination on phylogenomic inferences under the multispecies coalescent model.

Zhu, Tianqi; Flouri, Tomás; Yang, Ziheng.

Mol Ecol ; 31(10): 2814-2829, 2022 05.

Artigo em Inglês | MEDLINE | ID: mdl-35313033

RESUMO

Phylogenomic analyses under the multispecies coalescent model assume no recombination within locus and free recombination among loci. Yet, in real data sets intralocus recombination causes different sites of the same locus to have different genealogical histories so that the model is misspecified. The impact of recombination on various coalescent-based phylogenomic analyses has not been systematically examined. Here, we conduct a computer simulation to examine the impact of recombination on several Bayesian analyses of multilocus sequence data, including species tree estimation, species delimitation (by Bayesian selection of delimitation models) and estimation of evolutionary parameters such as species divergence and introgression times, population sizes for modern and extinct species, and cross-species introgression probabilities. We found that recombination, at rates comparable to estimates from the human being, has little impact on coalescent-based species tree estimation, species delimitation and estimation of population parameters. At rates 10 times higher than the human rate, recombination may affect parameter estimation, causing positive biases in introgression times and ancestral population sizes, although species divergence times and cross-species introgression probabilities are estimated with little bias. Overall, the simulation suggests that phylogenomic inferences under the multispecies coalescent model are robust to realistic amounts of intralocus recombination.

Assuntos

Modelos Genéticos , Recombinação Genética , Teorema de Bayes , Simulação por Computador , Humanos , Filogenia , Recombinação Genética/genética

19.

Gauging ages of tiger swallowtail butterflies using alternate SNP analyses.

Vernygora, Oksana V; Campbell, Erin O; Grishin, Nick V; Sperling, Felix A H; Dupuis, Julian R.

Mol Phylogenet Evol ; 171: 107465, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35351633

RESUMO

Divergence times underpin diverse evolutionary hypotheses, but conflicting age estimates across studies diminish the validity of such hypotheses. These conflicts have continued to grow as large genomics datasets become commonplace and analytical approaches proliferate. To provide more stable temporal intervals, age estimations should be interpreted in the context of both the type of data and analysis being used. Here, we use multispecies coalescent (MSC), concatenation-based, and categorical data transformation approaches on genome-wide SNP data to infer divergence ages within the Papilio glaucus group of tiger swallowtail butterflies in North America. While the SNP data supported previously recognized relationships within the group (P. multicaudata, ((P. eurymedon, P. rutulus), (P. appalachiensis, P. canadensis, P. glaucus))), estimated ages of divergence between the major lineages varied substantially among analyses. MSC produced wide credibility intervals particularly for deeper nodes, reflecting uncertainty in the coalescence times as a possible result of conflicting signal across gene trees. Concatenation, in contrast, gave narrower and more well-defined posterior distributions for the node ages; however, the higher precision of these time estimates is a likely artefact due to more simplistic underlying assumptions of this approach that do not account for conflict among gene trees. Transformed categorical data analysis gave the least precise and the most variable results, with its simple substitution model coupled with a relaxed clock tending to produce spurious results from large genome-wide datasets. While median node ages differed considerably between analyses (â¼2 Mya between MSC and concatenation-based results), their corresponding credibility intervals nonetheless highlight common temporal patterns for deeper divergences in the group as well as finer-scale phylogeography. Age distributions across analyses support an origin of the group during the warm period of the early to mid-Pliocene. Late Pliocene climate aridification and cooling drove divergence between eastern and western groups that further diversified during the period of repeated Pleistocene glaciations. Our results provide a structured comparative assessment of divergence time estimates and evolutionary relationships in a well-studied group of butterflies, and support better understanding of analytical biases in divergence time estimation.

Assuntos

Borboletas , Animais , Evolução Biológica , Borboletas/genética , Genoma , Filogenia , Filogeografia

20.

Phylogenetics in space: How continuous spatial structure impacts tree inference.

Hancock, Zachary B; Lehmberg, Emma S; Blackmon, Heath.

Mol Phylogenet Evol ; 173: 107505, 2022 08.

Artigo em Inglês | MEDLINE | ID: mdl-35577296

RESUMO

The tendency to discretize biology permeates taxonomy and systematics, leading to models that simplify the often continuous nature of populations. Even when the assumption of panmixia is relaxed, most models still assume some degree of discrete structure. The multispecies coalescent has emerged as a powerful model in phylogenetics, but in its common implementation is entirely space-independent - what we call the "missing z-axis". In this article, we review the many lines of evidence for how continuous spatial structure can impact phylogenetic inference. We illustrate and expand on these by using complex continuous-space demographic models that include distinct modes of speciation. We find that the impact of spatial structure permeates all aspects of phylogenetic inference, including gene tree stoichiometry, topological and branch-length variance, network estimation, and species delimitation. We conclude by utilizing our results to suggest how researchers can identify spatial structure in phylogenetic datasets.

Assuntos

Modelos Genéticos , Filogenia

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA