Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 312
Filter
1.
ISME J ; 2024 Jul 13.
Article in English | MEDLINE | ID: mdl-39001714

ABSTRACT

In recent years, phylogenetic reconciliation has emerged as a promising approach for studying microbial ecology and evolution. The core idea is to model how gene trees evolve along a species tree, and to explain differences between them via evolutionary events including gene duplications, transfers, and losses. Here, we describe how phylogenetic reconciliation provides a natural framework for studying genome evolution, and highlight recent applications including ancestral gene content inference, the rooting of species trees, and the insights into metabolic evolution and ecological transitions they yield. Reconciliation analyses have elucidated the evolution of diverse microbial lineages, from Chlamydiae to Asgard archaea, shedding light on ecological adaptation, host-microbe interactions, and symbiotic relationships. However, there are many opportunities for broader application of the approach in microbiology. Continuing improvements to make reconciliation models more realistic and scalable, and integration of ecological metadata such as habitat, pH, temperature and oxygen use, offer enormous potential for understanding the rich tapestry of microbial life.

2.
Syst Biol ; 2024 Jun 28.
Article in English | MEDLINE | ID: mdl-38940001

ABSTRACT

Maximum likelihood (ML) phylogenetic inference is widely used in phylogenomics. As heuristic searches most likely find suboptimal trees, it is recommended to conduct multiple (e.g., ten) tree searches in phylogenetic analyses. However, beyond its positive role, how and to what extent multiple tree searches aid ML phylogenetic inference remains poorly explored. Here, we found that a random starting tree was not as effective as the BioNJ and parsimony starting trees in inferring ML gene tree and that RAxML-NG and PhyML were less sensitive to different starting trees than IQ-TREE. We then examined the effect of the number of tree searches on ML tree inference with IQ-TREE and RAxML-NG, by running 100 tree searches on 19,414 gene alignments from 15 animal, plant, and fungal phylogenomic datasets. We found that the number of tree searches substantially impacted the recovery of the best-of-100 ML gene tree topology among 100 searches for a given ML program. In addition, all of the concatenation-based trees were topologically identical if the number of tree searches was ≥ 10. Quartet-based ASTRAL trees inferred from 1 to 80 tree searches differed topologically from those inferred from 100 tree searches for 6 /15 phylogenomic datasets. Lastly, our simulations showed that gene alignments with lower difficulty scores had a higher chance of finding the best-of-100 gene tree topology and were more likely to yield the correct trees.

3.
bioRxiv ; 2024 Jun 02.
Article in English | MEDLINE | ID: mdl-38854132

ABSTRACT

Ciliates are single-celled microbial eukaryotes that diverged from other eukaryotic lineages over a billion years ago. The extensive evolutionary timespan of ciliate has led to enormous genetic and phenotypic changes, contributing significantly to their high level of diversity. Recent analyses based on molecular data have revealed numerous cases of cryptic species complexes in different ciliate lineages, demonstrating the need for a robust approach to delimit species boundaries and elucidate phylogenetic relationships. Heterotrich ciliate species of the genus Spirostomum are abundant in freshwater and brackish environments and are commonly used as biological indicators for assessing water quality. However, some Spirostomum species are difficult to identify due to a lack of distinguishable morphological characteristics, and the existence of cryptic species in this genus remains largely unexplored. Previous phylogenetic studies have focused on only a few loci, namely the ribosomal RNA genes, alpha-tubulin, and mitochondrial CO1. In this study, we obtained single-cell transcriptome of 25 Spirostomum species populations (representing six morphospecies) sampled from South Korea and the USA, and used concatenation- and coalescent-based methods for species tree inference and delimitation. Phylogenomic analysis of 37 Spirostomum populations and 265 protein-coding genes provided a robustious insight into the evolutionary relationships among Spirostomum species and confirmed that species with moniliform and compact macronucleus each form a distinct monophyletic lineage. Furthermore, the multispecies coalescent (MSC) model suggests that there are at least nine cryptic species in the Spirostomum genus, three in S. minus, two in S. ambiguum, S. subtilis, and S. teres each. Overall, our fine sampling of closely related Spirostomum populations and wide scRNA-seq allowed us to demonstrate the hidden crypticity of species within the genus Spirostomum, and to resolve and provide much stronger support than hitherto to the phylogeny of this important ciliate genus.

4.
Mol Biol Evol ; 2024 Jun 08.
Article in English | MEDLINE | ID: mdl-38850168

ABSTRACT

We developed phyloBARCODER (https://github.com/jun-inoue/phyloBARCODER), a new web tool that can identify short DNA sequences to the species level using metabarcoding. phyloBARCODER estimates phylogenetic trees based on uploaded anonymous DNA sequences and reference sequences from databases. Without such phylogenetic contexts, alternative, similarity-based methods independently identify species names and anonymous sequences of the same group by pairwise comparisons between queries and database sequences, with the caveat that they must match exactly or very closely. By putting metabarcoding sequences into a phylogenetic context, phyloBARCODER accurately identifies (1) species or classification of query sequences and (2) anonymous sequences associated with the same species or even with populations of query sequences, with clear and accurate explanations. Version 1 of phyloBARCODER stores a database comprising all eukaryotic mitochondrial gene sequences. Moreover, by uploading their own databases, phyloBARCODER users can conduct species identification specialized for sequences obtained from a local geographic region or those of non-mitochondrial genes, e.g., ITS or rbcL.

5.
Mol Phylogenet Evol ; 197: 108111, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38801965

ABSTRACT

Swallows (Hirundinidae) are a globally distributed family of passerine birds that exhibit remarkable similarity in body shape but tremendous variation in plumage, sociality, nesting behavior, and migratory strategies. As a result, swallow species have become models for empirical behavioral ecology and evolutionary studies, and variation across the Hirundinidae presents an excellent opportunity for comparative analyses of trait evolution. Exploiting this potential requires a comprehensive and well-resolved phylogenetic tree of the family. To address this need, we estimated swallow phylogeny using genetic data from thousands of ultraconserved element (UCE) loci sampled from nearly all recognized swallow species. Maximum likelihood, coalescent-based, and Bayesian approaches yielded a well-resolved phylogenetic tree to the generic level, with minor disagreement among inferences at the species level, which likely reflect ongoing population genetic processes. The UCE data were particularly useful in helping to resolve deep nodes, which previously confounded phylogenetic reconstruction efforts. Divergence time estimates from the improved swallow tree support a Miocene origin of the family, roughly 13 million years ago, with subsequent diversification of major groups in the late Miocene and Pliocene. Our estimates of historical biogeography support the hypothesis that swallows originated in the Afrotropics and have subsequently expanded across the globe, with major in situ diversification in Africa and a secondary major radiation following colonization of the Neotropics. Initial examination of nesting and sociality indicates that the origin of mud nesting - a relatively rare nest construction phenotype in birds - was a major innovation coincident with the origin of a clade giving rise to over 40% of extant swallow diversity. In contrast, transitions between social and solitary nesting appear less important for explaining patterns of diversification among swallows.


Subject(s)
Bayes Theorem , Phylogeny , Phylogeography , Swallows , Animals , Swallows/genetics , Swallows/classification , Likelihood Functions , Models, Genetic , Sequence Analysis, DNA , Evolution, Molecular
6.
Mol Phylogenet Evol ; 195: 108057, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38471598

ABSTRACT

Previous efforts to reconstruct evolutionary history of Palearctic ground squirrels within the genus Spermophilus have primarily relied on a single mitochondrial marker for phylogenetic data. In this study, we present the first phylogeny with comprehensive taxon sampling of Spermophilus via a conventional multilocus approach utilizing five mitochondrial and five nuclear markers. Through application of the multispecies coalescent model, we constructed a species tree revealing four distinct clades that diverged during the Late Miocene. These clades are 1) S. alaschanicus and S. dauricus from East Asia; 2) S. musicus and S. pygmaeus from East Europe and northwestern Central Asia; 3) the subgenus Colobotis found across Central Asia and its adjacent regions and encompassing S. brevicauda, S. erythrogenys, S. fulvus, S. major, S. pallidicauda, S. ralli, S. relictus, S. selevini, and S. vorontsovi sp. nov.; and 4) a Central/Eastern Europe and Asia Minor clade comprising S. citellus, S. taurensis, S. xanthoprymnus, S. suslicus, and S. odessanus. The latter clade lacked strong support owing to uncertainty of taxonomic placement of S. odessanus and S. suslicus. Resolving relationships within the subgenus Colobotis, which radiated rapidly, remains challenging likely because of incomplete lineage sorting and introgressive hybridization. Most of modern Spermophilus species diversified during the Early-Middle Pleistocene (2.2-1.0 million years ago). We propose a revised taxonomic classification for the genus Spermophilus by recognizing 18 species including a newly identified one (S. vorontsovi sp. nov.), which is found only in a limited area in the southeast of West Siberia. Employing genome-wide single-nucleotide polymorphism genotyping, we substantiated the role of the Ob River as a major barrier ensuring robust isolation of this taxon from S. erythrogenys. Despite its inherent limitations, the traditional multilocus approach remains a valuable tool for resolving relationships and can provide important insights into otherwise poorly understood groups. It is imperative to recognize that additional efforts are needed to definitively determine phylogenetic relationships between certain species of Palearctic ground squirrels.


Subject(s)
Genetic Introgression , Sciuridae , Animals , Siberia , Phylogeny , Sciuridae/genetics , Asia
7.
Syst Biol ; 2024 Feb 08.
Article in English | MEDLINE | ID: mdl-38330161

ABSTRACT

The evolution of gene families is complex, involving gene-level evolutionary events such as gene duplication, horizontal gene transfer, and gene loss (DTL), and other processes such as incomplete lineage sorting (ILS). Because of this, topological differences often exist between gene trees and species trees. A number of models have been recently developed to explain these discrepancies, the most realistic of which attempt to consider both gene-level events and ILS. When unified in a single model, the interaction between ILS and gene-level events can cause polymorphism in gene copy number, which we refer to as copy number hemiplasy (CNH). In this paper we extend the Wright-Fisher process to include duplications and losses over several species, and show that the probability of CNH for this process can be significant. We study how well two unified models - MLMSC (MultiLocus MultiSpecies Coalescent), which models CNH, and DLCoal (Duplication, Loss, and Coalescence), which does not - approximate the Wright-Fisher process with duplication and loss. We then study the effect of CNH on gene family evolution by comparing MLMSC and DLCoal. We generate comparable gene trees under both models, showing significant differences in various summary statistics; most importantly, CNH reduces the number of gene copies greatly. If this is not taken into account, the traditional method of estimating duplication rates (by counting the number of gene copies) becomes inaccurate. The simulated gene trees are also used for species tree inference with the summary methods ASTRAL and ASTRAL-Pro, demonstrating that their accuracy, based on CNH-unaware simulations calibrated on real data, may have been overestimated.

8.
Stat Appl Genet Mol Biol ; 23(1)2024 Jan 01.
Article in English | MEDLINE | ID: mdl-38366619

ABSTRACT

Methods based on the multi-species coalescent have been widely used in phylogenetic tree estimation using genome-scale DNA sequence data to understand the underlying evolutionary relationship between the sampled species. Evolutionary processes such as hybridization, which creates new species through interbreeding between two different species, necessitate inferring a species network instead of a species tree. A species tree is strictly bifurcating and thus fails to incorporate hybridization events which require an internal node of degree three. Hence, it is crucial to decide whether a tree or network analysis should be performed given a DNA sequence data set, a decision that is based on the presence of hybrid species in the sampled species. Although many methods have been proposed for hybridization detection, it is rare to find a technique that does so globally while considering a data generation mechanism that allows both hybridization and incomplete lineage sorting. In this paper, we consider hybridization and coalescence in a unified framework and propose a new test that can detect whether there are any hybrid species in a set of species of arbitrary size. Based on this global test of hybridization, one can decide whether a tree or network analysis is appropriate for a given data set.


Subject(s)
Biological Evolution , Hybridization, Genetic , Phylogeny , Models, Genetic
9.
Algorithms Mol Biol ; 19(1): 7, 2024 Feb 14.
Article in English | MEDLINE | ID: mdl-38355611

ABSTRACT

We present a novel problem, called MetaEC, which aims to infer gene-species assignments in a collection of partially leaf-labeled gene trees labels by minimizing the size of duplication episode clustering (EC). This problem is particularly relevant in metagenomics, where incomplete data often poses a challenge in the accurate reconstruction of gene histories. To solve MetaEC, we propose a polynomial time dynamic programming (DP) formulation that verifies the existence of a set of duplication episodes from a predefined set of episode candidates. In addition, we design a method to infer distributions of gene-species mappings. We then demonstrate how to use DP to design an algorithm that solves MetaEC. Although the algorithm is exponential in the worst case, we introduce a heuristic modification of the algorithm that provides a solution with the knowledge that it is exact. To evaluate our method, we perform two computational experiments on simulated and empirical data containing whole genome duplication events, showing that our algorithm is able to accurately infer the corresponding events.

10.
Ann Bot ; 133(5-6): 725-742, 2024 May 10.
Article in English | MEDLINE | ID: mdl-38365451

ABSTRACT

BACKGROUND AND AIMS: The grass genus Urochloa (Brachiaria) sensu lato includes forage crops that are important for beef and dairy industries in tropical and sub-tropical Africa, South America and Oceania/Australia. Economically important species include U. brizantha, U. decumbens, U. humidicola, U. mutica, U. arrecta, U. trichopus, U. mosambicensis and Megathyrsus maximus, all native to the African continent. Perennial growth habits, large, fast growing palatable leaves, intra- and interspecific morphological variability, apomictic reproductive systems and frequent polyploidy are widely shared within the genus. The combination of these traits probably favoured the selection for forage domestication and weediness, but trait emergence across Urochloa cannot be modelled, as a robust phylogenetic assessment of the genus has not been conducted. We aim to produce a phylogeny for Urochloa that includes all important forage species, and identify their closest wild relatives (crop wild relatives). Finally, we will use our phylogeny and available trait data to infer the ancestral states of important forage traits across Urochloa s.l. and model the evolution of forage syndromes across the genus. METHODS: Using a target enrichment sequencing approach (Angiosperm 353), we inferred a species-level phylogeny for Urochloa s.l., encompassing 54 species (~40 % of the genus) and outgroups. Phylogenies were inferred using a multispecies coalescent model and maximum likelihood method. We determined the phylogenetic placement of agriculturally important species and identified their closest wild relatives, or crop wild relatives, based on well-supported monophyly. Further, we mapped key traits associated with Urochloa forage crops to the species tree and estimated ancestral states for forage traits along branch lengths for continuous traits and at ancestral nodes in discrete traits. KEY RESULTS: Agricultural species belong to five independent clades, including U. brizantha and U. decumbens lying in a previously defined species complex. Crop wild relatives were identified for these clades supporting previous sub-generic groupings in Urochloa based on morphology. Using ancestral trait estimation models, we find that five morphological traits that correlate with forage potential (perennial growth habits, culm height, leaf size, a winged rachis and large seeds) independently evolved in forage clades. CONCLUSIONS: Urochloa s.l. is a highly diverse genus that contains numerous species with agricultural potential, including crop wild relatives that are currently underexploited. All forage species and their crop wild relatives naturally occur on the African continent and their conservation across their native distributions is essential. Genomic and phenotypic diversity in forage clade species and their wild relatives need to be better assessed both to develop conservation strategies and to exploit the diversity in the genus for improved sustainability in Urochloa cultivar production.


Subject(s)
Phylogeny , Brachiaria/genetics , Brachiaria/anatomy & histology , Brachiaria/growth & development , Africa , Biological Evolution , Poaceae/genetics , Poaceae/anatomy & histology , Genome, Plant
11.
J Math Biol ; 88(3): 29, 2024 02 19.
Article in English | MEDLINE | ID: mdl-38372830

ABSTRACT

Reticulations in a phylogenetic network represent processes such as gene flow, admixture, recombination and hybrid speciation. Extending definitions from the tree setting, an anomalous network is one in which some unrooted tree topology displayed in the network appears in gene trees with a lower frequency than a tree not displayed in the network. We investigate anomalous networks under the Network Multispecies Coalescent Model with possible correlated inheritance at reticulations. Focusing on subsets of 4 taxa, we describe a new algorithm to calculate quartet concordance factors on networks of any level, faster than previous algorithms because of its focus on 4 taxa. We then study topological properties required for a 4-taxon network to be anomalous, uncovering the key role of [Formula: see text]-cycles: cycles of 3 edges parent to a sister group of 2 taxa. Under the model of common inheritance, that is, when each gene tree coalesces within a species tree displayed in the network, we prove that 4-taxon networks are never anomalous. Under independent and various levels of correlated inheritance, we use simulations under realistic parameters to quantify the prevalence of anomalous 4-taxon networks, finding that truly anomalous networks are rare. At the same time, however, we find a significant fraction of networks close enough to the anomaly zone to appear anomalous, when considering the quartet concordance factors observed from a few hundred genes. These apparent anomalies may challenge network inference methods.


Subject(s)
Algorithms , Prevalence , Phylogeny
12.
Mol Phylogenet Evol ; 190: 107958, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37914032

ABSTRACT

Species delimitation is a powerful approach to assist taxonomic decisions in challenging taxa where species boundaries are hard to establish. European taxa of the blind mole rats (genus Nannospalax) display small morphological differences and complex chromosomal evolution at a shallow evolutionary divergence level. Previous analyses led to the recognition of 25 'forms' in their distribution area. We provide a comprehensive framework to improve knowledge on the evolutionary history and revise the taxonomy of European blind mole rats based on samples from all but three of the 25 forms. We sequenced two nuclear-encoded genetic regions and the whole mitochondrial cytochrome b gene for phylogenetic tree reconstructions using concatenation and coalescence-based species-tree estimations. The phylogenetic analyses confirmed that Aegean N. insularis belongs to N. superspecies xanthodon, and that it represents the second known species of this superspecies in Europe. Mainland taxa reached Europe from Asia Minor in two colonisation events corresponding to two superspecies-level taxa: N. superspecies monticola (taxon established herewith) reached Europe c. 2.1 million years ago (Mya) and was followed by N. superspecies leucodon (re-defined herewith) c. 1.5 Mya. Species delimitation allowed the clarification of the taxonomic contents of the above superspecies. N. superspecies monticola contains three species geographically confined to the western periphery of the distribution of blind mole rats, whereas N. superspecies leucodon is more speciose with six species and several additional subspecies. The observed geographic pattern hints at a robust peripatric speciation process and rapid chromosomal evolution. The present treatment is thus regarded as the minimum taxonomic content of each lineage, which can be further refined based on other sources of information such as karyological traits, crossbreeding experiments, etc. The species delimitation models also allowed the recognition of a hitherto unnamed blind mole rat taxon from Albania, described here as a new subspecies.


Subject(s)
Mammals , Mole Rats , Animals , Phylogeny , Mole Rats/genetics , Muridae , Asia
13.
Mol Phylogenet Evol ; 191: 107978, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38013068

ABSTRACT

The family Drosophilidae is one of the most important model systems in evolutionary biology. Thanks to advances in high-throughput sequencing technology, a number of molecular phylogenetic analyses have been undertaken by using large data sets of many genes and many species sampled across this family. Especially, recent analyses using genome sequences have depicted the family-wide skeleton phylogeny with high confidence. However, the taxon sampling is still insufficient for minor lineages and non-Drosophila genera. In this study, we carried out phylogenetic analyses using a large number of transcriptome-based nucleotide sequences, focusing on the largest, core tribe Drosophilini in the Drosophilidae. In our analyses, some noise factors against phylogenetic reconstruction were taken into account by removing putative paralogy from the datasets and examining the effects of missing data, i.e. gene occupancy and site coverage, and incomplete lineage sorting. The inferred phylogeny has newly resolved the following phylogenetic positions/relationships at the genomic scale: (i) the monophyly of the subgenus Siphlodora including Zaprionus flavofasciatus to be transferred therein; (ii) the paraphyly of the robusta and melanica species groups within a clade comprised of the robusta, melanica and quadrisetata groups and Z. flavofasciatus; (iii) Drosophila curviceps (representing the curviceps group), D. annulipes (the quadrilineata subgroup of the immigrans group) and D. maculinotata clustered into a clade sister to the Idiomyia + Scaptomyza clade, forming together the expanded Hawaiian drosophilid lineage; (iv) Dichaetophora tenuicauda (representing the lineage comprised of the Zygothrica genus group and Dichaetophora) placed as the sister to the clade of the expanded Hawaiian drosophilid lineage and Siphlodora; and (v) relationships of the subgenus Drosophila and the genus Zaprionus as follows: (Zaprionus, (the quadrilineata subgroup, ((D. sternopleuralis, the immigrans group proper), (the quinaria radiation, the tripunctata radiation)))). These results are to be incorporated into the so-far published phylogenomic tree as a backbone (constraint) tree for grafting much more species based on sequences of a limited number of genes. Such a comprehensive, highly confident phylogenetic tree with extensive and dense taxon sampling will provide an essential framework for comparative studies of the Drosophilidae.


Subject(s)
Drosophilidae , Animals , Drosophilidae/genetics , Phylogeny , Transcriptome , Drosophila/genetics , Biological Evolution , Skeleton
14.
Algorithms Mol Biol ; 18(1): 16, 2023 Nov 08.
Article in English | MEDLINE | ID: mdl-37940998

ABSTRACT

BACKGROUND: Evolutionary scenarios describing the evolution of a family of genes within a collection of species comprise the mapping of the vertices of a gene tree T to vertices and edges of a species tree S. The relative timing of the last common ancestors of two extant genes (leaves of T) and the last common ancestors of the two species (leaves of S) in which they reside is indicative of horizontal gene transfers (HGT) and ancient duplications. Orthologous gene pairs, on the other hand, require that their last common ancestors coincides with a corresponding speciation event. The relative timing information of gene and species divergences is captured by three colored graphs that have the extant genes as vertices and the species in which the genes are found as vertex colors: the equal-divergence-time (EDT) graph, the later-divergence-time (LDT) graph and the prior-divergence-time (PDT) graph, which together form an edge partition of the complete graph. RESULTS: Here we give a complete characterization in terms of informative and forbidden triples that can be read off the three graphs and provide a polynomial time algorithm for constructing an evolutionary scenario that explains the graphs, provided such a scenario exists. While both LDT and PDT graphs are cographs, this is not true for the EDT graph in general. We show that every EDT graph is perfect. While the information about LDT and PDT graphs is necessary to recognize EDT graphs in polynomial-time for general scenarios, this extra information can be dropped in the HGT-free case. However, recognition of EDT graphs without knowledge of putative LDT and PDT graphs is NP-complete for general scenarios. In contrast, PDT graphs can be recognized in polynomial-time. We finally connect the EDT graph to the alternative definitions of orthology that have been proposed for scenarios with horizontal gene transfer. With one exception, the corresponding graphs are shown to be colored cographs.

15.
Front Plant Sci ; 14: 1308126, 2023.
Article in English | MEDLINE | ID: mdl-38023848
16.
Mol Biol Evol ; 40(11)2023 Nov 03.
Article in English | MEDLINE | ID: mdl-37879113

ABSTRACT

In phylogenomics, incongruences between gene trees, resulting from both artifactual and biological reasons, can decrease the signal-to-noise ratio and complicate species tree inference. The amount of data handled today in classical phylogenomic analyses precludes manual error detection and removal. However, a simple and efficient way to automate the identification of outliers from a collection of gene trees is still missing. Here, we present PhylteR, a method that allows rapid and accurate detection of outlier sequences in phylogenomic datasets, i.e. species from individual gene trees that do not follow the general trend. PhylteR relies on DISTATIS, an extension of multidimensional scaling to 3 dimensions to compare multiple distance matrices at once. In PhylteR, these distance matrices extracted from individual gene phylogenies represent evolutionary distances between species according to each gene. On simulated datasets, we show that PhylteR identifies outliers with more sensitivity and precision than a comparable existing method. We also show that PhylteR is not sensitive to ILS-induced incongruences, which is a desirable feature. On a biological dataset of 14,463 genes for 53 species previously assembled for Carnivora phylogenomics, we show (i) that PhylteR identifies as outliers sequences that can be considered as such by other means, and (ii) that the removal of these sequences improves the concordance between the gene trees and the species tree. Thanks to the generation of numerous graphical outputs, PhylteR also allows for the rapid and easy visual characterization of the dataset at hand, thus aiding in the precise identification of errors. PhylteR is distributed as an R package on CRAN and as containerized versions (docker and singularity).


Subject(s)
Biological Evolution , Phylogeny
17.
J Comput Biol ; 30(11): 1146-1181, 2023 11.
Article in English | MEDLINE | ID: mdl-37902986

ABSTRACT

We address the problem of rooting an unrooted species tree given a set of unrooted gene trees, under the assumption that gene trees evolve within the model species tree under the multispecies coalescent (MSC) model. Quintet Rooting (QR) is a polynomial time algorithm that was recently proposed for this problem, which is based on the theory developed by Allman, Degnan, and Rhodes that proves the identifiability of rooted 5-taxon trees from unrooted gene trees under the MSC. However, although QR had good accuracy in simulations, its statistical consistency was left as an open problem. We present QR-STAR, a variant of QR with an additional step and a different cost function, and prove that it is statistically consistent under the MSC. Moreover, we derive sample complexity bounds for QR-STAR and show that a particular variant of it based on "short quintets" has polynomial sample complexity. Finally, our simulation study under a variety of model conditions shows that QR-STAR matches or improves on the accuracy of QR. QR-STAR is available in open-source form on github.


Subject(s)
Algorithms , Models, Genetic , Phylogeny , Computer Simulation
18.
Syst Biol ; 2023 Oct 06.
Article in English | MEDLINE | ID: mdl-37804132

ABSTRACT

Can knowledge about genome architecture inform biogeographic and phylogenetic inference? Selection, drift, recombination, and gene flow interact to produce a genomic landscape of divergence wherein patterns of differentiation and genealogy vary nonrandomly across the genomes of diverging populations. For instance, genealogical patterns that arise due to gene flow should be more likely to occur on smaller chromosomes, which experience high recombination, whereas those tracking histories of geographic isolation (reduced gene flow caused by a barrier) and divergence should be more likely to occur on larger and sex chromosomes. In Amazonia, populations of many bird species diverge and introgress across rivers, resulting in reticulated genomic signals. Herein, we used reduced representation genomic data to disentangle the evolutionary history of four populations of an Amazonian antbird, Thamnophilus aethiops, whose biogeographic history was associated with the dynamic evolution of the Madeira River Basin. Specifically, we evaluate whether a large river capture event ca. 200 Ka, gave rise to reticulated genealogies in the genome by making spatially explicit predictions about isolation and gene flow based on knowledge about genomic processes. We first estimated chromosome-level phylogenies and recovered two primary topologies across the genome. The first topology (T1) was most consistent with predictions about population divergence and was recovered for the Z chromosome. The second (T2), was consistent with predictions about gene flow upon secondary contact. To evaluate support for these topologies, we trained a convolutional neural network to classify our data into alternative diversification models and estimate demographic parameters. The best-fit model was concordant with T1 and included gene flow between non-sister taxa. Finally, we modeled levels of divergence and introgression as functions of chromosome length and found that smaller chromosomes experienced higher gene flow. Given that (1) gene-trees supporting T2 were more likely to occur on smaller chromosomes and (2) we found lower levels of introgression on larger chromosomes (and especially the Z-chromosome), we argue that T1 represents the history of population divergence across rivers and T2 the history of secondary contact due to barrier loss. Our results suggest that a significant portion of genomic heterogeneity arises due to extrinsic biogeographic processes such as river capture interacting with intrinsic processes associated with genome architecture. Future phylogeographic studies would benefit from accounting for genomic processes, as different parts of the genome reveal contrasting, albeit complementary histories, all of which are relevant for disentangling the intricate geogenomic mechanisms of biotic diversification.

19.
Mol Phylogenet Evol ; 189: 107927, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37714443

ABSTRACT

Rapid divergence and subsequent reoccurring patterns of gene flow can complicate our ability to discern phylogenetic relationships among closely related species. To what degree such patterns may differ across the genome can provide an opportunity to extrapolate better how life history constraints may influence species boundaries. By exploring differences between autosomal and Z (or X) chromosomal-derived phylogenetic patterns, we can better identify factors that may limit introgression despite patterns of incomplete lineage sorting among closely related taxa. Here, using a whole-genome resequencing approach coupled with an exhaustive sampling of subspecies within the recently divergent prairie grouse complex (genus: Tympanuchus), including the extinct Heath Hen (T. cupido cupido), we show that their phylogenomic history differs depending on autosomal or Z-chromosome partitioned SNPs. Because the Heath Hen was allopatric relative to the other prairie grouse taxa, its phylogenetic signature should not be influenced by gene flow. In contrast, all the other extant prairie grouse taxa, except Attwater's Prairie-chicken (T. c. attwateri), possess overlapping contemporary geographic distributions and have been known to hybridize. After excluding samples that were likely translocated prairie grouse from the Midwest to the eastern coastal states or their resulting hybrids with mainland Heath Hens, species tree analyses based on autosomal SNPs consistently identified a paraphyletic relationship with regard to the Heath Hen with Lesser Prairie-chicken (T. pallidicinctus) sister to Greater Prairie-chicken (T. c. pinnatus) regardless of genic or intergenic partitions. In contrast, species trees based on the Z-chromosome were consistent with Heath Hen sister to a clade that included its conspecifics, Greater and Attwater's Prairie-chickens (T. c. attwateri). These results were further explained by historic gene flow, as shown with an excess of autosomal SNPs shared between Lesser and Greater Prairie-chickens but not with the Z-chromosome. Phylogenetic placement of Sharp-tailed Grouse (T. phasianellus), however, did not differ among analyses and was sister to a clade that included all other prairie grouse despite low levels of autosomal gene flow with Greater Prairie-chicken. These results, along with strong sexual selection (i.e., male hybrid behavioral isolation) and a lek breeding system (i.e., high variance in male mating success), are consistent with a pattern of female-biased introgression between prairie grouse taxa with overlapping geographic distributions. Additional study is warranted to explore how genomic components associated with the Z-chromosome influence the phenotype and thereby impact species limits among prairie grouse taxa despite ongoing contemporary gene flow.


Subject(s)
Chickens , Grassland , Animals , Female , Phylogeny
20.
bioRxiv ; 2023 Aug 21.
Article in English | MEDLINE | ID: mdl-37662314

ABSTRACT

Reticulations in a phylogenetic network represent processes such as gene flow, admixture, recombination and hybrid speciation. Extending definitions from the tree setting, an anomalous network is one in which some unrooted tree topology displayed in the network appears in gene trees with a lower frequency than a tree not displayed in the network. We investigate anomalous networks under the Network Multispecies Coalescent Model with possible correlated inheritance at reticulations. Focusing on subsets of 4 taxa, we describe a new algorithm to calculate quartet concordance factors on networks of any level, faster than previous algorithms because of its focus on 4 taxa. We then study topological properties required for a 4-taxon network to be anomalous, uncovering the key role of 32-cycles: cycles of 3 edges parent to a sister group of 2 taxa. Under the model of common inheritance, that is, when each gene tree coalesces within a species tree displayed in the network, we prove that 4-taxon networks are never anomalous. Under independent and various levels of correlated inheritance, we use simulations under realistic parameters to quantify the prevalence of anomalous 4-taxon networks, finding that truly anomalous networks are rare. At the same time, however, we find a significant fraction of networks close enough to the anomaly zone to appear anomalous, when considering the quartet concordance factors observed from a few hundred genes. These apparent anomalies may challenge network inference methods.

SELECTION OF CITATIONS
SEARCH DETAIL