Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 315
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Mol Biol Evol ; 41(8)2024 Aug 02.
Artigo em Inglês | MEDLINE | ID: mdl-38850168

RESUMO

We developed phyloBARCODER (https://github.com/jun-inoue/phyloBARCODER), a new web tool that can identify short DNA sequences to the species level using metabarcoding. phyloBARCODER estimates phylogenetic trees based on the uploaded anonymous DNA sequences and reference sequences from databases. Without such phylogenetic contexts, alternative, similarity-based methods independently identify species names and anonymous sequences of the same group by pairwise comparisons between queries and database sequences, with the caveat that they must match exactly or very closely. By putting metabarcoding sequences into a phylogenetic context, phyloBARCODER accurately identifies (i) species or classification of query sequences and (ii) anonymous sequences associated with the same species or even with populations of query sequences, with clear and accurate explanations. Version 1 of phyloBARCODER stores a database comprising all eukaryotic mitochondrial gene sequences. Moreover, by uploading their own databases, phyloBARCODER users can conduct species identification specialized for sequences obtained from a local geographic region or those of nonmitochondrial genes, e.g. ITS or rbcL.


Assuntos
Código de Barras de DNA Taxonômico , Eucariotos , Filogenia , Eucariotos/genética , Eucariotos/classificação , Código de Barras de DNA Taxonômico/métodos , Software , Bases de Dados Genéticas , Internet , Bases de Dados de Ácidos Nucleicos
2.
Syst Biol ; 73(5): 807-822, 2024 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-38940001

RESUMO

Maximum likelihood (ML) phylogenetic inference is widely used in phylogenomics. As heuristic searches most likely find suboptimal trees, it is recommended to conduct multiple (e.g., 10) tree searches in phylogenetic analyses. However, beyond its positive role, how and to what extent multiple tree searches aid ML phylogenetic inference remains poorly explored. Here, we found that a random starting tree was not as effective as the BioNJ and parsimony starting trees in inferring the ML gene tree and that RAxML-NG and PhyML were less sensitive to different starting trees than IQ-TREE. We then examined the effect of the number of tree searches on ML tree inference with IQ-TREE and RAxML-NG, by running 100 tree searches on 19,414 gene alignments from 15 animal, plant, and fungal phylogenomic datasets. We found that the number of tree searches substantially impacted the recovery of the best-of-100 ML gene tree topology among 100 searches for a given ML program. In addition, all of the concatenation-based trees were topologically identical if the number of tree searches was ≥10. Quartet-based ASTRAL trees inferred from 1 to 80 tree searches differed topologically from those inferred from 100 tree searches for 6/15 phylogenomic datasets. Finally, our simulations showed that gene alignments with lower difficulty scores had a higher chance of finding the best-of-100 gene tree topology and were more likely to yield the correct trees.


Assuntos
Classificação , Filogenia , Classificação/métodos , Funções Verossimilhança , Animais , Genômica/métodos , Plantas/classificação , Plantas/genética
3.
Syst Biol ; 73(2): 355-374, 2024 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-38330161

RESUMO

The evolution of gene families is complex, involving gene-level evolutionary events such as gene duplication, horizontal gene transfer, and gene loss, and other processes such as incomplete lineage sorting (ILS). Because of this, topological differences often exist between gene trees and species trees. A number of models have been recently developed to explain these discrepancies, the most realistic of which attempts to consider both gene-level events and ILS. When unified in a single model, the interaction between ILS and gene-level events can cause polymorphism in gene copy number, which we refer to as copy number hemiplasy (CNH). In this paper, we extend the Wright-Fisher process to include duplications and losses over several species, and show that the probability of CNH for this process can be significant. We study how well two unified models-multilocus multispecies coalescent (MLMSC), which models CNH, and duplication, loss, and coalescence (DLCoal), which does not-approximate the Wright-Fisher process with duplication and loss. We then study the effect of CNH on gene family evolution by comparing MLMSC and DLCoal. We generate comparable gene trees under both models, showing significant differences in various summary statistics; most importantly, CNH reduces the number of gene copies greatly. If this is not taken into account, the traditional method of estimating duplication rates (by counting the number of gene copies) becomes inaccurate. The simulated gene trees are also used for species tree inference with the summary methods ASTRAL and ASTRAL-Pro, demonstrating that their accuracy, based on CNH-unaware simulations calibrated on real data, may have been overestimated.


Assuntos
Evolução Molecular , Dosagem de Genes , Modelos Genéticos , Duplicação Gênica , Família Multigênica , Filogenia , Classificação/métodos , Simulação por Computador
4.
Stat Appl Genet Mol Biol ; 23(1)2024 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-38366619

RESUMO

Methods based on the multi-species coalescent have been widely used in phylogenetic tree estimation using genome-scale DNA sequence data to understand the underlying evolutionary relationship between the sampled species. Evolutionary processes such as hybridization, which creates new species through interbreeding between two different species, necessitate inferring a species network instead of a species tree. A species tree is strictly bifurcating and thus fails to incorporate hybridization events which require an internal node of degree three. Hence, it is crucial to decide whether a tree or network analysis should be performed given a DNA sequence data set, a decision that is based on the presence of hybrid species in the sampled species. Although many methods have been proposed for hybridization detection, it is rare to find a technique that does so globally while considering a data generation mechanism that allows both hybridization and incomplete lineage sorting. In this paper, we consider hybridization and coalescence in a unified framework and propose a new test that can detect whether there are any hybrid species in a set of species of arbitrary size. Based on this global test of hybridization, one can decide whether a tree or network analysis is appropriate for a given data set.


Assuntos
Evolução Biológica , Hibridização Genética , Filogenia , Modelos Genéticos
5.
Mol Biol Evol ; 40(11)2023 Nov 03.
Artigo em Inglês | MEDLINE | ID: mdl-37879113

RESUMO

In phylogenomics, incongruences between gene trees, resulting from both artifactual and biological reasons, can decrease the signal-to-noise ratio and complicate species tree inference. The amount of data handled today in classical phylogenomic analyses precludes manual error detection and removal. However, a simple and efficient way to automate the identification of outliers from a collection of gene trees is still missing. Here, we present PhylteR, a method that allows rapid and accurate detection of outlier sequences in phylogenomic datasets, i.e. species from individual gene trees that do not follow the general trend. PhylteR relies on DISTATIS, an extension of multidimensional scaling to 3 dimensions to compare multiple distance matrices at once. In PhylteR, these distance matrices extracted from individual gene phylogenies represent evolutionary distances between species according to each gene. On simulated datasets, we show that PhylteR identifies outliers with more sensitivity and precision than a comparable existing method. We also show that PhylteR is not sensitive to ILS-induced incongruences, which is a desirable feature. On a biological dataset of 14,463 genes for 53 species previously assembled for Carnivora phylogenomics, we show (i) that PhylteR identifies as outliers sequences that can be considered as such by other means, and (ii) that the removal of these sequences improves the concordance between the gene trees and the species tree. Thanks to the generation of numerous graphical outputs, PhylteR also allows for the rapid and easy visual characterization of the dataset at hand, thus aiding in the precise identification of errors. PhylteR is distributed as an R package on CRAN and as containerized versions (docker and singularity).


Assuntos
Evolução Biológica , Filogenia
6.
Mol Phylogenet Evol ; 191: 107978, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38013068

RESUMO

The family Drosophilidae is one of the most important model systems in evolutionary biology. Thanks to advances in high-throughput sequencing technology, a number of molecular phylogenetic analyses have been undertaken by using large data sets of many genes and many species sampled across this family. Especially, recent analyses using genome sequences have depicted the family-wide skeleton phylogeny with high confidence. However, the taxon sampling is still insufficient for minor lineages and non-Drosophila genera. In this study, we carried out phylogenetic analyses using a large number of transcriptome-based nucleotide sequences, focusing on the largest, core tribe Drosophilini in the Drosophilidae. In our analyses, some noise factors against phylogenetic reconstruction were taken into account by removing putative paralogy from the datasets and examining the effects of missing data, i.e. gene occupancy and site coverage, and incomplete lineage sorting. The inferred phylogeny has newly resolved the following phylogenetic positions/relationships at the genomic scale: (i) the monophyly of the subgenus Siphlodora including Zaprionus flavofasciatus to be transferred therein; (ii) the paraphyly of the robusta and melanica species groups within a clade comprised of the robusta, melanica and quadrisetata groups and Z. flavofasciatus; (iii) Drosophila curviceps (representing the curviceps group), D. annulipes (the quadrilineata subgroup of the immigrans group) and D. maculinotata clustered into a clade sister to the Idiomyia + Scaptomyza clade, forming together the expanded Hawaiian drosophilid lineage; (iv) Dichaetophora tenuicauda (representing the lineage comprised of the Zygothrica genus group and Dichaetophora) placed as the sister to the clade of the expanded Hawaiian drosophilid lineage and Siphlodora; and (v) relationships of the subgenus Drosophila and the genus Zaprionus as follows: (Zaprionus, (the quadrilineata subgroup, ((D. sternopleuralis, the immigrans group proper), (the quinaria radiation, the tripunctata radiation)))). These results are to be incorporated into the so-far published phylogenomic tree as a backbone (constraint) tree for grafting much more species based on sequences of a limited number of genes. Such a comprehensive, highly confident phylogenetic tree with extensive and dense taxon sampling will provide an essential framework for comparative studies of the Drosophilidae.


Assuntos
Drosophilidae , Animais , Drosophilidae/genética , Filogenia , Transcriptoma , Drosophila/genética , Evolução Biológica , Esqueleto
7.
Mol Phylogenet Evol ; 195: 108057, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38471598

RESUMO

Previous efforts to reconstruct evolutionary history of Palearctic ground squirrels within the genus Spermophilus have primarily relied on a single mitochondrial marker for phylogenetic data. In this study, we present the first phylogeny with comprehensive taxon sampling of Spermophilus via a conventional multilocus approach utilizing five mitochondrial and five nuclear markers. Through application of the multispecies coalescent model, we constructed a species tree revealing four distinct clades that diverged during the Late Miocene. These clades are 1) S. alaschanicus and S. dauricus from East Asia; 2) S. musicus and S. pygmaeus from East Europe and northwestern Central Asia; 3) the subgenus Colobotis found across Central Asia and its adjacent regions and encompassing S. brevicauda, S. erythrogenys, S. fulvus, S. major, S. pallidicauda, S. ralli, S. relictus, S. selevini, and S. vorontsovi sp. nov.; and 4) a Central/Eastern Europe and Asia Minor clade comprising S. citellus, S. taurensis, S. xanthoprymnus, S. suslicus, and S. odessanus. The latter clade lacked strong support owing to uncertainty of taxonomic placement of S. odessanus and S. suslicus. Resolving relationships within the subgenus Colobotis, which radiated rapidly, remains challenging likely because of incomplete lineage sorting and introgressive hybridization. Most of modern Spermophilus species diversified during the Early-Middle Pleistocene (2.2-1.0 million years ago). We propose a revised taxonomic classification for the genus Spermophilus by recognizing 18 species including a newly identified one (S. vorontsovi sp. nov.), which is found only in a limited area in the southeast of West Siberia. Employing genome-wide single-nucleotide polymorphism genotyping, we substantiated the role of the Ob River as a major barrier ensuring robust isolation of this taxon from S. erythrogenys. Despite its inherent limitations, the traditional multilocus approach remains a valuable tool for resolving relationships and can provide important insights into otherwise poorly understood groups. It is imperative to recognize that additional efforts are needed to definitively determine phylogenetic relationships between certain species of Palearctic ground squirrels.


Assuntos
Introgressão Genética , Sciuridae , Animais , Sibéria , Filogenia , Sciuridae/genética , Ásia
8.
Mol Phylogenet Evol ; 190: 107958, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37914032

RESUMO

Species delimitation is a powerful approach to assist taxonomic decisions in challenging taxa where species boundaries are hard to establish. European taxa of the blind mole rats (genus Nannospalax) display small morphological differences and complex chromosomal evolution at a shallow evolutionary divergence level. Previous analyses led to the recognition of 25 'forms' in their distribution area. We provide a comprehensive framework to improve knowledge on the evolutionary history and revise the taxonomy of European blind mole rats based on samples from all but three of the 25 forms. We sequenced two nuclear-encoded genetic regions and the whole mitochondrial cytochrome b gene for phylogenetic tree reconstructions using concatenation and coalescence-based species-tree estimations. The phylogenetic analyses confirmed that Aegean N. insularis belongs to N. superspecies xanthodon, and that it represents the second known species of this superspecies in Europe. Mainland taxa reached Europe from Asia Minor in two colonisation events corresponding to two superspecies-level taxa: N. superspecies monticola (taxon established herewith) reached Europe c. 2.1 million years ago (Mya) and was followed by N. superspecies leucodon (re-defined herewith) c. 1.5 Mya. Species delimitation allowed the clarification of the taxonomic contents of the above superspecies. N. superspecies monticola contains three species geographically confined to the western periphery of the distribution of blind mole rats, whereas N. superspecies leucodon is more speciose with six species and several additional subspecies. The observed geographic pattern hints at a robust peripatric speciation process and rapid chromosomal evolution. The present treatment is thus regarded as the minimum taxonomic content of each lineage, which can be further refined based on other sources of information such as karyological traits, crossbreeding experiments, etc. The species delimitation models also allowed the recognition of a hitherto unnamed blind mole rat taxon from Albania, described here as a new subspecies.


Assuntos
Mamíferos , Ratos-Toupeira , Animais , Filogenia , Ratos-Toupeira/genética , Muridae , Ásia
9.
Mol Phylogenet Evol ; 197: 108111, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38801965

RESUMO

Swallows (Hirundinidae) are a globally distributed family of passerine birds that exhibit remarkable similarity in body shape but tremendous variation in plumage, sociality, nesting behavior, and migratory strategies. As a result, swallow species have become models for empirical behavioral ecology and evolutionary studies, and variation across the Hirundinidae presents an excellent opportunity for comparative analyses of trait evolution. Exploiting this potential requires a comprehensive and well-resolved phylogenetic tree of the family. To address this need, we estimated swallow phylogeny using genetic data from thousands of ultraconserved element (UCE) loci sampled from nearly all recognized swallow species. Maximum likelihood, coalescent-based, and Bayesian approaches yielded a well-resolved phylogenetic tree to the generic level, with minor disagreement among inferences at the species level, which likely reflect ongoing population genetic processes. The UCE data were particularly useful in helping to resolve deep nodes, which previously confounded phylogenetic reconstruction efforts. Divergence time estimates from the improved swallow tree support a Miocene origin of the family, roughly 13 million years ago, with subsequent diversification of major groups in the late Miocene and Pliocene. Our estimates of historical biogeography support the hypothesis that swallows originated in the Afrotropics and have subsequently expanded across the globe, with major in situ diversification in Africa and a secondary major radiation following colonization of the Neotropics. Initial examination of nesting and sociality indicates that the origin of mud nesting - a relatively rare nest construction phenotype in birds - was a major innovation coincident with the origin of a clade giving rise to over 40% of extant swallow diversity. In contrast, transitions between social and solitary nesting appear less important for explaining patterns of diversification among swallows.


Assuntos
Teorema de Bayes , Filogenia , Filogeografia , Andorinhas , Animais , Andorinhas/genética , Andorinhas/classificação , Funções Verossimilhança , Modelos Genéticos , Análise de Sequência de DNA , Evolução Molecular
10.
Mol Phylogenet Evol ; 201: 108197, 2024 Sep 11.
Artigo em Inglês | MEDLINE | ID: mdl-39270765

RESUMO

Phylogenomics has enriched our understanding that the Tree of Life can have network-like or reticulate structures among some taxa and genes. Two non-vertical modes of evolution - hybridization/introgression and horizontal gene transfer - deviate from a strictly bifurcating tree model, causing non-treelike patterns. However, these reticulate processes can produce similar patterns to incomplete lineage sorting or recombination, potentially leading to ambiguity. Here, we present a brief overview of a phylogenomic workflow for inferring organismal histories and compare methods for distinguishing modes of reticulate evolution. We discuss how the timing of coalescent events can help disentangle introgression from incomplete lineage sorting and how horizontal gene transfer events can help determine the relative timing of speciation events. In doing so, we identify pitfalls of certain methods and discuss how to extend their utility across the Tree of Life. Workflows, methods, and future directions discussed herein underscore the need to embrace reticulate evolutionary patterns for understanding the timing and rates of evolutionary events, providing a clearer view of life's history.

11.
Syst Biol ; 72(6): 1220-1232, 2023 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-37449764

RESUMO

Despite the economic, ecological, and scientific importance of the genera Salix L. (willows) and Populus L. (poplars, cottonwoods, and aspens) Salicaceae, we know little about the sources of differences in species diversity between the genera and of the phylogenetic conflict that often confounds estimating phylogenetic trees. Salix subgenera and sections, in particular, have been difficult to classify, with one recent attempt termed a "spectacular failure" due to a speculated radiation of the subgenera Vetrix and Chamaetia. Here, we use targeted sequence capture to understand the evolutionary history of this portion of the Salicaceae plant family. Our phylogenetic hypothesis was based on 787 gene regions and identified extensive phylogenetic conflict among genes. Our analysis supported some previously described subgeneric relationships and confirmed the polyphyly of others. Using an fbranch analysis, we identified several cases of hybridization in deep branches of the phylogeny, which likely contributed to discordance among gene trees. In addition, we identified a rapid increase in diversification rate near the origination of the Vetrix-Chamaetia clade in Salix. This region of the tree coincided with several nodes that lacked strong statistical support, indicating a possible increase in incomplete lineage sorting due to rapid diversification. The extraordinary level of both recent and ancient hybridization in both Salix and Populus have played important roles in the diversification and diversity in these two genera.


Assuntos
Populus , Salix , Filogenia , Salix/genética , Populus/genética , Evolução Biológica , Hibridização Genética
12.
Syst Biol ; 72(5): 1171-1179, 2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37254872

RESUMO

We consider the evolution of phylogenetic gene trees along phylogenetic species networks, according to the network multispecies coalescent process, and introduce a new network coalescent model with correlated inheritance of gene flow. This model generalizes two traditional versions of the network coalescent: with independent or common inheritance. At each reticulation, multiple lineages of a given locus are inherited from parental populations chosen at random, either independently across lineages or with positive correlation according to a Dirichlet process. This process may account for locus-specific probabilities of inheritance, for example. We implemented the simulation of gene trees under these network coalescent models in the Julia package PhyloCoalSimulations, which depends on PhyloNetworks and its powerful network manipulation tools. Input species phylogenies can be read in extended Newick format, either in numbers of generations or in coalescent units. Simulated gene trees can be written in Newick format, and in a way that preserves information about their embedding within the species network. This embedding can be used for downstream purposes, such as to simulate species-specific processes like rate variation across species, or for other scenarios as illustrated in this note. This package should be useful for simulation studies and simulation-based inference methods. The software is available open source with documentation and a tutorial at https://github.com/cecileane/PhyloCoalSimulations.jl.


Assuntos
Fluxo Gênico , Software , Filogenia , Simulação por Computador , Probabilidade , Modelos Genéticos
13.
Syst Biol ; 2023 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-37804132

RESUMO

Can knowledge about genome architecture inform biogeographic and phylogenetic inference? Selection, drift, recombination, and gene flow interact to produce a genomic landscape of divergence wherein patterns of differentiation and genealogy vary nonrandomly across the genomes of diverging populations. For instance, genealogical patterns that arise due to gene flow should be more likely to occur on smaller chromosomes, which experience high recombination, whereas those tracking histories of geographic isolation (reduced gene flow caused by a barrier) and divergence should be more likely to occur on larger and sex chromosomes. In Amazonia, populations of many bird species diverge and introgress across rivers, resulting in reticulated genomic signals. Herein, we used reduced representation genomic data to disentangle the evolutionary history of four populations of an Amazonian antbird, Thamnophilus aethiops, whose biogeographic history was associated with the dynamic evolution of the Madeira River Basin. Specifically, we evaluate whether a large river capture event ca. 200 Ka, gave rise to reticulated genealogies in the genome by making spatially explicit predictions about isolation and gene flow based on knowledge about genomic processes. We first estimated chromosome-level phylogenies and recovered two primary topologies across the genome. The first topology (T1) was most consistent with predictions about population divergence and was recovered for the Z chromosome. The second (T2), was consistent with predictions about gene flow upon secondary contact. To evaluate support for these topologies, we trained a convolutional neural network to classify our data into alternative diversification models and estimate demographic parameters. The best-fit model was concordant with T1 and included gene flow between non-sister taxa. Finally, we modeled levels of divergence and introgression as functions of chromosome length and found that smaller chromosomes experienced higher gene flow. Given that (1) gene-trees supporting T2 were more likely to occur on smaller chromosomes and (2) we found lower levels of introgression on larger chromosomes (and especially the Z-chromosome), we argue that T1 represents the history of population divergence across rivers and T2 the history of secondary contact due to barrier loss. Our results suggest that a significant portion of genomic heterogeneity arises due to extrinsic biogeographic processes such as river capture interacting with intrinsic processes associated with genome architecture. Future phylogeographic studies would benefit from accounting for genomic processes, as different parts of the genome reveal contrasting, albeit complementary histories, all of which are relevant for disentangling the intricate geogenomic mechanisms of biotic diversification.

14.
Ann Bot ; 133(5-6): 725-742, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38365451

RESUMO

BACKGROUND AND AIMS: The grass genus Urochloa (Brachiaria) sensu lato includes forage crops that are important for beef and dairy industries in tropical and sub-tropical Africa, South America and Oceania/Australia. Economically important species include U. brizantha, U. decumbens, U. humidicola, U. mutica, U. arrecta, U. trichopus, U. mosambicensis and Megathyrsus maximus, all native to the African continent. Perennial growth habits, large, fast growing palatable leaves, intra- and interspecific morphological variability, apomictic reproductive systems and frequent polyploidy are widely shared within the genus. The combination of these traits probably favoured the selection for forage domestication and weediness, but trait emergence across Urochloa cannot be modelled, as a robust phylogenetic assessment of the genus has not been conducted. We aim to produce a phylogeny for Urochloa that includes all important forage species, and identify their closest wild relatives (crop wild relatives). Finally, we will use our phylogeny and available trait data to infer the ancestral states of important forage traits across Urochloa s.l. and model the evolution of forage syndromes across the genus. METHODS: Using a target enrichment sequencing approach (Angiosperm 353), we inferred a species-level phylogeny for Urochloa s.l., encompassing 54 species (~40 % of the genus) and outgroups. Phylogenies were inferred using a multispecies coalescent model and maximum likelihood method. We determined the phylogenetic placement of agriculturally important species and identified their closest wild relatives, or crop wild relatives, based on well-supported monophyly. Further, we mapped key traits associated with Urochloa forage crops to the species tree and estimated ancestral states for forage traits along branch lengths for continuous traits and at ancestral nodes in discrete traits. KEY RESULTS: Agricultural species belong to five independent clades, including U. brizantha and U. decumbens lying in a previously defined species complex. Crop wild relatives were identified for these clades supporting previous sub-generic groupings in Urochloa based on morphology. Using ancestral trait estimation models, we find that five morphological traits that correlate with forage potential (perennial growth habits, culm height, leaf size, a winged rachis and large seeds) independently evolved in forage clades. CONCLUSIONS: Urochloa s.l. is a highly diverse genus that contains numerous species with agricultural potential, including crop wild relatives that are currently underexploited. All forage species and their crop wild relatives naturally occur on the African continent and their conservation across their native distributions is essential. Genomic and phenotypic diversity in forage clade species and their wild relatives need to be better assessed both to develop conservation strategies and to exploit the diversity in the genus for improved sustainability in Urochloa cultivar production.


Assuntos
Filogenia , Brachiaria/genética , Brachiaria/anatomia & histologia , Brachiaria/crescimento & desenvolvimento , África , Evolução Biológica , Poaceae/genética , Poaceae/anatomia & histologia , Genoma de Planta
15.
J Math Biol ; 88(3): 29, 2024 02 19.
Artigo em Inglês | MEDLINE | ID: mdl-38372830

RESUMO

Reticulations in a phylogenetic network represent processes such as gene flow, admixture, recombination and hybrid speciation. Extending definitions from the tree setting, an anomalous network is one in which some unrooted tree topology displayed in the network appears in gene trees with a lower frequency than a tree not displayed in the network. We investigate anomalous networks under the Network Multispecies Coalescent Model with possible correlated inheritance at reticulations. Focusing on subsets of 4 taxa, we describe a new algorithm to calculate quartet concordance factors on networks of any level, faster than previous algorithms because of its focus on 4 taxa. We then study topological properties required for a 4-taxon network to be anomalous, uncovering the key role of [Formula: see text]-cycles: cycles of 3 edges parent to a sister group of 2 taxa. Under the model of common inheritance, that is, when each gene tree coalesces within a species tree displayed in the network, we prove that 4-taxon networks are never anomalous. Under independent and various levels of correlated inheritance, we use simulations under realistic parameters to quantify the prevalence of anomalous 4-taxon networks, finding that truly anomalous networks are rare. At the same time, however, we find a significant fraction of networks close enough to the anomaly zone to appear anomalous, when considering the quartet concordance factors observed from a few hundred genes. These apparent anomalies may challenge network inference methods.


Assuntos
Algoritmos , Prevalência , Filogenia
16.
Mol Biol Evol ; 39(8)2022 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-35907248

RESUMO

The multispecies coalescent (MSC) model accommodates both species divergences and within-species coalescent and provides a natural framework for phylogenetic analysis of genomic data when the gene trees vary across the genome. The MSC model implemented in the program bpp assumes a molecular clock and the Jukes-Cantor model, and is suitable for analyzing genomic data from closely related species. Here we extend our implementation to more general substitution models and relaxed clocks to allow the rate to vary among species. The MSC-with-relaxed-clock model allows the estimation of species divergence times and ancestral population sizes using genomic sequences sampled from contemporary species when the strict clock assumption is violated, and provides a simulation framework for evaluating species tree estimation methods. We conducted simulations and analyzed two real datasets to evaluate the utility of the new models. We confirm that the clock-JC model is adequate for inference of shallow trees with closely related species, but it is important to account for clock violation for distant species. Our simulation suggests that there is valuable phylogenetic information in the gene-tree branch lengths even if the molecular clock assumption is seriously violated, and the relaxed-clock models implemented in bpp are able to extract such information. Our Markov chain Monte Carlo algorithms suffer from mixing problems when used for species tree estimation under the relaxed clock and we discuss possible improvements. We conclude that the new models are currently most effective for estimating population parameters such as species divergence times when the species tree is fixed.


Assuntos
Modelos Genéticos , Teorema de Bayes , Simulação por Computador , Cadeias de Markov , Método de Monte Carlo , Filogenia
17.
Mol Biol Evol ; 39(1)2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34662403

RESUMO

Comparative genome-scale analyses of protein-coding gene sequences are employed to examine evidence for whole-genome duplication and horizontal gene transfer. For this purpose, an orthogroup should be delineated to infer evolutionary history regarding each gene, and results of all orthogroup analyses need to be integrated to infer a genome-scale history. An orthogroup is a set of genes descended from a single gene in the last common ancestor of all species under consideration. However, such analyses confront several problems: 1) Analytical pipelines to infer all gene histories with methods comparing species and gene trees are not fully developed, and 2) without detailed analyses within orthogroups, evolutionary events of paralogous genes in the same orthogroup cannot be distinguished for genome-wide integration of results derived from multiple orthogroup analyses. Here I present an analytical pipeline, ORTHOSCOPE* (star), to infer evolutionary histories of animal/plant genes from genome-scale data. ORTHOSCOPE* estimates a tree for a specified gene, detects speciation/gene duplication events that occurred at nodes belonging to only one lineage leading to a species of interest, and then integrates results derived from gene trees estimated for all query genes in genome-wide data. Thus, ORTHOSCOPE* can be used to detect species nodes just after whole-genome duplications as a first step of comparative genomic analyses. Moreover, by examining the presence or absence of genes belonging to species lineages with dense taxon sampling available from the ORTHOSCOPE web version, ORTHOSCOPE* can detect genes lost in specific lineages and horizontal gene transfers. This pipeline is available at https://github.com/jun-inoue/ORTHOSCOPE_STAR.


Assuntos
Duplicação Gênica , Genoma , Animais , Evolução Molecular , Transferência Genética Horizontal , Filogenia
18.
Mol Biol Evol ; 39(2)2022 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-35021210

RESUMO

Species tree inference from gene family trees is becoming increasingly popular because it can account for discordance between the species tree and the corresponding gene family trees. In particular, methods that can account for multiple-copy gene families exhibit potential to leverage paralogy as informative signal. At present, there does not exist any widely adopted inference method for this purpose. Here, we present SpeciesRax, the first maximum likelihood method that can infer a rooted species tree from a set of gene family trees and can account for gene duplication, loss, and transfer events. By explicitly modeling events by which gene trees can depart from the species tree, SpeciesRax leverages the phylogenetic rooting signal in gene trees. SpeciesRax infers species tree branch lengths in units of expected substitutions per site and branch support values via paralogy-aware quartets extracted from the gene family trees. Using both empirical and simulated data sets we show that SpeciesRax is at least as accurate as the best competing methods while being one order of magnitude faster on large data sets at the same time. We used SpeciesRax to infer a biologically plausible rooted phylogeny of the vertebrates comprising 188 species from 31,612 gene families in 1 h using 40 cores. SpeciesRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax and on BioConda.


Assuntos
Algoritmos , Duplicação Gênica , Modelos Genéticos , Linhagem , Filogenia
19.
Mol Biol Evol ; 39(12)2022 12 05.
Artigo em Inglês | MEDLINE | ID: mdl-36317198

RESUMO

Genomic sequence data provide a rich source of information about the history of species divergence and interspecific hybridization or introgression. Despite recent advances in genomics and statistical methods, it remains challenging to infer gene flow, and as a result, one may have to estimate introgression rates and times under misspecified models. Here we use mathematical analysis and computer simulation to examine estimation bias and issues of interpretation when the model of gene flow is misspecified in analysis of genomic datasets, for example, if introgression is assigned to the wrong lineages. In the case of two species, we establish a correspondence between the migration rate in the continuous migration model and the introgression probability in the introgression model. When gene flow occurs continuously through time but in the analysis is assumed to occur at a fixed time point, common evolutionary parameters such as species divergence times are surprisingly well estimated. However, the time of introgression tends to be estimated towards the recent end of the period of continuous gene flow. When introgression events are assigned incorrectly to the parental or daughter lineages, introgression times tend to collapse onto species divergence times, with introgression probabilities underestimated. Overall, our analyses suggest that the simple introgression model is useful for extracting information concerning between-specific gene flow and divergence even when the model may be misspecified. However, for reliable inference of gene flow it is important to include multiple samples per species, in particular, from hybridizing species.


Assuntos
Fluxo Gênico , Genômica , Simulação por Computador
20.
New Phytol ; 237(4): 1405-1417, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36349406

RESUMO

Ferns, and particularly homosporous ferns, have long been assumed to have experienced recurrent whole-genome duplication (WGD) events because of their substantially large genome sizes, surprisingly high chromosome numbers, and high degrees of polyploidy among many extant members. As the number of sequenced fern genomes is limited, recent studies have employed transcriptome data to find evidence for WGDs in ferns. However, they have reached conflicting results concerning the occurrence of ancient polyploidy, for instance, in the lineage of leptosporangiate ferns. Because identifying WGDs in a phylogenetic context is the foremost step in studying the contribution of ancient polyploidy to evolution, we here revisited earlier identified WGDs in leptosporangiate ferns, mainly the core leptosporangiate ferns, by building KS -age distributions and applying substitution rate corrections and by conducting statistical gene tree-species tree reconciliation analyses. Our integrative analyses not only identified four ancient WGDs in the sampled core leptosporangiate ferns but also identified false positives and false negatives for WGDs that recent studies have reported earlier. In conclusion, we underscore the significance of substitution rate corrections and uncertainties in gene tree-species tree reconciliations in calling WGD events and advance an exemplar workflow to overcome such often-overlooked issues.


Assuntos
Gleiquênias , Gleiquênias/genética , Filogenia , Duplicação Gênica , Tamanho do Genoma , Poliploidia , Evolução Molecular , Genoma de Planta
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA