Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 52
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Syst Biol ; 72(6): 1357-1369, 2023 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-37698548

RESUMO

The evolutionary implications and frequency of hybridization and introgression are increasingly being recognized across the tree of life. To detect hybridization from multi-locus and genome-wide sequence data, a popular class of methods are based on summary statistics from subsets of 3 or 4 taxa. However, these methods often carry the assumption of a constant substitution rate across lineages and genes, which is commonly violated in many groups. In this work, we quantify the effects of rate variation on the D test (also known as ABBA-BABA test), the D3 test, and HyDe. All 3 tests are used widely across a range of taxonomic groups, in part because they are very fast to compute. We consider rate variation across species lineages, across genes, their lineage-by-gene interaction, and rate variation across gene-tree edges. We simulated species networks according to a birth-death-hybridization process, so as to capture a range of realistic species phylogenies. For all 3 methods tested, we found a marked increase in the false discovery of reticulation (type-1 error rate) when there is rate variation across species lineages. The D3 test was the most sensitive, with around 80% type-1 error, such that D3 appears to more sensitive to a departure from the clock than to the presence of reticulation. For all 3 tests, the power to detect hybridization events decreased as the number of hybridization events increased, indicating that multiple hybridization events can obscure one another if they occur within a small subset of taxa. Our study highlights the need to consider rate variation when using site-based summary statistics, and points to the advantages of methods that do not require assumptions on evolutionary rates across lineages or across genes.


Assuntos
Evolução Biológica , Hibridização Genética , Filogenia , Genoma
2.
Syst Biol ; 72(5): 1171-1179, 2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37254872

RESUMO

We consider the evolution of phylogenetic gene trees along phylogenetic species networks, according to the network multispecies coalescent process, and introduce a new network coalescent model with correlated inheritance of gene flow. This model generalizes two traditional versions of the network coalescent: with independent or common inheritance. At each reticulation, multiple lineages of a given locus are inherited from parental populations chosen at random, either independently across lineages or with positive correlation according to a Dirichlet process. This process may account for locus-specific probabilities of inheritance, for example. We implemented the simulation of gene trees under these network coalescent models in the Julia package PhyloCoalSimulations, which depends on PhyloNetworks and its powerful network manipulation tools. Input species phylogenies can be read in extended Newick format, either in numbers of generations or in coalescent units. Simulated gene trees can be written in Newick format, and in a way that preserves information about their embedding within the species network. This embedding can be used for downstream purposes, such as to simulate species-specific processes like rate variation across species, or for other scenarios as illustrated in this note. This package should be useful for simulation studies and simulation-based inference methods. The software is available open source with documentation and a tutorial at https://github.com/cecileane/PhyloCoalSimulations.jl.


Assuntos
Fluxo Gênico , Software , Filogenia , Simulação por Computador , Probabilidade , Modelos Genéticos
3.
J Math Biol ; 88(3): 29, 2024 02 19.
Artigo em Inglês | MEDLINE | ID: mdl-38372830

RESUMO

Reticulations in a phylogenetic network represent processes such as gene flow, admixture, recombination and hybrid speciation. Extending definitions from the tree setting, an anomalous network is one in which some unrooted tree topology displayed in the network appears in gene trees with a lower frequency than a tree not displayed in the network. We investigate anomalous networks under the Network Multispecies Coalescent Model with possible correlated inheritance at reticulations. Focusing on subsets of 4 taxa, we describe a new algorithm to calculate quartet concordance factors on networks of any level, faster than previous algorithms because of its focus on 4 taxa. We then study topological properties required for a 4-taxon network to be anomalous, uncovering the key role of [Formula: see text]-cycles: cycles of 3 edges parent to a sister group of 2 taxa. Under the model of common inheritance, that is, when each gene tree coalesces within a species tree displayed in the network, we prove that 4-taxon networks are never anomalous. Under independent and various levels of correlated inheritance, we use simulations under realistic parameters to quantify the prevalence of anomalous 4-taxon networks, finding that truly anomalous networks are rare. At the same time, however, we find a significant fraction of networks close enough to the anomaly zone to appear anomalous, when considering the quartet concordance factors observed from a few hundred genes. These apparent anomalies may challenge network inference methods.


Assuntos
Algoritmos , Prevalência , Filogenia
4.
Proc Natl Acad Sci U S A ; 118(33)2021 08 17.
Artigo em Inglês | MEDLINE | ID: mdl-34373325

RESUMO

Carnivorous plants consume animals for mineral nutrients that enhance growth and reproduction in nutrient-poor environments. Here, we report that Triantha occidentalis (Tofieldiaceae) represents a previously overlooked carnivorous lineage that captures insects on sticky inflorescences. Field experiments, isotopic data, and mixing models demonstrate significant N transfer from prey to Triantha, with an estimated 64% of leaf N obtained from prey capture in previous years, comparable to levels inferred for the cooccurring round-leaved sundew, a recognized carnivore. N obtained via carnivory is exported from the inflorescence and developing fruits and may ultimately be transferred to next year's leaves. Glandular hairs on flowering stems secrete phosphatase, as seen in all carnivorous plants that directly digest prey. Triantha is unique among carnivorous plants in capturing prey solely with sticky traps adjacent to its flowers, contrary to theory. However, its glandular hairs capture only small insects, unlike the large bees and butterflies that act as pollinators, which may minimize the conflict between carnivory and pollination.


Assuntos
Alismatales/fisiologia , Planta Carnívora/fisiologia , Inflorescência/fisiologia , Isótopos de Nitrogênio/metabolismo , Animais , Drosophila/química , Ecossistema , Nitrogênio/metabolismo , Isótopos de Nitrogênio/química
5.
Bioinformatics ; 38(11): 3044-3050, 2022 05 26.
Artigo em Inglês | MEDLINE | ID: mdl-35482481

RESUMO

MOTIVATION: Kinship estimation is necessary for evaluating violations of assumptions or testing certain hypotheses in many population genomic studies. However, kinship estimators are usually designed for diploid systems and cannot be used in populations with mixed haploid diploid genetic systems. The only estimators for different ploidies require datasets free of population structure, limiting their usage. RESULTS: We present KIMGENS (Kinship Inference for Mixed GENetic Systems), an estimator for kinship estimation among individuals of various ploidies, that is robust to population structure. This estimator is based on the popular KING-robust estimator but uses diploid relatives of the individuals of interest as references of heterozygosity and extends its use to haploid-diploid and haploid pairs of individuals. We demonstrate that KIMGENS estimates kinship more accurately than previously developed estimators in simulated panmictic, structured and admixed populations, but has lower accuracy when the individual of interest is inbred. KIMGENS also outperforms other estimators in a honeybee dataset. Therefore, KIMGENS is a valuable addition to a population geneticist's toolbox. AVAILABILITY AND IMPLEMENTATION: KIMGENS and its association simulation tool are implemented and available open-source at https://github.com/YenWenWang/HapDipKinship. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Diploide , Software , Humanos , Animais , Haploidia , Genômica/métodos , Simulação por Computador
6.
Bioinformatics ; 37(5): 634-641, 2021 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-33027508

RESUMO

MOTIVATION: With growing genome-wide molecular datasets from next-generation sequencing, phylogenetic networks can be estimated using a variety of approaches. These phylogenetic networks include events like hybridization, gene flow or horizontal gene transfer explicitly. However, the most accurate network inference methods are computationally heavy. Methods that scale to larger datasets do not calculate a full likelihood, such that traditional likelihood-based tools for model selection are not applicable to decide how many past hybridization events best fit the data. We propose here a goodness-of-fit test to quantify the fit between data observed from genome-wide multi-locus data, and patterns expected under the multi-species coalescent model on a candidate phylogenetic network. RESULTS: We identified weaknesses in the previously proposed TICR test, and proposed corrections. The performance of our new test was validated by simulations on real-world phylogenetic networks. Our test provides one of the first rigorous tools for model selection, to select the adequate network complexity for the data at hand. The test can also work for identifying poorly inferred areas on a network. AVAILABILITY AND IMPLEMENTATION: Software for the goodness-of-fit test is available as a Julia package at https://github.com/cecileane/QuartetNetworkGoodnessFit.jl. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma , Software , Sequenciamento de Nucleotídeos em Larga Escala , Funções Verossimilhança , Filogenia
7.
J Math Biol ; 86(1): 12, 2022 12 08.
Artigo em Inglês | MEDLINE | ID: mdl-36481927

RESUMO

Phylogenetic networks extend phylogenetic trees to model non-vertical inheritance, by which a lineage inherits material from multiple parents. The computational complexity of estimating phylogenetic networks from genome-wide data with likelihood-based methods limits the size of networks that can be handled. Methods based on pairwise distances could offer faster alternatives. We study here the information that average pairwise distances contain on the underlying phylogenetic network, by characterizing local and global features that can or cannot be identified. For general networks, we clarify that the root and edge lengths adjacent to reticulations are not identifiable, and then focus on the class of zipped-up semidirected networks. We provide a criterion to swap subgraphs locally, such as 3-cycles, resulting in indistinguishable networks. We propose the "distance split tree", which can be constructed from pairwise distances, and prove that it is a refinement of the network's tree of blobs, capturing the tree-like features of the network. For level-1 networks, this distance split tree is equal to the tree of blobs refined to separate polytomies from blobs, and we prove that the mixed representation of the network is identifiable. The information loss is localized around 4-cycles, for which the placement of the reticulation is unidentifiable. The mixed representation combines split edges for 4-cycles, regular tree and hybrid edges from the semidirected network, and edge parameters that encode all information identifiable from average pairwise distances.


Assuntos
Filogenia , Funções Verossimilhança
8.
Syst Biol ; 69(3): 593-601, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-31432090

RESUMO

Genomic data have had a profound impact on nearly every biological discipline. In systematics and phylogenetics, the thousands of loci that are now being sequenced can be analyzed under the multispecies coalescent model (MSC) to explicitly account for gene tree discordance due to incomplete lineage sorting (ILS). However, the MSC assumes no gene flow post divergence, calling for additional methods that can accommodate this limitation. Explicit phylogenetic network methods have emerged, which can simultaneously account for ILS and gene flow by representing evolutionary history as a directed acyclic graph. In this point of view, we highlight some of the strengths and limitations of phylogenetic networks and argue that tree-based inference should not be blindly abandoned in favor of networks simply because they represent more parameter rich models. Attention should be given to model selection of reticulation complexity, and the most robust conclusions regarding evolutionary history are likely obtained when combining tree- and network-based inference.


Assuntos
Classificação/métodos , Genoma/genética , Filogenia
9.
Syst Biol ; 69(3): 462-478, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-31693158

RESUMO

Baobabs (Adansonia) are a cohesive group of tropical trees with a disjunct distribution in Australia, Madagascar, and continental Africa, and diverse flowers associated with two pollination modes. We used custom-targeted sequence capture in conjunction with new and existing phylogenetic comparative methods to explore the evolution of floral traits and pollination systems while allowing for reticulate evolution. Our analyses suggest that relationships in Adansonia are confounded by reticulation, with network inference methods supporting at least one reticulation event. The best supported hypothesis involves introgression between Adansonia rubrostipa and core Longitubae, both of which are hawkmoth pollinated with yellow/red flowers, but there is also some support for introgression between the African lineage and Malagasy Brevitubae, which are both mammal-pollinated with white flowers. New comparative methods for phylogenetic networks were developed that allow maximum-likelihood inference of ancestral states and were applied to study the apparent homoplasy in floral biology and pollination mode seen in Adansonia. This analysis supports a role for introgressive hybridization in morphological evolution even in a clade with highly divergent and geographically widespread species. Our new comparative methods for discrete traits on species networks are implemented in the software PhyloNetworks. [Comparative methods; Hyb-Seq; introgression; network inference; population trees; reticulate evolution; species tree inference; targeted sequence capture.].


Assuntos
Adansonia/anatomia & histologia , Adansonia/classificação , Evolução Biológica , Flores/anatomia & histologia , Polinização/fisiologia , Adansonia/genética , Flores/genética , Especificidade da Espécie
10.
Syst Biol ; 67(4): 662-680, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29385556

RESUMO

To study the evolution of several quantitative traits, the classical phylogenetic comparative framework consists of a multivariate random process running along the branches of a phylogenetic tree. The Ornstein-Uhlenbeck (OU) process is sometimes preferred to the simple Brownian motion (BM) as it models stabilizing selection toward an optimum. The optimum for each trait is likely to be changing over the long periods of time spanned by large modern phylogenies. Our goal is to automatically detect the position of these shifts on a phylogenetic tree, while accounting for correlations between traits, which might exist because of structural or evolutionary constraints. We show that, in the presence of shifts, phylogenetic Principal Component Analysis fails to decorrelate traits efficiently, so that any method aiming at finding shifts needs to deal with correlation simultaneously. We introduce here a simplification of the full multivariate OU model, named scalar OU, which allows for noncausal correlations and is still computationally tractable. We extend the equivalence between the OU and a BM on a rescaled tree to our multivariate framework. We describe an Expectation-Maximization (EM) algorithm that allows for a maximum likelihood estimation of the shift positions, associated with a new model selection criterion, accounting for the identifiability issues for the shift localization on the tree. The method, freely available as an R-package (PhylogeneticEM) is fast, and can deal with missing values. We demonstrate its efficiency and accuracy compared to another state-of-the-art method ($\ell$1ou) on a wide range of simulated scenarios and use this new framework to reanalyze recently gathered data sets on New World Monkeys and Anolis lizards.


Assuntos
Adaptação Biológica , Evolução Biológica , Lagartos , Fenótipo , Platirrinos , Algoritmos , Animais , Filogenia
11.
Syst Biol ; 67(5): 800-820, 2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-29701821

RESUMO

The goal of phylogenetic comparative methods (PCMs) is to study the distribution of quantitative traits among related species. The observed traits are often seen as the result of a Brownian Motion (BM) along the branches of a phylogenetic tree. Reticulation events such as hybridization, gene flow or horizontal gene transfer, can substantially affect a species' traits, but are not modeled by a tree. Phylogenetic networks have been designed to represent reticulate evolution. As they become available for downstream analyses, new models of trait evolution are needed, applicable to networks. We develop here an efficient recursive algorithm to compute the phylogenetic variance matrix of a trait on a network, in only one preorder traversal of the network. We then extend the standard PCM tools to this new framework, including phylogenetic regression with covariates (or phylogenetic ANOVA), ancestral trait reconstruction, and Pagel's $\lambda$ test of phylogenetic signal. The trait of a hybrid is sometimes outside of the range of its two parents, for instance because of hybrid vigor or hybrid depression. These two phenomena are rather commonly observed in present-day hybrids. Transgressive evolution can be modeled as a shift in the trait value following a reticulation point. We develop a general framework to handle such shifts and take advantage of the phylogenetic regression view of the problem to design statistical tests for ancestral transgressive evolution in the evolutionary history of a group of species. We study the power of these tests in several scenarios and show that recent events have indeed the strongest impact on the trait distribution of present-day taxa. We apply those methods to a data set of Xiphophorus fishes, to confirm and complete previous analysis in this group. All the methods developed here are available in the Julia package PhyloNetworks.


Assuntos
Ciprinodontiformes/genética , Evolução Molecular , Fluxo Gênico , Transferência Genética Horizontal , Hibridização Genética , Filogenia , Algoritmos , Animais , Ciprinodontiformes/classificação , Modelos Genéticos , Fenótipo
12.
PLoS Genet ; 12(3): e1005896, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26950302

RESUMO

Phylogenetic networks are necessary to represent the tree of life expanded by edges to represent events such as horizontal gene transfers, hybridizations or gene flow. Not all species follow the paradigm of vertical inheritance of their genetic material. While a great deal of research has flourished into the inference of phylogenetic trees, statistical methods to infer phylogenetic networks are still limited and under development. The main disadvantage of existing methods is a lack of scalability. Here, we present a statistical method to infer phylogenetic networks from multi-locus genetic data in a pseudolikelihood framework. Our model accounts for incomplete lineage sorting through the coalescent model, and for horizontal inheritance of genes through reticulation nodes in the network. Computation of the pseudolikelihood is fast and simple, and it avoids the burdensome calculation of the full likelihood which can be intractable with many species. Moreover, estimation at the quartet-level has the added computational benefit that it is easily parallelizable. Simulation studies comparing our method to a full likelihood approach show that our pseudolikelihood approach is much faster without compromising accuracy. We applied our method to reconstruct the evolutionary relationships among swordtails and platyfishes (Xiphophorus: Poeciliidae), which is characterized by widespread hybridizations.


Assuntos
Evolução Molecular , Transferência Genética Horizontal , Filogenia , Simulação por Computador , Funções Verossimilhança , Modelos Genéticos
13.
J Integr Plant Biol ; 61(1): 12-31, 2019 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-30474311

RESUMO

Previous research suggests that Gossypium has undergone a 5- to 6-fold multiplication following its divergence from Theobroma. However, the number of events, or where they occurred in the Malvaceae phylogeny remains unknown. We analyzed transcriptomic and genomic data from representatives of eight of the nine Malvaceae subfamilies. Phylogenetic analysis of nuclear data placed Dombeya (Dombeyoideae) as sister to the rest of Malvadendrina clade, but the plastid DNA tree strongly supported Durio (Helicteroideae) in this position. Intraspecific Ks plots indicated that all sampled taxa, except Theobroma (Byttnerioideae), Corchorus (Grewioideae), and Dombeya (Dombeyoideae), have experienced whole genome multiplications (WGMs). Quartet analysis suggested WGMs were shared by Malvoideae-Bombacoideae and Sterculioideae-Tilioideae, but did not resolve whether these are shared with each other or Helicteroideae (Durio). Gene tree reconciliation and Bayesian concordance analysis suggested a complex history. Alternative hypotheses are suggested, each involving two independent autotetraploid and one allopolyploid event. They differ in that one entails an allopolyploid origin for the Durio lineage, whereas the other invokes an allopolyploid origin for Malvoideae-Bombacoideae. We highlight the need for more genomic information in the Malvaceae and improved methods to resolve complex evolutionary histories that may include allopolyploidy, incomplete lineage sorting, and variable rates of gene and genome evolution.


Assuntos
Genoma de Planta/genética , Malvaceae/genética , Teorema de Bayes , Genômica , Gossypium/genética , Filogenia
14.
Mol Biol Evol ; 34(12): 3292-3298, 2017 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-28961984

RESUMO

PhyloNetworks is a Julia package for the inference, manipulation, visualization, and use of phylogenetic networks in an interactive environment. Inference of phylogenetic networks is done with maximum pseudolikelihood from gene trees or multi-locus sequences (SNaQ), with possible bootstrap analysis. PhyloNetworks is the first software providing tools to summarize a set of networks (from a bootstrap or posterior sample) with measures of tree edge support, hybrid edge support, and hybrid node support. Networks can be used for phylogenetic comparative analysis of continuous traits, to estimate ancestral states or do a phylogenetic regression. The software is available in open source and with documentation at https://github.com/crsl4/PhyloNetworks.jl.


Assuntos
Biologia Computacional/métodos , Filogenia , Algoritmos , Evolução Molecular , Software
15.
Am J Bot ; 105(11): 1888-1910, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30368769

RESUMO

PREMISE OF THE STUDY: We present the first plastome phylogeny encompassing all 77 monocot families, estimate branch support, and infer monocot-wide divergence times and rates of species diversification. METHODS: We conducted maximum likelihood analyses of phylogeny and BAMM studies of diversification rates based on 77 plastid genes across 545 monocots and 22 outgroups. We quantified how branch support and ascertainment vary with gene number, branch length, and branch depth. KEY RESULTS: Phylogenomic analyses shift the placement of 16 families in relation to earlier studies based on four plastid genes, add seven families, date the divergence between monocots and eudicots+Ceratophyllum at 136 Mya, successfully place all mycoheterotrophic taxa examined, and support recognizing Taccaceae and Thismiaceae as separate families and Arecales and Dasypogonales as separate orders. Only 45% of interfamilial divergences occurred after the Cretaceous. Net species diversification underwent four large-scale accelerations in PACMAD-BOP Poaceae, Asparagales sister to Doryanthaceae, Orchidoideae-Epidendroideae, and Araceae sister to Lemnoideae, each associated with specific ecological/morphological shifts. Branch ascertainment and support across monocots increase with gene number and branch length, and decrease with relative branch depth. Analysis of entire plastomes in Zingiberales quantifies the importance of non-coding regions in identifying and supporting short, deep branches. CONCLUSIONS: We provide the first resolved, well-supported monocot phylogeny and timeline spanning all families, and quantify the significant contribution of plastome-scale data to resolving short, deep branches. We outline a new functional model for the evolution of monocots and their diagnostic morphological traits from submersed aquatic ancestors, supported by convergent evolution of many of these traits in aquatic Hydatellaceae (Nymphaeales).


Assuntos
Especiação Genética , Genomas de Plastídeos , Magnoliopsida/genética , Filogenia , DNA Intergênico , Zingiberales/genética
16.
Syst Biol ; 65(5): 843-51, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27151419

RESUMO

Coalescent-based methods are now broadly used to infer evolutionary relationships between groups of organisms under the assumption that incomplete lineage sorting (ILS) is the only source of gene tree discordance. Many of these methods are known to consistently estimate the species tree when all their assumptions are met. Nonetheless, little work has been done to test the robustness of such methods to violations of their assumptions. Here, we study the performance of two of the most efficient coalescent-based methods, ASTRAL and NJst, in the presence of gene flow. Gene flow violates the assumption that ILS is the sole source of gene tree conflict. We find anomalous gene trees on three-taxon rooted trees and on four-taxon unrooted trees. These anomalous trees do not exist under ILS only, but appear because of gene flow. Our simulations show that species tree methods (and concatenation) may reconstruct the wrong evolutionary history, even from a very large number of well-reconstructed gene trees. In other words, species tree methods can be inconsistent under gene flow. Our results underline the need for methods like PhyloNet, to account simultaneously for ILS and gene flow in a unified framework. Although much slower, PhyloNet had better accuracy and remained consistent at high levels of gene flow.


Assuntos
Classificação/métodos , Fluxo Gênico , Filogenia , Evolução Biológica , Simulação por Computador , Especiação Genética , Modelos Genéticos
17.
J Math Biol ; 74(1-2): 355-385, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27241727

RESUMO

Diffusion processes on trees are commonly used in evolutionary biology to model the joint distribution of continuous traits, such as body mass, across species. Estimating the parameters of such processes from tip values presents challenges because of the intrinsic correlation between the observations produced by the shared evolutionary history, thus violating the standard independence assumption of large-sample theory. For instance (Ho and Ané, Ann Stat 41:957-981, 2013) recently proved that the mean (also known in this context as selection optimum) of an Ornstein-Uhlenbeck process on a tree cannot be estimated consistently from an increasing number of tip observations if the tree height is bounded. Here, using a fruitful connection to the so-called reconstruction problem in probability theory, we study the convergence rate of parameter estimation in the unbounded height case. For the mean of the process, we provide a necessary and sufficient condition for the consistency of the maximum likelihood estimator (MLE) and establish a phase transition on its convergence rate in terms of the growth of the tree. In particular we show that a loss of [Formula: see text]-consistency (i.e., the variance of the MLE becomes [Formula: see text], where n is the number of tips) occurs when the tree growth is larger than a threshold related to the phase transition of the reconstruction problem. For the covariance parameters, we give a novel, efficient estimation method which achieves [Formula: see text]-consistency under natural assumptions on the tree. Our theoretical results provide practical suggestions for the design of comparative data collection.


Assuntos
Modelos Biológicos , Filogenia , Fenótipo , Probabilidade
18.
Syst Biol ; 64(5): 809-23, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-26117705

RESUMO

Genome sequence data contain abundant information about genealogical history, but methods for extracting and interpreting this information are not yet fully developed. We analyzed genome sequences for multiple accessions of the selfing plant, Arabidopsis thaliana, with the goal of better understanding its genealogical history. As expected from accessions of the same species, we found much discordance between nuclear gene trees. Nonetheless, we inferred the optimal population tree under the assumption that all discordance is due to incomplete lineage sorting. To cope with the size of the data (many genes and many taxa), our pipeline is based on parallel computing and divides the problem into four-taxon trees. However, just because a population tree can be estimated does not mean that the assumptions of the multispecies coalescent model hold. Therefore, we implemented a new, nonparametric test to evaluate whether a population tree adequately explains the observed quartet frequencies (the frequencies of gene trees with each resolution of each four-taxon set). This test also considers other models: panmixia and a partially resolved population tree, that is, a tree in which some nodes are collapsed into local panmixia. We found that a partially resolved population tree provides the best fit to the data, providing evidence for tree-like structure within A. thaliana, qualitatively similar to what might be expected between different, closely related species. Further, we show that the pattern of deviation from expectations can be used to identify instances of introgression and detect one clear case of reticulation among ecotypes that have come into contact in the United Kingdom. Our study illustrates how we can use genome sequence data to evaluate whether phylogenetic relationships are strictly tree-like or reticulating.


Assuntos
Arabidopsis/classificação , Arabidopsis/genética , Classificação/métodos , Filogenia , Genoma de Planta , Endogamia
19.
Mol Biol Evol ; 31(3): 750-62, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24361993

RESUMO

Whole genome duplications (WGDs) followed by massive gene loss occurred in the evolutionary history of many groups. WGDs are usually inferred from the age distribution of paralogs (Ks-based methods) or from gene collinearity data (synteny). However, Ks-based methods are restricted to detect the recent WGDs due to saturation effects and the difficulty to date old duplicates, and synteny is difficult to reconstruct for distantly related species. Recently, Jiao et al. (Jiao Y, Wickett N, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, Tomsho LP, Hu Y, Liang H, Soltis PS, et al. 2011. Ancestral polyploidy in seed plants and angiosperms. Nature 473:97-100) introduced an empirical method that aims to detect a peak in duplication ages among nodes selected from a previous phylogenetic analysis. In this context, we present here two rigorous methods based on data from multiple gene families and on a new probabilistic model. Our model assumes that all gene lineages are instantaneously duplicated at the WGD event with a possible almost-immediate loss of some extra copies. Our reconciliation method relies on aligned molecular sequences, whereas our gene count method relies only on gene count data across species. We show, using extensive simulations, that both methods have a good detection power. Surprisingly, the gene count method enjoys no loss of power compared with the reconciliation method, despite the fact that sequence information is not used. We finally illustrate the performance of our methods on a benchmark yeast data set. Both methods are able to detect the well-known WGD in the Saccharomyces cerevisiae clade and agree on a small retention rate at the WGD, as established by synteny-based methods.


Assuntos
Duplicação Gênica/genética , Genoma Fúngico/genética , Filogenia , Probabilidade , Saccharomyces cerevisiae/genética , Simulação por Computador , Bases de Dados Genéticas , Genes Fúngicos/genética , Família Multigênica , Especificidade da Espécie
20.
Syst Biol ; 63(3): 397-408, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24500037

RESUMO

We developed a linear-time algorithm applicable to a large class of trait evolution models, for efficient likelihood calculations and parameter inference on very large trees. Our algorithm solves the traditional computational burden associated with two key terms, namely the determinant of the phylogenetic covariance matrix V and quadratic products involving the inverse of V. Applications include Gaussian models such as Brownian motion-derived models like Pagel's lambda, kappa, delta, and the early-burst model; Ornstein-Uhlenbeck models to account for natural selection with possibly varying selection parameters along the tree; as well as non-Gaussian models such as phylogenetic logistic regression, phylogenetic Poisson regression, and phylogenetic generalized linear mixed models. Outside of phylogenetic regression, our algorithm also applies to phylogenetic principal component analysis, phylogenetic discriminant analysis or phylogenetic prediction. The computational gain opens up new avenues for complex models or extensive resampling procedures on very large trees. We identify the class of models that our algorithm can handle as all models whose covariance matrix has a 3-point structure. We further show that this structure uniquely identifies a rooted tree whose branch lengths parametrize the trait covariance matrix, which acts as a similarity matrix. The new algorithm is implemented in the R package phylolm, including functions for phylogenetic linear regression and phylogenetic logistic regression.


Assuntos
Algoritmos , Evolução Biológica , Classificação/métodos , Software/normas , Simulação por Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA