Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Mol Biol Evol ; 41(1)2024 Jan 03.
Artículo en Inglés | MEDLINE | ID: mdl-38124381

RESUMEN

MOTIVATION: Simulating multiple sequence alignments (MSAs) using probabilistic models of sequence evolution plays an important role in the evaluation of phylogenetic inference tools and is crucial to the development of novel learning-based approaches for phylogenetic reconstruction, for instance, neural networks. These models and the resulting simulated data need to be as realistic as possible to be indicative of the performance of the developed tools on empirical data and to ensure that neural networks trained on simulations perform well on empirical data. Over the years, numerous models of evolution have been published with the goal to represent as faithfully as possible the sequence evolution process and thus simulate empirical-like data. In this study, we simulated DNA and protein MSAs under increasingly complex models of evolution with and without insertion/deletion (indel) events using a state-of-the-art sequence simulator. We assessed their realism by quantifying how accurately supervised learning methods are able to predict whether a given MSA is simulated or empirical. RESULTS: Our results show that we can distinguish between empirical and simulated MSAs with high accuracy using two distinct and independently developed classification approaches across all tested models of sequence evolution. Our findings suggest that the current state-of-the-art models fail to accurately replicate several aspects of empirical MSAs, including site-wise rates as well as amino acid and nucleotide composition.


Asunto(s)
Redes Neurales de la Computación , Proteínas , Filogenia , Alineación de Secuencia , Proteínas/genética , ADN/genética , Programas Informáticos
2.
Elife ; 122023 06 06.
Artículo en Inglés | MEDLINE | ID: mdl-37278068

RESUMEN

The accidental endogenization of viral elements within eukaryotic genomes can occasionally provide significant evolutionary benefits, giving rise to their long-term retention, that is, to viral domestication. For instance, in some endoparasitoid wasps (whose immature stages develop inside their hosts), the membrane-fusion property of double-stranded DNA viruses have been repeatedly domesticated following ancestral endogenizations. The endogenized genes provide female wasps with a delivery tool to inject virulence factors that are essential to the developmental success of their offspring. Because all known cases of viral domestication involve endoparasitic wasps, we hypothesized that this lifestyle, relying on a close interaction between individuals, may have promoted the endogenization and domestication of viruses. By analyzing the composition of 124 Hymenoptera genomes, spread over the diversity of this clade and including free-living, ecto, and endoparasitoid species, we tested this hypothesis. Our analysis first revealed that double-stranded DNA viruses, in comparison with other viral genomic structures (ssDNA, dsRNA, ssRNA), are more often endogenized and domesticated (that is, retained by selection) than expected from their estimated abundance in insect viral communities. Second, our analysis indicates that the rate at which dsDNA viruses are endogenized is higher in endoparasitoids than in ectoparasitoids or free-living hymenopterans, which also translates into more frequent events of domestication. Hence, these results are consistent with the hypothesis that the endoparasitoid lifestyle has facilitated the endogenization of dsDNA viruses, in turn, increasing the opportunities of domestications that now play a central role in the biology of many endoparasitoid lineages.


Asunto(s)
Virus , Avispas , Animales , Femenino , Evolución Biológica , ADN , Domesticación , Genoma Viral , Virus/genética , Avispas/genética
3.
Mol Biol Evol ; 40(2)2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36510704

RESUMEN

Identifying the footprints of selection in coding sequences can inform about the importance and function of individual sites. Analyses of the ratio of nonsynonymous to synonymous substitutions (dN/dS) have been widely used to pinpoint changes in the intensity of selection, but cannot distinguish them from changes in the direction of selection, that is, changes in the fitness of specific amino acids at a given position. A few methods that rely on amino-acid profiles to detect changes in directional selection have been designed, but their performances have not been well characterized. In this paper, we investigate the performance of six of these methods. We evaluate them on simulations along empirical phylogenies in which transition events have been annotated and compare their ability to detect sites that have undergone changes in the direction or intensity of selection to that of a widely used dN/dS approach, codeml's branch-site model A. We show that all methods have reduced performance in the presence of biased gene conversion but not CpG hypermutability. The best profile method, Pelican, a new implementation of Tamuri AU, Hay AJ, Goldstein RA. (2009. Identifying changes in selective constraints: host shifts in influenza. PLoS Comput Biol. 5(11):e1000564), performs as well as codeml in a range of conditions except for detecting relaxations of selection, and performs better when tree length increases, or in the presence of persistent positive selection. It is fast, enabling genome-scale searches for site-wise changes in the direction of selection associated with phenotypic changes.


Asunto(s)
Evolución Molecular , Selección Genética , Codón , Modelos Genéticos , Aminoácidos/genética , Filogenia
4.
Commun Biol ; 5(1): 1115, 2022 10 21.
Artículo en Inglés | MEDLINE | ID: mdl-36271143

RESUMEN

Zika virus (ZIKV) infection can cause important developmental and neurological defects in Humans. Type I/III interferon responses control ZIKV infection and pathological processes, yet the virus has evolved various mechanisms to defeat these host responses. Here, we established a pipeline to delineate at high-resolution the genetic evolution of ZIKV in a controlled host cell environment. We uncovered that serially passaged ZIKV acquired increased infectivity and simultaneously developed a resistance to TLR3-induced restriction. We built a mathematical model that suggests that the increased infectivity is due to a reduced time-lag between infection and viral replication. We found that this adaptation is cell-type specific, suggesting that different cell environments may drive viral evolution along different routes. Deep-sequencing of ZIKV populations pinpointed mutations whose increased frequencies temporally coincide with the acquisition of the adapted phenotype. We functionally validated S455L, a substitution in ZIKV envelope (E) protein, recapitulating the adapted phenotype. Its positioning on the E structure suggests a putative function in protein refolding/stability. Taken together, our results uncovered ZIKV adaptations to the cellular environment leading to accelerated replication onset coupled with resistance to TLR3-induced antiviral response. Our work provides insights into Zika virus adaptation to host cells and immune escape mechanisms.


Asunto(s)
Infección por el Virus Zika , Virus Zika , Humanos , Virus Zika/genética , Receptor Toll-Like 3 , Interferones , Antivirales
5.
Genome Biol Evol ; 14(1)2022 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-34983052

RESUMEN

Despite the importance of natural selection in species' evolutionary history, phylogenetic methods that take into account population-level processes typically ignore selection. The assumption of neutrality is often based on the idea that selection occurs at a minority of loci in the genome and is unlikely to compromise phylogenetic inferences significantly. However, genome-wide processes like GC-bias and some variation segregating at the coding regions are known to evolve in the nearly neutral range. As we are now using genome-wide data to estimate species trees, it is natural to ask whether weak but pervasive selection is likely to blur species tree inferences. We developed a polymorphism-aware phylogenetic model tailored for measuring signatures of nucleotide usage biases to test the impact of selection in the species tree. Our analyses indicate that although the inferred relationships among species are not significantly compromised, the genetic distances are systematically underestimated in a node-height-dependent manner: that is, the deeper nodes tend to be more underestimated than the shallow ones. Such biases have implications for molecular dating. We dated the evolutionary history of 30 worldwide fruit fly populations, and we found signatures of GC-bias considerably affecting the estimated divergence times (up to 23%) in the neutral model. Our findings call for the need to account for selection when quantifying divergence or dating species evolution.


Asunto(s)
Uso de Codones , Evolución Molecular , Animales , Uso de Codones/genética , Drosophila , Nucleótidos , Filogenia , Selección Genética
6.
Syst Biol ; 71(4): 797-809, 2022 06 16.
Artículo en Inglés | MEDLINE | ID: mdl-34668564

RESUMEN

Dating the tree of life is central to understanding the evolution of life on Earth. Molecular clocks calibrated with fossils represent the state of the art for inferring the ages of major groups. Yet, other information on the timing of species diversification can be used to date the tree of life. For example, horizontal gene transfer events and ancient coevolutionary interactions such as (endo)symbioses occur between contemporaneous species and thus can imply temporal relationships between two nodes in a phylogeny. Temporal constraints from these alternative sources can be particularly helpful when the geological record is sparse, for example, for microorganisms, which represent the majority of extant and extinct biodiversity. Here, we present a new method to combine fossil calibrations and relative age constraints to estimate chronograms. We provide an implementation of relative age constraints in RevBayes that can be combined in a modular manner with the wide range of molecular dating methods available in the software. We use both realistic simulations and empirical datasets of 40 Cyanobacteria and 62 Archaea to evaluate our method. We show that the combination of relative age constraints with fossil calibrations significantly improves the estimation of node ages. [Archaea, Bayesian analysis, cyanobacteria, dating, endosymbiosis, lateral gene transfer, MCMC, molecular clock, phylogenetic dating, relaxed molecular clock, revbayes, tree of life.].


Asunto(s)
Fósiles , Transferencia de Gen Horizontal , Teorema de Bayes , Evolución Molecular , Filogenia , Simbiosis
7.
Bioinformatics ; 36(18): 4822-4824, 2020 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-33085745

RESUMEN

MOTIVATION: Gene and species tree reconciliation methods are used to interpret gene trees, root them and correct uncertainties that are due to scarcity of signal in multiple sequence alignments. So far, reconciliation tools have not been integrated in standard phylogenetic software and they either lack performance on certain functions, or usability for biologists. RESULTS: We present Treerecs, a phylogenetic software based on duplication-loss reconciliation. Treerecs is simple to install and to use. It is fast and versatile, has a graphic output, and can be used along with methods for phylogenetic inference on multiple alignments like PLL and Seaview. AVAILABILITY AND IMPLEMENTATION: Treerecs is open-source. Its source code (C++, AGPLv3) and manuals are available from https://project.inria.fr/treerecs/.


Asunto(s)
Algoritmos , Evolución Molecular , Filogenia , Alineación de Secuencia , Programas Informáticos
8.
Philos Trans R Soc Lond B Biol Sci ; 374(1777): 20180234, 2019 07 22.
Artículo en Inglés | MEDLINE | ID: mdl-31154974

RESUMEN

In evolutionary genomics, researchers have taken an interest in identifying substitutions that subtend convergent phenotypic adaptations. This is a difficult question that requires distinguishing foreground convergent substitutions that are involved in the convergent phenotype from background convergent substitutions. Those may be linked to other adaptations, may be neutral or may be the consequence of mutational biases. Furthermore, there is no generally accepted definition of convergent substitutions. Various methods that use different definitions have been proposed in the literature, resulting in different sets of candidate foreground convergent substitutions. In this article, we first describe the processes that can generate foreground convergent substitutions in coding sequences, separating adaptive from non-adaptive processes. Second, we review methods that have been proposed to detect foreground convergent substitutions in coding sequences and expose the assumptions that underlie them. Finally, we examine their power on simulations of convergent changes-including in the presence of a change in the efficacy of selection-and on empirical alignments. This article is part of the theme issue 'Convergent evolution in the genomics era: new insights and directions'.


Asunto(s)
Aminoácidos/genética , Evolución Molecular , Proteínas/genética , Aminoácidos/metabolismo , Animales , Genómica , Humanos , Modelos Genéticos , Filogenia , Proteínas/metabolismo
9.
Bioinformatics ; 35(13): 2199-2207, 2019 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-30452539

RESUMEN

MOTIVATION: RNA sequencing (RNA-Seq) is a widely used approach to obtain transcript sequences in non-model organisms, notably for performing comparative analyses. However, current bioinformatic pipelines do not take full advantage of pre-existing reference data in related species for improving RNA-Seq assembly, annotation and gene family reconstruction. RESULTS: We built an automated pipeline named CAARS to combine novel data from RNA-Seq experiments with existing multi-species gene family alignments. RNA-Seq reads are assembled into transcripts by both de novo and assisted assemblies. Then, CAARS incorporates transcripts into gene families, builds gene alignments and trees and uses phylogenetic information to classify the genes as orthologs and paralogs of existing genes. We used CAARS to assemble and annotate RNA-Seq data in rodents and fishes using distantly related genomes as reference, a difficult case for this kind of analysis. We showed CAARS assemblies are more complete and accurate than those assembled by a standard pipeline consisting of de novo assembly coupled with annotation by sequence similarity on a guide species. In addition to annotated transcripts, CAARS provides gene family alignments and trees, annotated with orthology relationships, directly usable for downstream comparative analyses. AVAILABILITY AND IMPLEMENTATION: CAARS is implemented in Python and Ocaml and is freely available at https://github.com/carinerey/caars. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma , Análisis de Secuencia de ARN , Anotación de Secuencia Molecular , Filogenia , ARN , Programas Informáticos , Transcriptoma
10.
Mol Biol Evol ; 35(9): 2296-2306, 2018 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-29986048

RESUMEN

In the history of life, some phenotypes have been acquired several times independently, through convergent evolution. Recently, lots of genome-scale studies have been devoted to identify nucleotides or amino acids that changed in a convergent manner when the convergent phenotypes evolved. These efforts have had mixed results, probably because of differences in the detection methods, and because of conceptual differences about the definition of a convergent substitution. Some methods contend that substitutions are convergent only if they occur on all branches where the phenotype changed toward the exact same state at a given nucleotide or amino acid position. Others are much looser in their requirements and define a convergent substitution as one that leads the site at which they occur to prefer a phylogeny in which species with the convergent phenotype group together. Here, we suggest to look for convergent shifts in amino acid preferences instead of convergent substitutions to the exact same amino acid. We define as convergent shifts substitutions that occur on all branches where the phenotype changed and such that they correspond to a change in the type of amino acid preferred at this position. We implement the corresponding model into a method named PCOC. We show on simulations that PCOC better recovers convergent shifts than existing methods in terms of sensitivity and specificity. We test it on a plant protein alignment where convergent evolution has been studied in detail and find that our method recovers several previously identified convergent substitutions and proposes credible new candidates.


Asunto(s)
Sustitución de Aminoácidos , Evolución Molecular , Técnicas Genéticas , Modelos Genéticos , Animales , Cyperaceae/genética , Mamíferos/genética
11.
Bioinformatics ; 34(21): 3646-3652, 2018 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-29762653

RESUMEN

Motivation: A reconciliation is an annotation of the nodes of a gene tree with evolutionary events-for example, speciation, gene duplication, transfer, loss, etc.-along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Results: Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative-albeit flexible-specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities. Availability and implementation: http://phylariane.univ-lyon1.fr/recphyloxml/.


Asunto(s)
Evolución Molecular , Duplicación de Gen , Algoritmos , Filogenia , Programas Informáticos
12.
Nat Ecol Evol ; 2(5): 904-909, 2018 05.
Artículo en Inglés | MEDLINE | ID: mdl-29610471

RESUMEN

Biodiversity has always been predominantly microbial, and the scarcity of fossils from bacteria, archaea and microbial eukaryotes has prevented a comprehensive dating of the tree of life. Here, we show that patterns of lateral gene transfer deduced from an analysis of modern genomes encode a novel and abundant source of information about the temporal coexistence of lineages throughout the history of life. We use state-of-the-art species tree-aware phylogenetic methods to reconstruct the history of thousands of gene families and demonstrate that dates implied by gene transfers are consistent with estimates from relaxed molecular clocks in Bacteria, Archaea and Eukarya. We present the order of speciations according to lateral gene transfer data calibrated to geological time for three datasets comprising 40 genomes for Cyanobacteria, 60 genomes for Archaea and 60 genomes for Fungi. An inspection of discrepancies between transfers and clocks and a comparison with mammalian fossils show that gene transfer in microbes is potentially as informative for dating the tree of life as the geological record in macroorganisms.


Asunto(s)
Evolución Molecular , Transferencia de Gen Horizontal , Genoma Arqueal , Genoma Bacteriano , Genoma Fúngico , Filogenia , Cianobacterias/genética
13.
Proc Natl Acad Sci U S A ; 114(23): E4602-E4611, 2017 06 06.
Artículo en Inglés | MEDLINE | ID: mdl-28533395

RESUMEN

A root for the archaeal tree is essential for reconstructing the metabolism and ecology of early cells and for testing hypotheses that propose that the eukaryotic nuclear lineage originated from within the Archaea; however, published studies based on outgroup rooting disagree regarding the position of the archaeal root. Here we constructed a consensus unrooted archaeal topology using protein concatenation and a multigene supertree method based on 3,242 single gene trees, and then rooted this tree using a recently developed model of genome evolution. This model uses evidence from gene duplications, horizontal transfers, and gene losses contained in 31,236 archaeal gene families to identify the most likely root for the tree. Our analyses support the monophyly of DPANN (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, Nanohaloarchaea), a recently discovered cosmopolitan and genetically diverse lineage, and, in contrast to previous work, place the tree root between DPANN and all other Archaea. The sister group to DPANN comprises the Euryarchaeota and the TACK Archaea, including Lokiarchaeum, which our analyses suggest are monophyletic sister lineages. Metabolic reconstructions on the rooted tree suggest that early Archaea were anaerobes that may have had the ability to reduce CO2 to acetate via the Wood-Ljungdahl pathway. In contrast to proposals suggesting that genome reduction has been the predominant mode of archaeal evolution, our analyses infer a relatively small-genomed archaeal ancestor that subsequently increased in complexity via gene duplication and horizontal gene transfer.


Asunto(s)
Archaea/genética , Evolución Molecular , Genoma Arqueal , Modelos Genéticos , Algoritmos , Archaea/clasificación , Archaea/metabolismo , Eucariontes/clasificación , Eucariontes/genética , Duplicación de Gen , Transferencia de Gen Horizontal , Redes y Vías Metabólicas/genética , Familia de Multigenes , Filogenia , Temperatura
14.
PLoS One ; 11(8): e0159559, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27513924

RESUMEN

MOTIVATIONS: Gene trees inferred solely from multiple alignments of homologous sequences often contain weakly supported and uncertain branches. Information for their full resolution may lie in the dependency between gene families and their genomic context. Integrative methods, using species tree information in addition to sequence information, often rely on a computationally intensive tree space search which forecloses an application to large genomic databases. RESULTS: We propose a new method, called ProfileNJ, that takes a gene tree with statistical supports on its branches, and corrects its weakly supported parts by using a combination of information from a species tree and a distance matrix. Its low running time enabled us to use it on the whole Ensembl Compara database, for which we propose an alternative, arguably more plausible set of gene trees. This allowed us to perform a genome-wide analysis of duplication and loss patterns on the history of 63 eukaryote species, and predict ancestral gene content and order for all ancestors along the phylogeny. AVAILABILITY: A web interface called RefineTree, including ProfileNJ as well as a other gene tree correction methods, which we also test on the Ensembl gene families, is available at: http://www-ens.iro.umontreal.ca/~adbit/polytomysolver.html. The code of ProfileNJ as well as the set of gene trees corrected by ProfileNJ from Ensembl Compara version 73 families are also made available.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Evolución Molecular , Genes/genética , Genoma/genética , Filogenia , Animales , Humanos , Análisis de Secuencia de ADN
15.
Syst Biol ; 65(4): 726-36, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-27235697

RESUMEN

Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.].


Asunto(s)
Clasificación/métodos , Modelos Biológicos , Filogenia , Programas Informáticos , Teorema de Bayes
16.
Mol Biol Evol ; 33(2): 305-10, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26541173

RESUMEN

In a recent article, Nelson-Sathi et al. (NS) report that the origins of major archaeal lineages (MAL) correspond to massive group-specific gene acquisitions via HGT from bacteria (Nelson-Sathi et al. 2015. Origins of major archaeal clades correspond to gene acquisitions from bacteria. Nature 517(7532):77-80.). If correct, this would have fundamental implications for the process of diversification in microbes. However, a reexamination of these data and results shows that the methodology used by NS systematically inflates the number of genes acquired at the root of each MAL, and incorrectly assumes bacterial origins for these genes. A reanalysis of their data with appropriate phylogenetic models accounting for the dynamics of gene gain and loss between lineages supports the continuous acquisition of genes over long periods in the evolution of Archaea.


Asunto(s)
Archaea/genética , Bacterias/genética , Evolución Molecular , Transferencia de Gen Horizontal , Genotipo , Archaea/clasificación , Genes Arqueales , Genes Bacterianos , Genómica , Filogenia
17.
Science ; 350(6257): 171, 2015 Oct 09.
Artículo en Inglés | MEDLINE | ID: mdl-26450204

RESUMEN

Liu and Edwards argue against the use of weighted statistical binning within a species tree estimation pipeline. However, we show that their mathematical argument does not apply to weighted statistical binning. Furthermore, their simulation study does not follow the recommended statistical binning protocol and has data of unknown origin that bias the results against weighted statistical binning.


Asunto(s)
Aves/clasificación , Aves/genética , Genoma , Filogenia , Animales
18.
Philos Trans R Soc Lond B Biol Sci ; 370(1678): 20140335, 2015 09 26.
Artículo en Inglés | MEDLINE | ID: mdl-26323765

RESUMEN

Although the role of lateral gene transfer is well recognized in the evolution of bacteria, it is generally assumed that it has had less influence among eukaryotes. To explore this hypothesis, we compare the dynamics of genome evolution in two groups of organisms: cyanobacteria and fungi. Ancestral genomes are inferred in both clades using two types of methods: first, Count, a gene tree unaware method that models gene duplications, gains and losses to explain the observed numbers of genes present in a genome; second, ALE, a more recent gene tree-aware method that reconciles gene trees with a species tree using a model of gene duplication, loss and transfer. We compare their merits and their ability to quantify the role of transfers, and assess the impact of taxonomic sampling on their inferences. We present what we believe is compelling evidence that gene transfer plays a significant role in the evolution of fungi.


Asunto(s)
Hongos/genética , Transferencia de Gen Horizontal , Genoma Fúngico , Filogenia , Simulación por Computador , Cianobacterias/genética , Genoma Bacteriano , Modelos Genéticos
19.
PLoS One ; 10(6): e0129183, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26086579

RESUMEN

Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: http://dx.doi.org/10.6084/m9.figshare.1411146, and the software is available at https://github.com/smirarab/binning.


Asunto(s)
Genómica/métodos , Modelos Teóricos , Filogenia , Estadística como Asunto
20.
Syst Biol ; 64(1): e42-62, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25070970

RESUMEN

This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.


Asunto(s)
Modelos Biológicos , Filogenia , Simulación por Computador , Especiación Genética , Genoma/genética , Mutación/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...