Pesquisa | BVS IEC

1.

Simulations of Sequence Evolution: How (Un)realistic They Are and Why.

Trost, Johanna; Haag, Julia; Höhler, Dimitri; Jacob, Laurent; Stamatakis, Alexandros; Boussau, Bastien.

Mol Biol Evol ; 41(1)2024 Jan 03.

Artigo em Inglês | MEDLINE | ID: mdl-38124381

RESUMO

MOTIVATION: Simulating multiple sequence alignments (MSAs) using probabilistic models of sequence evolution plays an important role in the evaluation of phylogenetic inference tools and is crucial to the development of novel learning-based approaches for phylogenetic reconstruction, for instance, neural networks. These models and the resulting simulated data need to be as realistic as possible to be indicative of the performance of the developed tools on empirical data and to ensure that neural networks trained on simulations perform well on empirical data. Over the years, numerous models of evolution have been published with the goal to represent as faithfully as possible the sequence evolution process and thus simulate empirical-like data. In this study, we simulated DNA and protein MSAs under increasingly complex models of evolution with and without insertion/deletion (indel) events using a state-of-the-art sequence simulator. We assessed their realism by quantifying how accurately supervised learning methods are able to predict whether a given MSA is simulated or empirical. RESULTS: Our results show that we can distinguish between empirical and simulated MSAs with high accuracy using two distinct and independently developed classification approaches across all tested models of sequence evolution. Our findings suggest that the current state-of-the-art models fail to accurately replicate several aspects of empirical MSAs, including site-wise rates as well as amino acid and nucleotide composition.

Assuntos

Redes Neurais de Computação , Proteínas , Filogenia , Alinhamento de Sequência , Proteínas/genética , DNA/genética , Software

2.

Evaluation of Methods to Detect Shifts in Directional Selection at the Genome Scale.

Duchemin, Louis; Lanore, Vincent; Veber, Philippe; Boussau, Bastien.

Mol Biol Evol ; 40(2)2023 02 03.

Artigo em Inglês | MEDLINE | ID: mdl-36510704

RESUMO

Identifying the footprints of selection in coding sequences can inform about the importance and function of individual sites. Analyses of the ratio of nonsynonymous to synonymous substitutions (dN/dS) have been widely used to pinpoint changes in the intensity of selection, but cannot distinguish them from changes in the direction of selection, that is, changes in the fitness of specific amino acids at a given position. A few methods that rely on amino-acid profiles to detect changes in directional selection have been designed, but their performances have not been well characterized. In this paper, we investigate the performance of six of these methods. We evaluate them on simulations along empirical phylogenies in which transition events have been annotated and compare their ability to detect sites that have undergone changes in the direction or intensity of selection to that of a widely used dN/dS approach, codeml's branch-site model A. We show that all methods have reduced performance in the presence of biased gene conversion but not CpG hypermutability. The best profile method, Pelican, a new implementation of Tamuri AU, Hay AJ, Goldstein RA. (2009. Identifying changes in selective constraints: host shifts in influenza. PLoS Comput Biol. 5(11):e1000564), performs as well as codeml in a range of conditions except for detecting relaxations of selection, and performs better when tree length increases, or in the presence of persistent positive selection. It is fast, enabling genome-scale searches for site-wise changes in the direction of selection associated with phenotypic changes.

Assuntos

Evolução Molecular , Seleção Genética , Códon , Modelos Genéticos , Aminoácidos/genética , Filogenia

3.

Relative Time Constraints Improve Molecular Dating.

Szöllõsi, Gergely J; Höhna, Sebastian; Williams, Tom A; Schrempf, Dominik; Daubin, Vincent; Boussau, Bastien.

Syst Biol ; 71(4): 797-809, 2022 06 16.

Artigo em Inglês | MEDLINE | ID: mdl-34668564

RESUMO

Dating the tree of life is central to understanding the evolution of life on Earth. Molecular clocks calibrated with fossils represent the state of the art for inferring the ages of major groups. Yet, other information on the timing of species diversification can be used to date the tree of life. For example, horizontal gene transfer events and ancient coevolutionary interactions such as (endo)symbioses occur between contemporaneous species and thus can imply temporal relationships between two nodes in a phylogeny. Temporal constraints from these alternative sources can be particularly helpful when the geological record is sparse, for example, for microorganisms, which represent the majority of extant and extinct biodiversity. Here, we present a new method to combine fossil calibrations and relative age constraints to estimate chronograms. We provide an implementation of relative age constraints in RevBayes that can be combined in a modular manner with the wide range of molecular dating methods available in the software. We use both realistic simulations and empirical datasets of 40 Cyanobacteria and 62 Archaea to evaluate our method. We show that the combination of relative age constraints with fossil calibrations significantly improves the estimation of node ages. [Archaea, Bayesian analysis, cyanobacteria, dating, endosymbiosis, lateral gene transfer, MCMC, molecular clock, phylogenetic dating, relaxed molecular clock, revbayes, tree of life.].

Assuntos

Fósseis , Transferência Genética Horizontal , Teorema de Bayes , Evolução Molecular , Filogenia , Simbiose

4.

Treerecs: an integrated phylogenetic tool, from sequences to reconciliations.

Comte, Nicolas; Morel, Benoit; Hasic, Damir; Guéguen, Laurent; Boussau, Bastien; Daubin, Vincent; Penel, Simon; Scornavacca, Celine; Gouy, Manolo; Stamatakis, Alexandros; Tannier, Eric; Parsons, David P.

Bioinformatics ; 36(18): 4822-4824, 2020 09 15.

Artigo em Inglês | MEDLINE | ID: mdl-33085745

RESUMO

MOTIVATION: Gene and species tree reconciliation methods are used to interpret gene trees, root them and correct uncertainties that are due to scarcity of signal in multiple sequence alignments. So far, reconciliation tools have not been integrated in standard phylogenetic software and they either lack performance on certain functions, or usability for biologists. RESULTS: We present Treerecs, a phylogenetic software based on duplication-loss reconciliation. Treerecs is simple to install and to use. It is fast and versatile, has a graphic output, and can be used along with methods for phylogenetic inference on multiple alignments like PLL and Seaview. AVAILABILITY AND IMPLEMENTATION: Treerecs is open-source. Its source code (C++, AGPLv3) and manuals are available from https://project.inria.fr/treerecs/.

Assuntos

Algoritmos , Evolução Molecular , Filogenia , Alinhamento de Sequência , Software

5.

CAARS: comparative assembly and annotation of RNA-Seq data.

Rey, Carine; Veber, Philippe; Boussau, Bastien; Sémon, Marie.

Bioinformatics ; 35(13): 2199-2207, 2019 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-30452539

RESUMO

MOTIVATION: RNA sequencing (RNA-Seq) is a widely used approach to obtain transcript sequences in non-model organisms, notably for performing comparative analyses. However, current bioinformatic pipelines do not take full advantage of pre-existing reference data in related species for improving RNA-Seq assembly, annotation and gene family reconstruction. RESULTS: We built an automated pipeline named CAARS to combine novel data from RNA-Seq experiments with existing multi-species gene family alignments. RNA-Seq reads are assembled into transcripts by both de novo and assisted assemblies. Then, CAARS incorporates transcripts into gene families, builds gene alignments and trees and uses phylogenetic information to classify the genes as orthologs and paralogs of existing genes. We used CAARS to assemble and annotate RNA-Seq data in rodents and fishes using distantly related genomes as reference, a difficult case for this kind of analysis. We showed CAARS assemblies are more complete and accurate than those assembled by a standard pipeline consisting of de novo assembly coupled with annotation by sequence similarity on a guide species. In addition to annotated transcripts, CAARS provides gene family alignments and trees, annotated with orthology relationships, directly usable for downstream comparative analyses. AVAILABILITY AND IMPLEMENTATION: CAARS is implemented in Python and Ocaml and is freely available at https://github.com/carinerey/caars. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Genoma , Análise de Sequência de RNA , Anotação de Sequência Molecular , Filogenia , RNA , Software , Transcriptoma

6.

Integrative modeling of gene and genome evolution roots the archaeal tree of life.

Williams, Tom A; Szöllosi, Gergely J; Spang, Anja; Foster, Peter G; Heaps, Sarah E; Boussau, Bastien; Ettema, Thijs J G; Embley, T Martin.

Proc Natl Acad Sci U S A ; 114(23): E4602-E4611, 2017 06 06.

Artigo em Inglês | MEDLINE | ID: mdl-28533395

RESUMO

A root for the archaeal tree is essential for reconstructing the metabolism and ecology of early cells and for testing hypotheses that propose that the eukaryotic nuclear lineage originated from within the Archaea; however, published studies based on outgroup rooting disagree regarding the position of the archaeal root. Here we constructed a consensus unrooted archaeal topology using protein concatenation and a multigene supertree method based on 3,242 single gene trees, and then rooted this tree using a recently developed model of genome evolution. This model uses evidence from gene duplications, horizontal transfers, and gene losses contained in 31,236 archaeal gene families to identify the most likely root for the tree. Our analyses support the monophyly of DPANN (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, Nanohaloarchaea), a recently discovered cosmopolitan and genetically diverse lineage, and, in contrast to previous work, place the tree root between DPANN and all other Archaea. The sister group to DPANN comprises the Euryarchaeota and the TACK Archaea, including Lokiarchaeum, which our analyses suggest are monophyletic sister lineages. Metabolic reconstructions on the rooted tree suggest that early Archaea were anaerobes that may have had the ability to reduce CO2 to acetate via the Wood-Ljungdahl pathway. In contrast to proposals suggesting that genome reduction has been the predominant mode of archaeal evolution, our analyses infer a relatively small-genomed archaeal ancestor that subsequently increased in complexity via gene duplication and horizontal gene transfer.

Assuntos

Archaea/genética , Evolução Molecular , Genoma Arqueal , Modelos Genéticos , Algoritmos , Archaea/classificação , Archaea/metabolismo , Eucariotos/classificação , Eucariotos/genética , Duplicação Gênica , Transferência Genética Horizontal , Redes e Vias Metabólicas/genética , Família Multigênica , Filogenia , Temperatura

7.

Accurate Detection of Convergent Amino-Acid Evolution with PCOC.

Rey, Carine; Guéguen, Laurent; Sémon, Marie; Boussau, Bastien.

Mol Biol Evol ; 35(9): 2296-2306, 2018 09 01.

Artigo em Inglês | MEDLINE | ID: mdl-29986048

RESUMO

In the history of life, some phenotypes have been acquired several times independently, through convergent evolution. Recently, lots of genome-scale studies have been devoted to identify nucleotides or amino acids that changed in a convergent manner when the convergent phenotypes evolved. These efforts have had mixed results, probably because of differences in the detection methods, and because of conceptual differences about the definition of a convergent substitution. Some methods contend that substitutions are convergent only if they occur on all branches where the phenotype changed toward the exact same state at a given nucleotide or amino acid position. Others are much looser in their requirements and define a convergent substitution as one that leads the site at which they occur to prefer a phylogeny in which species with the convergent phenotype group together. Here, we suggest to look for convergent shifts in amino acid preferences instead of convergent substitutions to the exact same amino acid. We define as convergent shifts substitutions that occur on all branches where the phenotype changed and such that they correspond to a change in the type of amino acid preferred at this position. We implement the corresponding model into a method named PCOC. We show on simulations that PCOC better recovers convergent shifts than existing methods in terms of sensitivity and specificity. We test it on a plant protein alignment where convergent evolution has been studied in detail and find that our method recovers several previously identified convergent substitutions and proposes credible new candidates.

Assuntos

Substituição de Aminoácidos , Evolução Molecular , Técnicas Genéticas , Modelos Genéticos , Animais , Cyperaceae/genética , Mamíferos/genética

8.

RecPhyloXML: a format for reconciled gene trees.

Duchemin, Wandrille; Gence, Guillaume; Arigon Chifolleau, Anne-Muriel; Arvestad, Lars; Bansal, Mukul S; Berry, Vincent; Boussau, Bastien; Chevenet, François; Comte, Nicolas; Davín, Adrián A; Dessimoz, Christophe; Dylus, David; Hasic, Damir; Mallo, Diego; Planel, Rémi; Posada, David; Scornavacca, Celine; Szöllosi, Gergely; Zhang, Louxin; Tannier, Éric; Daubin, Vincent.

Bioinformatics ; 34(21): 3646-3652, 2018 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-29762653

RESUMO

Motivation: A reconciliation is an annotation of the nodes of a gene tree with evolutionary events-for example, speciation, gene duplication, transfer, loss, etc.-along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Results: Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative-albeit flexible-specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities. Availability and implementation: http://phylariane.univ-lyon1.fr/recphyloxml/.

Assuntos

Evolução Molecular , Duplicação Gênica , Algoritmos , Filogenia , Software

9.

Gene Acquisitions from Bacteria at the Origins of Major Archaeal Clades Are Vastly Overestimated.

Groussin, Mathieu; Boussau, Bastien; Szöllõsi, Gergely; Eme, Laura; Gouy, Manolo; Brochier-Armanet, Céline; Daubin, Vincent.

Mol Biol Evol ; 33(2): 305-10, 2016 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-26541173

RESUMO

In a recent article, Nelson-Sathi et al. (NS) report that the origins of major archaeal lineages (MAL) correspond to massive group-specific gene acquisitions via HGT from bacteria (Nelson-Sathi et al. 2015. Origins of major archaeal clades correspond to gene acquisitions from bacteria. Nature 517(7532):77-80.). If correct, this would have fundamental implications for the process of diversification in microbes. However, a reexamination of these data and results shows that the methodology used by NS systematically inflates the number of genes acquired at the root of each MAL, and incorrectly assumes bacterial origins for these genes. A reanalysis of their data with appropriate phylogenetic models accounting for the dynamics of gene gain and loss between lineages supports the continuous acquisition of genes over long periods in the evolution of Archaea.

Assuntos

Archaea/genética , Bactérias/genética , Evolução Molecular , Transferência Genética Horizontal , Genótipo , Archaea/classificação , Genes Arqueais , Genes Bacterianos , Genômica , Filogenia

10.

RevBayes: Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model-Specification Language.

Höhna, Sebastian; Landis, Michael J; Heath, Tracy A; Boussau, Bastien; Lartillot, Nicolas; Moore, Brian R; Huelsenbeck, John P; Ronquist, Fredrik.

Syst Biol ; 65(4): 726-36, 2016 07.

Artigo em Inglês | MEDLINE | ID: mdl-27235697

RESUMO

Programs for Bayesian inference of phylogeny currently implement a unique and ï¬xed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be speciï¬ed interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-speciï¬cation language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous ï¬exibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our ï¬eld. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.].

Assuntos

Classificação/métodos , Modelos Biológicos , Filogenia , Software , Teorema de Bayes

11.

Genome-scale coestimation of species and gene trees.

Boussau, Bastien; Szöllosi, Gergely J; Duret, Laurent; Gouy, Manolo; Tannier, Eric; Daubin, Vincent.

Genome Res ; 23(2): 323-30, 2013 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-23132911

RESUMO

Comparisons of gene trees and species trees are key to understanding major processes of genome evolution such as gene duplication and loss. Because current methods to reconstruct phylogenies fail to model the two-way dependency between gene trees and the species tree, they often misrepresent gene and species histories. We present a new probabilistic model to jointly infer rooted species and gene trees for dozens of genomes and thousands of gene families. We use simulations to show that this method accurately infers the species tree and gene trees, is robust to misspecification of the models of sequence and gene family evolution, and provides a precise historic record of gene duplications and losses throughout genome evolution. We simultaneously reconstruct the history of mammalian species and their genes based on 36 completely sequenced genomes, and use the reconstructed gene trees to infer the gene content and organization of ancestral mammalian genomes. We show that our method yields a more accurate picture of ancestral genomes than the trees available in the authoritative database Ensembl.

Assuntos

Genes , Genoma , Modelos Genéticos , Filogenia , Algoritmos , Animais , Biologia Computacional/métodos , Simulação por Computador , Evolução Molecular , Deleção de Genes , Duplicação Gênica , Humanos , Modelos Estatísticos

12.

The inference of gene trees with species trees.

Szöllosi, Gergely J; Tannier, Eric; Daubin, Vincent; Boussau, Bastien.

Syst Biol ; 64(1): e42-62, 2015 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-25070970

RESUMO

This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.

Assuntos

Modelos Biológicos , Filogenia , Simulação por Computador , Especiação Genética , Genoma/genética , Mutação/genética

13.

Assessing approaches for inferring species trees from multi-copy genes.

Chaudhary, Ruchi; Boussau, Bastien; Burleigh, J Gordon; Fernández-Baca, David.

Syst Biol ; 64(2): 325-39, 2015 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-25540456

RESUMO

With the availability of genomic sequence data, there is increasing interest in using genes with a possible history of duplication and loss for species tree inference. Here we assess the performance of both nonprobabilistic and probabilistic species tree inference approaches using gene duplication and loss and coalescence simulations. We evaluated the performance of gene tree parsimony (GTP) based on duplication (Only-dup), duplication and loss (Dup-loss), and deep coalescence (Deep-c) costs, the NJst distance method, the MulRF supertree method, and PHYLDOG, which jointly estimates gene trees and species tree using a hierarchical probabilistic model. We examined the effects of gene tree and species sampling, gene tree error, and duplication and loss rates on the accuracy of phylogenetic estimates. In the 10-taxon duplication and loss simulation experiments, MulRF is more accurate than the other methods when the duplication and loss rates are low, and Dup-loss is generally the most accurate when the duplication and loss rates are high. PHYLDOG performs well in 10-taxon duplication and loss simulations, but its run time is prohibitively long on larger data sets. In the larger duplication and loss simulation experiments, MulRF outperforms all other methods in experiments with at most 100 taxa; however, in the larger simulation, Dup-loss generally performs best. In all duplication and loss simulation experiments with more than 10 taxa, all methods perform better with more gene trees and fewer missing sequences, and they are all affected by gene tree error. Our results also highlight high levels of error in estimates of duplications and losses from GTP methods and demonstrate the usefulness of methods based on generic tree distances for large analyses.

Assuntos

Classificação/métodos , Filogenia , Análise de Sequência de DNA/métodos , Simulação por Computador , Deleção de Genes , Duplicação Gênica , Análise de Sequência de DNA/normas , Software/normas

14.

Probabilistic graphical model representation in phylogenetics.

Höhna, Sebastian; Heath, Tracy A; Boussau, Bastien; Landis, Michael J; Ronquist, Fredrik; Huelsenbeck, John P.

Syst Biol ; 63(5): 753-71, 2014 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-24951559

RESUMO

Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis-Hastings or Gibbs sampling of the posterior distribution.

Assuntos

Classificação/métodos , Modelos Estatísticos , Filogenia , Algoritmos , Simulação por Computador

15.

Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations.

Szöllosi, Gergely J; Boussau, Bastien; Abby, Sophie S; Tannier, Eric; Daubin, Vincent.

Proc Natl Acad Sci U S A ; 109(43): 17513-8, 2012 Oct 23.

Artigo em Inglês | MEDLINE | ID: mdl-23043116

RESUMO

The timing of the evolution of microbial life has largely remained elusive due to the scarcity of prokaryotic fossil record and the confounding effects of the exchange of genes among possibly distant species. The history of gene transfer events, however, is not a series of individual oddities; it records which lineages were concurrent and thus provides information on the timing of species diversification. Here, we use a probabilistic model of genome evolution that accounts for differences between gene phylogenies and the species tree as series of duplication, transfer, and loss events to reconstruct chronologically ordered species phylogenies. Using simulations we show that we can robustly recover accurate chronologically ordered species phylogenies in the presence of gene tree reconstruction errors and realistic rates of duplication, transfer, and loss. Using genomic data we demonstrate that we can infer rooted species phylogenies using homologous gene families from complete genomes of 10 bacterial and archaeal groups. Focusing on cyanobacteria, distinguished among prokaryotes by a relative abundance of fossils, we infer the maximum likelihood chronologically ordered species phylogeny based on 36 genomes with 8,332 homologous gene families. We find the order of speciation events to be in full agreement with the fossil record and the inferred phylogeny of cyanobacteria to be consistent with the phylogeny recovered from established phylogenomics methods. Our results demonstrate that lateral gene transfers, detected by probabilistic models of genome evolution, can be used as a source of information on the timing of evolution, providing a valuable complement to the limited prokaryotic fossil record.

Assuntos

Transferência Genética Horizontal , Modelos Genéticos , Filogenia , Especificidade da Espécie , Funções Verossimilhança

16.

Bio++: efficient extensible libraries and tools for computational molecular evolution.

Guéguen, Laurent; Gaillard, Sylvain; Boussau, Bastien; Gouy, Manolo; Groussin, Mathieu; Rochette, Nicolas C; Bigot, Thomas; Fournier, David; Pouyet, Fanny; Cahais, Vincent; Bernard, Aurélien; Scornavacca, Céline; Nabholz, Benoît; Haudry, Annabelle; Dachary, Loïc; Galtier, Nicolas; Belkhir, Khalid; Dutheil, Julien Y.

Mol Biol Evol ; 30(8): 1745-50, 2013 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-23699471

RESUMO

Efficient algorithms and programs for the analysis of the ever-growing amount of biological sequence data are strongly needed in the genomics era. The pace at which new data and methodologies are generated calls for the use of pre-existing, optimized-yet extensible-code, typically distributed as libraries or packages. This motivated the Bio++ project, aiming at developing a set of C++ libraries for sequence analysis, phylogenetics, population genetics, and molecular evolution. The main attractiveness of Bio++ is the extensibility and reusability of its components through its object-oriented design, without compromising the computer-efficiency of the underlying methods. We present here the second major release of the libraries, which provides an extended set of classes and methods. These extensions notably provide built-in access to sequence databases and new data structures for handling and manipulating sequences from the omics era, such as multiple genome alignments and sequencing reads libraries. More complex models of sequence evolution, such as mixture models and generic n-tuples alphabets, are also included.

Assuntos

Biologia Computacional , Evolução Molecular , Software , Algoritmos , Biologia Computacional/métodos , Genômica/métodos , Humanos , Internet

17.

Efficient exploration of the space of reconciled gene trees.

Szöllõsi, Gergely J; Rosikiewicz, Wojciech; Boussau, Bastien; Tannier, Eric; Daubin, Vincent.

Syst Biol ; 62(6): 901-12, 2013 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-23925510

RESUMO

Gene trees record the combination of gene-level events, such as duplication, transfer and loss (DTL), and species-level events, such as speciation and extinction. Gene tree-species tree reconciliation methods model these processes by drawing gene trees into the species tree using a series of gene and species-level events. The reconstruction of gene trees based on sequence alone almost always involves choosing between statistically equivalent or weakly distinguishable relationships that could be much better resolved based on a putative species tree. To exploit this potential for accurate reconstruction of gene trees, the space of reconciled gene trees must be explored according to a joint model of sequence evolution and gene tree-species tree reconciliation. Here we present amalgamated likelihood estimation (ALE), a probabilistic approach to exhaustively explore all reconciled gene trees that can be amalgamated as a combination of clades observed in a sample of gene trees. We implement the ALE approach in the context of a reconciliation model (Szöllosi et al. 2013), which allows for the DTL of genes. We use ALE to efficiently approximate the sum of the joint likelihood over amalgamations and to find the reconciled gene tree that maximizes the joint likelihood among all such trees. We demonstrate using simulations that gene trees reconstructed using the joint likelihood are substantially more accurate than those reconstructed using sequence alone. Using realistic gene tree topologies, branch lengths, and alignment sizes, we demonstrate that ALE produces more accurate gene trees even if the model of sequence evolution is greatly simplified. Finally, examining 1099 gene families from 36 cyanobacterial genomes we find that joint likelihood-based inference results in a striking reduction in apparent phylogenetic discord, with respectively. 24%, 59%, and 46% reductions in the mean numbers of duplications, transfers, and losses per gene family. The open source implementation of ALE is available from https://github.com/ssolo/ALE.git.

Assuntos

Classificação/métodos , Filogenia , Simulação por Computador , Cianobactérias/classificação , Cianobactérias/genética , Reprodutibilidade dos Testes , Análise de Sequência de DNA

18.

Parallel adaptations to high temperatures in the Archaean eon.

Boussau, Bastien; Blanquart, Samuel; Necsulea, Anamaria; Lartillot, Nicolas; Gouy, Manolo.

Nature ; 456(7224): 942-5, 2008 Dec 18.

Artigo em Inglês | MEDLINE | ID: mdl-19037246

RESUMO

Fossils of organisms dating from the origin and diversification of cellular life are scant and difficult to interpret, for this reason alternative means to investigate the ecology of the last universal common ancestor (LUCA) and of the ancestors of the three domains of life are of great scientific value. It was recently recognized that the effects of temperature on ancestral organisms left 'genetic footprints' that could be uncovered in extant genomes. Accordingly, analyses of resurrected proteins predicted that the bacterial ancestor was thermophilic and that Bacteria subsequently adapted to lower temperatures. As the archaeal ancestor is also thought to have been thermophilic, the LUCA was parsimoniously inferred as thermophilic too. However, an analysis of ribosomal RNAs supported the hypothesis of a non-hyperthermophilic LUCA. Here we show that both rRNA and protein sequences analysed with advanced, realistic models of molecular evolution provide independent support for two environmental-temperature-related phases during the evolutionary history of the tree of life. In the first period, thermotolerance increased from a mesophilic LUCA to thermophilic ancestors of Bacteria and of Archaea-Eukaryota; in the second period, it decreased. Therefore, the two lineages descending from the LUCA and leading to the ancestors of Bacteria and Archaea-Eukaryota convergently adapted to high temperatures, possibly in response to a climate change of the early Earth, and/or aided by the transition from an RNA genome in the LUCA to organisms with more thermostable DNA genomes. This analysis unifies apparently contradictory results into a coherent depiction of the evolution of an ecological trait over the entire tree of life.

Assuntos

Adaptação Fisiológica/fisiologia , Archaea/fisiologia , Temperatura Alta , Adaptação Fisiológica/genética , Archaea/genética , Evolução Molecular , Genes de RNAr/genética , Filogenia

19.

Efficient selection of branch-specific models of sequence evolution.

Dutheil, Julien Y; Galtier, Nicolas; Romiguier, Jonathan; Douzery, Emmanuel J P; Ranwez, Vincent; Boussau, Bastien.

Mol Biol Evol ; 29(7): 1861-74, 2012 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-22319139

RESUMO

The analysis of extant sequences shows that molecular evolution has been heterogeneous through time and among lineages. However, for a given sequence alignment, it is often difficult to uncover what factors caused this heterogeneity. In fact, identifying and characterizing heterogeneous patterns of molecular evolution along a phylogenetic tree is very challenging, for lack of appropriate methods. Users either have to a priori define groups of branches along which they believe molecular evolution has been similar or have to allow each branch to have its own pattern of molecular evolution. The first approach assumes prior knowledge that is seldom available, and the second requires estimating an unreasonably large number of parameters. Here we propose a convenient and reliable approach where branches get clustered by their pattern of molecular evolution alone, with no need for prior knowledge about the data set under study. Model selection is achieved in a statistical framework and therefore avoids overparameterization. We rely on substitution mapping for efficiency and present two clustering approaches, depending on whether or not we expect neighbouring branches to share more similar patterns of sequence evolution than distant branches. We validate our method on simulations and test it on four previously published data sets. We find that our method correctly groups branches sharing similar equilibrium GC contents in a data set of ribosomal RNAs and recovers expected footprints of selection through dN/dS. Importantly, it also uncovers a new pattern of relaxed selection in a phylogeny of Mantellid frogs, which we are able to correlate to life-history traits. This shows that our programs should be very useful to study patterns of molecular evolution and reveal new correlations between sequence and species evolution. Our programs can run on DNA, RNA, codon, or amino acid sequences with a large set of possible models of substitutions and are available at http://biopp.univ-montp2.fr/forge/testnh.

Assuntos

Algoritmos , Evolução Molecular , Modelos Genéticos , Animais , Evolução Biológica , Análise por Conglomerados , Simulação por Computador , Daphnia/genética , Muramidase/genética , Filogenia , RNA Ribossômico/genética , Ranidae/genética

20.

Evolution of gene neighborhoods within reconciled phylogenies.

Bérard, Sèverine; Gallien, Coralie; Boussau, Bastien; Szöllosi, Gergely J; Daubin, Vincent; Tannier, Eric.

Bioinformatics ; 28(18): i382-i388, 2012 Sep 15.

Artigo em Inglês | MEDLINE | ID: mdl-22962456

RESUMO

MOTIVATION: Most models of genome evolution integrating gene duplications, losses and chromosomal rearrangements are computationally intract able, even when comparing only two genomes. This prevents large-scale studies that consider different types of genome structural variations. RESULTS: We define an 'adjacency phylogenetic tree' that describes the evolution of an adjacency, a neighborhood relation between two genes, by speciation, duplication or loss of one or both genes, and rearrangement. We describe an algorithm that, given a species tree and a set of gene trees where the leaves are connected by adjacencies, computes an adjacency forest that minimizes the number of gains and breakages of adjacencies (caused by rearrangements) and runs in polynomial time. We use this algorithm to reconstruct contiguous regions of mammalian and plant ancestral genomes in a few minutes for a dozen species and several thousand genes. We show that this method yields reduced conflict between ancestral adjacencies. We detect duplications involving several genes and compare the different modes of evolution between phyla and among lineages. AVAILABILITY: C++ implementation using BIO++ package, available upon request to Sèverine Bérard. CONTACT: Severine.Berard@cirad.fr or Eric.Tannier@inria.fr SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.

Assuntos

Algoritmos , Evolução Molecular , Genes , Filogenia , Animais , Duplicação Gênica , Genoma , Genoma de Planta , Mamíferos/genética , Modelos Genéticos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA