Búsqueda | Biblioteca Virtual en Salud

1.

Divergent genomic trajectories predate the origin of animals and fungi.

Ocaña-Pallarès, Eduard; Williams, Tom A; López-Escardó, David; Arroyo, Alicia S; Pathmanathan, Jananan S; Bapteste, Eric; Tikhonenkov, Denis V; Keeling, Patrick J; Szöllosi, Gergely J; Ruiz-Trillo, Iñaki.

Nature ; 609(7928): 747-753, 2022 09.

Artículo en Inglés | MEDLINE | ID: mdl-36002568

RESUMEN

Animals and fungi have radically distinct morphologies, yet both evolved within the same eukaryotic supergroup: Opisthokonta1,2. Here we reconstructed the trajectory of genetic changes that accompanied the origin of Metazoa and Fungi since the divergence of Opisthokonta with a dataset that includes four novel genomes from crucial positions in the Opisthokonta phylogeny. We show that animals arose only after the accumulation of genes functionally important for their multicellularity, a tendency that began in the pre-metazoan ancestors and later accelerated in the metazoan root. By contrast, the pre-fungal ancestors experienced net losses of most functional categories, including those gained in the path to Metazoa. On a broad-scale functional level, fungal genomes contain a higher proportion of metabolic genes and diverged less from the last common ancestor of Opisthokonta than did the gene repertoires of Metazoa. Metazoa and Fungi also show differences regarding gene gain mechanisms. Gene fusions are more prevalent in Metazoa, whereas a larger fraction of gene gains were detected as horizontal gene transfers in Fungi and protists, in agreement with the long-standing idea that transfers would be less relevant in Metazoa due to germline isolation3-5. Together, our results indicate that animals and fungi evolved under two contrasting trajectories of genetic change that predated the origin of both groups. The gradual establishment of two clearly differentiated genomic contexts thus set the stage for the emergence of Metazoa and Fungi.

Asunto(s)

Evolución Molecular , Hongos , Genoma , Genómica , Filogenia , Animales , Hongos/genética , Transferencia de Gen Horizontal , Genes , Genoma/genética , Genoma Fúngico/genética , Metabolismo/genética

2.

AleRax: a tool for gene and species tree co-estimation and reconciliation under a probabilistic model of gene duplication, transfer, and loss.

Morel, Benoit; Williams, Tom A; Stamatakis, Alexandros; Szöllosi, Gergely J.

Bioinformatics ; 40(4)2024 03 29.

Artículo en Inglés | MEDLINE | ID: mdl-38514421

RESUMEN

MOTIVATION: Genomes are a rich source of information on the pattern and process of evolution across biological scales. How best to make use of that information is an active area of research in phylogenetics. Ideally, phylogenetic methods should not only model substitutions along gene trees, which explain differences between homologous gene sequences, but also the processes that generate the gene trees themselves along a shared species tree. To conduct accurate inferences, one needs to account for uncertainty at both levels, that is, in gene trees estimated from inherently short sequences and in their diverse evolutionary histories along a shared species tree. RESULTS: We present AleRax, a software that can infer reconciled gene trees together with a shared species tree using a simple, yet powerful, probabilistic model of gene duplication, transfer, and loss. A key feature of AleRax is its ability to account for uncertainty in the gene tree and its reconciliation by using an efficient approximation to calculate the joint phylogenetic-reconciliation likelihood and sample reconciled gene trees accordingly. Simulations and analyses of empirical data show that AleRax is one order of magnitude faster than competing gene tree inference tools while attaining the same accuracy. It is consistently more robust than species tree inference methods such as SpeciesRax and ASTRAL-Pro 2 under gene tree uncertainty. Finally, AleRax can process multiple gene families in parallel thereby allowing users to compare competing phylogenetic hypotheses and estimate model parameters, such as duplication, transfer, and loss probabilities for genome-scale datasets with hundreds of taxa. AVAILABILITY AND IMPLEMENTATION: GNU GPL at https://github.com/BenoitMorel/AleRax and data are made available at https://cme.h-its.org/exelixis/material/alerax_data.tar.gz.

Asunto(s)

Algoritmos , Duplicación de Gen , Filogenia , Programas Informáticos , Modelos Estadísticos , Evolución Molecular

3.

Compositionally Constrained Sites Drive Long-Branch Attraction.

Szánthó, Lénárd L; Lartillot, Nicolas; Szöllosi, Gergely J; Schrempf, Dominik.

Syst Biol ; 72(4): 767-780, 2023 08 07.

Artículo en Inglés | MEDLINE | ID: mdl-36946562

RESUMEN

Accurate phylogenies are fundamental to our understanding of the pattern and process of evolution. Yet, phylogenies at deep evolutionary timescales, with correspondingly long branches, have been fraught with controversy resulting from conflicting estimates from models with varying complexity and goodness of fit. Analyses of historical as well as current empirical datasets, such as alignments including Microsporidia, Nematoda, or Platyhelminthes, have demonstrated that inadequate modeling of across-site compositional heterogeneity, which is the result of biochemical constraints that lead to varying patterns of accepted amino acids along sequences, can lead to erroneous topologies that are strongly supported. Unfortunately, models that adequately account for across-site compositional heterogeneity remain computationally challenging or intractable for an increasing fraction of contemporary datasets. Here, we introduce "compositional constraint analysis," a method to investigate the effect of site-specific constraints on amino acid composition on phylogenetic inference. We show that more constrained sites with lower diversity and less constrained sites with higher diversity exhibit ostensibly conflicting signals under models ignoring across-site compositional heterogeneity that lead to long-branch attraction artifacts and demonstrate that more complex models accounting for across-site compositional heterogeneity can ameliorate this bias. We present CAT-posterior mean site frequencies (PMSF), a pipeline for diagnosing and resolving phylogenetic bias resulting from inadequate modeling of across-site compositional heterogeneity based on the CAT model. CAT-PMSF is robust against long-branch attraction in all alignments we have examined. We suggest using CAT-PMSF when convergence of the CAT model cannot be assured. We find evidence that compositionally constrained sites are driving long-branch attraction in two metazoan datasets and recover evidence for Porifera as the sister group to all other animals. [Animal phylogeny; cross-site heterogeneity; long-branch attraction; phylogenomics.].

Asunto(s)

Microsporidios , Animales , Filogenia , Sesgo , Modelos Genéticos

4.

Recoding Amino Acids to a Reduced Alphabet may Increase or Decrease Phylogenetic Accuracy.

Foster, Peter G; Schrempf, Dominik; Szöllosi, Gergely J; Williams, Tom A; Cox, Cymon J; Embley, T Martin.

Syst Biol ; 72(3): 723-737, 2023 Jun 17.

Artículo en Inglés | MEDLINE | ID: mdl-35713492

RESUMEN

Common molecular phylogenetic characteristics such as long branches and compositional heterogeneity can be problematic for phylogenetic reconstruction when using amino acid data. Recoding alignments to reduced alphabets before phylogenetic analysis has often been used both to explore and potentially decrease the effect of such problems. We tested the effectiveness of this strategy on topological accuracy using simulated data on four-taxon trees. We simulated alignments in phylogenetically challenging ways to test the phylogenetic accuracy of analyses using various recoding strategies together with commonly used homogeneous models. We tested three recoding methods based on amino acid exchangeability, and another recoding method based on lowering the compositional heterogeneity among alignment sequences as measured by the Chi-squared statistic. Our simulation results show that on trees with long branches where sequences approach saturation, accuracy was not greatly affected by exchangeability-based recodings, but Chi-squared-based recoding decreased accuracy. We then simulated sequences with different kinds of compositional heterogeneity over the tree. Recoding often increased accuracy on such alignments. Exchangeability-based recoding was rarely worse than not recoding, and often considerably better. Recoding based on lowering the Chi-squared value improved accuracy in some cases but not in others, suggesting that low compositional heterogeneity by itself is not sufficient to increase accuracy in the analysis of these alignments. We also simulated alignments using site-specific amino acid profiles, making sequences that had compositional heterogeneity over alignment sites. Exchangeability-based recoding coupled with site-homogeneous models had poor accuracy for these data sets but Chi-squared-based recoding on these alignments increased accuracy. We then simulated data sets that were compositionally both site- and tree-heterogeneous, like many real data sets. The effect on the accuracy of recoding such doubly problematic data sets varied widely, depending on the type of compositional tree heterogeneity and on the recoding scheme. Interestingly, analysis of unrecoded compositionally heterogeneous alignments with the NDCH or CAT models was generally more accurate than homogeneous analysis, whether recoded or not. Overall, our results suggest that making trees for recoded amino acid data sets can be useful, but they need to be interpreted cautiously as part of a more comprehensive analysis. The use of better-fitting models like NDCH and CAT, which directly account for the patterns in the data, may offer a more promising long-term solution for analyzing empirical data. [Compositional heterogeneity; models of evolution; phylogenetic methods; recoding amino acid data sets.].

Asunto(s)

Aminoácidos , Filogenia , Simulación por Computador

5.

SpeciesRax: A Tool for Maximum Likelihood Species Tree Inference from Gene Family Trees under Duplication, Transfer, and Loss.

Morel, Benoit; Schade, Paul; Lutteropp, Sarah; Williams, Tom A; Szöllosi, Gergely J; Stamatakis, Alexandros.

Mol Biol Evol ; 39(2)2022 02 03.

Artículo en Inglés | MEDLINE | ID: mdl-35021210

RESUMEN

Species tree inference from gene family trees is becoming increasingly popular because it can account for discordance between the species tree and the corresponding gene family trees. In particular, methods that can account for multiple-copy gene families exhibit potential to leverage paralogy as informative signal. At present, there does not exist any widely adopted inference method for this purpose. Here, we present SpeciesRax, the first maximum likelihood method that can infer a rooted species tree from a set of gene family trees and can account for gene duplication, loss, and transfer events. By explicitly modeling events by which gene trees can depart from the species tree, SpeciesRax leverages the phylogenetic rooting signal in gene trees. SpeciesRax infers species tree branch lengths in units of expected substitutions per site and branch support values via paralogy-aware quartets extracted from the gene family trees. Using both empirical and simulated data sets we show that SpeciesRax is at least as accurate as the best competing methods while being one order of magnitude faster on large data sets at the same time. We used SpeciesRax to infer a biologically plausible rooted phylogeny of the vertebrates comprising 188 species from 31,612 gene families in 1 h using 40 cores. SpeciesRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax and on BioConda.

Asunto(s)

Algoritmos , Duplicación de Gen , Modelos Genéticos , Linaje , Filogenia

6.

Relative Time Constraints Improve Molecular Dating.

Szöllõsi, Gergely J; Höhna, Sebastian; Williams, Tom A; Schrempf, Dominik; Daubin, Vincent; Boussau, Bastien.

Syst Biol ; 71(4): 797-809, 2022 06 16.

Artículo en Inglés | MEDLINE | ID: mdl-34668564

RESUMEN

Dating the tree of life is central to understanding the evolution of life on Earth. Molecular clocks calibrated with fossils represent the state of the art for inferring the ages of major groups. Yet, other information on the timing of species diversification can be used to date the tree of life. For example, horizontal gene transfer events and ancient coevolutionary interactions such as (endo)symbioses occur between contemporaneous species and thus can imply temporal relationships between two nodes in a phylogeny. Temporal constraints from these alternative sources can be particularly helpful when the geological record is sparse, for example, for microorganisms, which represent the majority of extant and extinct biodiversity. Here, we present a new method to combine fossil calibrations and relative age constraints to estimate chronograms. We provide an implementation of relative age constraints in RevBayes that can be combined in a modular manner with the wide range of molecular dating methods available in the software. We use both realistic simulations and empirical datasets of 40 Cyanobacteria and 62 Archaea to evaluate our method. We show that the combination of relative age constraints with fossil calibrations significantly improves the estimation of node ages. [Archaea, Bayesian analysis, cyanobacteria, dating, endosymbiosis, lateral gene transfer, MCMC, molecular clock, phylogenetic dating, relaxed molecular clock, revbayes, tree of life.].

Asunto(s)

Fósiles , Transferencia de Gen Horizontal , Teorema de Bayes , Evolución Molecular , Filogenia , Simbiosis

7.

Distinguishing excess mutations and increased cell death based on variant allele frequencies.

Tibély, Gergely; Schrempf, Dominik; Derényi, Imre; Szöllosi, Gergely J.

PLoS Comput Biol ; 18(4): e1010048, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-35468135

RESUMEN

Tumors often harbor orders of magnitude more mutations than healthy tissues. The increased number of mutations may be due to an elevated mutation rate or frequent cell death and correspondingly rapid cell turnover, or a combination of the two. It is difficult to disentangle these two mechanisms based on widely available bulk sequencing data, where sequences from individual cells are intermixed and, thus, the cell lineage tree of the tumor cannot be resolved. Here we present a method that can simultaneously estimate the cell turnover rate and the rate of mutations from bulk sequencing data. Our method works by simulating tumor growth and finding the parameters with which the observed data can be reproduced with maximum likelihood. Applying this method to a real tumor sample, we find that both the mutation rate and the frequency of death may be high.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento , Neoplasias , Muerte Celular/genética , Frecuencia de los Genes/genética , Humanos , Mutación/genética , Neoplasias/genética , Neoplasias/patología

8.

A compartment size-dependent selective threshold limits mutation accumulation in hierarchical tissues.

Grajzel, Dániel; Derényi, Imre; Szöllosi, Gergely J.

Proc Natl Acad Sci U S A ; 117(3): 1606-1611, 2020 01 21.

Artículo en Inglés | MEDLINE | ID: mdl-31907322

RESUMEN

Cancer is a genetic disease fueled by somatic evolution. Hierarchical tissue organization can slow somatic evolution by two qualitatively different mechanisms: by cell differentiation along the hierarchy "washing out" harmful mutations and by limiting the number of cell divisions required to maintain a tissue. Here we explore the effects of compartment size on somatic evolution in hierarchical tissues by considering cell number regulation that acts on cell division rates such that the number of cells in the tissue has the tendency to return to its desired homeostatic value. Introducing mutants with a proliferative advantage, we demonstrate the existence of a third fundamental mechanism by which hierarchically organized tissues are able to slow down somatic evolution. We show that tissue size regulation leads to the emergence of a threshold proliferative advantage, below which mutants cannot persist. We find that the most significant determinant of the threshold selective advantage is compartment size, with the threshold being higher the smaller the compartment. Our results demonstrate that, in sufficiently small compartments, even mutations that confer substantial proliferative advantage cannot persist, but are expelled from the tissue by differentiation along the hierarchy. The resulting selective barrier can significantly slow down somatic evolution and reduce the risk of cancer by limiting the accumulation of mutations that increase the proliferation of cells.

Asunto(s)

Genes Relacionados con las Neoplasias/genética , Modelos Genéticos , Acumulación de Mutaciones , Neoplasias/genética , Diferenciación Celular/genética , División Celular , Evolución Clonal/genética , Simulación por Computador , Citoprotección/genética , Humanos , Mutación

9.

Resurrection of Ancestral Malate Dehydrogenases Reveals the Evolutionary History of Halobacterial Proteins: Deciphering Gene Trajectories and Changes in Biochemical Properties.

Blanquart, Samuel; Groussin, Mathieu; Le Roy, Aline; Szöllosi, Gergely J; Girard, Eric; Franzetti, Bruno; Gouy, Manolo; Madern, Dominique.

Mol Biol Evol ; 38(9): 3754-3774, 2021 08 23.

Artículo en Inglés | MEDLINE | ID: mdl-33974066

RESUMEN

Extreme halophilic Archaea thrive in high salt, where, through proteomic adaptation, they cope with the strong osmolarity and extreme ionic conditions of their environment. In spite of wide fundamental interest, however, studies providing insights into this adaptation are scarce, because of practical difficulties inherent to the purification and characterization of halophilic enzymes. In this work, we describe the evolutionary history of malate dehydrogenases (MalDH) within Halobacteria (a class of the Euryarchaeota phylum). We resurrected nine ancestors along the inferred halobacterial MalDH phylogeny, including the Last Common Ancestral MalDH of Halobacteria (LCAHa) and compared their biochemical properties with those of five modern halobacterial MalDHs. We monitored the stability of these various MalDHs, their oligomeric states and enzymatic properties, as a function of concentration for different salts in the solvent. We found that a variety of evolutionary processes, such as amino acid replacement, gene duplication, loss of MalDH gene and replacement owing to horizontal transfer resulted in significant differences in solubility, stability and catalytic properties between these enzymes in the three Halobacteriales, Haloferacales, and Natrialbales orders since the LCAHa MalDH. We also showed how a stability trade-off might favor the emergence of new properties during adaptation to diverse environmental conditions. Altogether, our results suggest a new view of halophilic protein adaptation in Archaea.

Asunto(s)

Euryarchaeota , Halobacterium , Malatos , Filogenia , Proteómica

10.

GeneRax: A Tool for Species-Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under Gene Duplication, Transfer, and Loss.

Morel, Benoit; Kozlov, Alexey M; Stamatakis, Alexandros; Szöllosi, Gergely J.

Mol Biol Evol ; 37(9): 2763-2774, 2020 09 01.

Artículo en Inglés | MEDLINE | ID: mdl-32502238

RESUMEN

Inferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges, species-tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on approximations and heuristics that limit the degree of tree space exploration. Here, we present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared with competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson-Foulds distance. On empirical data sets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1,099 Cyanobacteria families in 8 min on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax (last accessed June 17, 2020).

Asunto(s)

Duplicación de Gen , Técnicas Genéticas , Filogenia , Programas Informáticos , Cianobacterias/genética , Eliminación de Gen , Transferencia de Gen Horizontal

11.

Zombi: a phylogenetic simulator of trees, genomes and sequences that accounts for dead linages.

Davín, Adrián A; Tricou, Théo; Tannier, Eric; de Vienne, Damien M; Szöllosi, Gergely J.

Bioinformatics ; 36(4): 1286-1288, 2020 02 15.

Artículo en Inglés | MEDLINE | ID: mdl-31566657

RESUMEN

SUMMARY: Here we present Zombi, a tool to simulate the evolution of species, genomes and sequences in silico, that considers for the first time the evolution of genomes in extinct lineages. It also incorporates various features that have not to date been combined in a single simulator, such as the possibility of generating species trees with a pre-defined variation of speciation and extinction rates through time, simulating explicitly intergenic sequences of variable length and outputting gene tree-species tree reconciliations. AVAILABILITY AND IMPLEMENTATION: Source code and manual are freely available in https://github.com/AADavin/ZOMBI/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Genoma , Programas Informáticos , Simulación por Computador , ADN Intergénico , Filogenia

12.

Integrative modeling of gene and genome evolution roots the archaeal tree of life.

Williams, Tom A; Szöllosi, Gergely J; Spang, Anja; Foster, Peter G; Heaps, Sarah E; Boussau, Bastien; Ettema, Thijs J G; Embley, T Martin.

Proc Natl Acad Sci U S A ; 114(23): E4602-E4611, 2017 06 06.

Artículo en Inglés | MEDLINE | ID: mdl-28533395

RESUMEN

A root for the archaeal tree is essential for reconstructing the metabolism and ecology of early cells and for testing hypotheses that propose that the eukaryotic nuclear lineage originated from within the Archaea; however, published studies based on outgroup rooting disagree regarding the position of the archaeal root. Here we constructed a consensus unrooted archaeal topology using protein concatenation and a multigene supertree method based on 3,242 single gene trees, and then rooted this tree using a recently developed model of genome evolution. This model uses evidence from gene duplications, horizontal transfers, and gene losses contained in 31,236 archaeal gene families to identify the most likely root for the tree. Our analyses support the monophyly of DPANN (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, Nanohaloarchaea), a recently discovered cosmopolitan and genetically diverse lineage, and, in contrast to previous work, place the tree root between DPANN and all other Archaea. The sister group to DPANN comprises the Euryarchaeota and the TACK Archaea, including Lokiarchaeum, which our analyses suggest are monophyletic sister lineages. Metabolic reconstructions on the rooted tree suggest that early Archaea were anaerobes that may have had the ability to reduce CO2 to acetate via the Wood-Ljungdahl pathway. In contrast to proposals suggesting that genome reduction has been the predominant mode of archaeal evolution, our analyses infer a relatively small-genomed archaeal ancestor that subsequently increased in complexity via gene duplication and horizontal gene transfer.

Asunto(s)

Archaea/genética , Evolución Molecular , Genoma Arqueal , Modelos Genéticos , Algoritmos , Archaea/clasificación , Archaea/metabolismo , Eucariontes/clasificación , Eucariontes/genética , Duplicación de Gen , Transferencia de Gen Horizontal , Redes y Vías Metabólicas/genética , Familia de Multigenes , Filogenia , Temperatura

13.

How Long Does Wolbachia Remain on Board?

Bailly-Bechet, Marc; Martins-Simões, Patricia; Szöllosi, Gergely J; Mialdea, Gladys; Sagot, Marie-France; Charlat, Sylvain.

Mol Biol Evol ; 34(5): 1183-1193, 2017 05 01.

Artículo en Inglés | MEDLINE | ID: mdl-28201740

RESUMEN

Wolbachia bacteria infect about half of all arthropods, with diverse and extreme consequences ranging from sex-ratio distortion and mating incompatibilities to protection against viruses. These phenotypic effects, combined with efficient vertical transmission from mothers to offspring, satisfactorily explain the invasion dynamics of Wolbachia within species. However, beyond the species level, the lack of congruence between the host and symbiont phylogenetic trees indicates that Wolbachia horizontal transfers and extinctions do happen and underlie its global distribution. But how often do they occur? And has the Wolbachia pandemic reached its equilibrium? Here, we address these questions by inferring recent acquisition/loss events from the distribution of Wolbachia lineages across the mitochondrial DNA tree of 3,600 arthropod specimens, spanning 1,100 species from Tahiti and surrounding islands. We show that most events occurred within the last million years, but are likely attributable to individual level variation (e.g., imperfect maternal transmission) rather than population level variation (e.g., Wolbachia extinction). At the population level, we estimate that mitochondria typically accumulate 4.7% substitutions per site during an infected episode, and 7.1% substitutions per site during the uninfected phase. Using a Bayesian time calibration of the mitochondrial tree, these numbers translate into infected and uninfected phases of approximately 7 and 9 million years. Infected species thus lose Wolbachia slightly more often than uninfected species acquire it, supporting the view that its present incidence, estimated here slightly below 0.5, represents an epidemiological equilibrium.

Asunto(s)

Wolbachia/genética , Animales , Artrópodos/genética , ADN Mitocondrial/genética , Evolución Molecular , Variación Genética , Genética de Población , Haplotipos , Filogenia , Simbiosis/genética

14.

ecceTERA: comprehensive gene tree-species tree reconciliation using parsimony.

Jacox, Edwin; Chauve, Cedric; Szöllosi, Gergely J; Ponty, Yann; Scornavacca, Celine.

Bioinformatics ; 32(13): 2056-8, 2016 07 01.

Artículo en Inglés | MEDLINE | ID: mdl-27153713

RESUMEN

UNLABELLED: : A gene tree-species tree reconciliation explains the evolution of a gene tree within the species tree given a model of gene-family evolution. We describe ecceTERA, a program that implements a generic parsimony reconciliation algorithm, which accounts for gene duplication, loss and transfer (DTL) as well as speciation, involving sampled and unsampled lineages, within undated, fully dated or partially dated species trees. The ecceTERA reconciliation model and algorithm generalize or improve upon most published DTL parsimony algorithms for binary species trees and binary gene trees. Moreover, ecceTERA can estimate accurate species-tree aware gene trees using amalgamation. AVAILABILITY AND IMPLEMENTATION: ecceTERA is freely available under http://mbb.univ-montp2.fr/MBB/download_sources/16__ecceTERA and can be run online at http://mbb.univ-montp2.fr/MBB/subsection/softExec.php?soft=eccetera CONTACT: celine.scornavacca@umontpellier.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Biología Computacional/métodos , Evolución Molecular , Duplicación de Gen , Familia de Multigenes , Filogenia , Algoritmos , Modelos Teóricos

15.

Toward more accurate ancestral protein genotype-phenotype reconstructions with the use of species tree-aware gene trees.

Groussin, Mathieu; Hobbs, Joanne K; Szöllosi, Gergely J; Gribaldo, Simonetta; Arcus, Vickery L; Gouy, Manolo.

Mol Biol Evol ; 32(1): 13-22, 2015 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-25371435

RESUMEN

The resurrection of ancestral proteins provides direct insight into how natural selection has shaped proteins found in nature. By tracing substitutions along a gene phylogeny, ancestral proteins can be reconstructed in silico and subsequently synthesized in vitro. This elegant strategy reveals the complex mechanisms responsible for the evolution of protein functions and structures. However, to date, all protein resurrection studies have used simplistic approaches for ancestral sequence reconstruction (ASR), including the assumption that a single sequence alignment alone is sufficient to accurately reconstruct the history of the gene family. The impact of such shortcuts on conclusions about ancestral functions has not been investigated. Here, we show with simulations that utilizing information on species history using a model that accounts for the duplication, horizontal transfer, and loss (DTL) of genes statistically increases ASR accuracy. This underscores the importance of the tree topology in the inference of putative ancestors. We validate our in silico predictions using in vitro resurrection of the LeuB enzyme for the ancestor of the Firmicutes, a major and ancient bacterial phylum. With this particular protein, our experimental results demonstrate that information on the species phylogeny results in a biochemically more realistic and kinetically more stable ancestral protein. Additional resurrection experiments with different proteins are necessary to statistically quantify the impact of using species tree-aware gene trees on ancestral protein phenotypes. Nonetheless, our results suggest the need for incorporating both sequence and DTL information in future studies of protein resurrections to accurately define the genotype-phenotype space in which proteins diversify.

Asunto(s)

Biología Computacional/métodos , Proteínas/genética , Secuencia de Aminoácidos , Proteínas Bacterianas/genética , Simulación por Computador , Evolución Molecular , Genotipo , Bacterias Grampositivas/enzimología , Bacterias Grampositivas/genética , Fenotipo , Filogenia

16.

Genome-scale coestimation of species and gene trees.

Boussau, Bastien; Szöllosi, Gergely J; Duret, Laurent; Gouy, Manolo; Tannier, Eric; Daubin, Vincent.

Genome Res ; 23(2): 323-30, 2013 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-23132911

RESUMEN

Comparisons of gene trees and species trees are key to understanding major processes of genome evolution such as gene duplication and loss. Because current methods to reconstruct phylogenies fail to model the two-way dependency between gene trees and the species tree, they often misrepresent gene and species histories. We present a new probabilistic model to jointly infer rooted species and gene trees for dozens of genomes and thousands of gene families. We use simulations to show that this method accurately infers the species tree and gene trees, is robust to misspecification of the models of sequence and gene family evolution, and provides a precise historic record of gene duplications and losses throughout genome evolution. We simultaneously reconstruct the history of mammalian species and their genes based on 36 completely sequenced genomes, and use the reconstructed gene trees to infer the gene content and organization of ancestral mammalian genomes. We show that our method yields a more accurate picture of ancestral genomes than the trees available in the authoritative database Ensembl.

Asunto(s)

Genes , Genoma , Modelos Genéticos , Filogenia , Algoritmos , Animales , Biología Computacional/métodos , Simulación por Computador , Evolución Molecular , Eliminación de Gen , Duplicación de Gen , Humanos , Modelos Estadísticos

17.

Joint amalgamation of most parsimonious reconciled gene trees.

Scornavacca, Celine; Jacox, Edwin; Szöllosi, Gergely J.

Bioinformatics ; 31(6): 841-8, 2015 Mar 15.

Artículo en Inglés | MEDLINE | ID: mdl-25380957

RESUMEN

MOTIVATION: Traditionally, gene phylogenies have been reconstructed solely on the basis of molecular sequences; this, however, often does not provide enough information to distinguish between statistically equivalent relationships. To address this problem, several recent methods have incorporated information on the species phylogeny in gene tree reconstruction, leading to dramatic improvements in accuracy. Although probabilistic methods are able to estimate all model parameters but are computationally expensive, parsimony methods-generally computationally more efficient-require a prior estimate of parameters and of the statistical support. RESULTS: Here, we present the Tree Estimation using Reconciliation (TERA) algorithm, a parsimony based, species tree aware method for gene tree reconstruction based on a scoring scheme combining duplication, transfer and loss costs with an estimate of the sequence likelihood. TERA explores all reconciled gene trees that can be amalgamated from a sample of gene trees. Using a large scale simulated dataset, we demonstrate that TERA achieves the same accuracy as the corresponding probabilistic method while being faster, and outperforms other parsimony-based methods in both accuracy and speed. Running TERA on a set of 1099 homologous gene families from complete cyanobacterial genomes, we find that incorporating knowledge of the species tree results in a two thirds reduction in the number of apparent transfer events.

Asunto(s)

Algoritmos , Cianobacterias/genética , Evolución Molecular , Genoma Bacteriano , Filogenia , Simulación por Computador , Cianobacterias/clasificación , Duplicación de Gen , Familia de Multigenes

18.

The inference of gene trees with species trees.

Szöllosi, Gergely J; Tannier, Eric; Daubin, Vincent; Boussau, Bastien.

Syst Biol ; 64(1): e42-62, 2015 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-25070970

RESUMEN

This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.

Asunto(s)

Modelos Biológicos , Filogenia , Simulación por Computador , Especiación Genética , Genoma/genética , Mutación/genética

19.

Effective temperature of mutations.

Derényi, Imre; Szöllosi, Gergely J.

Phys Rev Lett ; 114(5): 058101, 2015 Feb 06.

Artículo en Inglés | MEDLINE | ID: mdl-25699467

RESUMEN

Biological macromolecules experience two seemingly very different types of noise acting on different time scales: (i) point mutations corresponding to changes in molecular sequence and (ii) thermal fluctuations. Examining the secondary structures of a large number of microRNA precursor sequences and model lattice proteins, we show that the effects of single point mutations are statistically indistinguishable from those of an increase in temperature by a few tens of kelvins. The existence of such an effective mutational temperature establishes a quantitative connection between robustness to genetic (mutational) and environmental (thermal) perturbations.

Asunto(s)

MicroARNs/genética , Modelos Genéticos , Mutación Puntual , MicroARNs/química , Conformación de Ácido Nucleico , Temperatura

20.

Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations.

Szöllosi, Gergely J; Boussau, Bastien; Abby, Sophie S; Tannier, Eric; Daubin, Vincent.

Proc Natl Acad Sci U S A ; 109(43): 17513-8, 2012 Oct 23.

Artículo en Inglés | MEDLINE | ID: mdl-23043116

RESUMEN

The timing of the evolution of microbial life has largely remained elusive due to the scarcity of prokaryotic fossil record and the confounding effects of the exchange of genes among possibly distant species. The history of gene transfer events, however, is not a series of individual oddities; it records which lineages were concurrent and thus provides information on the timing of species diversification. Here, we use a probabilistic model of genome evolution that accounts for differences between gene phylogenies and the species tree as series of duplication, transfer, and loss events to reconstruct chronologically ordered species phylogenies. Using simulations we show that we can robustly recover accurate chronologically ordered species phylogenies in the presence of gene tree reconstruction errors and realistic rates of duplication, transfer, and loss. Using genomic data we demonstrate that we can infer rooted species phylogenies using homologous gene families from complete genomes of 10 bacterial and archaeal groups. Focusing on cyanobacteria, distinguished among prokaryotes by a relative abundance of fossils, we infer the maximum likelihood chronologically ordered species phylogeny based on 36 genomes with 8,332 homologous gene families. We find the order of speciation events to be in full agreement with the fossil record and the inferred phylogeny of cyanobacteria to be consistent with the phylogeny recovered from established phylogenomics methods. Our results demonstrate that lateral gene transfers, detected by probabilistic models of genome evolution, can be used as a source of information on the timing of evolution, providing a valuable complement to the limited prokaryotic fossil record.

Asunto(s)

Transferencia de Gen Horizontal , Modelos Genéticos , Filogenia , Especificidad de la Especie , Funciones de Verosimilitud

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

Detalles de la búsqueda