Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Nat Commun ; 14(1): 7456, 2023 11 17.
Artículo en Inglés | MEDLINE | ID: mdl-37978174

RESUMEN

The timing of early cellular evolution, from the divergence of Archaea and Bacteria to the origin of eukaryotes, is poorly constrained. The ATP synthase complex is thought to have originated prior to the Last Universal Common Ancestor (LUCA) and analyses of ATP synthase genes, together with ribosomes, have played a key role in inferring and rooting the tree of life. We reconstruct the evolutionary history of ATP synthases using an expanded taxon sampling set and develop a phylogenetic cross-bracing approach, constraining equivalent speciation nodes to be contemporaneous, based on the phylogenetic imprint of endosymbioses and ancient gene duplications. This approach results in a highly resolved, dated species tree and establishes an absolute timeline for ATP synthase evolution. Our analyses show that the divergence of ATP synthase into F- and A/V-type lineages was a very early event in cellular evolution dating back to more than 4 Ga, potentially predating the diversification of Archaea and Bacteria. Our cross-braced, dated tree of life also provides insight into more recent evolutionary transitions including eukaryogenesis, showing that the eukaryotic nuclear and mitochondrial lineages diverged from their closest archaeal (2.67-2.19 Ga) and bacterial (2.58-2.12 Ga) relatives at approximately the same time, with a slightly longer nuclear stem-lineage.


Asunto(s)
Archaea , Bacterias , Filogenia , Bacterias/genética , Archaea/genética , Mitocondrias/genética , Adenosina Trifosfato , Evolución Molecular , Eucariontes/genética , Evolución Biológica
2.
Syst Biol ; 72(4): 767-780, 2023 08 07.
Artículo en Inglés | MEDLINE | ID: mdl-36946562

RESUMEN

Accurate phylogenies are fundamental to our understanding of the pattern and process of evolution. Yet, phylogenies at deep evolutionary timescales, with correspondingly long branches, have been fraught with controversy resulting from conflicting estimates from models with varying complexity and goodness of fit. Analyses of historical as well as current empirical datasets, such as alignments including Microsporidia, Nematoda, or Platyhelminthes, have demonstrated that inadequate modeling of across-site compositional heterogeneity, which is the result of biochemical constraints that lead to varying patterns of accepted amino acids along sequences, can lead to erroneous topologies that are strongly supported. Unfortunately, models that adequately account for across-site compositional heterogeneity remain computationally challenging or intractable for an increasing fraction of contemporary datasets. Here, we introduce "compositional constraint analysis," a method to investigate the effect of site-specific constraints on amino acid composition on phylogenetic inference. We show that more constrained sites with lower diversity and less constrained sites with higher diversity exhibit ostensibly conflicting signals under models ignoring across-site compositional heterogeneity that lead to long-branch attraction artifacts and demonstrate that more complex models accounting for across-site compositional heterogeneity can ameliorate this bias. We present CAT-posterior mean site frequencies (PMSF), a pipeline for diagnosing and resolving phylogenetic bias resulting from inadequate modeling of across-site compositional heterogeneity based on the CAT model. CAT-PMSF is robust against long-branch attraction in all alignments we have examined. We suggest using CAT-PMSF when convergence of the CAT model cannot be assured. We find evidence that compositionally constrained sites are driving long-branch attraction in two metazoan datasets and recover evidence for Porifera as the sister group to all other animals. [Animal phylogeny; cross-site heterogeneity; long-branch attraction; phylogenomics.].


Asunto(s)
Microsporidios , Animales , Filogenia , Sesgo , Modelos Genéticos
3.
Syst Biol ; 72(3): 723-737, 2023 Jun 17.
Artículo en Inglés | MEDLINE | ID: mdl-35713492

RESUMEN

Common molecular phylogenetic characteristics such as long branches and compositional heterogeneity can be problematic for phylogenetic reconstruction when using amino acid data. Recoding alignments to reduced alphabets before phylogenetic analysis has often been used both to explore and potentially decrease the effect of such problems. We tested the effectiveness of this strategy on topological accuracy using simulated data on four-taxon trees. We simulated alignments in phylogenetically challenging ways to test the phylogenetic accuracy of analyses using various recoding strategies together with commonly used homogeneous models. We tested three recoding methods based on amino acid exchangeability, and another recoding method based on lowering the compositional heterogeneity among alignment sequences as measured by the Chi-squared statistic. Our simulation results show that on trees with long branches where sequences approach saturation, accuracy was not greatly affected by exchangeability-based recodings, but Chi-squared-based recoding decreased accuracy. We then simulated sequences with different kinds of compositional heterogeneity over the tree. Recoding often increased accuracy on such alignments. Exchangeability-based recoding was rarely worse than not recoding, and often considerably better. Recoding based on lowering the Chi-squared value improved accuracy in some cases but not in others, suggesting that low compositional heterogeneity by itself is not sufficient to increase accuracy in the analysis of these alignments. We also simulated alignments using site-specific amino acid profiles, making sequences that had compositional heterogeneity over alignment sites. Exchangeability-based recoding coupled with site-homogeneous models had poor accuracy for these data sets but Chi-squared-based recoding on these alignments increased accuracy. We then simulated data sets that were compositionally both site- and tree-heterogeneous, like many real data sets. The effect on the accuracy of recoding such doubly problematic data sets varied widely, depending on the type of compositional tree heterogeneity and on the recoding scheme. Interestingly, analysis of unrecoded compositionally heterogeneous alignments with the NDCH or CAT models was generally more accurate than homogeneous analysis, whether recoded or not. Overall, our results suggest that making trees for recoded amino acid data sets can be useful, but they need to be interpreted cautiously as part of a more comprehensive analysis. The use of better-fitting models like NDCH and CAT, which directly account for the patterns in the data, may offer a more promising long-term solution for analyzing empirical data. [Compositional heterogeneity; models of evolution; phylogenetic methods; recoding amino acid data sets.].


Asunto(s)
Aminoácidos , Filogenia , Simulación por Computador
4.
Methods Mol Biol ; 2569: 75-94, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36083444

RESUMEN

Many organisms are able to incorporate exogenous DNA into their genomes. This process, called lateral gene transfer (LGT), has the potential to benefit the recipient organism by providing useful coding sequences, such as antibiotic resistance genes or enzymes which expand the organism's metabolic niche. For evolutionary biologists, LGTs have often been considered a nuisance because they complicate the reconstruction of the underlying species tree that many analyses aim to recover. However, LGT events between distinct organisms harbor information on the relative divergence time of the donor and recipient lineages. As a result transfers provide a novel and as yet mostly unexplored source of information to determine the order of divergence of clades, with the potential for absolute dating if linked to the fossil record.


Asunto(s)
Evolución Biológica , Transferencia de Gen Horizontal , Evolución Molecular , Genoma , Filogenia
5.
Nat Ecol Evol ; 6(11): 1634-1643, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36175544

RESUMEN

The origin of plants and their colonization of land fundamentally transformed the terrestrial environment. Here we elucidate the basis of this formative episode in Earth history through patterns of lineage, gene and genome evolution. We use new fossil calibrations, a relative clade age calibration (informed by horizontal gene transfer) and new phylogenomic methods for mapping gene family origins. Distinct rooting strategies resolve tracheophytes (vascular plants) and bryophytes (non-vascular plants) as monophyletic sister groups that diverged during the Cambrian, 515-494 million years ago. The embryophyte stem is characterized by a burst of gene innovation, while bryophytes subsequently experienced an equally dramatic episode of reductive genome evolution in which they lost genes associated with the elaboration of vasculature and the stomatal complex. Overall, our analyses reveal that extant tracheophytes and bryophytes are both highly derived from a more complex ancestral land plant. Understanding the origin of land plants requires tracing character evolution across a diversity of modern lineages.


Asunto(s)
Embryophyta , Tracheophyta , Evolución Biológica , Embryophyta/genética , Filogenia , Plantas/genética , Fósiles
6.
PLoS Comput Biol ; 18(4): e1010048, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35468135

RESUMEN

Tumors often harbor orders of magnitude more mutations than healthy tissues. The increased number of mutations may be due to an elevated mutation rate or frequent cell death and correspondingly rapid cell turnover, or a combination of the two. It is difficult to disentangle these two mechanisms based on widely available bulk sequencing data, where sequences from individual cells are intermixed and, thus, the cell lineage tree of the tumor cannot be resolved. Here we present a method that can simultaneously estimate the cell turnover rate and the rate of mutations from bulk sequencing data. Our method works by simulating tumor growth and finding the parameters with which the observed data can be reproduced with maximum likelihood. Applying this method to a real tumor sample, we find that both the mutation rate and the frequency of death may be high.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Neoplasias , Muerte Celular/genética , Frecuencia de los Genes/genética , Humanos , Mutación/genética , Neoplasias/genética , Neoplasias/patología
7.
Syst Biol ; 71(4): 797-809, 2022 06 16.
Artículo en Inglés | MEDLINE | ID: mdl-34668564

RESUMEN

Dating the tree of life is central to understanding the evolution of life on Earth. Molecular clocks calibrated with fossils represent the state of the art for inferring the ages of major groups. Yet, other information on the timing of species diversification can be used to date the tree of life. For example, horizontal gene transfer events and ancient coevolutionary interactions such as (endo)symbioses occur between contemporaneous species and thus can imply temporal relationships between two nodes in a phylogeny. Temporal constraints from these alternative sources can be particularly helpful when the geological record is sparse, for example, for microorganisms, which represent the majority of extant and extinct biodiversity. Here, we present a new method to combine fossil calibrations and relative age constraints to estimate chronograms. We provide an implementation of relative age constraints in RevBayes that can be combined in a modular manner with the wide range of molecular dating methods available in the software. We use both realistic simulations and empirical datasets of 40 Cyanobacteria and 62 Archaea to evaluate our method. We show that the combination of relative age constraints with fossil calibrations significantly improves the estimation of node ages. [Archaea, Bayesian analysis, cyanobacteria, dating, endosymbiosis, lateral gene transfer, MCMC, molecular clock, phylogenetic dating, relaxed molecular clock, revbayes, tree of life.].


Asunto(s)
Fósiles , Transferencia de Gen Horizontal , Teorema de Bayes , Evolución Molecular , Filogenia , Simbiosis
8.
Genome Biol Evol ; 13(5)2021 05 07.
Artículo en Inglés | MEDLINE | ID: mdl-33772552

RESUMEN

There is an expectation that analyses of molecular sequences might be able to distinguish between alternative hypotheses for ancient relationships, but the phylogenetic methods used and types of data analyzed are of critical importance in any attempt to recover historical signal. Here, we discuss some common issues that can influence the topology of trees obtained when using overly simple models to analyze molecular data that often display complicated patterns of sequence heterogeneity. To illustrate our discussion, we have used three examples of inferred relationships which have changed radically as models and methods of analysis have improved. In two of these examples, the sister-group relationship between thermophilic Thermus and mesophilic Deinococcus, and the position of long-branch Microsporidia among eukaryotes, we show that recovering what is now generally considered to be the correct tree is critically dependent on the fit between model and data. In the third example, the position of eukaryotes in the tree of life, the hypothesis that is currently supported by the best available methods is fundamentally different from the classical view of relationships between major cellular domains. Since heterogeneity appears to be pervasive and varied among all molecular sequence data, and even the best available models can still struggle to deal with some problems, the issues we discuss are generally relevant to phylogenetic analyses. It remains essential to maintain a critical attitude to all trees as hypotheses of relationship that may change with more data and better methods.


Asunto(s)
Evolución Biológica , Modelos Genéticos , Filogenia , Deinococcus/clasificación , Microsporidios/clasificación , Thermus/clasificación
9.
Mol Biol Evol ; 37(12): 3616-3631, 2020 12 16.
Artículo en Inglés | MEDLINE | ID: mdl-32877529

RESUMEN

Biochemical demands constrain the range of amino acids acceptable at specific sites resulting in across-site compositional heterogeneity of the amino acid replacement process. Phylogenetic models that disregard this heterogeneity are prone to systematic errors, which can lead to severe long-branch attraction artifacts. State-of-the-art models accounting for across-site compositional heterogeneity include the CAT model, which is computationally expensive, and empirical distribution mixture models estimated via maximum likelihood (C10-C60 models). Here, we present a new, scalable method EDCluster for finding empirical distribution mixture models involving a simple cluster analysis. The cluster analysis utilizes specific coordinate transformations which allow the detection of specialized amino acid distributions either from curated databases or from the alignment at hand. We apply EDCluster to the HOGENOM and HSSP databases in order to provide universal distribution mixture (UDM) models comprising up to 4,096 components. Detailed analyses of the UDM models demonstrate the removal of various long-branch attraction artifacts and improved performance compared with the C10-C60 models. Ready-to-use implementations of the UDM models are provided for three established software packages (IQ-TREE, Phylobayes, and RevBayes).


Asunto(s)
Sustitución de Aminoácidos , Técnicas Genéticas , Modelos Genéticos , Filogenia , Programas Informáticos , Análisis por Conglomerados
11.
Mol Biol Evol ; 37(5): 1530-1534, 2020 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-32011700

RESUMEN

IQ-TREE (http://www.iqtree.org, last accessed February 6, 2020) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.


Asunto(s)
Evolución Molecular , Genómica , Modelos Genéticos , Filogenia , Programas Informáticos
12.
Mol Biol Evol ; 36(6): 1294-1301, 2019 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-30825307

RESUMEN

Molecular phylogenetics has neglected polymorphisms within present and ancestral populations for a long time. Recently, multispecies coalescent based methods have increased in popularity, however, their application is limited to a small number of species and individuals. We introduced a polymorphism-aware phylogenetic model (PoMo), which overcomes this limitation and scales well with the increasing amount of sequence data whereas accounting for present and ancestral polymorphisms. PoMo circumvents handling of gene trees and directly infers species trees from allele frequency data. Here, we extend the PoMo implementation in IQ-TREE and integrate search for the statistically best-fit mutation model, the ability to infer mutation rate variation across sites, and assessment of branch support values. We exemplify an analysis of a hundred species with ten haploid individuals each, showing that PoMo can perform inference on large data sets. While PoMo is more accurate than standard substitution models applied to concatenated alignments, it is almost as fast. We also provide bmm-simulate, a software package that allows simulation of sequences evolving under PoMo. The new options consolidate the value of PoMo for phylogenetic analyses with population data.


Asunto(s)
Modelos Genéticos , Tasa de Mutación , Filogenia , Polimorfismo Genético , Animales , Humanos , Funciones de Verosimilitud , Programas Informáticos
13.
Sci Adv ; 5(1): eaau6947, 2019 01.
Artículo en Inglés | MEDLINE | ID: mdl-30854422

RESUMEN

Recent studies suggest that closely related species can accumulate substantial genetic and phenotypic differences despite ongoing gene flow, thus challenging traditional ideas regarding the genetics of speciation. Baboons (genus Papio) are Old World monkeys consisting of six readily distinguishable species. Baboon species hybridize in the wild, and prior data imply a complex history of differentiation and introgression. We produced a reference genome assembly for the olive baboon (Papio anubis) and whole-genome sequence data for all six extant species. We document multiple episodes of admixture and introgression during the radiation of Papio baboons, thus demonstrating their value as a model of complex evolutionary divergence, hybridization, and reticulation. These results help inform our understanding of similar cases, including modern humans, Neanderthals, Denisovans, and other ancient hominins.


Asunto(s)
Evolución Biológica , Genómica/métodos , Papio/genética , Animales , Secuencia de Bases , Femenino , Flujo Génico , Haplotipos/genética , Humanos , Hibridación Genética , Masculino , Filogenia , Polimorfismo Genético , Secuenciación Completa del Genoma
14.
J Theor Biol ; 439: 166-180, 2018 02 14.
Artículo en Inglés | MEDLINE | ID: mdl-29229523

RESUMEN

A central aim of population genetics is the inference of the evolutionary history of a population. To this end, the underlying process can be represented by a model of the evolution of allele frequencies parametrized by e.g., the population size, mutation rates and selection coefficients. A large class of models use forward-in-time models, such as the discrete Wright-Fisher and Moran models and the continuous forward diffusion, to obtain distributions of population allele frequencies, conditional on an ancestral initial allele frequency distribution. Backward-in-time diffusion processes have been rarely used in the context of parameter inference. Here, we demonstrate how forward and backward diffusion processes can be combined to efficiently calculate the exact joint probability distribution of sample and population allele frequencies at all times in the past, for both discrete and continuous population genetics models. This procedure is analogous to the forward-backward algorithm of hidden Markov models. While the efficiency of discrete models is limited by the population size, for continuous models it suffices to expand the transition density in orthogonal polynomials of the order of the sample size to infer marginal likelihoods of population genetic parameters. Additionally, conditional allele trajectories and marginal likelihoods of samples from single populations or from multiple populations that split in the past can be obtained. The described approaches allow for efficient maximum likelihood inference of population genetic parameters in a wide variety of demographic scenarios.


Asunto(s)
Genética de Población/métodos , Modelos Genéticos , Algoritmos , Evolución Biológica , Frecuencia de los Genes , Funciones de Verosimilitud , Cadenas de Markov , Métodos , Densidad de Población , Tiempo
15.
Theor Popul Biol ; 114: 88-94, 2017 04.
Artículo en Inglés | MEDLINE | ID: mdl-28041892

RESUMEN

Recently, Burden and Tang (2016) provided an analytical expression for the stationary distribution of the multivariate neutral Wright-Fisher model with low mutation rates. In this paper we present a simple, alternative derivation that illustrates the approximation. Our proof is based on the discrete multivariate boundary mutation model which has three key ingredients. First, the decoupled Moran model is used to describe genetic drift. Second, low mutation rates are assumed by limiting mutations to monomorphic states. Third, the mutation rate matrix is separated into a time-reversible part and a flux part, as suggested by Burden and Tang (2016). An application of our result to data from several great apes reveals that the assumption of stationarity may be inadequate or that other evolutionary forces like selection or biased gene conversion are acting. Furthermore we find that the model with a reversible mutation rate matrix provides a reasonably good fit to the data compared to the one with a non-reversible mutation rate matrix.


Asunto(s)
Evolución Biológica , Frecuencia de los Genes , Flujo Genético , Tasa de Mutación , Genética de Población , Modelos Genéticos , Mutación , Selección Genética
16.
J Theor Biol ; 407: 362-370, 2016 10 21.
Artículo en Inglés | MEDLINE | ID: mdl-27480613

RESUMEN

We present a reversible Polymorphism-Aware Phylogenetic Model (revPoMo) for species tree estimation from genome-wide data. revPoMo enables the reconstruction of large scale species trees for many within-species samples. It expands the alphabet of DNA substitution models to include polymorphic states, thereby, naturally accounting for incomplete lineage sorting. We implemented revPoMo in the maximum likelihood software IQ-TREE. A simulation study and an application to great apes data show that the runtimes of our approach and standard substitution models are comparable but that revPoMo has much better accuracy in estimating trees, divergence times and mutation rates. The advantage of revPoMo is that an increase of sample size per species improves estimations but does not increase runtime. Therefore, revPoMo is a valuable tool with several applications, from speciation dating to species tree reconstruction.


Asunto(s)
Modelos Genéticos , Filogenia , Polimorfismo Genético , Animales , Simulación por Computador , Difusión , Hominidae/genética , Especificidad de la Especie
17.
Syst Biol ; 64(6): 1018-31, 2015 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-26209413

RESUMEN

Incomplete lineage sorting can cause incongruencies of the overall species-level phylogenetic tree with the phylogenetic trees for individual genes or genomic segments. If these incongruencies are not accounted for, it is possible to incur several biases in species tree estimation. Here, we present a simple maximum likelihood approach that accounts for ancestral variation and incomplete lineage sorting. We use a POlymorphisms-aware phylogenetic MOdel (PoMo) that we have recently shown to efficiently estimate mutation rates and fixation biases from within and between-species variation data. We extend this model to perform efficient estimation of species trees. We test the performance of PoMo in several different scenarios of incomplete lineage sorting using simulations and compare it with existing methods both in accuracy and computational speed. In contrast to other approaches, our model does not use coalescent theory but is allele frequency based. We show that PoMo is well suited for genome-wide species tree estimation and that on such data it is more accurate than previous approaches.


Asunto(s)
Clasificación/métodos , Simulación por Computador , Frecuencia de los Genes , Filogenia , Animales , Hominidae/clasificación , Hominidae/genética , Mutación , Polimorfismo Genético
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...