Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Philos Trans R Soc Lond B Biol Sci ; 374(1777): 20180234, 2019 07 22.
Artigo em Inglês | MEDLINE | ID: mdl-31154974

RESUMO

In evolutionary genomics, researchers have taken an interest in identifying substitutions that subtend convergent phenotypic adaptations. This is a difficult question that requires distinguishing foreground convergent substitutions that are involved in the convergent phenotype from background convergent substitutions. Those may be linked to other adaptations, may be neutral or may be the consequence of mutational biases. Furthermore, there is no generally accepted definition of convergent substitutions. Various methods that use different definitions have been proposed in the literature, resulting in different sets of candidate foreground convergent substitutions. In this article, we first describe the processes that can generate foreground convergent substitutions in coding sequences, separating adaptive from non-adaptive processes. Second, we review methods that have been proposed to detect foreground convergent substitutions in coding sequences and expose the assumptions that underlie them. Finally, we examine their power on simulations of convergent changes-including in the presence of a change in the efficacy of selection-and on empirical alignments. This article is part of the theme issue 'Convergent evolution in the genomics era: new insights and directions'.


Assuntos
Aminoácidos/genética , Evolução Molecular , Proteínas/genética , Aminoácidos/metabolismo , Animais , Genômica , Humanos , Modelos Genéticos , Filogenia , Proteínas/metabolismo
2.
Bioinformatics ; 35(13): 2199-2207, 2019 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-30452539

RESUMO

MOTIVATION: RNA sequencing (RNA-Seq) is a widely used approach to obtain transcript sequences in non-model organisms, notably for performing comparative analyses. However, current bioinformatic pipelines do not take full advantage of pre-existing reference data in related species for improving RNA-Seq assembly, annotation and gene family reconstruction. RESULTS: We built an automated pipeline named CAARS to combine novel data from RNA-Seq experiments with existing multi-species gene family alignments. RNA-Seq reads are assembled into transcripts by both de novo and assisted assemblies. Then, CAARS incorporates transcripts into gene families, builds gene alignments and trees and uses phylogenetic information to classify the genes as orthologs and paralogs of existing genes. We used CAARS to assemble and annotate RNA-Seq data in rodents and fishes using distantly related genomes as reference, a difficult case for this kind of analysis. We showed CAARS assemblies are more complete and accurate than those assembled by a standard pipeline consisting of de novo assembly coupled with annotation by sequence similarity on a guide species. In addition to annotated transcripts, CAARS provides gene family alignments and trees, annotated with orthology relationships, directly usable for downstream comparative analyses. AVAILABILITY AND IMPLEMENTATION: CAARS is implemented in Python and Ocaml and is freely available at https://github.com/carinerey/caars. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

3.
Mol Biol Evol ; 35(9): 2296-2306, 2018 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-29986048

RESUMO

In the history of life, some phenotypes have been acquired several times independently, through convergent evolution. Recently, lots of genome-scale studies have been devoted to identify nucleotides or amino acids that changed in a convergent manner when the convergent phenotypes evolved. These efforts have had mixed results, probably because of differences in the detection methods, and because of conceptual differences about the definition of a convergent substitution. Some methods contend that substitutions are convergent only if they occur on all branches where the phenotype changed toward the exact same state at a given nucleotide or amino acid position. Others are much looser in their requirements and define a convergent substitution as one that leads the site at which they occur to prefer a phylogeny in which species with the convergent phenotype group together. Here, we suggest to look for convergent shifts in amino acid preferences instead of convergent substitutions to the exact same amino acid. We define as convergent shifts substitutions that occur on all branches where the phenotype changed and such that they correspond to a change in the type of amino acid preferred at this position. We implement the corresponding model into a method named PCOC. We show on simulations that PCOC better recovers convergent shifts than existing methods in terms of sensitivity and specificity. We test it on a plant protein alignment where convergent evolution has been studied in detail and find that our method recovers several previously identified convergent substitutions and proposes credible new candidates.


Assuntos
Substituição de Aminoácidos , Evolução Molecular , Técnicas Genéticas , Modelos Genéticos , Animais , Cyperaceae/genética , Mamíferos/genética
4.
Bioinformatics ; 34(21): 3646-3652, 2018 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-29762653

RESUMO

Motivation: A reconciliation is an annotation of the nodes of a gene tree with evolutionary events-for example, speciation, gene duplication, transfer, loss, etc.-along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Results: Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative-albeit flexible-specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities. Availability and implementation: http://phylariane.univ-lyon1.fr/recphyloxml/.


Assuntos
Evolução Molecular , Duplicação Gênica , Algoritmos , Filogenia , Software
5.
Nat Ecol Evol ; 2(5): 904-909, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29610471

RESUMO

Biodiversity has always been predominantly microbial, and the scarcity of fossils from bacteria, archaea and microbial eukaryotes has prevented a comprehensive dating of the tree of life. Here, we show that patterns of lateral gene transfer deduced from an analysis of modern genomes encode a novel and abundant source of information about the temporal coexistence of lineages throughout the history of life. We use state-of-the-art species tree-aware phylogenetic methods to reconstruct the history of thousands of gene families and demonstrate that dates implied by gene transfers are consistent with estimates from relaxed molecular clocks in Bacteria, Archaea and Eukarya. We present the order of speciations according to lateral gene transfer data calibrated to geological time for three datasets comprising 40 genomes for Cyanobacteria, 60 genomes for Archaea and 60 genomes for Fungi. An inspection of discrepancies between transfers and clocks and a comparison with mammalian fossils show that gene transfer in microbes is potentially as informative for dating the tree of life as the geological record in macroorganisms.


Assuntos
Evolução Molecular , Transferência Genética Horizontal , Genoma Arqueal , Genoma Bacteriano , Genoma Fúngico , Filogenia , Cianobactérias/genética
6.
Proc Natl Acad Sci U S A ; 114(23): E4602-E4611, 2017 06 06.
Artigo em Inglês | MEDLINE | ID: mdl-28533395

RESUMO

A root for the archaeal tree is essential for reconstructing the metabolism and ecology of early cells and for testing hypotheses that propose that the eukaryotic nuclear lineage originated from within the Archaea; however, published studies based on outgroup rooting disagree regarding the position of the archaeal root. Here we constructed a consensus unrooted archaeal topology using protein concatenation and a multigene supertree method based on 3,242 single gene trees, and then rooted this tree using a recently developed model of genome evolution. This model uses evidence from gene duplications, horizontal transfers, and gene losses contained in 31,236 archaeal gene families to identify the most likely root for the tree. Our analyses support the monophyly of DPANN (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, Nanohaloarchaea), a recently discovered cosmopolitan and genetically diverse lineage, and, in contrast to previous work, place the tree root between DPANN and all other Archaea. The sister group to DPANN comprises the Euryarchaeota and the TACK Archaea, including Lokiarchaeum, which our analyses suggest are monophyletic sister lineages. Metabolic reconstructions on the rooted tree suggest that early Archaea were anaerobes that may have had the ability to reduce CO2 to acetate via the Wood-Ljungdahl pathway. In contrast to proposals suggesting that genome reduction has been the predominant mode of archaeal evolution, our analyses infer a relatively small-genomed archaeal ancestor that subsequently increased in complexity via gene duplication and horizontal gene transfer.


Assuntos
Archaea/genética , Evolução Molecular , Genoma Arqueal , Modelos Genéticos , Algoritmos , Archaea/classificação , Archaea/metabolismo , Eucariotos/classificação , Eucariotos/genética , Duplicação Gênica , Transferência Genética Horizontal , Redes e Vias Metabólicas/genética , Família Multigênica , Filogenia , Temperatura
7.
PLoS One ; 11(8): e0159559, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27513924

RESUMO

MOTIVATIONS: Gene trees inferred solely from multiple alignments of homologous sequences often contain weakly supported and uncertain branches. Information for their full resolution may lie in the dependency between gene families and their genomic context. Integrative methods, using species tree information in addition to sequence information, often rely on a computationally intensive tree space search which forecloses an application to large genomic databases. RESULTS: We propose a new method, called ProfileNJ, that takes a gene tree with statistical supports on its branches, and corrects its weakly supported parts by using a combination of information from a species tree and a distance matrix. Its low running time enabled us to use it on the whole Ensembl Compara database, for which we propose an alternative, arguably more plausible set of gene trees. This allowed us to perform a genome-wide analysis of duplication and loss patterns on the history of 63 eukaryote species, and predict ancestral gene content and order for all ancestors along the phylogeny. AVAILABILITY: A web interface called RefineTree, including ProfileNJ as well as a other gene tree correction methods, which we also test on the Ensembl gene families, is available at: http://www-ens.iro.umontreal.ca/~adbit/polytomysolver.html. The code of ProfileNJ as well as the set of gene trees corrected by ProfileNJ from Ensembl Compara version 73 families are also made available.


Assuntos
Algoritmos , Biologia Computacional/métodos , Evolução Molecular , Genes/genética , Genoma/genética , Filogenia , Animais , Humanos , Análise de Sequência de DNA
8.
Syst Biol ; 65(4): 726-36, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27235697

RESUMO

Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.].


Assuntos
Classificação/métodos , Modelos Biológicos , Filogenia , Software , Teorema de Bayes
9.
Mol Biol Evol ; 33(2): 305-10, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26541173

RESUMO

In a recent article, Nelson-Sathi et al. (NS) report that the origins of major archaeal lineages (MAL) correspond to massive group-specific gene acquisitions via HGT from bacteria (Nelson-Sathi et al. 2015. Origins of major archaeal clades correspond to gene acquisitions from bacteria. Nature 517(7532):77-80.). If correct, this would have fundamental implications for the process of diversification in microbes. However, a reexamination of these data and results shows that the methodology used by NS systematically inflates the number of genes acquired at the root of each MAL, and incorrectly assumes bacterial origins for these genes. A reanalysis of their data with appropriate phylogenetic models accounting for the dynamics of gene gain and loss between lineages supports the continuous acquisition of genes over long periods in the evolution of Archaea.


Assuntos
Archaea/genética , Bactérias/genética , Evolução Molecular , Transferência Genética Horizontal , Genótipo , Archaea/classificação , Genes Arqueais , Genes Bacterianos , Genômica , Filogenia
10.
Science ; 350(6257): 171, 2015 Oct 09.
Artigo em Inglês | MEDLINE | ID: mdl-26450204

RESUMO

Liu and Edwards argue against the use of weighted statistical binning within a species tree estimation pipeline. However, we show that their mathematical argument does not apply to weighted statistical binning. Furthermore, their simulation study does not follow the recommended statistical binning protocol and has data of unknown origin that bias the results against weighted statistical binning.


Assuntos
Aves/classificação , Aves/genética , Genoma , Filogenia , Animais
11.
Philos Trans R Soc Lond B Biol Sci ; 370(1678): 20140335, 2015 09 26.
Artigo em Inglês | MEDLINE | ID: mdl-26323765

RESUMO

Although the role of lateral gene transfer is well recognized in the evolution of bacteria, it is generally assumed that it has had less influence among eukaryotes. To explore this hypothesis, we compare the dynamics of genome evolution in two groups of organisms: cyanobacteria and fungi. Ancestral genomes are inferred in both clades using two types of methods: first, Count, a gene tree unaware method that models gene duplications, gains and losses to explain the observed numbers of genes present in a genome; second, ALE, a more recent gene tree-aware method that reconciles gene trees with a species tree using a model of gene duplication, loss and transfer. We compare their merits and their ability to quantify the role of transfers, and assess the impact of taxonomic sampling on their inferences. We present what we believe is compelling evidence that gene transfer plays a significant role in the evolution of fungi.


Assuntos
Fungos/genética , Transferência Genética Horizontal , Genoma Fúngico , Filogenia , Simulação por Computador , Cianobactérias/genética , Genoma Bacteriano , Modelos Genéticos
12.
PLoS One ; 10(6): e0129183, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26086579

RESUMO

Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: http://dx.doi.org/10.6084/m9.figshare.1411146, and the software is available at https://github.com/smirarab/binning.


Assuntos
Genômica/métodos , Modelos Teóricos , Filogenia , Estatística como Assunto
13.
Syst Biol ; 64(1): e42-62, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25070970

RESUMO

This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.


Assuntos
Modelos Biológicos , Filogenia , Simulação por Computador , Especiação Genética , Genoma/genética , Mutação/genética
14.
Syst Biol ; 64(2): 325-39, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25540456

RESUMO

With the availability of genomic sequence data, there is increasing interest in using genes with a possible history of duplication and loss for species tree inference. Here we assess the performance of both nonprobabilistic and probabilistic species tree inference approaches using gene duplication and loss and coalescence simulations. We evaluated the performance of gene tree parsimony (GTP) based on duplication (Only-dup), duplication and loss (Dup-loss), and deep coalescence (Deep-c) costs, the NJst distance method, the MulRF supertree method, and PHYLDOG, which jointly estimates gene trees and species tree using a hierarchical probabilistic model. We examined the effects of gene tree and species sampling, gene tree error, and duplication and loss rates on the accuracy of phylogenetic estimates. In the 10-taxon duplication and loss simulation experiments, MulRF is more accurate than the other methods when the duplication and loss rates are low, and Dup-loss is generally the most accurate when the duplication and loss rates are high. PHYLDOG performs well in 10-taxon duplication and loss simulations, but its run time is prohibitively long on larger data sets. In the larger duplication and loss simulation experiments, MulRF outperforms all other methods in experiments with at most 100 taxa; however, in the larger simulation, Dup-loss generally performs best. In all duplication and loss simulation experiments with more than 10 taxa, all methods perform better with more gene trees and fewer missing sequences, and they are all affected by gene tree error. Our results also highlight high levels of error in estimates of duplications and losses from GTP methods and demonstrate the usefulness of methods based on generic tree distances for large analyses.


Assuntos
Classificação/métodos , Filogenia , Análise de Sequência de DNA/métodos , Simulação por Computador , Deleção de Genes , Duplicação Gênica , Análise de Sequência de DNA/normas , Software/normas
15.
Genome Biol ; 15(12): 549, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25496599

RESUMO

BACKGROUND: While effective population size (Ne) and life history traits such as generation time are known to impact substitution rates, their potential effects on base composition evolution are less well understood. GC content increases with decreasing body mass in mammals, consistent with recombination-associated GC biased gene conversion (gBGC) more strongly impacting these lineages. However, shifts in chromosomal architecture and recombination landscapes between species may complicate the interpretation of these results. In birds, interchromosomal rearrangements are rare and the recombination landscape is conserved, suggesting that this group is well suited to assess the impact of life history on base composition. RESULTS: Employing data from 45 newly and 3 previously sequenced avian genomes covering a broad range of taxa, we found that lineages with large populations and short generations exhibit higher GC content. The effect extends to both coding and non-coding sites, indicating that it is not due to selection on codon usage. Consistent with recombination driving base composition, GC content and heterogeneity were positively correlated with the rate of recombination. Moreover, we observed ongoing increases in GC in the majority of lineages. CONCLUSIONS: Our results provide evidence that gBGC may drive patterns of nucleotide composition in avian genomes and are consistent with more effective gBGC in large populations and a greater number of meioses per unit time; that is, a shorter generation time. Thus, in accord with theoretical predictions, base composition evolution is substantially modulated by species life history.


Assuntos
Composição de Bases , Aves/classificação , Aves/genética , Animais , Evolução Molecular , Conversão Gênica , Heterogeneidade Genética , Genoma , Filogenia , Densidade Demográfica , Seleção Genética , Análise de Sequência de DNA , Especificidade da Espécie
16.
Science ; 346(6215): 1320-31, 2014 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-25504713

RESUMO

To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods created to handle genome-scale data. We recovered a highly resolved tree that confirms previously controversial sister or close relationships. We identified the first divergence in Neoaves, two groups we named Passerea and Columbea, representing independent lineages of diverse and convergently evolved land and water bird species. Among Passerea, we infer the common ancestor of core landbirds to have been an apex predator and confirm independent gains of vocal learning. Among Columbea, we identify pigeons and flamingoes as belonging to sister clades. Even with whole genomes, some of the earliest branches in Neoaves proved challenging to resolve, which was best explained by massive protein-coding sequence convergence and high levels of incomplete lineage sorting that occurred during a rapid radiation after the Cretaceous-Paleogene mass extinction event about 66 million years ago.


Assuntos
Aves/genética , Genoma , Filogenia , Animais , Proteínas Aviárias/genética , Sequência de Bases , Evolução Biológica , Aves/classificação , Elementos de DNA Transponíveis , Genes , Especiação Genética , Mutação INDEL , Íntrons , Análise de Sequência de DNA
17.
Science ; 346(6215): 1250463, 2014 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-25504728

RESUMO

Gene tree incongruence arising from incomplete lineage sorting (ILS) can reduce the accuracy of concatenation-based estimations of species trees. Although coalescent-based species tree estimation methods can have good accuracy in the presence of ILS, they are sensitive to gene tree estimation error. We propose a pipeline that uses bootstrapping to evaluate whether two genes are likely to have the same tree, then it groups genes into sets using a graph-theoretic optimization and estimates a tree on each subset using concatenation, and finally produces an estimated species tree from these trees using the preferred coalescent-based method. Statistical binning improves the accuracy of MP-EST, a popular coalescent-based method, and we use it to produce the first genome-scale coalescent-based avian tree of life.


Assuntos
Aves/classificação , Aves/genética , Genoma , Filogenia , Animais , Biologia Computacional , Funções Verossimilhança , Mamíferos/classificação , Mamíferos/genética , Alinhamento de Sequência , Vertebrados/classificação , Vertebrados/genética , Leveduras/genética
18.
PLoS One ; 9(10): e107709, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25272037

RESUMO

Insect phylogeny has recently been the focus of renewed interest as advances in sequencing techniques make it possible to rapidly generate large amounts of genomic or transcriptomic data for a species of interest. However, large numbers of markers are not sufficient to guarantee accurate phylogenetic reconstruction, and the choice of the model of sequence evolution as well as adequate taxonomic sampling are as important for phylogenomic studies as they are for single-gene phylogenies. Recently, the sequence of the genome of a strepsipteran has been published and used to place Strepsiptera as sister group to Coleoptera. However, this conclusion relied on a data set that did not include representatives of Neuropterida or of coleopteran lineages formerly proposed to be related to Strepsiptera. Furthermore, it did not use models that are robust against the long branch attraction artifact. Here we have sequenced the transcriptomes of seven key species to complete a data set comprising 36 species to study the higher level phylogeny of insects, with a particular focus on Neuropteroidea (Coleoptera, Strepsiptera, Neuropterida), especially on coleopteran taxa considered as potential close relatives of Strepsiptera. Using models robust against the long branch attraction artifact we find a highly resolved phylogeny that confirms the position of Strepsiptera as a sister group to Coleoptera, rather than as an internal clade of Coleoptera, and sheds new light onto the phylogeny of Neuropteroidea.


Assuntos
Genômica , Insetos/classificação , Insetos/genética , Filogenia , Animais , Feminino , Masculino , Modelos Genéticos , RNA Ribossômico
19.
Syst Biol ; 63(5): 753-71, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-24951559

RESUMO

Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis-Hastings or Gibbs sampling of the posterior distribution.


Assuntos
Classificação/métodos , Modelos Estatísticos , Filogenia , Algoritmos , Simulação por Computador
20.
Biol Lett ; 9(5): 20130608, 2013 Oct 23.
Artigo em Inglês | MEDLINE | ID: mdl-24046876

RESUMO

Several lines of evidence such as the basal location of thermophilic lineages in large-scale phylogenetic trees and the ancestral sequence reconstruction of single enzymes or large protein concatenations support the conclusion that the ancestors of the bacterial and archaeal domains were thermophilic organisms which were adapted to hot environments during the early stages of the Earth. A parsimonious reasoning would therefore suggest that the last universal common ancestor (LUCA) was also thermophilic. Various authors have used branch-wise non-homogeneous evolutionary models that better capture the variation of molecular compositions among lineages to accurately reconstruct the ancestral G + C contents of ribosomal RNAs and the ancestral amino acid composition of highly conserved proteins. They confirmed the thermophilic nature of the ancestors of Bacteria and Archaea but concluded that LUCA, their last common ancestor, was a mesophilic organism having a moderate optimal growth temperature. In this letter, we investigate the unknown nature of the phylogenetic signal that informs ancestral sequence reconstruction to support this non-parsimonious scenario. We find that rate variation across sites of molecular sequences provides information at different time scales by recording the oldest adaptation to temperature in slow-evolving regions and subsequent adaptations in fast-evolving ones.


Assuntos
Adaptação Fisiológica , Temperatura Baixa , Planeta Terra , Vida , Modelos Teóricos , Filogenia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA