RESUMO
Several indigenous cattle breeds in Sweden are endangered. Conservation of their genetic diversity and genomic characterization is a priority.Whole-genome sequences (WGS) with a mean coverage of 25X, ranging from 14 to 41X were obtained for 30 individuals of the breeds Fjällko, Fjällnära, Bohuskulla, Rödkulla, Ringamåla, and Väneko. WGS-based genotyping revealed 22,548,028 variants in total, comprising 18,876,115 single nucleotide polymorphisms (SNPs) and 3,671,913 indels. Out of these, 1,154,779 SNPs and 304,467 indels were novel. Population stratification based on roughly 19 million SNPs showed two major groups of the breeds that correspond to northern and southern breeds. Overall, a higher genetic diversity was observed in the southern breeds compared to the northern breeds. While the population stratification was consistent with previous genome-wide SNP array-based analyses, the genealogy of the individuals inferred from WGS based estimates turned out to be more complex than expected from previous SNP-array based estimates. Polymorphisms and their predicted phenotypic consequences were associated with differences in the coat color phenotypes between the northern and southern breeds. Notably, these high-consequence polymorphisms were not represented in SNP arrays, which are used routinely for genotyping of cattle breeds.This study is the first WGS-based population genetic analysis of Swedish native cattle breeds. The genetic diversity of native breeds was found to be high. High-consequence polymorphisms were linked with desirable phenotypes using whole-genome genotyping, which highlights the pressing need for intensifying WGS-based characterization of the native breeds.
Assuntos
Cruzamento , Polimorfismo de Nucleotídeo Único , Humanos , Animais , Bovinos/genética , Suécia , Sequenciamento Completo do Genoma/veterinária , GenômicaRESUMO
Mitochondria are energy-producing organelles in eukaryotic cells considered to be of bacterial origin. The mitochondrial genome has evolved under selection for minimization of gene content, yet it is not known why not all mitochondrial genes have been transferred to the nuclear genome. Here, we predict that hydrophobic membrane proteins encoded by the mitochondrial genomes would be recognized by the signal recognition particle and targeted to the endoplasmic reticulum if they were nuclear-encoded and translated in the cytoplasm. Expression of the mitochondrially encoded proteins Cytochrome oxidase subunit 1, Apocytochrome b, and ATP synthase subunit 6 in the cytoplasm of HeLa cells confirms export to the endoplasmic reticulum. To examine the extent to which the mitochondrial proteome is driven by selective constraints within the eukaryotic cell, we investigated the occurrence of mitochondrial protein domains in bacteria and eukaryotes. The accessory protein domains of the oxidative phosphorylation system are unique to mitochondria, indicating the evolution of new protein folds. Most of the identified domains in the accessory proteins of the ribosome are also found in eukaryotic proteins of other functions and locations. Overall, one-third of the protein domains identified in mitochondrial proteins are only rarely found in bacteria. We conclude that the mitochondrial genome has been maintained to ensure the correct localization of highly hydrophobic membrane proteins. Taken together, the results suggest that selective constraints on the eukaryotic cell have played a major role in modulating the evolution of the mitochondrial genome and proteome.
Assuntos
Genoma Mitocondrial/genética , Proteínas Mitocondriais/metabolismo , Proteínas de Bactérias/metabolismo , Núcleo Celular/genética , Proteínas de Cloroplastos/metabolismo , Biologia Computacional , Citocromos b/metabolismo , Citosol/metabolismo , Retículo Endoplasmático/metabolismo , Células HeLa/metabolismo , Humanos , Interações Hidrofóbicas e Hidrofílicas , Proteínas de Membrana/genética , ATPases Mitocondriais Próton-Translocadoras/metabolismo , Fosforilação Oxidativa , Filogenia , Dobramento de Proteína , Estrutura Terciária de Proteína , Partícula de Reconhecimento de Sinal/metabolismo , TermodinâmicaRESUMO
Lynn Sagan's conjecture (1967) that three of the fundamental organelles observed in eukaryote cells, specifically mitochondria, plastids and flagella were once free-living primitive (prokaryotic) cells was accepted after considerable opposition. Even though the idea was swiftly refuted for the specific case of origins of flagella in eukaryotes, the symbiosis model in general was accepted for decades as a realistic hypothesis to describe the endosymbiotic origins of eukaryotes. However, a systematic analysis of the origins of the mitochondrial proteome based on empirical genome evolution models now indicates that 97% of modern mitochondrial protein domains as well their homologues in bacteria and archaea were present in the universal common ancestor (UCA) of the modern tree of life (ToL). These protein domains are universal modular building blocks of modern genes and genomes, each of which is identified by a unique tertiary structure and a specific biochemical function as well as a characteristic sequence profile. Further, phylogeny reconstructed from genome-scale evolution models reveals that Eukaryotes and Akaryotes (archaea and bacteria) descend independently from UCA. That is to say, Eukaryotes and Akaryotes are both primordial lineages that evolved in parallel. Finally, there is no indication of massive inter-lineage exchange of coding sequences during the descent of the two lineages. Accordingly, we suggest that the evolution of the mitochondrial proteome was autogenic (endogenic) and not endosymbiotic (exogenic).
Assuntos
Evolução Biológica , Mitocôndrias/genética , Filogenia , Proteoma , Coevolução Biológica , Mitocôndrias/química , SimbioseRESUMO
The formulation and testing of hypotheses using 'big biology data' often lie at the interface of computational biology and structural biology. The Protein Data Bank (PDB), which was established about 50 years ago, catalogs three-dimensional (3D) shapes of organic macromolecules and showcases a structural view of biology. The comparative analysis of the structures of homologs, particularly of proteins, from different species has significantly improved the in-depth analyses of molecular and cell biological questions. In addition, computational tools that were developed to analyze the 'protein universe' are providing the means for efficient resolution of longstanding debates in cell and molecular evolution. In celebrating the golden jubilee of the PDB, much has been written about the transformative impact of PDB on a broad range of fields of scientific inquiry and how structural biology transformed the study of the fundamental processes of life. Yet, the transforming influence of PDB on one field of inquiry of fundamental interest-the reconstruction of the distant biological past-has gone almost unnoticed. Here, I discuss the recent advances to highlight how insights and tools of structural biology are bearing on the data required for the empirical resolution of vigorously debated and apparently contradicting hypotheses in evolutionary biology. Specifically, I show that evolutionary characters defined by protein structure are superior compared to conventional sequence characters for reliable, data-driven resolution of competing hypotheses about the origins of the major clades of life and evolutionary relationship among those clades. Since the better quality data unequivocally support two primary domains of life, it is imperative that the primary classification of life be revised accordingly.
RESUMO
The genus Limnospira includes cyanobacterial species used for industrial production of dietary supplements and nutraceutical agents. The metagenome-assembled genome of Limnospira sp. strain BM01 from Big Momela Lake, Tanzania, was 6,228,312 bp long with a GC content of 44.8% and carried 4,921 proteins and 52 RNA genes, including 6 rRNA genes.
RESUMO
Background: Locating the root node of the "tree of life" (ToL) is one of the hardest problems in phylogenetics. The root-node or the universal common ancestor (UCA) divides descendants into organismal domains. Two notable variants of the two-domains ToL (2D-ToL) have gained support recently. One 2D-ToL posits that eukaryotes (organisms with nuclei) and akaryotes (organisms without nuclei) are sister clades that diverged from the UCA and that Asgard archaea are sister to other archaea, whereas the other proposes that eukaryotes emerged within archaea and places Asgard archaea sister to eukaryotes. Williams et al. ( Nature Ecol. Evol. 4: 138-147; 2020) re-evaluated the data and methods that support the competing two-domains proposals and concluded that eukaryotes are the closest relatives of Asgard archaea. Critique: We argue that important aspects of estimating evolutionary relatedness and assessing phylogenetic signal in empirical data were overlooked. We focus on phylogenetic character reconstructions necessary to describe the UCA or its closest descendants in the absence of reliable fossils. It is well known that different character types present different perspectives on evolutionary history that relate to different phylogenetic depths. Which 2D-ToL is better supported depends on which kind of molecular features - protein-domains or their component amino acids - are better for resolving common ancestors at the roots of clades. In practice, this involves reconstructing character compositions of the ancestral nodes all the way back to the UCA. We believe the criticisms of 2D-ToL focus on superficial aspects of the data and reflects common misunderstandings of phylogenetic reconstructions using protein domains (folds). Clarifications: Models of protein domain evolution support more reliable phylogenetic reconstructions. In contrast, even the best available amino acid substitution models fail to resolve the archaeal radiation, despite employing thousands of genes. Therefore, the primary domains Eukaryotes and Akaryotes are better supported in a 2D-ToL.
Assuntos
Archaea/classificação , Evolução Biológica , Eucariotos/classificação , Filogenia , Domínios Proteicos , Alinhamento de Sequência , Análise de Sequência de ProteínaRESUMO
The survey of components in living systems at different levels of organization enables an evolutionary exploration of patterns and processes in macromolecules, networks, and genomic repertoires. Here we discuss how phylogenetic strategies that generate intrinsically rooted phylogenies impact the evolutionary study of RNA and protein components of the macromolecular machinery that is responsible for biological function. We used these methods to generate timelines of discovery of components in systems, such as substructures in RNA molecules, architectures in proteomes, domains in multi-domain proteins, enzymes in metabolic networks, and protein architectures in proteomes. These timelines unfolded remarkable patterns of origin and evolution of molecules, repertoires and networks, showing episodes of both functional specialization (e.g., rise of domains with specialized functions) and molecular simplification (e.g., reductive tendencies in molecules and proteomes). These observations have important evolutionary implications for origins of translation, the genetic code, modules in the protein world, and diversification of life, and suggest early evolution of modern biochemistry was driven by recruitment of both RNA and protein catalysts in an ancient community of complex organisms.
Assuntos
Bioquímica/tendências , Evolução Molecular , Genoma , Genômica/tendências , Modelos Moleculares , Evolução Molecular Direcionada/métodos , Modelos Estruturais , RNA/química , RNA/genéticaRESUMO
The recognition of the group Archaea as a major branch of the tree of life (ToL) prompted a new view of the evolution of biodiversity. The genomic representation of archaeal biodiversity has since significantly increased. In addition, advances in phylogenetic modeling of multi-locus datasets have resolved many recalcitrant branches of the ToL. Despite the technical advances and an expanded taxonomic representation, two important aspects of the origins and evolution of the Archaea remain controversial, even as we celebrate the 40th anniversary of the monumental discovery. These issues concern (i) the uniqueness (monophyly) of the Archaea, and (ii) the evolutionary relationships of the Archaea to the Bacteria and the Eukarya; both of these are relevant to the deep structure of the ToL. To explore the causes for this persistent ambiguity, I examine multiple datasets and different phylogenetic approaches that support contradicting conclusions. I find that the uncertainty is primarily due to a scarcity of information in standard datasets-universal core-genes datasets-to reliably resolve the conflicts. These conflicts can be resolved efficiently by comparing patterns of variation in the distribution of functional genomic signatures, which are less diffused unlike patterns of primary sequence variation. Relatively lower heterogeneity in distribution patterns minimizes uncertainties and supports statistically robust phylogenetic inferences, especially of the earliest divergences of life. This case study further highlights the limitations of primary sequence data in resolving difficult phylogenetic problems, and raises questions about evolutionary inferences drawn from the analyses of sequence alignments of a small set of core genes. In particular, the findings of this study corroborate the growing consensus that reversible substitution mutations may not be optimal phylogenetic markers for resolving early divergences in the ToL, nor for determining the polarity of evolutionary transitions across the ToL.
RESUMO
We recently analyzed the robustness of competing evolution models developed to identify the root of the Tree of Life: 1) An empirical Sankoff parsimony (ESP) model (Harish and Kurland, 2017), which is a nonstationary and directional evolution model; and 2) An a priori ancestor (APA) model (Kim and Caetano-Anollés, 2011) that is a stationary and reversible evolution model. Both Bayesian model selection tests as well as maximum parsimony analyses demonstrate that the ESP model is, overwhelmingly, the better model. Moreover, we showed that the APA model is not only sensitive to artifacts, but also that the underlying assumptions are neither empirically grounded nor biologically realistic.
Assuntos
Evolução Molecular , Genoma/genética , Modelos Genéticos , Teorema de Bayes , FilogeniaRESUMO
A reliable phylogenetic reconstruction of the evolutionary history of contemporary species depends on a robust identification of the universal common ancestor (UCA) at the root of the Tree of Life (ToL). That root polarizes the tree so that the evolutionary succession of ancestors to descendants is discernable. In effect, the root determines the branching order and the direction of character evolution. Typically, conventional phylogenetic analyses implement time-reversible models of evolution for which character evolution is un-polarized. Such practices leave the root and the direction of character evolution undefined by the data used to construct such trees. In such cases, rooting relies on theoretic assumptions and/or the use of external data to interpret unrooted trees. The most common rooting method, the outgroup method is clearly inapplicable to the ToL, which has no outgroup. Both here and in the accompanying paper (Harish and Kurland, 2017) we have explored the theoretical and technical issues related to several rooting methods. We demonstrate (1) that Genome-level characters and evolution models are necessary for species phylogeny reconstructions. By the same token, standard practices exploiting sequence-based methods that implement gene-scale substitution models do not root species trees; (2) Modeling evolution of complex genomic characters and processes that are non-reversible and non-stationary is required to reconstruct the polarized evolution of the ToL; (3) Rooting experiments and Bayesian model selection tests overwhelmingly support the earlier finding that akaryotes and eukaryotes are sister clades that descend independently from UCA (Harish and Kurland, 2013); (4) Consistent ancestral state reconstructions from independent genome samplings confirm the previous finding that UCA features three fourths of the unique protein domain-superfamilies encoded by extant genomes.
Assuntos
Evolução Molecular , Genoma , Modelos Genéticos , Filogenia , Archaea/genética , Bactérias/genética , Teorema de Bayes , Eucariotos/genéticaRESUMO
We reconstructed a global tree of life (ToL) with non-reversible and non-stationary models of genome evolution that root trees intrinsically. We implemented Bayesian model selection tests and compared the statistical support for four conflicting ToL hypotheses. We show that reconstructions obtained with a Bayesian implementation (Klopfstein et al., 2015) are consistent with reconstructions obtained with an empirical Sankoff parsimony (ESP) implementation (Harish et al., 2013). Both are based on the genome contents of coding sequences for protein domains (superfamilies) from hundreds of genomes. Thus, we conclude that the independent descent of Eukaryotes and Akaryotes (archaea and bacteria) from the universal common ancestor (UCA) is the most probable as well as the most parsimonious hypothesis for the evolutionary origins of extant genomes. Reconstructions of ancestral proteomes by both Bayesian and ESP methods suggest that at least 70% of unique domain-superfamilies known in extant species were present in the UCA. In addition, identification of a vast majority (96%) of the mitochondrial superfamilies in the UCA proteome precludes a symbiotic hypothesis for the origin of eukaryotes. Accordingly, neither the archaeal origin of eukaryotes nor the bacterial origin of mitochondria is supported by the data. The proteomic complexity of the UCA suggests that the evolution of cellular phenotypes in the two primordial lineages, Akaryotes and Eukaryotes, was driven largely by duplication of common superfamilies as well as by loss of unique superfamilies. Finally, innovation of novel superfamilies has played a surprisingly small role in the evolution of Akaryotes and only a marginal role in the evolution of Eukaryotes.
Assuntos
Evolução Molecular , Genoma , Modelos Genéticos , Filogenia , Proteoma , Archaea/genética , Bactérias/genética , Teorema de Bayes , Eucariotos/genética , MitocôndriasRESUMO
The evolutionary origins of viruses according to marker gene phylogenies, as well as their relationships to the ancestors of host cells remains unclear. In a recent article Nasir and Caetano-Anollés reported that their genome-scale phylogenetic analyses based on genomic composition of protein structural-domains identify an ancient origin of the "viral supergroup" (Nasir et al. 2015. A phylogenomic data-driven exploration of viral origins and evolution. Sci Adv. 1(8):e1500527.). It suggests that viruses and host cells evolved independently from a universal common ancestor. Examination of their data and phylogenetic methods indicates that systematic errors likely affected the results. Reanalysis of the data with additional tests shows that small-genome attraction artifacts distort their phylogenomic analyses, particularly the location of the root of the phylogenetic tree of life that is central to their conclusions. These new results indicate that their suggestion of a distinct ancestry of the viral supergroup is not well supported by the evidence.
Assuntos
Evolução Molecular , Vírus/genética , Archaea/genética , Bactérias/genética , Genoma Viral , Modelos Genéticos , Filogenia , Proteínas Virais/genética , Vírus/classificaçãoRESUMO
In this introductory retrospective, evolution as viewed through gene trees is inspected through a lens compounded from its founding operational assumptions. The four assumptions of the gene tree culture that are singularly important to evolutionary interpretations are: a. that protein-coding sequences are molecular fossils; b. that gene trees are equivalent to species trees; c. that the tree of life is assumed to be rooted in a simple akaryote cell implying that akaryotes are primitive, and d. that the notion that all or most incongruities between alignment-based gene trees are due to horizontal gene transfer (HGT), which includes the endosymbiotic models postulated for the origins of eukaryotes. What has been unusual about these particular assumptions is that though each was taken on board explicitly, they are defended in the face of factual challenge by a stolid disregard for the conflicting observations. The factual challenges to the mainstream gene tree-inspired evolutionary view are numerous and most convincingly summarized as: Genome trees tell a very different story. Phylogeny inferred from genomic assortments of homologous protein structural-domains does not support any one of the four principle evolutionary interpretations of gene trees: a. 3D protein domain structures are the molecular fossils of evolution, while coding sequences are transients; b. Species trees are very different from gene trees; c. The ToL is rooted in a surprisingly complex universal common ancestor (UCA) that is distinct from any specific modern descendant and d. HGT including endosymbiosis is a negligible player in genome evolution from UCA to the present.
Assuntos
Evolução Molecular , Transferência Genética Horizontal , Genoma , Modelos Genéticos , Filogenia , Proteoma/química , Sequência de Aminoácidos , Animais , Humanos , Mutação , Conformação Proteica , Isoformas de Proteínas/química , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Estrutura Terciária de Proteína , Proteoma/genética , Proteoma/metabolismoRESUMO
The traditional bacterial rooting of the three superkingdoms in sequence-based gene trees is inconsistent with new phylogenetic reconstructions based on genome content of compact protein domains. We find that protein domains at the level of the SCOP superfamily (SF) from sequenced genomes implement with maximum parsimony fully resolved rooted trees. Such genome content trees identify archaea and bacteria (akaryotes) as sister clades that diverge from an akaryote common ancestor, LACA. Several eukaryote sister clades diverge from a eukaryote common ancestor, LECA. LACA and LECA descend in parallel from the most recent universal common ancestor (MRUCA), which is not a bacterium. Rather, MRUCA presents 75% of the unique SFs encoded by extant genomes of the three superkingdoms, each encoding a proteome that partially overlaps all others. This alone implies that the common ancestor to the superkingdoms was very complex. Such ancestral complexity is confirmed by phylogenetic reconstructions. In addition, the divergence of proteomes from the complex ancestor in each superkingdom is both reductive in numbers of unique SFs as well as cumulative in the abundance of surviving SFs. These data suggest that the common ancestor was not the first cell lineage and that modern global phylogeny is the crown of a "recently" re-rooted tree. We suggest that a bottlenecked survivor of an environmental collapse, which preceded the flourishing of the modern crown, seeded the current phylogenetic tree.
Assuntos
Archaea/classificação , Bactérias/classificação , Eucariotos/classificação , Filogenia , Archaea/genética , Bactérias/genética , Eucariotos/genética , Evolução MolecularRESUMO
The origin and evolution of the ribosome is central to our understanding of the cellular world. Most hypotheses posit that the ribosome originated in the peptidyl transferase center of the large ribosomal subunit. However, these proposals do not link protein synthesis to RNA recognition and do not use a phylogenetic comparative framework to study ribosomal evolution. Here we infer evolution of the structural components of the ribosome. Phylogenetic methods widely used in morphometrics are applied directly to RNA structures of thousands of molecules and to a census of protein structures in hundreds of genomes. We find that components of the small subunit involved in ribosomal processivity evolved earlier than the catalytic peptidyl transferase center responsible for protein synthesis. Remarkably, subunit RNA and proteins coevolved, starting with interactions between the oldest proteins (S12 and S17) and the oldest substructure (the ribosomal ratchet) in the small subunit and ending with the rise of a modern multi-subunit ribosome. Ancestral ribonucleoprotein components show similarities to in vitro evolved RNA replicase ribozymes and protein structures in extant replication machinery. Our study therefore provides important clues about the chicken-or-egg dilemma associated with the central dogma of molecular biology by showing that ribosomal history is driven by the gradual structural accretion of protein and RNA structures. Most importantly, results suggest that functionally important and conserved regions of the ribosome were recruited and could be relics of an ancient ribonucleoprotein world.