RESUMEN
The whole-genome duplication 80 million years ago of the common ancestor of salmonids (salmonid-specific fourth vertebrate whole-genome duplication, Ss4R) provides unique opportunities to learn about the evolutionary fate of a duplicated vertebrate genome in 70 extant lineages. Here we present a high-quality genome assembly for Atlantic salmon (Salmo salar), and show that large genomic reorganizations, coinciding with bursts of transposon-mediated repeat expansions, were crucial for the post-Ss4R rediploidization process. Comparisons of duplicate gene expression patterns across a wide range of tissues with orthologous genes from a pre-Ss4R outgroup unexpectedly demonstrate far more instances of neofunctionalization than subfunctionalization. Surprisingly, we find that genes that were retained as duplicates after the teleost-specific whole-genome duplication 320 million years ago were not more likely to be retained after the Ss4R, and that the duplicate retention was not influenced to a great extent by the nature of the predicted protein interactions of the gene products. Finally, we demonstrate that the Atlantic salmon assembly can serve as a reference sequence for the study of other salmonids for a range of purposes.
Asunto(s)
Diploidia , Evolución Molecular , Duplicación de Gen/genética , Genes Duplicados/genética , Genoma/genética , Salmo salar/genética , Animales , Elementos Transponibles de ADN/genética , Femenino , Genómica , Masculino , Modelos Genéticos , Mutagénesis/genética , Filogenia , Estándares de Referencia , Salmo salar/clasificación , Homología de SecuenciaRESUMEN
With the large collections of gene and genome sequences, there is a need to generate curated comparative genomic databases that enable interpretation of results in an evolutionary context. Such resources can facilitate an understanding of the co-evolution of genes in the context of a genome mapped onto a phylogeny, of a protein structure, and of interactions within a pathway. A phylogenetically indexed gene family database, the adaptive evolution database (TAED), is presented that organizes gene families and their evolutionary histories in a species tree context. Gene families include alignments, phylogenetic trees, lineage-specific dN/dS ratios, reconciliation with the species tree to enable both the mapping and the identification of duplication events, mapping of gene families onto pathways, and mapping of amino acid substitutions onto protein structures. In addition to organization of the data, new phylogenetic visualization tools have been developed to aid in interpreting the data that are also available, including TreeThrasher and TAED Tree Viewer. A new resource of gene families organized by species and taxonomic lineage promises to be a valuable comparative genomics database for molecular biologists, evolutionary biologists, and ecologists. The new visualization tools and database framework will be of interest to both evolutionary biologists and bioinformaticians.
Asunto(s)
Cordados/genética , Bases de Datos Genéticas , Evolución Molecular , Genómica/métodos , Familia de Multigenes , Animales , Filogenia , Análisis de Secuencia de ADN/métodos , Programas InformáticosRESUMEN
Biochemical thought posits that rate-limiting steps (defined here as points of flux control) are strongly selected as points of pathway regulation and control and are thus expected to be evolutionarily conserved. Conversely, population genetic thought based upon the concepts of mutation-selection-drift balance at the pathway level might suggest variation in flux controlling steps over evolutionary time. Glycolysis, as one of the most conserved and best characterized pathways, was studied to evaluate its evolutionary conservation. The flux controlling step in glycolysis was found to vary over the tree of life. Further, phylogenetic analysis suggested at least 60 events of gene duplication and additional events of putative positive selection that might alter pathway kinetic properties. Together, these results suggest that even with presumed largely negative selection on pathway output on glycolysis, the co-evolutionary process under the hood is dynamic.
Asunto(s)
Evolución Biológica , Glucólisis/genética , Redes y Vías Metabólicas/genética , Duplicación de Gen , Glucólisis/fisiología , Cinética , Vida , Redes y Vías Metabólicas/fisiología , Modelos Químicos , Filogenia , Biología de SistemasRESUMEN
BACKGROUND: The number of species with completed genomes, including those with evidence for recent whole genome duplication events has exploded. The recently sequenced Atlantic salmon genome has been through two rounds of whole genome duplication since the divergence of teleost fish from the lineage that led to amniotes. This quadrupoling of the number of potential genes has led to complex patterns of retention and loss among gene families. RESULTS: Methods have been developed to characterize the interplay of duplicate gene retention processes across both whole genome duplication events and additional smaller scale duplication events. Further, gene expression divergence data has become available as well for Atlantic salmon and the closely related, pre-whole genome duplication pike and methods to describe expression divergence are also presented. These methods for the characterization of duplicate gene retention and gene expression divergence that have been applied to salmon are described. CONCLUSIONS: With the growth in available genomic and functional data, the opportunities to extract functional inference from large scale duplicates using comparative methods have expanded dramatically. Recently developed methods that further this inference for duplicated genes have been described.
RESUMEN
BACKGROUND: Selection on proteins is typically measured with the assumption that each protein acts independently. However, selection more likely acts at higher levels of biological organization, requiring an integrative view of protein function. Here, we built a kinetic model for de novo pyrimidine biosynthesis in the yeast Saccharomyces cerevisiae to relate pathway function to selective pressures on individual protein-encoding genes. RESULTS: Gene families across yeast were constructed for each member of the pathway and the ratio of nonsynonymous to synonymous nucleotide substitution rates (dN/dS) was estimated for each enzyme from S. cerevisiae and closely related species. We found a positive relationship between the influence that each enzyme has on pathway function and its selective constraint. CONCLUSIONS: We expect this trend to be locally present for enzymes that have pathway control, but over longer evolutionary timescales we expect that mutation-selection balance may change the enzymes that have pathway control.
Asunto(s)
Vías Biosintéticas , Evolución Molecular , Pirimidinas/biosíntesis , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/enzimología , Saccharomyces cerevisiae/genética , Mutación , Filogenia , Saccharomyces cerevisiae/metabolismoRESUMEN
BACKGROUND: Most fishes possess two paralogs for myostatin, a muscle growth inhibitor, while salmonids are presumed to have four: mstn1a, mstn1b, mstn2a and mstn2b, a pseudogene. The mechanisms responsible for preserving these duplicates as well as the depth of mstn2b nonfunctionalization within the family remain unknown. We therefore characterized several genomic clones in order to better define species and gene phylogenies. RESULTS: Gene organization and sequence conservation was particularly evident among paralog groupings and within salmonid subfamilies. All mstn2b sequences included in-frame stop codons, confirming its nonfunctionalization across taxa, although the indels and polymorphisms responsible often differed. For example, the specific indels within the Onchorhynchus tshawytscha and O. nerka genes were remarkably similar and differed equally from other mstn2b orthologs. A phylogenetic analysis weakly established a mstn2b clade including only these species, which coupled with a shared 51 base pair deletion might suggest a history involving hybridization or a shared phylogenetic history. Furthermore, mstn2 introns all lacked conserved splice site motifs, suggesting that the tissue-specific processing of mstn2a transcripts, but not those of mstn2b, is due to alternative cis regulation and is likely a common feature in salmonids. It also suggests that limited transcript processing may have contributed to mstn2b nonfunctionalization. CONCLUSIONS: Previous studies revealed divergence within gene promoters while the current studies provide evidence for relaxed or positive selection in some coding sequence lineages. These results together suggest that the salmonid myostatin gene family is a novel resource for investigating mechanisms that regulate duplicate gene fate as paralog specific differences in gene expression, transcript processing and protein structure are all suggestive of active divergence.
Asunto(s)
Proteínas de Peces/genética , Familia de Multigenes , Miostatina/genética , Salmonidae/genética , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Clonación Molecular , Proteínas de Peces/clasificación , Genes Duplicados , Variación Genética , Modelos Genéticos , Datos de Secuencia Molecular , Miostatina/clasificación , Oncorhynchus mykiss/genética , Filogenia , Isoformas de Proteínas/clasificación , Isoformas de Proteínas/genética , Salmonidae/clasificación , Análisis de Secuencia de ADN , Homología de Secuencia de Aminoácido , Especificidad de la Especie , Trucha/genéticaRESUMEN
Pyrenophora semeniperda (anamorph Drechslera campulata) is a necrotrophic fungal seed pathogen that has a wide host range within the Poaceae. One of its hosts is cheatgrass (Bromus tectorum), a species exotic to the United States that has invaded natural ecosystems of the Intermountain West. As a natural pathogen of cheatgrass, P. semeniperda has potential as a biocontrol agent due to its effectiveness at killing seeds within the seed bank; however, few genetic resources exist for the fungus. Here, the genome of P. semeniperda isolate assembled from sequence reads of 454 pyrosequencing is presented. The total assembly is 32.5 Mb and includes 11,453 gene models encoding putative proteins larger than 24 amino acids. The models represent a variety of putative genes that are involved in pathogenic pathways typically found in necrotrophic fungi. In addition, extensive rearrangements, including inter- and intrachromosomal rearrangements, were found when the P. semeniperda genome was compared to P. tritici-repentis, a related fungal species.
Asunto(s)
Ascomicetos/genética , Bromus/microbiología , Componentes Genómicos/genética , Genoma Fúngico/genética , Secuencia de Bases , ADN Complementario/genética , Idaho , Datos de Secuencia Molecular , Oligonucleótidos/genética , Análisis de Secuencia de ADNRESUMEN
Although eusociality evolved independently within several orders of insects, research into the molecular underpinnings of the transition towards social complexity has been confined primarily to Hymenoptera (for example, ants and bees). Here we sequence the genome and stage-specific transcriptomes of the dampwood termite Zootermopsis nevadensis (Blattodea) and compare them with similar data for eusocial Hymenoptera, to better identify commonalities and differences in achieving this significant transition. We show an expansion of genes related to male fertility, with upregulated gene expression in male reproductive individuals reflecting the profound differences in mating biology relative to the Hymenoptera. For several chemoreceptor families, we show divergent numbers of genes, which may correspond to the more claustral lifestyle of these termites. We also show similarities in the number and expression of genes related to caste determination mechanisms. Finally, patterns of DNA methylation and alternative splicing support a hypothesized epigenetic regulation of caste differentiation.
Asunto(s)
Fertilidad/genética , Regulación de la Expresión Génica , Proteínas de Insectos/genética , Isópteros/genética , Reproducción/genética , Conducta Social , Empalme Alternativo , Animales , Metilación de ADN , Epigénesis Genética , Perfilación de la Expresión Génica , Genoma , Proteínas de Insectos/metabolismo , MasculinoRESUMEN
Next-gen sequencing technologies have revolutionized data collection in genetic studies and advanced genome biology to novel frontiers. However, to date, next-gen technologies have been used principally for whole genome sequencing and transcriptome sequencing. Yet many questions in population genetics and systematics rely on sequencing specific genes of known function or diversity levels. Here, we describe a targeted amplicon sequencing (TAS) approach capitalizing on next-gen capacity to sequence large numbers of targeted gene regions from a large number of samples. Our TAS approach is easily scalable, simple in execution, neither time-nor labor-intensive, relatively inexpensive, and can be applied to a broad diversity of organisms and/or genes. Our TAS approach includes a bioinformatic application, BarcodeCrucher, to take raw next-gen sequence reads and perform quality control checks and convert the data into FASTA format organized by gene and sample, ready for phylogenetic analyses. We demonstrate our approach by sequencing targeted genes of known phylogenetic utility to estimate a phylogeny for the Pancrustacea. We generated data from 44 taxa using 68 different 10-bp multiplexing identifiers. The overall quality of data produced was robust and was informative for phylogeny estimation. The potential for this method to produce copious amounts of data from a single 454 plate (e.g., 325 taxa for 24 loci) significantly reduces sequencing expenses incurred from traditional Sanger sequencing. We further discuss the advantages and disadvantages of this method, while offering suggestions to enhance the approach.