ABSTRACT
Endogenous retroviruses (ERVs) are inherited genomic remains of past germline retroviral infections. Research on human ERVs has focused on medical implications of their dysregulation on various diseases. However, recent studies incorporating wildlife are yielding remarkable perspectives on long-term retrovirus-host interactions. These initial forays into broader taxonomic analysis, including sequencing of multiple individuals per species, show the incredible plasticity and variation of ERVs within and among wildlife species. This demonstrates that stochastic processes govern much of the vertebrate genome. In this review, we elaborate on discoveries pertaining to wildlife ERV origins and evolution, genome colonization, and consequences for host biology.
Subject(s)
Endogenous Retroviruses , Animals , Humans , Animals, Wild/genetics , Vertebrates/genetics , Genomics , Evolution, Molecular , PhylogenyABSTRACT
Endogenous retroviruses (ERVs) make up a large fraction of mammalian genomes and are thought to contribute to human disease, including brain disorders. In the brain, aberrant activation of ERVs is a potential trigger for an inflammatory response, but mechanistic insight into this phenomenon remains lacking. Using CRISPR/Cas9-based gene disruption of the epigenetic co-repressor protein Trim28, we found a dynamic H3K9me3-dependent regulation of ERVs in proliferating neural progenitor cells (NPCs), but not in adult neurons. In vivo deletion of Trim28 in cortical NPCs during mouse brain development resulted in viable offspring expressing high levels of ERVs in excitatory neurons in the adult brain. Neuronal ERV expression was linked to activated microglia and the presence of ERV-derived proteins in aggregate-like structures. This study demonstrates that brain development is a critical period for the silencing of ERVs and provides causal in vivo evidence demonstrating that transcriptional activation of ERV in neurons results in an inflammatory response.
Subject(s)
Brain/growth & development , Encephalitis/genetics , Endogenous Retroviruses/genetics , Gene Deletion , Tripartite Motif-Containing Protein 28/genetics , Animals , Brain/immunology , Brain/virology , CRISPR-Cas Systems , Cells, Cultured , Encephalitis/immunology , Encephalitis/virology , Endogenous Retroviruses/immunology , Epigenesis, Genetic , Gene Expression Regulation , Histones/metabolism , Mice , Transcriptional ActivationABSTRACT
Retroviruses have left their legacy in host genomes over millions of years as endogenous retroviruses (ERVs), and their structure, diversity, and prevalence provide insights into the historical dynamics of retrovirus-host interactions. In bioinformatic analyses of koala (Phascolarctos cinereus) whole-genome sequences, we identify a recently expanded ERV lineage (phaCin-ß) that is related to the New World squirrel monkey retrovirus. This ERV expansion shares many parallels with the ongoing koala retrovirus (KoRV) invasion of the koala genome, including highly similar and mostly intact sequences, and polymorphic ERV loci in the sampled koala population. The recent phaCin-ß ERV colonization of the koala genome appears to predate the current KoRV invasion, but polymorphic ERVs and divergence comparisons between these two lineages predict a currently uncharacterized, possibly still extant, phaCin-ß retrovirus. The genomics approach to ERV-guided discovery of novel retroviruses in host species provides a strong incentive to search for phaCin-ß retroviruses in the Australasian fauna.
Subject(s)
Betaretrovirus , Endogenous Retroviruses , Host Microbial Interactions , Phascolarctidae , Retroviridae Infections , Animals , Betaretrovirus/genetics , Endogenous Retroviruses/genetics , Evolution, Molecular , Genome , Genomics , Phascolarctidae/genetics , Phascolarctidae/virology , Retroviridae Infections/veterinary , Retroviridae Infections/virologyABSTRACT
Atlantic Halibut (Hippoglossus hippoglossus) has a X/Y genetic sex determination system, but the sex determining factor is not known. We produced a high-quality genome assembly from a male and identified parts of chromosome 13 as the Y chromosome due to sequence divergence between sexes and segregation of sex genotypes in pedigrees. Linkage analysis revealed that all chromosomes exhibit heterochiasmy, i.e. male-only and female-only meiotic recombination regions (MRR/FRR). We show that FRR/MRR intervals differ in nucleotide diversity and repeat class content and that this is true also for other Pleuronectidae species. We further show that remnants of a Gypsy-like transposable element insertion on chr13 promotes early male specific expression of gonadal somatic cell derived factor (gsdf). Less than 4.5 MYA, this male-determining element evolved on an autosomal FRR segment featuring pre-existing male meiotic recombination barriers, thereby creating a Y chromosome. Our findings indicate that heterochiasmy may facilitate the evolution of genetic sex determination systems relying on linkage of sexually antagonistic loci to a sex-determining factor.
Subject(s)
Fish Proteins/genetics , Flounder/genetics , Recombination, Genetic , Sex Determination Processes , Animals , DNA Transposable Elements , Embryo, Nonmammalian , Female , Flounder/embryology , Gene Expression , Genome , Male , Meiosis , Promoter Regions, Genetic , Repetitive Sequences, Nucleic Acid , Sex Chromosomes , Y ChromosomeABSTRACT
Although recent advances in sequencing and computational analyses have facilitated use of endogenous retroviruses (ERVs) for deciphering coevolution among retroviruses and their hosts, sampling effects from different host populations present major challenges. Here we utilize available whole-genome data from wild and domesticated European rabbit (Oryctolagus cuniculus sp.) populations, sequenced as DNA pools by paired-end Illumina technology, for identifying segregating reference as well as nonreference ERV loci, to reveal their variation along the host phylogeny and domestication history. To produce new viruses, retroviruses must insert a proviral DNA copy into the host nuclear DNA. Occasional proviral insertions into the host germline have been passed down through generations as inherited ERVs during millions of years. These ERVs represent retroviruses that were active at the time of infection and thus present a remarkable record of historical virus-host associations. To examine segregating ERVs in host populations, we apply a reference library search strategy for anchoring ERV-associated short-sequence read pairs from pooled whole-genome sequences to reference genome assembly positions. We show that most ERVs segregate along host phylogeny but also uncover radiation of some ERVs, identified as segregating loci among wild and domestic rabbits. The study targets pertinent issues regarding genome sampling when examining virus-host evolution from the genomic ERV record and offers improved scope regarding common strategies for single-nucleotide variant analyses in host population comparative genomics.
Subject(s)
Animals, Domestic/virology , Endogenous Retroviruses/genetics , Genome, Viral/genetics , Host Specificity/genetics , Animals , Comparative Genomic Hybridization/methods , DNA/genetics , Genome-Wide Association Study/methods , Genomics/methods , Phylogeny , Polymorphism, Single Nucleotide/genetics , RabbitsABSTRACT
Although extensive research has demonstrated host-retrovirus microevolutionary dynamics, it has been difficult to gain a deeper understanding of the macroevolutionary patterns of host-retrovirus interactions. Here we use recent technological advances to infer broad patterns in retroviral diversity, evolution, and host-virus relationships by using a large-scale phylogenomic approach using endogenous retroviruses (ERVs). Retroviruses insert a proviral DNA copy into the host cell genome to produce new viruses. ERVs are provirus insertions in germline cells that are inherited down the host lineage and consequently present a record of past host-viral associations. By mining ERVs from 65 host genomes sampled across vertebrate diversity, we uncover a great diversity of ERVs, indicating that retroviral sequences are much more prevalent and widespread across vertebrates than previously appreciated. The majority of ERV clades that we recover do not contain known retroviruses, implying either that retroviral lineages are highly transient over evolutionary time or that a considerable number of retroviruses remain to be identified. By characterizing the distribution of ERVs, we show that no major vertebrate lineage has escaped retroviral activity and that retroviruses are extreme host generalists, having an unprecedented ability for rampant host switching among distantly related vertebrates. In addition, we examine whether the distribution of ERVs can be explained by host factors predicted to influence viral transmission and find that internal fertilization has a pronounced effect on retroviral colonization of host genomes. By capturing the mode and pattern of retroviral evolution and contrasting ERV diversity with known retroviral diversity, our study provides a cohesive framework to understand host-virus coevolution better.
Subject(s)
Endogenous Retroviruses/genetics , Evolution, Molecular , Retroviridae/genetics , Vertebrates/genetics , Vertebrates/virology , Animals , Ecosystem , Endogenous Retroviruses/pathogenicity , Endogenous Retroviruses/physiology , Genetic Variation , Genome, Viral , Genomics , Host Specificity/genetics , Host-Pathogen Interactions/genetics , Humans , Phylogeny , Retroviridae/pathogenicity , Retroviridae/physiologyABSTRACT
UNLABELLED: High-throughput genotyping and sequencing technologies facilitate studies of complex genetic traits and provide new research opportunities. The increasing popularity of genome-wide association studies (GWAS) leads to the discovery of new associated loci and a better understanding of the genetic architecture underlying not only diseases, but also other monogenic and complex phenotypes. Several softwares are available for performing GWAS analyses, R environment being one of them. RESULTS: We present cgmisc, an R package that enables enhanced data analysis and visualization of results from GWAS. The package contains several utilities and modules that complement and enhance the functionality of the existing software. It also provides several tools for advanced visualization of genomic data and utilizes the power of the R language to aid in preparation of publication-quality figures. Some of the package functions are specific for the domestic dog (Canis familiaris) data. AVAILABILITY AND IMPLEMENTATION: The package is operating system-independent and is available from: https://github.com/cgmisc-team/cgmisc CONTACT: marcin.kierczak@imbim.uu.se. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Computer Graphics , Genome-Wide Association Study/methods , Genomics/methods , Software , Animals , Dogs , Genotype , Humans , Loss of Heterozygosity , PhenotypeABSTRACT
Retroviruses are the only group of viruses known to have left a fossil record, in the form of endogenous proviruses, and approximately 8% of the human genome is made up of these elements. Although many other viruses, including non-retroviral RNA viruses, are known to generate DNA forms of their own genomes during replication, none has been found as DNA in the germline of animals. Bornaviruses, a genus of non-segmented, negative-sense RNA virus, are unique among RNA viruses in that they establish persistent infection in the cell nucleus. Here we show that elements homologous to the nucleoprotein (N) gene of bornavirus exist in the genomes of several mammalian species, including humans, non-human primates, rodents and elephants. These sequences have been designated endogenous Borna-like N (EBLN) elements. Some of the primate EBLNs contain an intact open reading frame (ORF) and are expressed as mRNA. Phylogenetic analyses showed that EBLNs seem to have been generated by different insertional events in each specific animal family. Furthermore, the EBLN of a ground squirrel was formed by a recent integration event, whereas those in primates must have been formed more than 40 million years ago. We also show that the N mRNA of a current mammalian bornavirus, Borna disease virus (BDV), can form EBLN-like elements in the genomes of persistently infected cultured cells. Our results provide the first evidence for endogenization of non-retroviral virus-derived elements in mammalian genomes and give novel insights not only into generation of endogenous elements, but also into a role of bornavirus as a source of genetic novelty in its host.
Subject(s)
Bornaviridae/genetics , Genes, Viral/genetics , Genome/genetics , Mammals/genetics , Mammals/virology , Virus Integration/genetics , Amino Acid Sequence , Animals , Borna disease virus/genetics , Borna disease virus/physiology , Bornaviridae/physiology , Cell Line , Conserved Sequence/genetics , Evolution, Molecular , Host-Pathogen Interactions/genetics , Humans , Models, Genetic , Molecular Sequence Data , Open Reading Frames/genetics , Phylogeny , Reverse Transcription , Time FactorsABSTRACT
Genomic data provide an excellent resource to improve understanding of retrovirus evolution and the complex relationships among viruses and their hosts. In conjunction with broad-scale in silico screening of vertebrate genomes, this resource offers an opportunity to complement data on the evolution and frequency of past retroviral spread and so evaluate future risks and limitations for horizontal transmission between different host species. Here, we develop a methodology for extracting phylogenetic signal from large endogenous retrovirus (ERV) datasets by collapsing information to facilitate broad-scale phylogenomics across a wide sample of hosts. Starting with nearly 90,000 ERVs from 60 vertebrate host genomes, we construct phylogenetic hypotheses and draw inferences regarding the designation, host distribution, origin, and transmission of the Gammaretrovirus genus and associated class I ERVs. Our results uncover remarkable depths in retroviral sequence diversity, supported within a phylogenetic context. This finding suggests that current infectious exogenous retrovirus diversity may be underestimated, adding credence to the possibility that many additional exogenous retroviruses may remain to be discovered in vertebrate taxa. We demonstrate a history of frequent horizontal interorder transmissions from a rodent reservoir and suggest that rats may have acted as important overlooked facilitators of gammaretrovirus spread across diverse mammalian hosts. Together, these results demonstrate the promise of the methodology used here to analyze large ERV datasets and improve understanding of retroviral evolution and diversity for utilization in wider applications.
Subject(s)
Evolution, Molecular , Genetic Variation , Host-Pathogen Interactions/genetics , Phylogeny , Retroviridae/genetics , Vertebrates/genetics , Animals , Base Sequence , Disease Transmission, Infectious , Mice , Molecular Sequence Data , Rats , Sequence Alignment , Sequence Analysis, DNA , Species SpecificityABSTRACT
Domestication of wild boar (Sus scrofa) and subsequent selection have resulted in dramatic phenotypic changes in domestic pigs for a number of traits, including behavior, body composition, reproduction, and coat color. Here we have used whole-genome resequencing to reveal some of the loci that underlie phenotypic evolution in European domestic pigs. Selective sweep analyses revealed strong signatures of selection at three loci harboring quantitative trait loci that explain a considerable part of one of the most characteristic morphological changes in the domestic pig--the elongation of the back and an increased number of vertebrae. The three loci were associated with the NR6A1, PLAG1, and LCORL genes. The latter two have repeatedly been associated with loci controlling stature in other domestic animals and in humans. Most European domestic pigs are homozygous for the same haplotype at these three loci. We found an excess of derived nonsynonymous substitutions in domestic pigs, most likely reflecting both positive selection and relaxed purifying selection after domestication. Our analysis of structural variation revealed four duplications at the KIT locus that were exclusively present in white or white-spotted pigs, carrying the Dominant white, Patch, or Belt alleles. This discovery illustrates how structural changes have contributed to rapid phenotypic evolution in domestic animals and how alleles in domestic animals may evolve by the accumulation of multiple causative mutations as a response to strong directional selection.
Subject(s)
Animals, Domestic/genetics , Genome , Selection, Genetic , Swine/genetics , Amino Acid Sequence , Animals , DNA Copy Number Variations , Homozygote , Molecular Sequence Data , Quantitative Trait Loci , Sequence Homology, Amino AcidABSTRACT
Koalas (Phascolarctos cinereus) have experienced a history of retroviral epidemics leaving their trace as heritable endogenous retroviruses (ERVs) in their genomes. A recently identified ERV lineage, named phaCin-ß, shows a pattern of recent, possibly current, activity with high insertional polymorphism in the population. Here, we investigate geographic patterns of three focal ERV lineages of increasing estimated ages, from the koala retrovirus (KoRV) to phaCin-ß and to phaCin-ß-like, using the whole-genome sequencing of 430 koalas from the Koala Genome Survey. Thousands of ERV loci were found across the population, with contrasting patterns of polymorphism. Northern individuals had thousands of KoRV integrations and hundreds of phaCin-ß ERVs. In contrast, southern individuals had higher phaCin-ß frequencies, possibly reflecting more recent activity and a founder effect. Overall, our findings suggest high ERV burden in koalas, reflecting historic retrovirus-host interactions. Importantly, the ERV catalogue supplies improved markers for conservation genetics in this endangered species.
Subject(s)
Endogenous Retroviruses , Gammaretrovirus , Phascolarctidae , Retroviridae Infections , Humans , Animals , Endogenous Retroviruses/genetics , Phascolarctidae/genetics , Retroviridae Infections/genetics , Gammaretrovirus/genetics , Whole Genome SequencingABSTRACT
BACKGROUND: Phenomena such as incomplete lineage sorting, horizontal gene transfer, gene duplication and subsequent sub- and neo-functionalisation can result in distinct local phylogenetic relationships that are discordant with species phylogeny. In order to assess the possible biological roles for these subdivisions, they must first be identified and characterised, preferably on a large scale and in an automated fashion. RESULTS: We developed Saguaro, a combination of a Hidden Markov Model (HMM) and a Self Organising Map (SOM), to characterise local phylogenetic relationships among aligned sequences using cacti, matrices of pair-wise distance measures. While the HMM determines the genomic boundaries from aligned sequences, the SOM hypothesises new cacti in an unsupervised and iterative fashion based on the regions that were modelled least well by existing cacti. After testing the software on simulated data, we demonstrate the utility of Saguaro by testing two different data sets: (i) 181 Dengue virus strains, and (ii) 5 primate genomes. Saguaro identifies regions under lineage-specific constraint for the first set, and genomic segments that we attribute to incomplete lineage sorting in the second dataset. Intriguingly for the primate data, Saguaro also classified an additional ~3% of the genome as most incompatible with the expected species phylogeny. A substantial fraction of these regions was found to overlap genes associated with both the innate and adaptive immune systems. CONCLUSIONS: Saguaro detects distinct cacti describing local phylogenetic relationships without requiring any a priori hypotheses. We have successfully demonstrated Saguaro's utility with two contrasting data sets, one containing many members with short sequences (Dengue viral strains: n = 181, genome size = 10,700 nt), and the other with few members but complex genomes (related primate species: n = 5, genome size = 3 Gb), suggesting that the software is applicable to a wide variety of experimental populations. Saguaro is written in C++, runs on the Linux operating system, and can be downloaded from http://saguarogw.sourceforge.net/.
Subject(s)
Genomics/methods , Algorithms , Animals , Dengue Virus/genetics , Disease Outbreaks , Humans , Immunity/genetics , Markov Chains , Models, Genetic , Phylogeny , Primates/genetics , Primates/immunology , Software , Species SpecificityABSTRACT
Traumatic brain injury (TBI) is a leading cause of chronic brain impairment and results in a robust, but poorly understood, neuroinflammatory response that contributes to the long-term pathology. We used single-nuclei RNA sequencing (snRNA-seq) to study transcriptomic changes in different cell populations in human brain tissue obtained acutely after severe, life-threatening TBI. This revealed a unique transcriptional response in oligodendrocyte precursors and mature oligodendrocytes, including the activation of a robust innate immune response, indicating an important role for oligodendroglia in the initiation of neuroinflammation. The activation of an innate immune response correlated with transcriptional upregulation of endogenous retroviruses in oligodendroglia. This observation was causally linked in vitro using human glial progenitors, implicating these ancient viral sequences in human neuroinflammation. In summary, this work provides insight into the initiating events of the neuroinflammatory response in TBI, which has therapeutic implications.
Subject(s)
Brain Injuries, Traumatic , Brain Injuries , Endogenous Retroviruses , Humans , Animals , Mice , Endogenous Retroviruses/genetics , Neuroinflammatory Diseases , Transcriptome/genetics , Brain Injuries, Traumatic/pathology , Brain Injuries/pathology , Oligodendroglia/pathology , Inflammation/genetics , Inflammation/pathology , Mice, Inbred C57BLABSTRACT
Endogenous retroviruses (ERVs) are inherited remnants of retroviruses that colonized host germline over millions of years, providing a sampling of retroviral diversity across time. Here, we utilize the strength of Darwin's finches, a system synonymous with evolutionary studies, for investigating ERV history, revealing recent retrovirus-host interactions in natural populations. By mapping ERV variation across all species of Darwin's finches and comparing with outgroup species, we highlight geographical and historical patterns of retrovirus-host occurrence, utilizing the system for evaluating the extent and timing of retroviral activity in hosts undergoing adaptive radiation and colonization of new environments. We find shared ERVs among all samples indicating retrovirus-host associations pre-dating host speciation, as well as considerable ERV variation across populations of the entire Darwin's finches' radiation. Unexpected ERV variation in finch species on different islands suggests historical changes in gene flow and selection. Non-random distribution of ERVs along and between chromosomes, and across finch species, suggests association between ERV accumulation and the rapid speciation of Darwin's finches.
Subject(s)
Endogenous Retroviruses , Finches , Passeriformes , Animals , Biological Evolution , Ecuador , Endogenous Retroviruses/genetics , Finches/genetics , Gene Flow , Passeriformes/genetics , PhylogenyABSTRACT
The role of APOBEC3 (A3) protein family members in inhibiting retrovirus infection and mobile element retrotransposition is well established. However, the evolutionary effects these restriction factors may have had on active retroviruses such as HIV-1 are less well understood. An HIV-1 variant that has been highly G-to-A mutated is unlikely to be transmitted due to accumulation of deleterious mutations. However, G-to-A mutated hA3G target sequences within which the mutations are the least deleterious are more likely to survive selection pressure. Thus, among hA3G targets in HIV-1, the ratio of nonsynonymous to synonymous changes will increase with virus generations, leaving a footprint of past activity. To study such footprints in HIV-1 evolution, we developed an in silico model based on calculated hA3G target probabilities derived from G-to-A mutation sequence contexts in the literature. We simulated G-to-A changes iteratively in independent sequential HIV-1 infections until a stop codon was introduced into any gene. In addition to our simulation results, we observed higher ratios of nonsynonymous to synonymous mutation at hA3G targets in extant HIV-1 genomes than in their putative ancestral genomes, compared to random controls, implying that moderate levels of A3G-mediated G-to-A mutation have been a factor in HIV-1 evolution. Results from in vitro passaging experiments of HIV-1 modified to be highly susceptible to hA3G mutagenesis verified our simulation accuracy. We also used our simulation to examine the possible role of A3G-induced mutations in the origin of drug resistance. We found that hA3G activity could have been responsible for only a small increase in mutations at known drug resistance sites and propose that concerns for increased resistance to other antiviral drugs should not prevent Vif from being considered a suitable target for development of new drugs.
Subject(s)
Cytidine Deaminase/metabolism , Drug Resistance, Viral , HIV-1/genetics , Mutation , APOBEC-3G Deaminase , Computer Simulation , DNA Footprinting , Evolution, Molecular , HIV-1/drug effects , HIV-1/physiology , Humans , Models, Genetic , Reproducibility of Results , vif Gene Products, Human Immunodeficiency Virus/genetics , vif Gene Products, Human Immunodeficiency Virus/metabolismABSTRACT
Retroviruses have infiltrated vertebrate germlines for millions of years as inherited endogenous retroviruses (ERVs). Mammalian genomes host large numbers of ERVs and transposable elements (TEs), including retrotransposons and DNA transposons, that contribute to genomic innovation and evolution as coopted genes and regulators of diverse functions. To explore features distinguishing coopted ERVs and TEs from other integrations, we focus on the potential role of ZBED6 and repeated ERV domestication as repurposed Syncytin genes. The placental mammal-specific ZBED6 is a DNA transposon-derived transcription regulator and we demonstrate that its binding motifs are associated with distinct Syncytins and that ZBED6 binding motifs are 2- to 3-fold more frequent in ERVs than in flanking DNA. Our observations suggest that ZBED6 could contribute an extended regulatory role of genomic expression, utilizing ERVs as platforms for genomic innovation and evolution.
ABSTRACT
It is a broadly observed pattern that the non-recombining regions of sex-limited chromosomes (Y and W) accumulate more repeats than the rest of the genome, even in species like birds with a low genome-wide repeat content. Here, we show that in birds with highly heteromorphic sex chromosomes, the W chromosome has a transposable element (TE) density of greater than 55% compared to the genome-wide density of less than 10%, and contains over half of all full-length (thus potentially active) endogenous retroviruses (ERVs) of the entire genome. Using RNA-seq and protein mass spectrometry data, we were able to detect signatures of female-specific ERV expression. We hypothesize that the avian W chromosome acts as a refugium for active ERVs, probably leading to female-biased mutational load that may influence female physiology similar to the 'toxic-Y' effect in Drosophila males. Furthermore, Haldane's rule predicts that the heterogametic sex has reduced fertility in hybrids. We propose that the excess of W-linked active ERVs over the rest of the genome may be an additional explanatory variable for Haldane's rule, with consequences for genetic incompatibilities between species through TE/repressor mismatches in hybrids. Together, our results suggest that the sequence content of female-specific W chromosomes can have effects far beyond sex determination and gene dosage. This article is part of the theme issue 'Challenging the paradigm in sex chromosome evolution: empirical and theoretical insights with a focus on vertebrates (Part II)'.
Subject(s)
Birds/genetics , Endogenous Retroviruses/physiology , Mutation Rate , Sex Chromosomes , Animals , Birds/virology , Female , Fertility , Male , Sex Factors , Species SpecificityABSTRACT
The underlying molecular mechanisms that determine long day versus short day breeders remain unknown in any organism. Atlantic herring provides a unique opportunity to examine the molecular mechanisms involved in reproduction timing, because both spring and autumn spawners exist within the same species. Although our previous whole genome comparisons revealed a strong association of TSHR alleles with spawning seasons, the functional consequences of these variants remain unknown. Here we examined the functional significance of six candidate TSHR mutations strongly associated with herring reproductive seasonality. We show that the L471M missense mutation in the spring-allele causes enhanced cAMP signaling. The best candidate non-coding mutation is a 5.2 kb retrotransposon insertion upstream of the TSHR transcription start site, near an open chromatin region, which is likely to affect TSHR expression. The insertion occurred prior to the split between Pacific and Atlantic herring and was lost in the autumn-allele. Our study shows that strongly associated coding and non-coding variants at the TSHR locus may both contribute to the regulation of seasonal reproduction in herring.
Subject(s)
Fishes/physiology , Receptors, Thyrotropin/genetics , Alleles , Animals , Atlantic Ocean , Conserved Sequence , Haplotypes , Mutation , Receptors, Thyrotropin/physiology , Reproduction/physiology , Seasons , Signal Transduction , Thyrotropin, beta Subunit/geneticsABSTRACT
The ability of human and murine APOBECs (specifically, APOBEC3) to inhibit infecting retroviruses and retrotransposition of some mobile elements is becoming established. Less clear is the effect that they have had on the establishment of the endogenous proviruses resident in the human and mouse genomes. We used the mouse genome sequence to study diversity and genetic traits of nonecotropic murine leukemia viruses (polytropic [Pmv], modified polytropic [Mpmv], and xenotropic [Xmv] subgroups), the best-characterized large set of recently integrated proviruses. We identified 49 proviruses. In phylogenetic analyses, Pmvs and Mpmvs were monophyletic, whereas Xmvs were divided into several clades, implying a greater number of replication cycles between the integration events. Four distinct primer binding site types (Pro, Gln1, Gln2 and Thr) were dispersed within the phylogeny, indicating frequent mispriming. We analyzed the frequency and context of G-to-A mutations for the role of mA3 in formation of these proviruses. In the Pmv and Mpmv (but not Xmv) groups, mutations attributable to mA3 constituted a large fraction of the total. A significant number of nonsense mutations suggests the absence of purifying selection following mutation. A strong bias of G-to-A relative to C-to-T changes was seen, implying a strand specificity that can only have occurred prior to integration. The optimal sequence context of G-to-A mutations, TTC, was consistent with mA3. At least in the Pmv group, a significant 5' to 3' gradient of G-to-A mutations was consistent with mA3 editing. Altogether, our results for the first time suggest mA3 editing immediately preceding the integration event that led to retroviral endogenization, contributing to inactivation of infectivity.
Subject(s)
Cytidine Deaminase/metabolism , Genetic Variation , Leukemia Virus, Murine/genetics , Adenine , Animals , Binding Sites , Genome , Guanine , Leukemia Virus, Murine/physiology , Mice , Mice, Inbred C57BL , Mutation/genetics , Phylogeny , Proviruses/genetics , Virus ReplicationABSTRACT
Eukaryotic genomes contain many endogenous retroviral sequences (ERVs). ERVs are often severely mutated, therefore difficult to detect. A platform independent (Java) program package, RetroTector (ReTe), was constructed. It has three basic modules: (i) detection of candidate long terminal repeats (LTRs), (ii) detection of chains of conserved retroviral motifs fulfilling distance constraints and (iii) attempted reconstruction of original retroviral protein sequences, combining alignment, codon statistics and properties of protein ends. Other features are prediction of additional open reading frames, automated database collection, graphical presentation and automatic classification. ReTe favors elements >1000-bp long due to its dependence on order of and distances between retroviral fragments. It detects single or low-copy-number elements. ReTe assigned a 'retroviral' score of 890-2827 to 10 exogenous retroviruses from seven genera, and accurately predicted their genes. In a simulated model, ReTe was robust against mutational decay. The human genome was analyzed in 1-2 days on a LINUX cluster. Retroviral sequences were detected in divergent vertebrate genomes. Most ReTe detected chains were coincident with Repeatmasker output and the HERVd database. ReTe did not report most of the evolutionary old HERV-L related and MalR sequences, and is not yet tailored for single LTR detection. Nevertheless, ReTe rationally detects and annotates many retroviral sequences.