RESUMO
The ability to generate multiple RNA transcript isoforms from the same gene is a general phenomenon in eukaryotes. However, the complexity and diversity of alternative isoforms in natural populations remain largely unexplored. Using a newly developed full-length transcript enrichment protocol with 5' CAP selection, we sequenced full-length RNA transcripts of 48 individuals from outbred populations and subspecies of Mus musculus, and from the closely related sister species Mus spretus and Mus spicilegus as outgroups. The data set represents the most extensive full-length high-quality isoform catalog at the population level to date. In total, we reliably identify 117,728 distinct isoforms, of which only 51% were previously annotated. We show that the population-specific distribution pattern of isoforms is phylogenetically informative and reflects the segregating single nucleotide polymorphism (SNP) diversity between the populations. We find that ancient housekeeping genes are a major source of the overall isoform diversity, and that the generation of alternative first exons plays a major role in generating new isoforms. Given that our data allow us to distinguish between population-specific isoforms and isoforms that are conserved across multiple populations, it is possible to refine the annotation of the reference mouse genome to a set of about 40,000 isoforms that should be most relevant for comparative functional analysis across species.
RESUMO
Each generation, spontaneous mutations introduce heritable changes that tend to reduce fitness in populations of highly adapted living organisms. This erosion of fitness is countered by natural selection, which keeps deleterious mutations at low frequencies and ultimately removes most of them from the population. The classical way of studying the impact of spontaneous mutations is via mutation accumulation (MA) experiments, where lines of small effective population size are bred for many generations in conditions where natural selection is largely removed. Such experiments in microbes, invertebrates, and plants have generally demonstrated that fitness decays as a result of MA. However, the phenotypic consequences of MA in vertebrates are largely unknown, because no replicated MA experiment has previously been carried out. This gap in our knowledge is relevant for human populations, where societal changes have reduced the strength of natural selection, potentially allowing deleterious mutations to accumulate. Here, we study the impact of spontaneous MA on the mean and genetic variation for quantitative and fitness-related traits in the house mouse using the MA experimental design, with a cryopreserved control to account for environmental influences. We show that variation for morphological and life history traits accumulates at a sufficiently high rate to maintain genetic variation and selection response. Weight and tail length measures decrease significantly between 0.04% and 0.3% per generation with narrow confidence intervals. Fitness proxy measures (litter size and surviving offspring) decrease on average by about 0.2% per generation, but with confidence intervals overlapping zero. When extrapolated to humans, our results imply that the rate of fitness loss should not be of concern in the foreseeable future.
Assuntos
Aptidão Genética , Acúmulo de Mutações , Seleção Genética , Animais , Camundongos , Feminino , Masculino , Variação Genética , Mutação/genética , FenótipoRESUMO
The mouse serves as a mammalian model for understanding the nature of variation from new mutations, a question that has both evolutionary and medical significance. Previous studies suggest that the rate of single-nucleotide mutations (SNMs) in mice is â¼50% of that in humans. However, information largely comes from studies involving the C57BL/6 strain, and there is little information from other mouse strains. Here, we study the mutations that accumulated in 59 mouse lines derived from four inbred strains that are commonly used in genetics and clinical research (BALB/cAnNRj, C57BL/6JRj, C3H/HeNRj, and FVB/NRj), maintained for eight to nine generations by brother-sister mating. By analyzing Illumina whole-genome sequencing data, we estimate that the average rate of new SNMs in mice is â¼µ = 6.7 × 10-9. However, there is substantial variation in the spectrum of SNMs among strains, so the burden from new mutations also varies among strains. For example, the FVB strain has a spectrum that is markedly skewed toward CâA transversions and is likely to experience a higher deleterious load than other strains, due to an increased frequency of nonsense mutations in glutamic acid codons. Finally, we observe substantial variation in the rate of new SNMs among DNA sequence contexts, CpG sites, and their adjacent nucleotides playing an important role.
Assuntos
Camundongos Endogâmicos , Animais , Camundongos , Camundongos Endogâmicos/genética , Mutação , Camundongos Endogâmicos C57BLRESUMO
Mammalian genomes include many maternally and paternally imprinted genes. Most of these are also expressed in the brain, and several have been implicated in regulating specific behavioral traits. Here, we have used a knockout approach to study the function of Peg13, a gene that codes for a fast-evolving lncRNA (long noncoding RNA) and is part of a complex of imprinted genes on chromosome 15 in mice and chromosome 8 in humans. Mice lacking the 3' half of the transcript look morphologically wild-type but show distinct behavioral differences. They lose interest in the opposite sex, instead displaying a preference for wild-type animals of the same sex. Further, they show a higher level of anxiety, lowered activity and curiosity, and a deficiency in pup retrieval behavior. Brain RNA expression analysis reveals that genes involved in the serotonergic system, formation of glutamatergic synapses, olfactory processing, and estrogen signaling-as well as more than half of the other known imprinted genes-show significant expression changes in Peg13-deficient mice. Intriguingly, these pathways are differentially affected in the sexes, resulting in male and female brains of Peg13-deficient mice differing more from each other than those of wild-type mice. We conclude that Peg13 is part of a developmental pathway that regulates the neurobiology of social and sexual interactions.
Assuntos
Encéfalo/metabolismo , Impressão Genômica , Preferência de Acasalamento Animal , RNA Longo não Codificante/metabolismo , Transcriptoma , Animais , Feminino , Masculino , Camundongos , Camundongos Knockout , RNA Longo não Codificante/genéticaRESUMO
Gene retroposition is known to contribute to patterns of gene evolution and adaptations. However, possible negative effects of gene retroposition remain largely unexplored since most previous studies have focused on between-species comparisons where negatively selected copies are mostly not observed, as they are quickly lost from populations. Here, we show for natural house mouse populations that the primary rate of retroposition is orders of magnitude higher than the long-term rate. Comparisons with single-nucleotide polymorphism distribution patterns in the same populations show that most retroposition events are deleterious. Transcriptomic profiling analysis shows that new retroposed copies become easily subject to transcription and have an influence on the expression levels of their parental genes, especially when transcribed in the antisense direction. Our results imply that the impact of retroposition on the mutational load has been highly underestimated in natural populations. This has additional implications for strategies of disease allele detection in humans.
Assuntos
Mutação/genética , Retroelementos/genética , Animais , Variações do Número de Cópias de DNA/genética , Regulação da Expressão Gênica , Genética Populacional , Geografia , Camundongos , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Although the contribution of retrogenes to the evolution of genes and genomes has long been recognized, the evolutionary patterns of very recently derived retrocopies that are still polymorphic within natural populations have not been much studied so far. We use here a set of 2,025 such retrocopies in nine house mouse populations from three subspecies (Mus musculus domesticus, M. m. musculus, and M. m. castaneus) to trace their origin and evolutionary fate. We find that ancient house-keeping genes are significantly more likely to generate retrocopies than younger genes and that the propensity to generate a retrocopy depends on its level of expression in the germline. Although most retrocopies are detrimental and quickly purged, we focus here on the subset that appears to be neutral or even adaptive. We show that retrocopies from X-chromosomal parental genes have a higher likelihood to reach elevated frequencies in the populations, confirming the notion of adaptive effects for "out-of-X" retrogenes. Also, retrocopies in intergenic regions are more likely to reach higher population frequencies than those in introns of genes, implying a more detrimental effect when they land within transcribed regions. For a small subset of retrocopies, we find signatures of positive selection, indicating they were involved in a recent adaptation process. We show that the population-specific distribution pattern of retrocopies is phylogenetically informative and can be used to infer population history with a better resolution than with SNP markers.
Assuntos
Evolução Molecular , Genoma , Animais , CamundongosRESUMO
Genic copy number differences can have phenotypic consequences, but so far this has not been studied in detail in natural populations. Here, we analysed the natural variation of two families of tandemly repeated regulatory small nucleolar RNAs (SNORD115 and SNORD116) in the house mouse (Mus musculus). They are encoded within the Prader-Willi Syndrome gene region, known to be involved in behavioural, metabolic, and osteogenic functions in mammals. We determined that the copy numbers of these SNORD RNAs show substantial natural variation, both in wild-derived mice as well as in an inbred mouse strain (C57BL/6J). We show that copy number differences are subject to change across generations, making them highly variable and resulting in individual differences. In transcriptome data from brain samples, we found SNORD copy-number correlated regulation of possible target genes, including Htr2c, a predicted target gene of SNORD115, as well as Ankrd11, a predicted target gene of SNORD116. Ankrd11 is a chromatin regulator, which has previously been implicated in regulating the development of the skull. Based on morphometric shape analysis of the skulls of individual mice of the inbred strain, we show that shape measures correlate with SNORD116 copy numbers in the respective individuals. Our results suggest that the variable dosage of regulatory RNAs can lead to phenotypic variation between individuals that would typically have been ascribed to environmentally induced variation, while it is actually encoded in individual differences of copy numbers of regulatory molecules.
Assuntos
Variações do Número de Cópias de DNA , Síndrome de Prader-Willi , Animais , Encéfalo , Variações do Número de Cópias de DNA/genética , Camundongos , Camundongos Endogâmicos C57BL , RNA Nucleolar PequenoRESUMO
For over a century, inbred mice have been used in many areas of genetics research to gain insight into the genetic variation underlying traits of interest. The generalizability of any genetic research study in inbred mice is dependent upon all individual mice being genetically identical, which in turn is dependent on the breeding designs of companies that supply inbred mice to researchers. Here, we compare whole-genome sequences from individuals of four commonly used inbred strains that were procured from either the colony nucleus or from a production colony (which can be as many as ten generations removed from the nucleus) of a large commercial breeder, in order to investigate the extent and nature of genetic variation within and between individuals. We found that individuals within strains are not isogenic, and there are differences in the levels of genetic variation that are explained by differences in the genetic distance from the colony nucleus. In addition, we employ a novel approach to mutation rate estimation based on the observed genetic variation and the expected site frequency spectrum at equilibrium, given a fully inbred breeding design. We find that it provides a reasonable per nucleotide mutation rate estimate when mice come from the colony nucleus (~7.9 × 10-9 in C3H/HeN), but substantially inflated estimates when mice come from production colonies.
Assuntos
Taxa de Mutação , Nucleotídeos , Animais , Camundongos , Camundongos Endogâmicos C3HRESUMO
Systematic knockout studies in mice have shown that a large fraction of the gene replacements show no lethal or other overt phenotypes. This has led to the development of more refined analysis schemes, including physiological, behavioral, developmental and cytological tests. However, transcriptomic analyses have not yet been systematically evaluated for non-lethal knockouts. We conducted a power analysis to determine the experimental conditions under which even small changes in transcript levels can be reliably traced. We have applied this to two gene disruption lines of genes for which no function was known so far. Dedicated phenotyping tests informed by the tissues and stages of highest expression of the two genes show small effects on the tested phenotypes. For the transcriptome analysis of these stages and tissues, we used a prior power analysis to determine the number of biological replicates and the sequencing depth. We find that under these conditions, the knockouts have a significant impact on the transcriptional networks, with thousands of genes showing small transcriptional changes. GO analysis suggests that A930004D18Rik is involved in developmental processes through contributing to protein complexes, and A830005F24Rik in extracellular matrix functions. Subsampling analysis of the data reveals that the increase in the number of biological replicates was more important that increasing the sequencing depth to arrive at these results. Hence, our proof-of-principle experiment suggests that transcriptomic analysis is indeed an option to study gene functions of genes with weak or no traceable phenotypic effects and it provides the boundary conditions under which this is possible.
Assuntos
Perfilação da Expressão Gênica/métodos , Técnicas de Inativação de Genes , Estudos de Associação Genética/métodos , Animais , Comportamento Animal , Biologia Computacional , Extremidades/anatomia & histologia , Feminino , Perfilação da Expressão Gênica/estatística & dados numéricos , Estudos de Associação Genética/estatística & dados numéricos , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Knockout , Modelos Genéticos , Fenótipo , Estudo de Prova de Conceito , RNA-Seq/estatística & dados numéricos , TranscriptomaRESUMO
BACKGROUND: Amylase gene clusters have been implicated in adaptive copy number changes in response to the amount of starch in the diet of humans and mammals. However, this interpretation has been questioned for humans and for mammals there is a paucity of information from natural populations. RESULTS: Using optical mapping and genome read information, we show here that the amylase cluster in natural house mouse populations is indeed copy-number variable for Amy2b paralogous gene copies (called Amy2a1 - Amy2a5), but a direct connection to starch diet is not evident. However, we find that the amylase cluster was subject to introgression of haplotypes between Mus musculus sub-species. A very recent introgression can be traced in the Western European populations and this leads also to the rescue of an Amy2b pseudogene. Some populations and inbred lines derived from the Western house mouse (Mus musculus domesticus) harbor a copy of the pancreatic amylase (Amy2b) with a stop codon in the first exon, making it non-functional. But populations in France harbor a haplotype introgressed from the Eastern house mouse (M. m. musculus) with an intact reading frame. Detailed analysis of phylogenetic patterns along the amylase cluster suggest an additional history of previous introgressions. CONCLUSIONS: Our results show that the amylase gene cluster is a hotspot of introgression in the mouse genome, making it an evolutionary active region beyond the previously observed copy number changes.
Assuntos
Amilases/genética , Família Multigênica , Pseudogenes , Substituição de Aminoácidos/genética , Animais , Sequência de Bases , Genoma , Haplótipos/genética , Camundongos , Filogenia , Alinhamento de SequênciaRESUMO
Pupation site choice of Drosophila third-instar larvae is critical for the survival of individuals, as pupae are exposed to various biotic and abiotic dangers while immobilized during the 3-4 days of metamorphosis. This singular behavioural choice is sensitive to both environmental and genetic factors. Here, we developed a high-throughput phenotyping approach to assay the variation in pupation height in Drosophila melanogaster, while controlling for possibly confounding factors. We find substantial variation of mean pupation height among sampled natural stocks and we show that the Drosophila Genetic Reference Panel (DGRP) reflects this variation. Using the DGRP stocks for genome-wide association (GWA) mapping, 16 loci involved in determining pupation height could be resolved. The candidate genes in these loci are enriched for high expression in the larval central nervous system. A genetic network could be constructed from the candidate loci, which places scribble (scrib) at the centre, plus other genes known to be involved in nervous system development, such as Epidermal growth factor receptor (Egfr) and p53. Using gene disruption lines, we could functionally validate several of the initially identified loci, as well as additional loci predicted from network analysis. Our study shows that the combination of high-throughput phenotyping with a genetic analysis of variation captured from the wild can be used to approach the genetic dissection of an environmentally relevant behavioural phenotype.
Assuntos
Comportamento Animal/fisiologia , Drosophila melanogaster/genética , Redes Reguladoras de Genes/genética , Animais , Proteínas de Drosophila/genética , Feminino , Estudo de Associação Genômica Ampla/métodos , Larva/genética , Masculino , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Pupa/genética , Locos de Características Quantitativas/genéticaRESUMO
Not all genetic loci follow Mendel's rules, and the evolutionary consequences of this are not yet fully known. Genomic conflict involving multiple loci is a likely outcome, as restoration of Mendelian inheritance patterns will be selected for, and sexual conflict may also arise when sexes are differentially affected. Here, we investigate effects of the t haplotype, an autosomal male meiotic driver in house mice, on genome-wide gene expression patterns in males and females. We analysed gonads, liver and brain in adult same-sex sibling pairs differing in genotype, allowing us to identify t-associated differences in gene regulation. In testes, only 40% of differentially expressed genes mapped to the approximately 708 annotated genes comprising the t haplotype. Thus, much of the activity of the t haplotype occurs in trans, and as upregulation. Sperm maturation functions were enriched among both cis and trans acting t haplotype genes. Within the t haplotype, we observed more downregulation and differential exon usage. In ovaries, liver and brain, the majority of expression differences mapped to the t haplotype, and were largely independent of the differences seen in the testis. Overall, we found widespread transcriptional effects of this male meiotic driver in the house mouse genome.
Assuntos
Expressão Gênica , Genoma , Haplótipos , Camundongos/genética , Transcriptoma , Animais , Feminino , Masculino , Especificidade de Órgãos , Fatores SexuaisRESUMO
BACKGROUND: The adaptive immune system of vertebrates has an extraordinary potential to sense and neutralize foreign antigens entering the body. De novo evolution of genes implies that the genome itself expresses novel antigens from intergenic sequences which could cause a problem with this immune system. Peptides from these novel proteins could be presented by the major histocompatibility complex (MHC) receptors to the cell surface and would be recognized as foreign. The respective cells would then be attacked and destroyed, or would cause inflammatory responses. Hence, de novo expressed peptides have to be introduced to the immune system as being self-peptides to avoid such autoimmune reactions. The regulation of the distinction between self and non-self starts during embryonic development, but continues late into adulthood. It is mostly mediated by specialized cells in the thymus, but can also be conveyed in peripheral tissues, such as the lymph nodes and the spleen. The self-antigens need to be exposed to the reactive T-cells, which requires the expression of the genes in the respective tissues. Since the initial activation of a promotor for new intergenic transcription of a de novo gene could occur in any tissue, we should expect that the evolutionary establishment of a de novo gene in animals with an adaptive immune system should also involve expression in at least one of the tissues that confer self-recognition. RESULTS: We have studied this question by analyzing the transcriptomes of multiple tissues from young mice in three closely related natural populations of the house mouse (M. m. domesticus). We find that new intergenic transcription occurs indeed mostly in only a single tissue. When a second tissue becomes involved, thymus and spleen are significantly overrepresented. CONCLUSIONS: We conclude that the inclusion of de novo transcripts in the processes for the induction of self-tolerance is indeed an important step in the evolution of functional de novo genes in vertebrates.
Assuntos
Imunidade Adaptativa/genética , DNA Intergênico/genética , Evolução Molecular , Sistema Imunitário/imunologia , Animais , Sequência de Bases , Simulação por Computador , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Camundongos , Filogenia , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Transcriptoma/genéticaRESUMO
Phylostratigraphy is a computational framework for dating the emergence of DNA and protein sequences in a phylogeny. It has been extensively applied to make inferences on patterns of genome evolution, including patterns of disease gene evolution, ontogeny and de novo gene origination. Phylostratigraphy typically relies on BLAST searches along a species tree, but new simulation studies have raised concerns about the ability of BLAST to detect remote homologues and its impact on phylostratigraphic inferences. Here, we re-assessed these simulations. We found that, even with a possible overall BLAST false negative rate between 11-15%, the large majority of sequences assigned to a recent evolutionary origin by phylostratigraphy is unaffected by technical concerns about BLAST. Where the results of the simulations did cast doubt on previously reported findings, we repeated the original analyses but now excluded all questionable sequences. The originally described patterns remained essentially unchanged. These new analyses strongly support phylostratigraphic inferences, including: genes that emerged after the origin of eukaryotes are more likely to be expressed in the ectoderm than in the endoderm or mesoderm in Drosophila, and the de novo emergence of protein-coding genes from non-genic sequences occurs through proto-gene intermediates in yeast. We conclude that BLAST is an appropriate and sufficiently sensitive tool in phylostratigraphic analysis that does not appear to introduce significant biases into evolutionary pattern inferences.
Assuntos
Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Animais , Viés , Evolução Biológica , Simulação por Computador , Drosophila , Evolução Molecular , Genoma , Modelos Genéticos , Filogenia , Fatores de TempoRESUMO
Copy number variation represents a major source of genetic divergence, yet the evolutionary dynamics of genic copy number variation in natural populations during differentiation and adaptation remain unclear. We applied a read depth approach to genome resequencing data to detect copy number variants (CNVs) ≥1 kb in wild-caught mice belonging to four populations of Mus musculus domesticus. We complemented the bioinformatics analyses with experimental validation using droplet digital PCR. The specific focus of our analysis is CNVs that include complete genes, as these CNVs could be expected to contribute most directly to evolutionary divergence. In total, 1863 transcription units appear to be completely encompassed within CNVs in at least one individual when compared to the reference assembly. Further, 179 of these CNVs show population-specific copy number differences, and 325 are subject to complete deletion in multiple individuals. Among the most copy-number variable genes are three highly conserved genes that encode the splicing factor CWC22, the spindle protein SFI1, and the Holliday junction recognition protein HJURP. These genes exhibit population-specific expansion patterns that suggest involvement in local adaptations. We found that genes that overlap with large segmental duplications are generally more copy-number variable. These genes encode proteins that are relevant for environmental and behavioral interactions, such as vomeronasal and olfactory receptors, as well as major urinary proteins and several proteins of unknown function. The overall analysis shows that genic CNVs contribute more to population differentiation in mice than in humans and may promote and speed up population divergence.
Assuntos
Proteínas de Ciclo Celular/genética , Variações do Número de Cópias de DNA , Proteínas de Ligação a DNA/genética , Camundongos/genética , Proteínas Nucleares/genética , Adaptação Biológica , Animais , Proteínas de Ciclo Celular/metabolismo , Sequência Conservada , Proteínas de Ligação a DNA/metabolismo , Evolução Molecular , Genética Populacional , Genoma , Genômica/métodos , Camundongos/classificação , Proteínas Nucleares/metabolismo , Proteínas de Ligação a RNA , Seleção GenéticaRESUMO
BACKGROUND: The MHC class I and II loci mediate the adaptive immune response and belong to the most polymorphic loci in vertebrate genomes. In fact, the number of different alleles in a given species is often so large that it remains a challenge to provide an evolutionary model that can fully account for this. RESULTS: We provide here a general survey of MHC allele numbers in house mouse populations and two sub-species (M. m. domesticus and M. m. musculus) for H2 class I D and K, as well as class II A and E loci. Between 50 and 90% of the detected different sequences constitute new alleles, confirming that the discovery of new alleles is indeed far from complete. House mice live in separate demes with small effective population sizes, factors that were proposed to reduce, rather than enhance the possibility for the maintenance of many different alleles. To specifically investigate the occurrence of alleles within demes, we focused on the class II H2-Aa and H2-Eb exon 2 alleles in nine demes of M. m. domesticus from two different geographic regions. We find on the one hand a group of alleles that occur in different sampling regions and three quarters of these are also found in both sub-species. On the other hand, the larger group of different alleles (56%) occurs only in one of the regions and most of these (89%) only in single demes. We show that most of these region-specific alleles have apparently arisen through recombination and/or partial gene conversion from already existing alleles. CONCLUSIONS: Demes can act as sources of alleles that outnumber the set of alleles that are shared across the species range. These findings support the reservoir model proposed for human MHC diversity, which states that large pools of rare MHC allele variants are continuously generated by neutral mutational mechanisms. Given that these can become important in the defense against newly emerging pathogens, the reservoir model complements the selection based models for MHC diversity and explains why the exceptional diversity exists.
RESUMO
The vertebrate cranium is a prime example of the high evolvability of complex traits. While evidence of genes and developmental pathways underlying craniofacial shape determination is accumulating, we are still far from understanding how such variation at the genetic level is translated into craniofacial shape variation. Here we used 3D geometric morphometrics to map genes involved in shape determination in a population of outbred mice (Carworth Farms White, or CFW). We defined shape traits via principal component analysis of 3D skull and mandible measurements. We mapped genetic loci associated with shape traits at ~80,000 candidate single nucleotide polymorphisms in ~700 male mice. We found that craniofacial shape and size are highly heritable, polygenic traits. Despite the polygenic nature of the traits, we identified 17 loci that explain variation in skull shape, and 8 loci associated with variation in mandible shape. Together, the associated variants account for 11.4% of skull and 4.4% of mandible shape variation, however, the total additive genetic variance associated with phenotypic variation was estimated in ~45%. Candidate genes within the associated loci have known roles in craniofacial development; this includes 6 transcription factors and several regulators of bone developmental pathways. One gene, Mn1, has an unusually large effect on shape variation in our study. A knockout of this gene was previously shown to affect negatively the development of membranous bones of the cranial skeleton, and evolutionary analysis shows that the gene has arisen at the base of the bony vertebrates (Eutelostomi), where the ossified head first appeared. Therefore, Mn1 emerges as a key gene for both skull formation and within-population shape variation. Our study shows that it is possible to identify important developmental genes through genome-wide mapping of high-dimensional shape features in an outbred population.
Assuntos
Face/anatomia & histologia , Regulação da Expressão Gênica no Desenvolvimento , Crânio/anatomia & histologia , Animais , Masculino , Camundongos , Camundongos Mutantes , Polimorfismo de Nucleotídeo ÚnicoRESUMO
BACKGROUND: Segmental duplications are an abundant source for novel gene functions and evolutionary adaptations. This mechanism of generating novelty was very active during the evolution of primates particularly in the human lineage. Here, we characterize the evolution and function of the SPATA31 gene family (former designation FAM75A), which was previously shown to be among the gene families with the strongest signal of positive selection in hominoids. The mouse homologue for this gene family is a single copy gene expressed during spermatogenesis. RESULTS: We show that in primates, the SPATA31 gene duplicated into SPATA31A and SPATA31C types and broadened the expression into many tissues. Each type became further segmentally duplicated in the line towards humans with the largest number of full-length copies found for SPATA31A in humans. Copy number estimates of SPATA31A based on digital PCR show an average of 7.5 with a range of 5-11 copies per diploid genome among human individuals. The primate SPATA31 genes also acquired new protein domains that suggest an involvement in UV response and DNA repair. We generated antibodies and show that the protein is re-localized from the nucleolus to the whole nucleus upon UV-irradiation suggesting a UV damage response. We used CRISPR/Cas mediated mutagenesis to knockout copies of the gene in human primary fibroblast cells. We find that cell lines with reduced functional copies as well as naturally occurring low copy number HFF cells show enhanced sensitivity towards UV-irradiation. CONCLUSION: The acquisition of new SPATA31 protein functions and its broadening of expression may be related to the evolution of the diurnal life style in primates that required a higher UV tolerance. The increased segmental duplications in hominoids as well as its fast evolution suggest the acquisition of further specific functions particularly in humans.
Assuntos
Dano ao DNA/efeitos da radiação , Evolução Molecular , Família Multigênica , Primatas/genética , Duplicações Segmentares Genômicas , Raios Ultravioleta , Animais , Mapeamento Cromossômico , Variações do Número de Cópias de DNA , Duplicação Gênica , Humanos , Filogenia , Domínios Proteicos/genéticaRESUMO
Gene evolution has long been thought to be primarily driven by duplication and rearrangement mechanisms. However, every evolutionary lineage harbours orphan genes that lack homologues in other lineages and whose evolutionary origin is only poorly understood. Orphan genes might arise from duplication and rearrangement processes followed by fast divergence; however, de novo evolution out of non-coding genomic regions is emerging as an important additional mechanism. This process appears to provide raw material continuously for the evolution of new gene functions, which can become relevant for lineage-specific adaptations.