RESUMO
The Diversity Outbred (DO) mice and their inbred founders are widely used models of human disease. However, although the genetic diversity of these mice has been well documented, their epigenetic diversity has not. Epigenetic modifications, such as histone modifications and DNA methylation, are important regulators of gene expression and, as such, are a critical mechanistic link between genotype and phenotype. Therefore, creating a map of epigenetic modifications in the DO mice and their founders is an important step toward understanding mechanisms of gene regulation and the link to disease in this widely used resource. To this end, we performed a strain survey of epigenetic modifications in hepatocytes of the DO founders. We surveyed four histone modifications (H3K4me1, H3K4me3, H3K27me3, and H3K27ac), as well as DNA methylation. We used ChromHMM to identify 14 chromatin states, each of which represents a distinct combination of the four histone modifications. We found that the epigenetic landscape is highly variable across the DO founders and is associated with variation in gene expression across strains. We found that epigenetic state imputed into a population of DO mice recapitulated the association with gene expression seen in the founders, suggesting that both histone modifications and DNA methylation are highly heritable mechanisms of gene expression regulation. We illustrate how DO gene expression can be aligned with inbred epigenetic states to identify putative cis-regulatory regions. Finally, we provide a data resource that documents strain-specific variation in the chromatin state and DNA methylation in hepatocytes across nine widely used strains of laboratory mice.
Assuntos
Metilação de DNA , Histonas , Humanos , Camundongos , Animais , Histonas/genética , Histonas/metabolismo , Regiões Promotoras Genéticas , Cromatina/genética , Epigênese Genética , Código das Histonas , Camundongos Endogâmicos , Expressão GênicaRESUMO
Some imprinted genes exhibit parental origin specific expression bias rather than being transcribed exclusively from one copy. The physiological relevance of this remains poorly understood. In an analysis of brain-specific allele-biased expression, we identified that Trappc9, a cellular trafficking factor, was expressed predominantly (~70%) from the maternally inherited allele. Loss-of-function mutations in human TRAPPC9 cause a rare neurodevelopmental syndrome characterized by microcephaly and obesity. By studying Trappc9 null mice we discovered that homozygous mutant mice showed a reduction in brain size, exploratory activity and social memory, as well as a marked increase in body weight. A role for Trappc9 in energy balance was further supported by increased ad libitum food intake in a child with TRAPPC9 deficiency. Strikingly, heterozygous mice lacking the maternal allele (70% reduced expression) had pathology similar to homozygous mutants, whereas mice lacking the paternal allele (30% reduction) were phenotypically normal. Taken together, we conclude that Trappc9 deficient mice recapitulate key pathological features of TRAPPC9 mutations in humans and identify a role for Trappc9 and its imprinting in controlling brain development and metabolism.
Assuntos
Peptídeos e Proteínas de Sinalização Intercelular/deficiência , Microcefalia/genética , Obesidade/genética , Animais , Criança , Feminino , Regulação da Expressão Gênica , Frequência do Gene , Impressão Genômica , Heterozigoto , Homozigoto , Humanos , Peptídeos e Proteínas de Sinalização Intercelular/genética , Peptídeos e Proteínas de Sinalização Intercelular/metabolismo , Masculino , Herança Materna , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Knockout , Microcefalia/metabolismo , Mutação , Obesidade/metabolismo , FenótipoRESUMO
Mouse embryonic stem cells (mESCs) cultured in the presence of LIF occupy a ground state with highly active pluripotency-associated transcriptional and epigenetic circuitry. However, ground state pluripotency in some inbred strain backgrounds is unstable in the absence of ERK1/2 and GSK3 inhibition. Using an unbiased genetic approach, we dissect the basis of this divergent response to extracellular cues by profiling gene expression and chromatin accessibility in 170 genetically heterogeneous mESCs. We map thousands of loci affecting chromatin accessibility and/or transcript abundance, including 10 QTL hotspots where genetic variation at a single locus coordinates the regulation of genes throughout the genome. For one hotspot, we identify a single enhancer variant â¼10 kb upstream of Lifr associated with chromatin accessibility and mediating a cascade of molecular events affecting pluripotency. We validate causation through reciprocal allele swaps, demonstrating the functional consequences of noncoding variation in gene regulatory networks that stabilize pluripotent states in vitro.
Assuntos
Cromatina , Células-Tronco Pluripotentes , Animais , Diferenciação Celular , Cromatina/genética , Expressão Gênica , Variação Genética , Quinase 3 da Glicogênio Sintase , CamundongosRESUMO
Allele-specific expression (ASE) at single-cell resolution is a critical tool for understanding the stochastic and dynamic features of gene expression. However, low read coverage and high biological variability present challenges for analyzing ASE. We demonstrate that discarding multi-mapping reads leads to higher variability in estimates of allelic proportions, an increased frequency of sampling zeros, and can lead to spurious findings of dynamic and monoallelic gene expression. Here, we report a method for ASE analysis from single-cell RNA-Seq data that accurately classifies allelic expression states and improves estimation of allelic proportions by pooling information across cells. We further demonstrate that combining information across cells using a hierarchical mixture model reduces sampling variability without sacrificing cell-to-cell heterogeneity. We applied our approach to re-evaluate the statistical independence of allelic bursting and track changes in the allele-specific expression patterns of cells sampled over a developmental time course.
Assuntos
Expressão Gênica , Zigoto/metabolismo , Alelos , Animais , Feminino , Camundongos , Análise de Sequência de RNA , Análise de Célula Única , Zigoto/citologia , Zigoto/crescimento & desenvolvimentoRESUMO
Systems genetic analysis of complex traits involves the integrated analysis of genetic, genomic, and disease-related measures. However, these data are often collected separately across multiple study populations, rendering direct correlation of molecular features to complex traits impossible. Recent transcriptome-wide association studies (TWAS) have harnessed gene expression quantitative trait loci (eQTL) to associate unmeasured gene expression with a complex trait in genotyped individuals, but this approach relies primarily on strong eQTL. We propose a simple and powerful alternative strategy for correlating independently obtained sets of complex traits and molecular features. In contrast to TWAS, our approach gains precision by correlating complex traits through a common set of continuous phenotypes instead of genetic predictors, and can identify transcript-trait correlations for which the regulation is not genetic. In our approach, a set of multiple quantitative "reference" traits is measured across all individuals, while measures of the complex trait of interest and transcriptional profiles are obtained in disjoint subsamples. A conventional multivariate statistical method, canonical correlation analysis, is used to relate the reference traits and traits of interest to identify gene expression correlates. We evaluate power and sample size requirements of this methodology, as well as performance relative to other methods, via extensive simulation and analysis of a behavioral genetics experiment in 258 Diversity Outbred mice involving two independent sets of anxiety-related behaviors and hippocampal gene expression. After splitting the data set and hiding one set of anxiety-related traits in half the samples, we identified transcripts correlated with the hidden traits using the other set of anxiety-related traits and exploiting the highest canonical correlation (R = 0.69) between the trait data sets. We demonstrate that this approach outperforms TWAS in identifying associated transcripts. Together, these results demonstrate the validity, reliability, and power of reference trait analysis for identifying relations between complex traits and their molecular substrates.
Assuntos
Perfilação da Expressão Gênica/métodos , Estudo de Associação Genômica Ampla/métodos , Locos de Características Quantitativas , Transcriptoma , Animais , Ansiedade/genética , Perfilação da Expressão Gênica/normas , Estudo de Associação Genômica Ampla/normas , Humanos , Padrões de ReferênciaRESUMO
Isogenic laboratory mouse strains enhance reproducibility because individual animals are genetically identical. For the most widely used isogenic strain, C57BL/6, there exists a wealth of genetic, phenotypic, and genomic data, including a high-quality reference genome (GRCm38.p6). Now 20 years after the first release of the mouse reference genome, C57BL/6J mice are at least 26 inbreeding generations removed from GRCm38 and the strain is now maintained with periodic reintroduction of cryorecovered mice derived from a single breeder pair, aptly named Adam and Eve. To provide an update to the mouse reference genome that more accurately represents the genome of today's C57BL/6J mice, we took advantage of long read, short read, and optical mapping technologies to generate a de novo assembly of the C57BL/6J Eve genome (B6Eve). Using these data, we have addressed recurring variants observed in previous mouse genomic studies. We have also identified structural variations, closed gaps in the mouse reference assembly, and revealed previously unannotated coding sequences. This B6Eve assembly explains discrepant observations that have been associated with GRCm38-based analyses, and will inform a reference genome that is more representative of the C57BL/6J mice that are in use today.
Assuntos
Genoma , Genômica , Animais , Biologia Computacional/métodos , Feminino , Genômica/métodos , Endogamia , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Linhagem , Fenótipo , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Motivation: Allele-specific expression (ASE) refers to the differential abundance of the allelic copies of a transcript. RNA sequencing (RNA-seq) can provide quantitative estimates of ASE for genes with transcribed polymorphisms. When short-read sequences are aligned to a diploid transcriptome, read-mapping ambiguities confound our ability to directly count reads. Multi-mapping reads aligning equally well to multiple genomic locations, isoforms or alleles can comprise the majority (>85%) of reads. Discarding them can result in biases and substantial loss of information. Methods have been developed that use weighted allocation of read counts but these methods treat the different types of multi-reads equivalently. We propose a hierarchical approach to allocation of read counts that first resolves ambiguities among genes, then among isoforms, and lastly between alleles. We have implemented our model in EMASE software (Expectation-Maximization for Allele Specific Expression) to estimate total gene expression, isoform usage and ASE based on this hierarchical allocation. Results: Methods that align RNA-seq reads to a diploid transcriptome incorporating known genetic variants improve estimates of ASE and total gene expression compared to methods that use reference genome alignments. Weighted allocation methods outperform methods that discard multi-reads. Hierarchical allocation of reads improves estimation of ASE even when data are simulated from a non-hierarchical model. Analysis of RNA-seq data from F1 hybrid mice using EMASE reveals widespread ASE associated with cis-acting polymorphisms and a small number of parent-of-origin effects. Availability and implementation: EMASE software is available at https://github.com/churchill-lab/emase. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Alelos , Processamento Alternativo , Análise de Sequência de RNA/métodos , Software , Transcriptoma , Animais , Genômica/métodos , Masculino , CamundongosRESUMO
A variety of inherited animal models with different genetic causes and distinct genetic backgrounds are needed to help dissect the complex genetic etiology of glaucoma. The scarcity of such animal models has hampered progress in glaucoma research. Here, we introduce a new inherited glaucoma model: the inbred mouse strain YBR/EiJ (YBR). YBR mice develop a form of pigmentary glaucoma. They exhibit a progressive age-related pigment-dispersing iris disease characterized by iris stromal atrophy. Subsequently, these mice develop elevated intraocular pressure (IOP) and glaucoma. Genetic mapping studies utilizing YBR as a glaucoma-susceptible strain and C57BL/6J as a glaucoma-resistant strain were performed to identify genetic loci responsible for the iris disease and high IOP. A recessive locus linked to Tyrp1(b) on chromosome 4 contributes to iris stromal atrophy and high IOP. However, this is not the only important locus. A recessive locus on YBR chromosome 17 causes high IOP independent of the iris stromal atrophy. In specific eyes with high IOP caused by YBR chromosome 17, the drainage angle (through which ocular fluid leaves the eye) is largely open. The YBR alleles of genes on chromosomes 4 and 17 underlie the development of high IOP and glaucoma but do so through independent mechanisms. Together, these two loci act in an additive manner to increase the susceptibility of YBR mice to the development of high IOP. The chromosome 17 locus is important not only because it causes IOP elevation in mice with largely open drainage angles but also because it exacerbates IOP elevation and glaucoma induced by pigment dispersion. Therefore, YBR mice are a valuable resource for studying the genetic etiology of IOP elevation and glaucoma, as well as for testing new treatments.
Assuntos
Cromossomos de Mamíferos/genética , Predisposição Genética para Doença , Glaucoma/genética , Glaucoma/patologia , Animais , Atrofia , Substância Própria/patologia , Modelos Animais de Doenças , Glaucoma/fisiopatologia , Pressão Intraocular , Doenças da Íris/genética , Doenças da Íris/patologia , Camundongos Endogâmicos C57BL , Degeneração Neural/patologia , Pigmentação , Locos de Características QuantitativasRESUMO
Genetic variation modulates protein expression through both transcriptional and post-transcriptional mechanisms. To characterize the consequences of natural genetic diversity on the proteome, here we combine a multiplexed, mass spectrometry-based method for protein quantification with an emerging outbred mouse model containing extensive genetic variation from eight inbred founder strains. By measuring genome-wide transcript and protein expression in livers from 192 Diversity outbred mice, we identify 2,866 protein quantitative trait loci (pQTL) with twice as many local as distant genetic variants. These data support distinct transcriptional and post-transcriptional models underlying the observed pQTL effects. Using a sensitive approach to mediation analysis, we often identified a second protein or transcript as the causal mediator of distant pQTL. Our analysis reveals an extensive network of direct protein-protein interactions. Finally, we show that local genotype can provide accurate predictions of protein abundance in an independent cohort of collaborative cross mice.
Assuntos
Variação Genética/genética , Fígado/metabolismo , Proteoma/análise , Proteoma/genética , Proteômica , Animais , Feminino , Genoma/genética , Genótipo , Masculino , Espectrometria de Massas , Camundongos , Modelos Genéticos , Mapas de Interação de Proteínas , Proteoma/biossíntese , Locos de Características Quantitativas/genética , RNA Mensageiro/análise , RNA Mensageiro/genética , Transcriptoma/genéticaRESUMO
RNA editing refers to post-transcriptional processes that alter the base sequence of RNA. Recently, hundreds of new RNA editing targets have been reported. However, the mechanisms that determine the specificity and degree of editing are not well understood. We examined quantitative variation of site-specific editing in a genetically diverse multiparent population, Diversity Outbred mice, and mapped polymorphic loci that alter editing ratios globally for C-to-U editing and at specific sites for A-to-I editing. An allelic series in the C-to-U editing enzyme Apobec1 influences the editing efficiency of Apob and 58 additional C-to-U editing targets. We identified 49 A-to-I editing sites with polymorphisms in the edited transcript that alter editing efficiency. In contrast to the shared genetic control of C-to-U editing, most of the variable A-to-I editing sites were determined by local nucleotide polymorphisms in proximity to the editing site in the RNA secondary structure. Our results indicate that RNA editing is a quantitative trait subject to genetic variation and that evolutionary constraints have given rise to distinct genetic architectures in the two canonical types of RNA editing.
Assuntos
Variação Genética , Herança Multifatorial , Locos de Características Quantitativas , Edição de RNA , Desaminase APOBEC-1 , Animais , Mapeamento Cromossômico , Citidina Desaminase/química , Citidina Desaminase/genética , Feminino , Perfilação da Expressão Gênica , Genoma , Masculino , CamundongosRESUMO
Meiotic recombination generates new genetic variation and assures the proper segregation of chromosomes in gametes. PRDM9, a zinc finger protein with histone methyltransferase activity, initiates meiotic recombination by binding DNA at recombination hotspots and directing the position of DNA double-strand breaks (DSB). The DSB repair mechanism suggests that hotspots should eventually self-destruct, yet genome-wide recombination levels remain constant, a conundrum known as the hotspot paradox. To test if PRDM9 drives this evolutionary erosion, we measured activity of the Prdm9Cst allele in two Mus musculus subspecies, M.m. castaneus, in which Prdm9Cst arose, and M.m. domesticus, into which Prdm9Cst was introduced experimentally. Comparing these two strains, we find that haplotype differences at hotspots lead to qualitative and quantitative changes in PRDM9 binding and activity. Using Mus spretus as an outlier, we found most variants affecting PRDM9Cst binding arose and were fixed in M.m. castaneus, suppressing hotspot activity. Furthermore, M.m. castaneus×M.m. domesticus F1 hybrids exhibit novel hotspots, with large haplotype biases in both PRDM9 binding and chromatin modification. These novel hotspots represent sites of historic evolutionary erosion that become activated in hybrids due to crosstalk between one parent's Prdm9 allele and the opposite parent's chromosome. Together these data support a model where haplotype-specific PRDM9 binding directs biased gene conversion at hotspots, ultimately leading to hotspot erosion.
Assuntos
Proteínas de Ligação a DNA/genética , Histona-Lisina N-Metiltransferase/genética , Recombinação Homóloga , Meiose/genética , Animais , Quebras de DNA de Cadeia Dupla , Reparo do DNA/genética , Conversão Gênica , Haplótipos , Camundongos , Motivos de NucleotídeosRESUMO
Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations.
Assuntos
Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Software , Transcriptoma , Animais , Feminino , Genoma , Masculino , Camundongos , Locos de Características QuantitativasRESUMO
Identifying genomic regions that descended from a common ancestor is important for understanding the function and evolution of genomes. In distantly related genomes, clusters of homologous gene pairs are evidence of candidate homologous regions. Demonstrating the statistical significance of such "gene clusters" is an essential component of comparative genomic analyses. However, currently there are no practical statistical tests for gene clusters that model the influence of the number of homologs in each gene family on cluster significance. In this work, we demonstrate empirically that failure to incorporate gene family size in gene cluster statistics results in overestimation of significance, leading to incorrect conclusions. We further present novel analytical methods for estimating gene cluster significance that take gene family size into account. Our methods do not require complete genome data and are suitable for testing individual clusters found in local regions, such as contigs in an unfinished assembly. We consider pairs of regions drawn from the same genome (paralogous clusters), as well as regions drawn from two different genomes (orthologous clusters). Determining cluster significance under general models of gene family size is computationally intractable. By assuming that all gene families are of equal size, we obtain analytical expressions that allow fast approximation of cluster probabilities. We evaluate the accuracy of this approximation by comparing the resulting gene cluster probabilities with cluster probabilities obtained by simulating a realistic, power-law distributed model of gene family size, with parameters inferred from genomic data. Surprisingly, despite the simplicity of the underlying assumption, our method accurately approximates the true cluster probabilities. It slightly overestimates these probabilities, yielding a conservative test. We present additional simulation results indicating the best choice of parameter values for data analysis in genomes of various sizes and illustrate the utility of our methods by applying them to gene clusters recently reported in the literature. Mathematical code to compute cluster probabilities using our methods is available as supplementary material.
Assuntos
Modelos Estatísticos , Família Multigênica/genética , Arabidopsis/genética , Evolução Molecular , Genes de Plantas , Genoma/genética , Raphanus/genética , Homologia de Sequência do Ácido NucleicoRESUMO
Gene clusters that span three or more chromosomal regions are of increasing importance, yet statistical tests to validate such clusters are in their infancy. Current approaches either conduct several pairwise comparisons or consider only the number of genes that occur in all of the regions. In this paper, we provide statistical tests for clusters spanning exactly three regions based on genome models of typical comparative genomics problems, including analysis of conserved linkage within multiple species and identification of large-scale duplications. Our tests are the first to combine evidence from genes shared among all three regions and genes shared between pairs of regions. We show that our tests of clusters spanning three regions are more sensitive than existing approaches, and can thus be used to identify more diverged homologous regions.