RESUMEN
DNA base damage is a major source of oncogenic mutations1. Such damage can produce strand-phased mutation patterns and multiallelic variation through the process of lesion segregation2. Here we exploited these properties to reveal how strand-asymmetric processes, such as replication and transcription, shape DNA damage and repair. Despite distinct mechanisms of leading and lagging strand replication3,4, we observe identical fidelity and damage tolerance for both strands. For small alkylation adducts of DNA, our results support a model in which the same translesion polymerase is recruited on-the-fly to both replication strands, starkly contrasting the strand asymmetric tolerance of bulky UV-induced adducts5. The accumulation of multiple distinct mutations at the site of persistent lesions provides the means to quantify the relative efficiency of repair processes genome wide and at single-base resolution. At multiple scales, we show DNA damage-induced mutations are largely shaped by the influence of DNA accessibility on repair efficiency, rather than gradients of DNA damage. Finally, we reveal specific genomic conditions that can actively drive oncogenic mutagenesis by corrupting the fidelity of nucleotide excision repair. These results provide insight into how strand-asymmetric mechanisms underlie the formation, tolerance and repair of DNA damage, thereby shaping cancer genome evolution.
Asunto(s)
Daño del ADN , Reparación del ADN , ADN Polimerasa Dirigida por ADN , ADN , Mutagénesis , Mutación , Animales , Humanos , Ratones , Alquilación/efectos de la radiación , Línea Celular , ADN/química , ADN/genética , ADN/metabolismo , ADN/efectos de la radiación , Aductos de ADN/química , Aductos de ADN/genética , Aductos de ADN/metabolismo , Aductos de ADN/efectos de la radiación , Daño del ADN/genética , Daño del ADN/efectos de la radiación , Reparación del ADN/genética , Reparación del ADN/fisiología , Replicación del ADN , ADN Polimerasa Dirigida por ADN/metabolismo , Mutagénesis/genética , Mutagénesis/efectos de la radiación , Mutación/genética , Mutación/efectos de la radiación , Neoplasias/genética , Transcripción Genética , Rayos Ultravioleta/efectos adversosRESUMEN
Cancers arise through the acquisition of oncogenic mutations and grow by clonal expansion1,2. Here we reveal that most mutagenic DNA lesions are not resolved into a mutated DNA base pair within a single cell cycle. Instead, DNA lesions segregate, unrepaired, into daughter cells for multiple cell generations, resulting in the chromosome-scale phasing of subsequent mutations. We characterize this process in mutagen-induced mouse liver tumours and show that DNA replication across persisting lesions can produce multiple alternative alleles in successive cell divisions, thereby generating both multiallelic and combinatorial genetic diversity. The phasing of lesions enables accurate measurement of strand-biased repair processes, quantification of oncogenic selection and fine mapping of sister-chromatid-exchange events. Finally, we demonstrate that lesion segregation is a unifying property of exogenous mutagens, including UV light and chemotherapy agents in human cells and tumours, which has profound implications for the evolution and adaptation of cancer genomes.
Asunto(s)
Segregación Cromosómica/genética , Evolución Molecular , Genoma/genética , Neoplasias/genética , Alelos , Animales , Reparación del ADN , Replicación del ADN , Receptores ErbB/metabolismo , Humanos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/patología , Masculino , Ratones , Mutación , Neoplasias/patología , Selección Genética , Transducción de Señal , Intercambio de Cromátides Hermanas , Transcripción Genética , Quinasas raf/metabolismo , Proteínas ras/metabolismoRESUMEN
Mutation in the germline is the ultimate source of genetic variation, but little is known about the influence of germline chromatin structure on mutational processes. Using ATAC-seq, we profile the open chromatin landscape of human spermatogonia, the most proliferative cell type of the germline, identifying transcription factor binding sites (TFBSs) and PRDM9 binding sites, a subset of which will initiate meiotic recombination. We observe an increase in rare structural variant (SV) breakpoints at PRDM9-bound sites, implicating meiotic recombination in the generation of structural variation. Many germline TFBSs, such as NRF1, are also associated with increased rates of SV breakpoints, apparently independent of recombination. Singleton short insertions (≥5 bp) are highly enriched at TFBSs, particularly at sites bound by testis active TFs, and their rates correlate with those of structural variant breakpoints. Short insertions often duplicate the TFBS motif, leading to clustering of motif sites near regulatory regions in this male-driven evolutionary process. Increased mutation loads at germline TFBSs disproportionately affect neural enhancers with activity in spermatogonia, potentially altering neurodevelopmental regulatory architecture. Local chromatin structure in spermatogonia is thus pervasive in shaping both evolution and disease.
Asunto(s)
Genoma Humano , Espermatogonias , Sitios de Unión , Secuenciación de Inmunoprecipitación de Cromatina , N-Metiltransferasa de Histona-Lisina/genética , Humanos , Masculino , Mutación , Espermatogonias/metabolismoRESUMEN
Human population isolates provide a snapshot of the impact of historical demographic processes on population genetics. Such data facilitate studies of the functional impact of rare sequence variants on biomedical phenotypes, as strong genetic drift can result in higher frequencies of variants that are otherwise rare. We present the first whole genome sequencing (WGS) study of the VIKING cohort, a representative collection of samples from the isolated Shetland population in northern Scotland, and explore how its genetic characteristics compare to a mainland Scottish population. Our analyses reveal the strong contributions played by the founder effect and genetic drift in shaping genomic variation in the VIKING cohort. About one tenth of all high-quality variants discovered are unique to the VIKING cohort or are seen at frequencies at least ten fold higher than in more cosmopolitan control populations. Multiple lines of evidence also suggest relaxation of purifying selection during the evolutionary history of the Shetland isolate. We demonstrate enrichment of ultra-rare VIKING variants in exonic regions and for the first time we also show that ultra-rare variants are enriched within regulatory regions, particularly promoters, suggesting that gene expression patterns may diverge relatively rapidly in human isolates.
Asunto(s)
Demografía , Variación Genética/genética , Genética de Población , Secuencias Reguladoras de Ácidos Nucleicos/genética , Regiones no Traducidas 5'/genética , Alelos , Cromatina/genética , Europa (Continente) , Exones/genética , Efecto Fundador , Flujo Genético , Estudio de Asociación del Genoma Completo , Genómica , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Regiones Promotoras Genéticas/genética , Escocia , Secuenciación Completa del GenomaRESUMEN
Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal. Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body. We find that few genes are truly 'housekeeping', whereas many mammalian promoters are composite entities composed of several closely separated TSSs, with independent cell-type-specific expression profiles. TSSs specific to different cell types evolve at different rates, whereas promoters of broadly expressed genes are the most conserved. Promoter-based expression analysis reveals key transcription factors defining cell states and links them to binding-site motifs. The functions of identified novel transcripts can be predicted by coexpression and sample ontology enrichment analyses. The functional annotation of the mammalian genome 5 (FANTOM5) project provides comprehensive expression profiles and functional annotation of mammalian cell-type-specific transcriptomes with wide applications in biomedical research.
Asunto(s)
Atlas como Asunto , Anotación de Secuencia Molecular , Regiones Promotoras Genéticas/genética , Transcriptoma/genética , Animales , Línea Celular , Células Cultivadas , Análisis por Conglomerados , Secuencia Conservada/genética , Regulación de la Expresión Génica/genética , Redes Reguladoras de Genes/genética , Genes Esenciales/genética , Genoma/genética , Humanos , Ratones , Sistemas de Lectura Abierta/genética , Especificidad de Órganos , ARN Mensajero/análisis , ARN Mensajero/genética , Factores de Transcripción/metabolismo , Sitio de Iniciación de la Transcripción , Transcripción Genética/genéticaRESUMEN
BACKGROUND: Disease relapse is the primary cause of death from ovarian carcinoma. Isolated lymph node relapse is a rare pattern of ovarian carcinoma recurrence, with a reported median postrelapse survival of 2.5 to 4 years. To date, investigations have not compared isolated lymph node relapse ovarian carcinoma directly to a matched extranodal relapse cohort or performed molecular characterization of cases that subsequently experience isolated lymph node relapse. OBJECTIVE: Here we seek to compare the clinical outcome, tumor-infiltrating lymphocyte burden, and frequency of known prognostic genomic events in isolated lymph node relapse ovarian carcinoma vs extranodal relapse ovarian carcinoma. STUDY DESIGN: Forty-nine isolated lymph node relapse ovarian carcinoma patients were identified and matched to 49 extranodal relapse cases using the Edinburgh Ovarian Cancer Database, from which the clinical data for identified patients were retrieved. Matching criteria were disease stage, histologic subtype and grade, extent of residual disease following surgical debulking, and age at diagnosis. Clinicopathologic factors and survival data were compared between the isolated lymph node relapse and extranodal relapse cohorts. Genomic characterization of tumor material from diagnosis was performed using panel-based high-throughput sequencing and tumor-infiltrating T cell burden was assessed using immunohistochemistry for CD3+ and CD8+ cells. RESULTS: Isolated lymph node relapse cases demonstrated significantly prolonged postrelapse survival and overall survival vs extranodal relapse upon multivariable analysis (HRmulti = 0.52 [0.33-0.84] and 0.51 [0.31-0.84]). Diagnostic specimens from high-grade serous ovarian carcinomas that subsequently displayed isolated lymph node relapse harbored significantly greater CD3+ and CD8+ cell infiltration compared to extranodal relapse cases (P = .001 and P = .009, Bonferroni-adjusted P = .003 and P = .019). Isolated lymph node relapse high-grade serous ovarian carcinoma cases did not show marked enrichment or depletion of cases with BRCA1/2 mutation or CCNE1 copy number gain when compared to their extranodal relapse counterparts (24.4% vs 19.4% and 18.2% vs 22.6%, P = .865 and P = .900). CONCLUSION: Isolated lymph node relapse ovarian carcinoma represents a distinct clinical entity with favorable outcome compared to extranodal relapse. There was no clear enrichment or depletion of BRCA1/2 mutation or CCNE1 gain in the isolated lymph node relapse ovarian carcinoma cohort compared with extranodal relapse cases, suggesting that these known prognostic genomically defined subtypes of disease do not display markedly altered propensity for isolated lymph node relapse. Diagnostic tumor material from isolated lymph node relapse patients demonstrated greater CD3+ and CD8+ cell infiltration, indicating stronger tumor engagement by T cell populations, which may contribute to the more indolent disease course of isolated lymph node relapse.
Asunto(s)
Carcinoma/diagnóstico , Carcinoma/patología , Ganglios Linfáticos/patología , Neoplasias Ováricas/diagnóstico , Neoplasias Ováricas/patología , Adulto , Anciano , Anciano de 80 o más Años , Biomarcadores de Tumor/genética , Carcinoma/genética , Carcinoma/inmunología , Estudios de Casos y Controles , Ciclina E/genética , Variaciones en el Número de Copia de ADN , Bases de Datos Factuales , Femenino , Genes BRCA1 , Genes BRCA2 , Humanos , Metástasis Linfática , Linfocitos Infiltrantes de Tumor , Persona de Mediana Edad , Mutación , Proteínas Oncogénicas/genética , Neoplasias Ováricas/genética , Neoplasias Ováricas/inmunología , Pronóstico , Modelos de Riesgos ProporcionalesRESUMEN
Disruption of gene regulation is known to play major roles in carcinogenesis and tumour progression. Here, we comprehensively characterize the mutational profiles of diverse transcription factor binding sites (TFBSs) across 1,574 completely sequenced cancer genomes encompassing 11 tumour types. We assess the relative rates and impact of the mutational burden at the binding sites of 81 transcription factors (TFs), by comparing the abundance and patterns of single base substitutions within putatively functional binding sites to control sites with matched sequence composition. There is a strong (1.43-fold) and significant excess of mutations at functional binding sites across TFs, and the mutations that accumulate in cancers are typically more disruptive than variants tolerated in extant human populations at the same sites. CTCF binding sites suffer an exceptionally high mutational load in cancer (3.31-fold excess) relative to control sites, and we demonstrate for the first time that this effect is seen in essentially all cancer types with sufficient data. The sub-set of CTCF sites involved in higher order chromatin structures has the highest mutational burden, suggesting a widespread breakdown of chromatin organization. However, we find no evidence for selection driving these distinctive patterns of mutation. The mutational load at CTCF-binding sites is substantially determined by replication timing and the mutational signature of the tumor in question, suggesting that selectively neutral processes underlie the unusual mutation patterns. Pervasive hyper-mutation within transcription factor binding sites rewires the regulatory landscape of the cancer genome, but it is dominated by mutational processes rather than selection.
Asunto(s)
Neoplasias/genética , Proteínas Represoras/genética , Factores de Transcripción/genética , Sitios de Unión/genética , Factor de Unión a CCCTC , Carcinogénesis/genética , Regulación Neoplásica de la Expresión Génica , Genoma Humano , Humanos , Mutación/genética , Neoplasias/metabolismo , Unión Proteica , Secuencias Reguladoras de Ácidos Nucleicos , Proteínas Represoras/metabolismo , Factores de Transcripción/metabolismoRESUMEN
BACKGROUND: Approximately 10-15% of ovarian carcinomas (OC) are attributed to inherited susceptibility, the majority of which are due to mutations in BRCA1 or BRCA2 (BRCA1/2). These patients display superior clinical outcome, including enhanced sensitivity to platinum-based chemotherapy. Here, we seek to investigate whether BRCA1/2 status influences the response rate to single-agent pegylated liposomal doxorubicin (PLD) in high grade serous (HGS) OC. METHODS: One hundred and forty-eight patients treated with single-agent PLD were identified retrospectively from the Edinburgh Ovarian Cancer Database. DNA was extracted from formalin-fixed paraffin-embedded (FFPE) archival tumour material and sequenced using the Ion Ampliseq BRCA1 and BRCA2 panel. A minimum variant allele frequency threshold was applied to correct for sequencing artefacts associated with formalin fixation. RESULTS: A superior response rate to PLD was observed in patients with HGS OC who harboured variants likely to affect BRCA1 or BRCA2 function compared to the BRCA1/2 wild-type population (36%, 9 of 25 patients versus 12.1%, 7 of 58 patients; p = 0.016). An enhanced response rate was also seen in patients harbouring only the BRCA1 SNP rs1799950, predicted to be detrimental to BRCA1 function (50%, 3 of 6 patients versus 12.1%, 7 of 58 patients; p = 0.044). CONCLUSIONS: These data demonstrate that HGS OC patients with BRCA1/2 variants predicted damaging to protein function experience superior sensitivity to PLD, consistent with impaired DNA repair. Further characterisation of rs1799950 is now warranted in relation to chemosensitivity and susceptibility to developing ovarian carcinoma.
Asunto(s)
Antibióticos Antineoplásicos/uso terapéutico , Proteína BRCA1/genética , Proteína BRCA2/genética , Cistadenocarcinoma Seroso/tratamiento farmacológico , Doxorrubicina/análogos & derivados , Mutación , Neoplasias Ováricas/tratamiento farmacológico , Adulto , Anciano , Anciano de 80 o más Años , Biomarcadores de Tumor/genética , Cistadenocarcinoma Seroso/genética , Cistadenocarcinoma Seroso/patología , Doxorrubicina/uso terapéutico , Femenino , Estudios de Seguimiento , Predisposición Genética a la Enfermedad , Humanos , Persona de Mediana Edad , Clasificación del Tumor , Neoplasias Ováricas/genética , Neoplasias Ováricas/patología , Polietilenglicoles/uso terapéutico , Polimorfismo de Nucleótido Simple , Estudios Retrospectivos , Tasa de SupervivenciaRESUMEN
Homozygous loss of function (HLOF) variants provide a valuable window on gene function in humans, as well as an inventory of the human genes that are not essential for survival and reproduction. All humans carry at least a few HLOF variants, but the exact number of inactivated genes that can be tolerated is currently unknownas are the phenotypic effects of losing function for most human genes. Here, we make use of 1432 whole exome sequences from five European populations to expand the catalogue of known human HLOF mutations; after stringent filtering of variants in our dataset, we identify a total of 173 HLOF mutations, 76 (44%) of which have not been observed previously. We find that population isolates are particularly well suited to surveys of novel HLOF genes because individuals in such populations carry extensive runs of homozygosity, which we show are enriched for novel, rare HLOF variants. Further, we make use of extensive phenotypic data to show that most HLOFs, ascertained in population-based samples, appear to have little detectable effect on the phenotype. On the contrary, we document several genes directly implicated in disease that seem to tolerate HLOF variants. Overall HLOF genes are enriched for olfactory receptor function and are expressed in testes more often than expected, consistent with reduced purifying selection and incipient pseudogenisation.
Asunto(s)
Mutación , Población Blanca/genética , Exoma , Frecuencia de los Genes , Homocigoto , Humanos , Fenotipo , Selección GenéticaRESUMEN
Human serum uric acid concentration (SUA) is a complex trait. A recent meta-analysis of multiple genome-wide association studies (GWAS) identified 28 loci associated with SUA jointly explaining only 7.7% of the SUA variance, with 3.4% explained by two major loci (SLC2A9 and ABCG2). Here we examined whether gene-gene interactions had any roles in regulating SUA using two large GWAS cohorts included in the meta-analysis [the Atherosclerosis Risk in Communities study cohort (ARIC) and the Framingham Heart Study cohort (FHS)]. We found abundant genome-wide significant local interactions in ARIC in the 4p16.1 region located mostly in an intergenic area near SLC2A9 that were not driven by linkage disequilibrium and were replicated in FHS. Taking the forward selection approach, we constructed a model of five SNPs with marginal effects and three epistatic SNP pairs in ARIC-three marginal SNPs were located within SLC2A9 and the remaining SNPs were all located in the nearby intergenic area. The full model explained 1.5% more SUA variance than that explained by the lead SNP alone, but only 0.3% was contributed by the marginal and epistatic effects of the SNPs in the intergenic area. Functional analysis revealed strong evidence that the epistatically interacting SNPs in the intergenic area were unusually enriched at enhancers active in ENCODE hepatic (HepG2, P = 4.7E-05) and precursor red blood (K562, P = 5.0E-06) cells, putatively regulating transcription of WDR1 and SLC2A9. These results suggest that exploring epistatic interactions is valuable in uncovering the complex functional mechanisms underlying the 4p16.1 region.
Asunto(s)
Cromosomas Humanos Par 4 , Epistasis Genética , Proteínas Facilitadoras del Transporte de la Glucosa/genética , Carácter Cuantitativo Heredable , Ácido Úrico/sangre , Línea Celular , Biología Computacional , Elementos de Facilitación Genéticos , Femenino , Estudio de Asociación del Genoma Completo , Genómica , Proteínas Facilitadoras del Transporte de la Glucosa/metabolismo , Humanos , Masculino , Modelos Genéticos , Modelos Estadísticos , Polimorfismo de Nucleótido Simple , Sitios de Carácter CuantitativoRESUMEN
Mammalian chromosomes fold into arrays of megabase-sized topologically associating domains (TADs), which are arranged into compartments spanning multiple megabases of genomic DNA. TADs have internal substructures that are often cell type specific, but their higher-order organization remains elusive. Here, we investigate TAD higher-order interactions with Hi-C through neuronal differentiation and show that they form a hierarchy of domains-within-domains (metaTADs) extending across genomic scales up to the range of entire chromosomes. We find that TAD interactions are well captured by tree-like, hierarchical structures irrespective of cell type. metaTAD tree structures correlate with genetic, epigenomic and expression features, and structural tree rearrangements during differentiation are linked to transcriptional state changes. Using polymer modelling, we demonstrate that hierarchical folding promotes efficient chromatin packaging without the loss of contact specificity, highlighting a role far beyond the simple need for packing efficiency.
Asunto(s)
Cromatina/química , Cromosomas/química , Células Madre Embrionarias de Ratones/citología , Neuronas/citología , Transcripción Genética , Animales , Diferenciación Celular , Células Cultivadas , Ensamble y Desensamble de Cromatina , Epigénesis Genética , Regulación de la Expresión Génica , RatonesRESUMEN
The immediate-early response mediates cell fate in response to a variety of extracellular stimuli and is dysregulated in many cancers. However, the specificity of the response across stimuli and cell types, and the roles of non-coding RNAs are not well understood. Using a large collection of densely-sampled time series expression data we have examined the induction of the immediate-early response in unparalleled detail, across cell types and stimuli. We exploit cap analysis of gene expression (CAGE) time series datasets to directly measure promoter activities over time. Using a novel analysis method for time series data we identify transcripts with expression patterns that closely resemble the dynamics of known immediate-early genes (IEGs) and this enables a comprehensive comparative study of these genes and their chromatin state. Surprisingly, these data suggest that the earliest transcriptional responses often involve promoters generating non-coding RNAs, many of which are produced in advance of canonical protein-coding IEGs. IEGs are known to be capable of induction without de novo protein synthesis. Consistent with this, we find that the response of both protein-coding and non-coding RNA IEGs can be explained by their transcriptionally poised, permissive chromatin state prior to stimulation. We also explore the function of non-coding RNAs in the attenuation of the immediate early response in a small RNA sequencing dataset matched to the CAGE data: We identify a novel set of microRNAs responsible for the attenuation of the IEG response in an estrogen receptor positive cancer cell line. Our computational statistical method is well suited to meta-analyses as there is no requirement for transcripts to pass thresholds for significant differential expression between time points, and it is agnostic to the number of time points per dataset.
Asunto(s)
Proteínas Inmediatas-Precoces/genética , ARN no Traducido/genética , Transcripción Genética/genética , Biología Computacional , Humanos , Proteínas Inmediatas-Precoces/metabolismo , Cinética , Células MCF-7 , MicroARNs/genética , MicroARNs/metabolismo , Modelos Estadísticos , ARN no Traducido/metabolismoRESUMEN
We recently found that hnRNP A1, a protein implicated in many aspects of RNA processing, acts as an auxiliary factor for the Drosha-mediated processing of a microRNA precursor, pri-miR-18a. Here, we provide the mechanism by which hnRNP A1 regulates this event. We show that hnRNP A1 binds to the loop of pri-miR-18a and induces a relaxation at the stem, creating a more favorable cleavage site for Drosha. We found that approximately 14% of all pri-miRNAs have highly conserved loops, which we predict act as landing pads for trans-acting factors influencing miRNA processing. In agreement, we show that 2'O-methyl oligonucleotides targeting conserved loops (LooptomiRs) abolish miRNA processing in vitro. Furthermore, we present evidence to support an essential role of conserved loops for pri-miRNA processing. Altogether, these data suggest the existence of auxiliary factors for the processing of specific miRNAs, revealing an additional level of complexity for the regulation of miRNA biogenesis.
Asunto(s)
Secuencia Conservada , MicroARNs/genética , Procesamiento Postranscripcional del ARN/genética , Secuencia de Bases , Sitios de Unión , Huella de ADN , Genoma Humano , Ribonucleoproteína Nuclear Heterogénea A1 , Ribonucleoproteína Heterogénea-Nuclear Grupo A-B/química , Ribonucleoproteína Heterogénea-Nuclear Grupo A-B/genética , Ribonucleoproteína Heterogénea-Nuclear Grupo A-B/metabolismo , Humanos , MicroARNs/química , Modelos Moleculares , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , Edición de ARN , Interferencia de ARN , ARN Mensajero/genéticaRESUMEN
Evolutionary change in gene expression is generally considered to be a major driver of phenotypic differences between species. We investigated innate immune diversification by analyzing interspecies differences in the transcriptional responses of primary human and mouse macrophages to the Toll-like receptor (TLR)-4 agonist lipopolysaccharide (LPS). By using a custom platform permitting cross-species interrogation coupled with deep sequencing of mRNA 5' ends, we identified extensive divergence in LPS-regulated orthologous gene expression between humans and mice (24% of orthologues were identified as "divergently regulated"). We further demonstrate concordant regulation of human-specific LPS target genes in primary pig macrophages. Divergently regulated orthologues were enriched for genes encoding cellular "inputs" such as cell surface receptors (e.g., TLR6, IL-7Rα) and functional "outputs" such as inflammatory cytokines/chemokines (e.g., CCL20, CXCL13). Conversely, intracellular signaling components linking inputs to outputs were typically concordantly regulated. Functional consequences of divergent gene regulation were confirmed by showing LPS pretreatment boosts subsequent TLR6 responses in mouse but not human macrophages, in keeping with mouse-specific TLR6 induction. Divergently regulated genes were associated with a large dynamic range of gene expression, and specific promoter architectural features (TATA box enrichment, CpG island depletion). Surprisingly, regulatory divergence was also associated with enhanced interspecies promoter conservation. Thus, the genes controlled by complex, highly conserved promoters that facilitate dynamic regulation are also the most susceptible to evolutionary change.
Asunto(s)
Perfilación de la Expresión Génica , Variación Genética , Macrófagos/metabolismo , Receptor Toll-Like 4/genética , Animales , Línea Celular , Células Cultivadas , Quimiocina CCL20/genética , Quimiocina CXCL13/genética , Evolución Molecular , Femenino , Regulación de la Expresión Génica/efectos de los fármacos , Interacciones Huésped-Patógeno , Humanos , Lipopolisacáridos/farmacología , Macrófagos/efectos de los fármacos , Macrófagos/microbiología , Masculino , Ratones , Ratones Endogámicos BALB C , Ratones Endogámicos C57BL , Ratones Noqueados , Análisis de Secuencia por Matrices de Oligonucleótidos , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Salmonella typhimurium/fisiología , Especificidad de la Especie , Porcinos , Receptor Toll-Like 4/agonistasRESUMEN
Mammalian promoters can be separated into two classes, conserved TATA box-enriched promoters, which initiate at a well-defined site, and more plastic, broad and evolvable CpG-rich promoters. We have sequenced tags corresponding to several hundred thousand transcription start sites (TSSs) in the mouse and human genomes, allowing precise analysis of the sequence architecture and evolution of distinct promoter classes. Different tissues and families of genes differentially use distinct types of promoters. Our tagging methods allow quantitative analysis of promoter usage in different tissues and show that differentially regulated alternative TSSs are a common feature in protein-coding genes and commonly generate alternative N termini. Among the TSSs, we identified new start sites associated with the majority of exons and with 3' UTRs. These data permit genome-scale identification of tissue-specific promoters and analysis of the cis-acting elements associated with them.
Asunto(s)
Evolución Molecular , Regiones Promotoras Genéticas , Regiones no Traducidas 3' , Animales , Secuencia de Bases , ADN , Genoma , Proteoma , TATA BoxRESUMEN
In this study we investigated the strengths and modes of selection associated with nucleosome positioning in the human lineage through the comparison of interspecies and intraspecies rates of divergence. We identify significant evidence for both positive and negative selection linked to human nucleosome positioning for the first time, implicating a widespread and important role for DNA sequence in the location of well-positioned nucleosomes. Selection appears to be acting on particular base substitutions to maintain optimum GC compositions in core and linker regions, with, e.g., unexpectedly elevated rates of CâT substitutions during recent human evolution at linker regions 60-90 bp from the nucleosome dyad but significant depletion of the same substitutions within nucleosome core regions. These patterns are strikingly consistent with the known relationships between genomic sequence composition and nucleosome assembly. By stratifying nucleosomes according to the GC content of their genomic neighborhood, we also show that the strength and direction of selection detected is dictated by local GC content. Intriguingly these signatures of selection are not restricted to nucleosomes in close proximity to exons, suggesting the correct positioning of nucleosomes is not only important in and around coding regions. This analysis provides strong evidence that the genomic sequences associated with nucleosomes are not evolving neutrally, and suggests that underlying DNA sequence is an important factor in nucleosome positioning. Recent signatures of selection linked to genomic features as ubiquitous as the nucleosome have important implications for human genome evolution and disease.
Asunto(s)
Nucleosomas/genética , Nucleosomas/metabolismo , Composición de Base , Ensamble y Desensamble de Cromatina , Evolución Molecular , Histonas/metabolismo , Humanos , Polimorfismo de Nucleótido Simple , Selección GenéticaRESUMEN
Several recent studies have examined different aspects of mammalian higher order chromatin structure - replication timing, lamina association and Hi-C inter-locus interactions - and have suggested that most of these features of genome organisation are conserved over evolution. However, the extent of evolutionary divergence in higher order structure has not been rigorously measured across the mammalian genome, and until now little has been known about the characteristics of any divergent loci present. Here, we generate a dataset combining multiple measurements of chromatin structure and organisation over many embryonic cell types for both human and mouse that, for the first time, allows a comprehensive assessment of the extent of structural divergence between mammalian genomes. Comparison of orthologous regions confirms that all measurable facets of higher order structure are conserved between human and mouse, across the vast majority of the detectably orthologous genome. This broad similarity is observed in spite of many loci possessing cell type specific structures. However, we also identify hundreds of regions (from 100 Kb to 2.7 Mb in size) showing consistent evidence of divergence between these species, constituting at least 10% of the orthologous mammalian genome and encompassing many hundreds of human and mouse genes. These regions show unusual shifts in human GC content, are unevenly distributed across both genomes, and are enriched in human subtelomeric regions. Divergent regions are also relatively enriched for genes showing divergent expression patterns between human and mouse ES cells, implying these regions cause divergent regulation. Particular divergent loci are strikingly enriched in genes implicated in vertebrate development, suggesting important roles for structural divergence in the evolution of mammalian developmental programmes. These data suggest that, though relatively rare in the mammalian genome, divergence in higher order chromatin structure has played important roles during evolution.
Asunto(s)
Cromatina/química , Células Madre Embrionarias/citología , Regulación del Desarrollo de la Expresión Génica , Animales , Análisis por Conglomerados , Biología Computacional/métodos , Bases de Datos Genéticas , Evolución Molecular , Genoma , Genoma Humano , Humanos , Ratones , Telómero/ultraestructuraRESUMEN
Genome-wide association studies (GWAS) have discovered many loci associated with common disease and quantitative traits. However, most GWAS have not studied the gene-gene interactions (epistasis) that could be important in complex trait genetics. A major challenge in analysing epistasis in GWAS is the enormous computational demands of analysing billions of SNP combinations. Several methods have been developed recently to address this, some using computers equipped with particular graphical processing units, most restricted to binary disease traits and all poorly suited to general usage on the most widely used operating systems. We have developed the BiForce Toolbox to address the demand for high-throughput analysis of pairwise epistasis in GWAS of quantitative and disease traits across all commonly used computer systems. BiForce Toolbox is a stand-alone Java program that integrates bitwise computing with multithreaded parallelization and thus allows rapid full pairwise genome scans via a graphical user interface or the command line. Furthermore, BiForce Toolbox incorporates additional tests of interactions involving SNPs with significant marginal effects, potentially increasing the power of detection of epistasis. BiForce Toolbox is easy to use and has been applied in multiple studies of epistasis in large GWAS data sets, identifying interesting interaction signals and pathways.
Asunto(s)
Epistasis Genética , Estudio de Asociación del Genoma Completo , Programas Informáticos , Genómica/métodos , Internet , Polimorfismo de Nucleótido SimpleRESUMEN
Advances in protein structure determination and modeling allow us to study the structural context of human genetic variants on an unprecedented scale. Here, we analyze millions of cancer-associated missense mutations based on their structural locations and predicted perturbative effects. By considering the collective properties of mutations at the level of individual proteins, we identify distinct patterns associated with tumor suppressors and oncogenes. Tumor suppressors are enriched in structurally damaging mutations, consistent with loss-of-function mechanisms, while oncogene mutations tend to be structurally mild, reflecting selection for gain-of-function driver mutations and against loss-of-function mutations. Although oncogenes are difficult to distinguish from genes with no role in cancer using only structural damage, we find that the three-dimensional clustering of mutations is highly predictive. These observations allow us to identify candidate driver genes and speculate about their molecular roles, which we expect will have general utility in the analysis of cancer sequencing data.
RESUMEN
It is unclear how patterns of regional genetic differentiation in the UK and Ireland might impact the protein-coding fraction of the genome. We exploit UK Biobank (UKB) and Viking Genes whole exome sequencing data to study regional genetic differentiation across the UK and Ireland in protein coding genes, encompassing 44,696 unrelated individuals from 20 regions of origin. We demonstrate substantial exonic differentiation among Shetlanders, Orcadians, individuals with full or partial Ashkenazi Jewish ancestry and in several mainland regions (particularly north and south Wales, southeast Scotland and Ireland). With stringent filtering criteria, we find 67 regionally enriched (≥5-fold) variants likely to have adverse biomedical consequences in homozygous individuals. Here, we show that regional genetic variation across the UK and Ireland should be considered in the design of genetic studies and may inform effective genetic screening and counselling.