RESUMEN
The dichotomous model of "drivers" and "passengers" in cancer posits that only a few mutations in a tumor strongly affect its progression, with the remaining ones being inconsequential. Here, we leveraged the comprehensive variant dataset from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) project to demonstrate that-in addition to the dichotomy of high- and low-impact variants-there is a third group of medium-impact putative passengers. Moreover, we also found that molecular impact correlates with subclonal architecture (i.e., early versus late mutations), and different signatures encode for mutations with divergent impact. Furthermore, we adapted an additive-effects model from complex-trait studies to show that the aggregated effect of putative passengers, including undetected weak drivers, provides significant additional power (â¼12% additive variance) for predicting cancerous phenotypes, beyond PCAWG-identified driver mutations. Finally, this framework allowed us to estimate the frequency of potential weak-driver mutations in PCAWG samples lacking any well-characterized driver alterations.
Asunto(s)
Genoma Humano/genética , Genómica/métodos , Mutación/genética , Neoplasias/genética , Análisis Mutacional de ADN/métodos , Progresión de la Enfermedad , Humanos , Neoplasias/patología , Secuenciación Completa del GenomaRESUMEN
A dichotomous choice for metazoan cells is between proliferation and differentiation. Measuring tRNA pools in various cell types, we found two distinct subsets, one that is induced in proliferating cells, and repressed otherwise, and another with the opposite signature. Correspondingly, we found that genes serving cell-autonomous functions and genes involved in multicellularity obey distinct codon usage. Proliferation-induced and differentiation-induced tRNAs often carry anticodons that correspond to the codons enriched among the cell-autonomous and the multicellularity genes, respectively. Because mRNAs of cell-autonomous genes are induced in proliferation and cancer in particular, the concomitant induction of their codon-enriched tRNAs suggests coordination between transcription and translation. Histone modifications indeed change similarly in the vicinity of cell-autonomous genes and their corresponding tRNAs, and in multicellularity genes and their tRNAs, suggesting the existence of transcriptional programs coordinating tRNA supply and demand. Hence, we describe the existence of two distinct translation programs that operate during proliferation and differentiation.
Asunto(s)
Diferenciación Celular , Proliferación Celular , Biosíntesis de Proteínas , ARN de Transferencia/genética , Anticodón , Línea Celular Tumoral , Transformación Celular Neoplásica , Codón , Histonas/metabolismo , Humanos , Neoplasias/genética , ARN Mensajero/metabolismo , ARN de Transferencia/química , ARN de Transferencia/metabolismo , TranscriptomaRESUMEN
Treatment resistance remains a major issue in aggressive prostate cancer (PC), and novel genomic biomarkers may guide better treatment selection. Circulating tumor DNA (ctDNA) can provide minimally invasive information about tumor genomes, but the genomic landscape of aggressive PC based on whole-genome sequencing (WGS) of ctDNA remains incompletely characterized. Thus, we here performed WGS of tumor tissue (n = 31) or plasma ctDNA (n = 10) from a total of 41 aggressive PC patients, including 11 hormone-naïve, 15 hormone-sensitive, and 15 castration-resistant patients. Across all variant types, we found progressively more altered tumor genomic profiles in later stages of aggressive PC. The potential driver genes most frequently affected by single-nucleotide variants or insertions/deletions included the known PC-related genes TP53, CDK12, and PTEN and the novel genes COL13A1, KCNH3, and SENP3. Etiologically, aggressive PC was associated with age-related and DNA repair-related mutational signatures. Copy number variants most frequently affected 14q11.2 and 8p21.2, where no well-recognized PC-related genes are located, and also frequently affected regions near the known PC-related genes MYC, AR, TP53, PTEN, and BRCA1. Structural variants most frequently involved not only the known PC-related genes TMPRSS2 and ERG but also the less extensively studied gene in this context, PTPRD. Finally, clinically actionable variants were detected throughout all stages of aggressive PC and in both plasma and tissue samples, emphasizing the potential clinical applicability of WGS of minimally invasive plasma samples. Overall, our study highlights the feasibility of using liquid biopsies for comprehensive genomic characterization as an alternative to tissue biopsies in advanced/aggressive PC.
Asunto(s)
Biomarcadores de Tumor , ADN Tumoral Circulante , Neoplasias de la Próstata , Secuenciación Completa del Genoma , Humanos , Masculino , Secuenciación Completa del Genoma/métodos , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/patología , Anciano , Biopsia Líquida/métodos , ADN Tumoral Circulante/genética , ADN Tumoral Circulante/sangre , Persona de Mediana Edad , Biomarcadores de Tumor/genética , Variaciones en el Número de Copia de ADN , Mutación , Anciano de 80 o más Años , Genómica/métodosRESUMEN
Circulating tumor DNA (ctDNA) is a promising biomarker, reflecting the presence of tumor cells. Sequencing-based detection of ctDNA at low tumor fractions is challenging due to the crude error rate of sequencing. To mitigate this challenge, we developed ultra-deep mutation-integrated sequencing (UMIseq), a fixed-panel deep targeted sequencing approach, which is universally applicable to all colorectal cancer (CRC) patients. UMIseq features UMI-mediated error correction, the exclusion of mutations related to clonal hematopoiesis, a panel of normal samples for error modeling, and signal integration from single-nucleotide variations, insertions, deletions, and phased mutations. UMIseq was trained and independently validated on pre-operative (pre-OP) plasma from CRC patients (n = 364) and healthy individuals (n = 61). UMIseq displayed an area under the curve surpassing 0.95 for allele frequencies (AFs) down to 0.05%. In the training cohort, the pre-OP detection rate reached 80% at 95% specificity, while it was 70% in the validation cohort. UMIseq enabled the detection of AFs down to 0.004%. To assess the potential for detection of residual disease, 26 post-operative plasma samples from stage III CRC patients were analyzed. From this we found that the detection of ctDNA was associated with recurrence. In conclusion, UMIseq demonstrated robust performance with high sensitivity and specificity, enabling the detection of ctDNA at low allele frequencies.
Asunto(s)
Biomarcadores de Tumor , ADN Tumoral Circulante , Neoplasias Colorrectales , Secuenciación de Nucleótidos de Alto Rendimiento , Mutación , Humanos , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/sangre , Neoplasias Colorrectales/diagnóstico , ADN Tumoral Circulante/genética , ADN Tumoral Circulante/sangre , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Masculino , Femenino , Biomarcadores de Tumor/sangre , Biomarcadores de Tumor/genética , Anciano , Persona de Mediana Edad , Adulto , Frecuencia de los Genes , Anciano de 80 o más Años , Ácidos Nucleicos Libres de Células/genética , Ácidos Nucleicos Libres de Células/sangre , Sensibilidad y EspecificidadRESUMEN
Nucleotide excision repair (NER) is one of the main DNA repair pathways that protect cells against genomic damage. Disruption of this pathway can contribute to the development of cancer and accelerate aging. Mutational characteristics of NER-deficiency may reveal important diagnostic opportunities, as tumors deficient in NER are more sensitive to certain treatments. Here, we analyzed the genome-wide somatic mutational profiles of adult stem cells (ASCs) from NER-deficient Ercc1 -/Δ mice. Our results indicate that NER-deficiency increases the base substitution load twofold in liver but not in small intestinal ASCs, which coincides with the tissue-specific aging pathology observed in these mice. Moreover, NER-deficient ASCs of both tissues show an increased contribution of Signature 8 mutations, which is a mutational pattern with unknown etiology that is recurrently observed in various cancer types. The scattered genomic distribution of the base substitutions indicates that deficiency of global-genome NER (GG-NER) underlies the observed mutational consequences. In line with this, we observe increased Signature 8 mutations in a GG-NER-deficient human organoid culture, in which XPC was deleted using CRISPR-Cas9 gene-editing. Furthermore, genomes of NER-deficient breast tumors show an increased contribution of Signature 8 mutations compared with NER-proficient tumors. Elevated levels of Signature 8 mutations could therefore contribute to a predictor of NER-deficiency based on a patient's mutational profile.
Asunto(s)
Reparación del ADN/genética , Mutación , Neoplasias/genética , Células Madre Adultas , Animales , Neoplasias de la Mama/genética , Estudios de Cohortes , Análisis Mutacional de ADN , ADN de Neoplasias , Proteínas de Unión al ADN/genética , Endonucleasas/genética , Femenino , Humanos , Ratones , Organoides , Técnicas de Cultivo de Tejidos , Secuenciación Completa del GenomaRESUMEN
47,XXX (triple X) and Turner syndrome (45,X) are sex chromosomal abnormalities with detrimental effects on health with increased mortality and morbidity. In karyotypical normal females, X-chromosome inactivation balances gene expression between sexes and upregulation of the X chromosome in both sexes maintain stoichiometry with the autosomes. In 47,XXX and Turner syndrome a gene dosage imbalance may ensue from increased or decreased expression from the genes that escape X inactivation, as well as from incomplete X chromosome inactivation in 47,XXX. We aim to study genome-wide DNA-methylation and RNA-expression changes can explain phenotypic traits in 47,XXX syndrome. We compare DNA-methylation and RNA-expression data derived from white blood cells of seven women with 47,XXX syndrome, with data from seven female controls, as well as with seven women with Turner syndrome (45,X). To address these questions, we explored genome-wide DNA-methylation and transcriptome data in blood from seven females with 47,XXX syndrome, seven females with Turner syndrome, and seven karyotypically normal females (46,XX). Based on promoter methylation, we describe a demethylation of six X-chromosomal genes (AMOT, HTR2C, IL1RAPL2, STAG2, TCEANC, ZNF673), increased methylation for GEMIN8, and four differentially methylated autosomal regions related to four genes (SPEG, MUC4, SP6, and ZNF492). We illustrate how these changes seem compensated at the transcriptome level although several genes show differential exon usage. In conclusion, our results suggest an impact of the supernumerary X chromosome in 47,XXX syndrome on the methylation status of selected genes despite an overall comparable expression profile.
Asunto(s)
Metilación de ADN/genética , Trastornos de los Cromosomas Sexuales del Desarrollo Sexual/genética , Transcriptoma/genética , Trisomía/genética , Síndrome de Turner/genética , Angiomotinas , Proteínas de Ciclo Celular/genética , Cromosomas Humanos X/genética , Epigénesis Genética/genética , Femenino , Dosificación de Gen/genética , Regulación de la Expresión Génica/genética , Genes Ligados a X/genética , Humanos , Péptidos y Proteínas de Señalización Intercelular/genética , Proteína Accesoria del Receptor de Interleucina-1/genética , Masculino , Proteínas de Microfilamentos/genética , Receptor de Serotonina 5-HT2C/genética , Aberraciones Cromosómicas Sexuales , Trastornos de los Cromosomas Sexuales del Desarrollo Sexual/patología , Trisomía/patología , Síndrome de Turner/patología , Inactivación del Cromosoma X/genéticaRESUMEN
Motivation: Understanding the mutational processes that act during cancer development is a key topic of cancer biology. Nevertheless, much remains to be learned, as a complex interplay of processes with dependencies on a range of genomic features creates highly heterogeneous cancer genomes. Accurate driver detection relies on unbiased models of the mutation rate that also capture rate variation from uncharacterized sources. Results: Here, we analyse patterns of observed-to-expected mutation counts across 505 whole cancer genomes, and find that genomic features missing from our mutation-rate model likely operate on a megabase length scale. We extend our site-specific model of the mutation rate to include the additional variance from these sources, which leads to robust significance evaluation of candidate cancer drivers. We thus present ncdDetect v.2, with greatly improved cancer driver detection specificity. Finally, we show that ranking candidates by their posterior mean value of their effect sizes offers an equivalent and more computationally efficient alternative to ranking by their P-values. Availability and implementation: ncdDetect v.2 is implemented as an R-package and is freely available at http://github.com/TobiasMadsen/ncdDetect2. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Modelos Genéticos , Tasa de Mutación , Neoplasias/genética , Biología Computacional , Genómica , Humanos , Programas InformáticosRESUMEN
DNA methylation and gene expression are interdependent and both implicated in cancer development and progression, with many individual biomarkers discovered. A joint analysis of the two data types can potentially lead to biological insights that are not discoverable with separate analyses. To optimally leverage the joint data for identifying perturbed genes and classifying clinical cancer samples, it is important to accurately model the interactions between the two data types. Here, we present EBADIMEX for jointly identifying differential expression and methylation and classifying samples. The moderated t-test widely used with empirical Bayes priors in current differential expression methods is generalised to a multivariate setting by developing: (1) a moderated Welch t-test for equality of means with unequal variances; (2) a moderated F-test for equality of variances; and (3) a multivariate test for equality of means with equal variances. This leads to parametric models with prior distributions for the parameters, which allow fast evaluation and robust analysis of small data sets. EBADIMEX is demonstrated on simulated data as well as a large breast cancer (BRCA) cohort from TCGA. We show that the use of empirical Bayes priors and moderated tests works particularly well on small data sets.
Asunto(s)
Teorema de Bayes , Biología Computacional/métodos , Metilación de ADN , Epigenómica/métodos , Perfilación de la Expresión Génica/métodos , Algoritmos , Bases de Datos Genéticas , Regulación de la Expresión Génica , Humanos , Modelos Estadísticos , Reproducibilidad de los Resultados , TranscriptomaRESUMEN
BACKGROUND: Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. RESULTS: To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. CONCLUSION: We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.
Asunto(s)
Genoma Humano , Modelos Genéticos , Tasa de Mutación , Mutación/genética , Neoplasias/genética , Bases de Datos Genéticas , Epigenómica , Humanos , Polimorfismo de Nucleótido Simple/genética , Análisis de RegresiónRESUMEN
BACKGROUND: Factor graphs provide a flexible and general framework for specifying probability distributions. They can capture a range of popular and recent models for analysis of both genomics data as well as data from other scientific fields. Owing to the ever larger data sets encountered in genomics and the multiple-testing issues accompanying them, accurate significance evaluation is of great importance. We here address the problem of evaluating statistical significance of observations from factor graph models. RESULTS: Two novel numerical approximations for evaluation of statistical significance are presented. First a method using importance sampling. Second a saddlepoint approximation based method. We develop algorithms to efficiently compute the approximations and compare them to naive sampling and the normal approximation. The individual merits of the methods are analysed both from a theoretical viewpoint and with simulations. A guideline for choosing between the normal approximation, saddle-point approximation and importance sampling is also provided. Finally, the applicability of the methods is demonstrated with examples from cancer genomics, motif-analysis and phylogenetics. CONCLUSIONS: The applicability of saddlepoint approximation and importance sampling is demonstrated on known models in the factor graph framework. Using the two methods we can substantially improve computational cost without compromising accuracy. This contribution allows analyses of large datasets in the general factor graph framework.
Asunto(s)
Algoritmos , Biología Computacional/métodos , Modelos Teóricos , Secuencia de Aminoácidos , Factor de Unión a CCCTC , Genómica , Humanos , Células MCF-7 , Neoplasias/diagnóstico , Neoplasias/genética , Filogenia , Probabilidad , Dominios y Motivos de Interacción de Proteínas , Proteínas Represoras , Alineación de SecuenciaRESUMEN
The first epigenomes from archaic hominins (AH) and ancient anatomically modern humans (AMH) have recently been characterized, based, however, on a limited number of samples. The extent to which ancient genome-wide epigenetic landscapes can be reconstructed thus remains contentious. Here, we present epiPALEOMIX, an open-source and user-friendly pipeline that exploits post-mortem DNA degradation patterns to reconstruct ancient methylomes and nucleosome maps from shotgun and/or capture-enrichment data. Applying epiPALEOMIX to the sequence data underlying 35 ancient genomes including AMH, AH, equids and aurochs, we investigate the temporal, geographical and preservation range of ancient epigenetic signatures. We first assess the quality of inferred ancient epigenetic signatures within well-characterized genomic regions. We find that tissue-specific methylation signatures can be obtained across a wider range of DNA preparation types than previously thought, including when no particular experimental procedures have been used to remove deaminated cytosines prior to sequencing. We identify a large subset of samples for which DNA associated with nucleosomes is protected from post-mortem degradation, and nucleosome positioning patterns can be reconstructed. Finally, we describe parameters and conditions such as DNA damage levels and sequencing depth that limit the preservation of epigenetic signatures in ancient samples. When such conditions are met, we propose that epigenetic profiles of CTCF binding regions can be used to help data authentication. Our work, including epiPALEOMIX, opens for further investigations of ancient epigenomes through time especially aimed at tracking possible epigenetic changes during major evolutionary, environmental, socioeconomic, and cultural shifts.
Asunto(s)
Metilación de ADN , ADN Antiguo/análisis , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Nucleosomas/genética , Análisis de Secuencia de ADN/métodos , Ensamble y Desensamble de Cromatina , Simulación por Computador , Citosina/metabolismo , ADN/genética , Epigénesis Genética , Genoma , Humanos , Programas InformáticosRESUMEN
Epigenetic information is available from contemporary organisms, but is difficult to track back in evolutionary time. Here, we show that genome-wide epigenetic information can be gathered directly from next-generation sequence reads of DNA isolated from ancient remains. Using the genome sequence data generated from hair shafts of a 4000-yr-old Paleo-Eskimo belonging to the Saqqaq culture, we generate the first ancient nucleosome map coupled with a genome-wide survey of cytosine methylation levels. The validity of both nucleosome map and methylation levels were confirmed by the recovery of the expected signals at promoter regions, exon/intron boundaries, and CTCF sites. The top-scoring nucleosome calls revealed distinct DNA positioning biases, attesting to nucleotide-level accuracy. The ancient methylation levels exhibited high conservation over time, clustering closely with modern hair tissues. Using ancient methylation information, we estimated the age at death of the Saqqaq individual and illustrate how epigenetic information can be used to infer ancient gene expression. Similar epigenetic signatures were found in other fossil material, such as 110,000- to 130,000-yr-old bones, supporting the contention that ancient epigenomic information can be reconstructed from a deep past. Our findings lay the foundation for extracting epigenomic information from ancient samples, allowing shifts in epialleles to be tracked through evolutionary time, as well as providing an original window into modern epigenomics.
Asunto(s)
Citosina/metabolismo , Metilación de ADN , Genoma Humano , Inuk/genética , Nucleosomas/genética , Animales , Mapeo Cromosómico , Epigénesis Genética , Epigenómica , Evolución Molecular , Expresión Génica , Regulación de la Expresión Génica , Humanos , Filogenia , Regiones Promotoras Genéticas , Análisis de Secuencia de ADNRESUMEN
MOTIVATION: Recently, new RNA secondary structure probing techniques have been developed, including Next Generation Sequencing based methods capable of probing transcriptome-wide. These techniques hold great promise for improving structure prediction accuracy. However, each new data type comes with its own signal properties and biases, which may even be experiment specific. There is therefore a growing need for RNA structure prediction methods that can be automatically trained on new data types and readily extended to integrate and fully exploit multiple types of data. RESULTS: Here, we develop and explore a modular probabilistic approach for integrating probing data in RNA structure prediction. It can be automatically trained given a set of known structures with probing data. The approach is demonstrated on SHAPE datasets, where we evaluate and selectively model specific correlations. The approach often makes superior use of the probing data signal compared to other methods. We illustrate the use of ProbFold on multiple data types using both simulations and a small set of structures with both SHAPE, DMS and CMCT data. Technically, the approach combines stochastic context-free grammars (SCFGs) with probabilistic graphical models. This approach allows rapid adaptation and integration of new probing data types. AVAILABILITY AND IMPLEMENTATION: ProbFold is implemented in C ++. Models are specified using simple textual formats. Data reformatting is done using separate C ++ programs. Source code, statically compiled binaries for x86 Linux machines, C ++ programs, example datasets and a tutorial is available from http://moma.ki.au.dk/prj/probfold/ CONTACT: : jakob.skou@clin.au.dk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Modelos Estadísticos , ARN , Algoritmos , Conformación de Ácido NucleicoRESUMEN
The past decade has shown mammalian genomes to be pervasively transcribed and identified thousands of noncoding (nc) transcripts. It is currently unclear to what extent these transcripts are of functional importance, as experimental functional evidence exists for only a small fraction. Here, we characterize the expression and evolutionary conservation properties of 12,115 known and novel nc transcripts, including structural RNAs, long nc RNAs (lncRNAs), antisense RNAs, EvoFold predictions, ultraconserved elements, and expressed nc regions. Expression levels are evaluated across 12 human tissues using a custom-designed microarray, supplemented with RNAseq. Conservation levels are evaluated at both the base level and at the syntenic level. We combine these measures with epigenetic mark annotations to identify subsets of novel nc transcripts that show characteristics similar to known functional ncRNAs. Few novel nc transcripts show both high expression and conservation levels. However, overall, we observe a positive correlation between expression and both conservation and epigenetic annotations, suggesting that a subset of the expressed transcripts are under purifying selection and likely functional. The identified subsets of expressed and conserved novel nc transcripts may form the basis for further functional characterization.
Asunto(s)
ARN no Traducido/genética , Transcriptoma , Secuencia de Bases , Cromatina/genética , Secuencia Conservada , Etiquetas de Secuencia Expresada , Humanos , Secuencias Invertidas Repetidas , Anotación de Secuencia Molecular , Análisis de Secuencia por Matrices de Oligonucleótidos , Sistemas de Lectura Abierta , Especificidad de Órganos , ARN no Traducido/metabolismoRESUMEN
We report here the genome sequence of an ancient human. Obtained from approximately 4,000-year-old permafrost-preserved hair, the genome represents a male individual from the first known culture to settle in Greenland. Sequenced to an average depth of 20x, we recover 79% of the diploid genome, an amount close to the practical limit of current sequencing technologies. We identify 353,151 high-confidence single-nucleotide polymorphisms (SNPs), of which 6.8% have not been reported previously. We estimate raw read contamination to be no higher than 0.8%. We use functional SNP assessment to assign possible phenotypic characteristics of the individual that belonged to a culture whose location has yielded only trace human remains. We compare the high-confidence SNPs to those of contemporary populations to find the populations most closely related to the individual. This provides evidence for a migration from Siberia into the New World some 5,500 years ago, independent of that giving rise to the modern Native Americans and Inuit.
Asunto(s)
Criopreservación , Extinción Biológica , Genoma Humano/genética , Inuk/genética , Emigración e Inmigración/historia , Genética de Población , Genómica , Genotipo , Groenlandia , Cabello , Historia Antigua , Humanos , Masculino , Fenotipo , Filogenia , Polimorfismo de Nucleótido Simple/genética , Análisis de Secuencia de ADN , Siberia/etnologíaRESUMEN
Regulatory RNA structures are often members of families with multiple paralogous instances across the genome. Family members share functional and structural properties, which allow them to be studied as a whole, facilitating both bioinformatic and experimental characterization. We have developed a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein-coding regions comprising 725 individual structures, including 48 families with known structural RNA elements. Known families identified include both noncoding RNAs, e.g., miRNAs and the recently identified MALAT1/MEN ß lincRNA family; and cis-regulatory structures, e.g., iron-responsive elements. We also identify tens of new families supported by strong evolutionary evidence and other statistical evidence, such as GO term enrichments. For some of these, detailed analysis has led to the formulation of specific functional hypotheses. Examples include two hypothesized auto-regulatory feedback mechanisms: one involving six long hairpins in the 3'-UTR of MAT2A, a key metabolic gene that produces the primary human methyl donor S-adenosylmethionine; the other involving a tRNA-like structure in the intron of the tRNA maturation gene POP1. We experimentally validate the predicted MAT2A structures. Finally, we identify potential new regulatory networks, including large families of short hairpins enriched in immunity-related genes, e.g., TNF, FOS, and CTLA4, which include known transcript destabilizing elements. Our findings exemplify the diversity of post-transcriptional regulation and provide a resource for further characterization of new regulatory mechanisms and families of noncoding RNAs.
Asunto(s)
Genoma , Genómica , ARN no Traducido/química , Secuencias Reguladoras de Ácido Ribonucleico , Vertebrados/genética , Regiones no Traducidas 3' , Animales , Secuencia de Bases , Secuencia Conservada , Regulación de la Expresión Génica , Humanos , Inmunidad/genética , Metionina Adenosiltransferasa/genética , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , Filogenia , Biosíntesis de Proteínas , Edición de ARN , Precursores del ARN/metabolismo , Procesamiento Postranscripcional del ARN , Estabilidad del ARN , ARN Mensajero/metabolismo , ARN de Transferencia/química , ARN de Transferencia/metabolismo , ARN no Traducido/genética , Alineación de SecuenciaRESUMEN
Circular RNAs represent a class of endogenous RNAs that regulate gene expression and influence cell biological decisions with implications for the pathogenesis of several diseases. Here, we disclose a novel gene-regulatory role of circHIPK3 by combining analyses of large genomics datasets and mechanistic cell biological follow-up experiments. Using time-course depletion of circHIPK3 and specific candidate RNA-binding proteins, we identify several perturbed genes by RNA sequencing analyses. Expression-coupled motif analyses identify an 11-mer motif within circHIPK3, which also becomes enriched in genes that are downregulated upon circHIPK3 depletion. By mining eCLIP datasets and combined with RNA immunoprecipitation assays, we demonstrate that the 11-mer motif constitutes a strong binding site for IGF2BP2 in bladder cancer cell lines. Our results suggest that circHIPK3 can sequester IGF2BP2 as a competing endogenous RNA (ceRNA), leading to target mRNA stabilization. As an example of a circHIPK3-regulated gene, we focus on the STAT3 mRNA as a specific substrate of IGF2BP2 and validate that manipulation of circHIPK3 regulates IGF2BP2-STAT3 mRNA binding and, thereby, STAT3 mRNA levels. Surprisingly, absolute copy number quantifications demonstrate that IGF2BP2 outnumbers circHIPK3 by orders of magnitude, which is inconsistent with a simple 1:1 ceRNA hypothesis. Instead, we show that circHIPK3 can nucleate multiple copies of IGF2BP2, potentially via phase separation, to produce IGF2BP2 condensates. Our results support a model where a few cellular circHIPK3 molecules can induce IGF2BP2 condensation, thereby regulating key factors for cell proliferation.
Asunto(s)
ARN Circular , Proteínas de Unión al ARN , Humanos , Proteínas de Unión al ARN/metabolismo , Proteínas de Unión al ARN/genética , ARN Circular/genética , ARN Circular/metabolismo , Línea Celular Tumoral , Factor de Transcripción STAT3/metabolismo , Factor de Transcripción STAT3/genética , Péptidos y Proteínas de Señalización Intracelular/metabolismo , Péptidos y Proteínas de Señalización Intracelular/genética , Unión Proteica , ARN Mensajero/metabolismo , ARN Mensajero/genética , Neoplasias de la Vejiga Urinaria/genética , Neoplasias de la Vejiga Urinaria/metabolismo , ARN Endógeno Competitivo , Proteínas Serina-Treonina QuinasasRESUMEN
Circular RNAs (circRNAs) represent a class of widespread endogenous RNAs that regulate gene expression and thereby influence cell biological decisions with implications for the pathogenesis of several diseases. Here, we disclose a novel gene-regulatory role of circHIPK3 by combining analyses of large genomics datasets and mechanistic cell biological follow-up experiments. Specifically, we use temporal depletion of circHIPK3 or specific RNA binding proteins (RBPs) and identify several perturbed genes by RNA sequencing analyses. Using expression-coupled motif analyses of mRNA expression data from various knockdown experiments, we identify an 11-mer motif within circHIPK3, which is also enriched in genes that become downregulated upon circHIPK3 depletion. By mining eCLIP datasets, we find that the 11-mer motif constitutes a strong binding site for IGF2BP2 and validate this circHIPK3-IGF2BP2 interaction experimentally using RNA-immunoprecipitation and competition assays in bladder cancer cell lines. Our results suggest that circHIPK3 and IGF2BP2 mRNA targets compete for binding. Since the identified 11-mer motif found in circHIPK3 is enriched in upregulated genes following IGF2BP2 knockdown, and since IGF2BP2 depletion conversely globally antagonizes the effect of circHIPK3 knockdown on target genes, our results suggest that circHIPK3 can sequester IGF2BP2 as a competing endogenous RNA (ceRNA), leading to target mRNA stabilization. As an example of a circHIPK3-regulated gene, we focus on the STAT3 mRNA as a specific substrate of IGF2BP2 and validate that manipulation of circHIPK3 regulates IGF2BP2-STAT3 mRNA binding and thereby STAT3 mRNA levels. However, absolute copy number quantifications demonstrate that IGF2BP2 outnumbers circHIPK3 by orders of magnitude, which is inconsistent with a simple 1:1 ceRNA hypothesis. Instead, we show that circHIPK3 can nucleate multiple copies of IGF2BP2, potentially via phase separation, to produce IGF2BP2 condensates. Finally, we show that circHIPK3 expression correlates with overall survival of patients with bladder cancer. Our results are consistent with a model where relatively few cellular circHIPK3 molecules function as inducers of IGF2BP2 condensation thereby regulating STAT3 and other key factors for cell proliferation and potentially cancer progression.
RESUMEN
BACKGROUND: Cancer mutations accumulate through replication errors and DNA damage coupled with incomplete repair. Individual mutational processes often show nucleotide sequence and functional region preferences. As a result, some sequence contexts mutate at much higher rates than others, with additional variation found between functional regions. Mutational hotspots, with recurrent mutations across cancer samples, represent genomic positions with elevated mutation rates, often caused by highly localized mutational processes. METHODS: We count the 11-mer genomic sequences across the genome, and using the PCAWG set of 2583 pan-cancer whole genomes, we associate 11-mers with mutational signatures, hotspots of single nucleotide variants, and specific genomic regions. We evaluate the mutation rates of individual and combined sets of 11-mers and derive mutational sequence motifs. RESULTS: We show that hotspots generally identify highly mutable sequence contexts. Using these, we show that some mutational signatures are enriched in hotspot sequence contexts, corresponding to well-defined sequence preferences for the underlying localized mutational processes. This includes signature 17b (of unknown etiology) and signatures 62 (POLE deficiency), 7a (UV), and 72 (linked to lymphomas). In some cases, the mutation rate and sequence preference increase further when focusing on certain genomic regions, such as signature 62 in transcribed regions, where the mutation rate is increased up to 9-folds over cancer type and mutational signature average. CONCLUSIONS: We summarize our findings in a catalog of localized mutational processes, their sequence preferences, and their estimated mutation rates.
Asunto(s)
Tasa de Mutación , Neoplasias , Humanos , Mutación , Neoplasias/genética , Daño del ADN , GenómicaRESUMEN
More than 80% of human cancers originate in epithelial tissues. Loss of epithelial cell characteristics are hallmarks of tumor development. Receptor-mediated endocytosis is a key function of absorptive epithelial cells with importance for cellular and organismal homeostasis. LRP2 (megalin) is the largest known endocytic membrane receptor and is essential for endocytosis of various ligands in specialized epithelia, including the proximal tubules of the kidney, the thyroid gland, and breast glandular epithelium. However, the role and regulation of LRP2 in cancers that arise from these tissues has not been delineated. Here, we examined the expression of LRP2 across 33 cancer types in The Cancer Genome Atlas. As expected, the highest levels of LRP2 were found in cancer types that arise from LRP2-expressing absorptive epithelial cells. However, in a subset of tumors from these cancer types, we observed epigenetic silencing of LRP2. LRP2 expression showed a strong inverse correlation to methylation of a specific CpG site (cg02361027) in the first intron of the LRP2 gene. Interestingly, low expression of LRP2 was associated with poor patient outcome in clear cell renal cell carcinoma, papillary renal cell carcinoma, mesothelioma, papillary thyroid carcinoma, and invasive breast carcinoma. Furthermore, loss of LRP2 expression was associated with dedifferentiated histological and molecular subtypes of these cancers. These observations now motivate further studies on the functional role of LRP2 in tumors of epithelial origin and the potential use of LRP2 as a cancer biomarker.