RESUMO
A key attribute of some long noncoding RNAs (lncRNAs) is their ability to regulate expression of neighbouring genes in cis. However, such 'cis-lncRNAs' are presently defined using ad hoc criteria that, we show, are prone to false-positive predictions. The resulting lack of cis-lncRNA catalogues hinders our understanding of their extent, characteristics and mechanisms. Here, we introduce TransCistor, a framework for defining and identifying cis-lncRNAs based on enrichment of targets amongst proximal genes. TransCistor's simple and conservative statistical models are compatible with functionally defined target gene maps generated by existing and future technologies. Using transcriptome-wide perturbation experiments for 268 human and 134 mouse lncRNAs, we provide the first large-scale survey of cis-lncRNAs. Known cis-lncRNAs are correctly identified, including XIST, LINC00240 and UMLILO, and predictions are consistent across analysis methods, perturbation types and independent experiments. We detect cis-activity in a minority of lncRNAs, primarily involving activators over repressors. Cis-lncRNAs are detected by both RNA interference and antisense oligonucleotide perturbations. Mechanistically, cis-lncRNA transcripts are observed to physically associate with their target genes and are weakly enriched with enhancer elements. In summary, TransCistor establishes a quantitative foundation for cis-lncRNAs, opening a path to elucidating their molecular mechanisms and biological significance.
Assuntos
Biologia Computacional , Técnicas Genéticas , RNA Longo não Codificante , Animais , Humanos , Camundongos , RNA Longo não Codificante/genética , RNA Longo não Codificante/isolamento & purificação , Fatores de Transcrição/genética , Transcriptoma , Software/normas , Biologia Computacional/métodosRESUMO
Over recent years, long-range RNA structure has emerged as a factor that is fundamental to alternative splicing regulation. An increasing number of human disorders are now being associated with splicing defects; hence it is essential to develop methods that assess long-range RNA structure experimentally. RNA in situ conformation sequencing (RIC-seq) is a method that recapitulates RNA structure within physiological RNA-protein complexes. In this work, we juxtapose pairs of conserved complementary regions (PCCRs) that were predicted in silico with the results of RIC-seq experiments conducted in seven human cell lines. We show statistically that RIC-seq support of PCCRs correlates with their properties, such as equilibrium free energy, presence of compensatory substitutions, and occurrence of A-to-I RNA editing sites and forked eCLIP peaks. Exons enclosed in PCCRs that are supported by RIC-seq tend to have weaker splice sites and lower inclusion rates, which is indicative of post-transcriptional splicing regulation mediated by RNA structure. Based on these findings, we prioritize PCCRs according to their RIC-seq support and show, using antisense nucleotides and minigene mutagenesis, that PCCRs in two disease-associated human genes, PHF20L1 and CASK, and also PCCRs in their murine orthologs, impact alternative splicing. In sum, we demonstrate how RIC-seq experiments can be used to discover functional long-range RNA structures, and particularly those that regulate alternative splicing.
Assuntos
Processamento Alternativo , Splicing de RNA , Humanos , Animais , Camundongos , Sequência de Bases , Análise de Sequência de RNA , RNA/genética , Sítios de Splice de RNA , Proteínas Cromossômicas não Histona/genéticaRESUMO
Eukaryotic gene expression is regulated post-transcriptionally by a mechanism called unproductive splicing, in which mRNA is triggered to degrade by the nonsense-mediated decay (NMD) pathway as a result of regulated alternative splicing (AS). Only a few dozen unproductive splicing events (USEs) are currently documented, and many more remain to be identified. Here, we analyzed RNA-seq experiments from the Genotype-Tissue Expression (GTEx) Consortium to identify USEs, in which an increase in the NMD isoform splicing rate is accompanied by tissue-specific down-regulation of the host gene. To characterize RNA-binding proteins (RBPs) that regulate USEs, we superimposed these results with RBP footprinting data and experiments on the response of the transcriptome to the perturbation of expression of a large panel of RBPs. Concordant tissue-specific changes between the expression of RBP and USE splicing rate revealed a high-confidence regulatory network including 27 tissue-specific USEs with strong evidence of RBP binding. Among them, we found previously unknown PTBP1-controlled events in the DCLK2 and IQGAP1 genes, for which we confirmed the regulatory effect using small interfering RNA (siRNA) knockdown experiments in the A549 cell line. In sum, we present a transcriptomic pipeline that allows the identification of tissue-specific USEs, potentially many more than were reported here using stringent filters.
Assuntos
Processamento Alternativo , Splicing de RNA , Regulação da Expressão Gênica , Degradação do RNAm Mediada por Códon sem Sentido , Isoformas de Proteínas/genética , RNA Mensageiro/metabolismo , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Humanos , Linhagem CelularRESUMO
We have produced RNA sequencing data for 53 primary cells from different locations in the human body. The clustering of these primary cells reveals that most cells in the human body share a few broad transcriptional programs, which define five major cell types: epithelial, endothelial, mesenchymal, neural, and blood cells. These act as basic components of many tissues and organs. Based on gene expression, these cell types redefine the basic histological types by which tissues have been traditionally classified. We identified genes whose expression is specific to these cell types, and from these genes, we estimated the contribution of the major cell types to the composition of human tissues. We found this cellular composition to be a characteristic signature of tissues and to reflect tissue morphological heterogeneity and histology. We identified changes in cellular composition in different tissues associated with age and sex, and found that departures from the normal cellular composition correlate with histological phenotypes associated with disease.
Assuntos
Transcrição Gênica , Linhagem Celular , Células Endoteliais/metabolismo , Células Epiteliais/metabolismo , Feminino , Perfilação da Expressão Gênica , Ginecomastia/genética , Ginecomastia/metabolismo , Humanos , Masculino , Mesoderma/citologia , Mesoderma/metabolismo , Neoplasias/genética , Especificidade de Órgãos , Análise de Sequência de RNARESUMO
The mammalian Ate1 gene encodes an arginyl transferase enzyme with tumor suppressor function that depends on the inclusion of one of the two mutually exclusive exons (MXE), exons 7a and 7b. We report that the molecular mechanism underlying MXE splicing in Ate1 involves five conserved regulatory intronic elements R1-R5, of which R1 and R4 compete for base pairing with R3, while R2 and R5 form an ultra-long-range RNA structure spanning 30 Kb. In minigenes, single and double mutations that disrupt base pairings in R1R3 and R3R4 lead to the loss of MXE splicing, while compensatory triple mutations that restore RNA structure revert splicing to that of the wild type. In the endogenous Ate1 pre-mRNA, blocking the competing base pairings by LNA/DNA mixmers complementary to R3 leads to the loss of MXE splicing, while the disruption of R2R5 interaction changes the ratio of MXE. That is, Ate1 splicing is controlled by two independent, dynamically interacting, and functionally distinct RNA structure modules. Exon 7a becomes more included in response to RNA Pol II slowdown, however it fails to do so when the ultra-long-range R2R5 interaction is disrupted, indicating that exon 7a/7b ratio depends on co-transcriptional RNA folding. In sum, these results demonstrate that splicing is coordinated both in time and in space over very long distances, and that the interaction of these components is mediated by RNA structure.
Assuntos
Processamento Alternativo/genética , Aminoaciltransferases/genética , Conformação de Ácido Nucleico , Oligonucleotídeos Antissenso/farmacologia , Oligonucleotídeos/farmacologia , Dobramento de RNA , Precursores de RNA/genética , RNA Mensageiro/genética , Células A549 , Sequência de Bases , Linhagem Celular Tumoral , Sequência Conservada , Éxons/genética , Regulação da Expressão Gênica/efeitos dos fármacos , Humanos , Íntrons/genética , Mutagênese Sítio-Dirigida , Proteínas de Neoplasias/genética , Oligonucleotídeos/genética , Oligonucleotídeos Antissenso/genética , Especificidade de Órgãos , RNA Mensageiro/metabolismo , Alinhamento de Sequência , Homologia de Sequência do Ácido Nucleico , Elongação da Transcrição GenéticaRESUMO
Tandem alternative splice sites (TASS) is a special class of alternative splicing events that are characterized by a close tandem arrangement of splice sites. Most TASS lack functional characterization and are believed to arise from splicing noise. Based on the RNA-seq data from the Genotype Tissue Expression project, we present an extended catalogue of TASS in healthy human tissues and analyze their tissue-specific expression. The expression of TASS is usually dominated by one major splice site (maSS), while the expression of minor splice sites (miSS) is at least an order of magnitude lower. Among 46k miSS with sufficient read support, 9k (20%) are significantly expressed above the expected noise level, and among them 2.5k are expressed tissue-specifically. We found significant correlations between tissue-specific expression of RNA-binding proteins (RBP), tissue-specific expression of miSS, and miSS response to RBP inactivation by shRNA. In combination with RBP profiling by eCLIP, this allowed prediction of novel cases of tissue-specific splicing regulation including a miSS in QKI mRNA that is likely regulated by PTBP1. The analysis of human primary cell transcriptomes suggested that both tissue-specific and cell-type-specific factors contribute to the regulation of miSS expression. More than 20% of tissue-specific miSS affect structured protein regions and may adjust protein-protein interactions or modify the stability of the protein core. The significantly expressed miSS evolve under the same selection pressure as maSS, while other miSS lack signatures of evolutionary selection and conservation. Using mixture models, we estimated that not more than 15% of maSS and not more than 54% of tissue-specific miSS are noisy, while the proportion of noisy splice sites among non-significantly expressed miSS is above 63%.
Assuntos
Processamento Alternativo , Transcriptoma , Humanos , RNA Mensageiro/genéticaRESUMO
Nonsense-mediated decay (NMD) is a eukaryotic mRNA surveillance system that selectively degrades transcripts with premature termination codons (PTC). Many RNA-binding proteins (RBP) regulate their expression levels by a negative feedback loop, in which RBP binds its own pre-mRNA and causes alternative splicing to introduce a PTC. We present a bioinformatic analysis integrating three data sources, eCLIP assays for a large RBP panel, shRNA inactivation of NMD pathway, and shRNA-depletion of RBPs followed by RNA-seq, to identify novel such autoregulatory feedback loops. We show that RBPs frequently bind their own pre-mRNAs, their exons respond prominently to NMD pathway disruption, and that the responding exons are enriched with nearby eCLIP peaks. We confirm previously proposed models of autoregulation in SRSF7 and U2AF1 genes and present two novel models, in which (i) SFPQ binds its mRNA and promotes switching to an alternative distal 3'-UTR that is targeted by NMD, and (ii) RPS3 binding activates a poison 5'-splice site in its pre-mRNA that leads to a frame shift and degradation by NMD. We also suggest specific splicing events that could be implicated in autoregulatory feedback loops in RBM39, HNRNPM, and U2AF2 genes. The results are available through a UCSC Genome Browser track hub.
Assuntos
Códon sem Sentido , Degradação do RNAm Mediada por Códon sem Sentido , Splicing de RNA , RNA Interferente Pequeno/metabolismo , Transcriptoma , Regiões 3' não Traduzidas , Processamento Alternativo , Biologia Computacional , Éxons , Mutação da Fase de Leitura , Ribonucleoproteínas Nucleares Heterogêneas Grupo M/metabolismo , Humanos , Proteínas Nucleares/metabolismo , Precursores de RNA/metabolismo , RNA Mensageiro/metabolismo , Proteínas de Ligação a RNA/metabolismo , Fatores de Processamento de Serina-Arginina/metabolismo , Spliceossomos , Fator de Processamento U2AF/metabolismoRESUMO
The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.
Assuntos
Genoma/genética , Genômica , Camundongos/genética , Anotação de Sequência Molecular , Animais , Linhagem da Célula/genética , Cromatina/genética , Cromatina/metabolismo , Sequência Conservada/genética , Replicação do DNA/genética , Desoxirribonuclease I/metabolismo , Regulação da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Estudo de Associação Genômica Ampla , Humanos , RNA/genética , Sequências Reguladoras de Ácido Nucleico/genética , Especificidade da Espécie , Fatores de Transcrição/metabolismo , Transcriptoma/genéticaRESUMO
BackgroundSurgical treatment for gastroschisis and congenital diaphragmatic hernia (CDH) commonly leads to abdominal compartment syndrome (ACS) associated with hypoxic renal injury. We hypothesized that measurement of urinary and serum concentrations of vascular endothelial growth factor (VEGF), π-glutathione S-transferase (π-GST), and monocyte chemoattractant protein-1 (MCP-1) may serve for noninvasive detection of hypoxic renal injury in such patients.MethodsIntra-abdominal pressure (IAP), renal excretory function, and the biomarker levels were analyzed before, 4, and 10 days after surgery. Association between the biomarker levels and renal histology was investigated using an original model of ACS in newborn rats.ResultsFour days after surgery, IAP increased, renal excretory function decreased, and the levels of VEGF, π-GST, and MCP-1 increased, indicating renal injury. Ten days after surgery, IAP partially decreased, renal excretory function completely restored, but the biomarker levels remained elevated, suggesting the ongoing kidney injury. In the model of ACS, increase in the biomarker levels was associated with progressing kidney morphological alteration.ConclusionSurgical treatment for gastroschisis and CDH is associated with prolonged hypoxic kidney injury despite complete restoration of renal excretory function. Follow-up measurement of VEGF, π-GST, and MCP-1 levels may provide a better tool for noninvasive assessment of renal parenchyma in newborns with ACS.
Assuntos
Síndromes Compartimentais/patologia , Anormalidades Congênitas/cirurgia , Gastrosquise/cirurgia , Hérnias Diafragmáticas Congênitas/cirurgia , Animais , Animais Recém-Nascidos , Biomarcadores/metabolismo , Quimiocina CCL2/metabolismo , Síndromes Compartimentais/complicações , Modelos Animais de Doenças , Feminino , Gastrosquise/metabolismo , Glutationa Transferase/metabolismo , Hérnia Diafragmática/cirurgia , Humanos , Hipóxia/fisiopatologia , Recém-Nascido , Hipertensão Intra-Abdominal , Rim/patologia , Masculino , Pressão , Estudos Prospectivos , Ratos , Ratos Wistar , Fator A de Crescimento do Endotélio Vascular/metabolismoRESUMO
IRBIS is a computational pipeline for detecting conserved complementary regions in unaligned orthologous sequences. Unlike other methods, it follows the "first-fold-then-align" principle in which all possible combinations of complementary k-mers are searched for simultaneous conservation. The novel trimming procedure reduces the size of the search space and improves the performance to the point where large-scale analyses of intra- and intermolecular RNA-RNA interactions become possible. In this article, I provide a rigorous description of the method, benchmarking on simulated and real data, and a set of stringent predictions of intramolecular RNA structure in placental mammals, drosophilids, and nematodes. I discuss two particular cases of long-range RNA structures that are likely to have a causal effect on single- and multiple-exon skipping, one in the mammalian gene Dystonin and the other in the insect gene Ca-α1D. In Dystonin, one of the two complementary boxes contains a binding site of Rbfox protein similar to one recently described in Enah gene. I also report that snoRNAs and long noncoding RNAs (lncRNAs) have a high capacity of base-pairing to introns of protein-coding genes, suggesting possible involvement of these transcripts in splicing regulation. I also find that conserved sequences that occur equally likely on both strands of DNA (e.g., transcription factor binding sites) contribute strongly to the false-discovery rate and, therefore, would confound every such analysis. IRBIS is an open-source software that is available at http://genome.crg.es/~dmitri/irbis/.
Assuntos
Caenorhabditis elegans/genética , Sequência Conservada/genética , Drosophila melanogaster/genética , Éxons/genética , Genes/genética , Íntrons/genética , Software , Animais , Sequência de Bases , Humanos , Dados de Sequência Molecular , Splicing de RNA/genética , RNA Nucleolar Pequeno/genética , Homologia de Sequência do Ácido NucleicoRESUMO
BACKGROUND: A vast amount of DNA variation is being identified by increasingly large-scale exome and genome sequencing projects. To be useful, variants require accurate functional annotation and a wide range of tools are available to this end. McCarthy et al recently demonstrated the large differences in prediction of loss-of-function (LoF) variation when RefSeq and Ensembl transcripts are used for annotation, highlighting the importance of the reference transcripts on which variant functional annotation is based. RESULTS: We describe a detailed analysis of the similarities and differences between the gene and transcript annotation in the GENCODE and RefSeq genesets. We demonstrate that the GENCODE Comprehensive set is richer in alternative splicing, novel CDSs, novel exons and has higher genomic coverage than RefSeq, while the GENCODE Basic set is very similar to RefSeq. Using RNAseq data we show that exons and introns unique to one geneset are expressed at a similar level to those common to both. We present evidence that the differences in gene annotation lead to large differences in variant annotation where GENCODE and RefSeq are used as reference transcripts, although this is predominantly confined to non-coding transcripts and UTR sequence, with at most ~30% of LoF variants annotated discordantly. We also describe an investigation of dominant transcript expression, showing that it both supports the utility of the GENCODE Basic set in providing a smaller set of more highly expressed transcripts and provides a useful, biologically-relevant filter for further reducing the complexity of the transcriptome. CONCLUSIONS: The reference transcripts selected for variant functional annotation do have a large effect on the outcome. The GENCODE Comprehensive transcripts contain more exons, have greater genomic coverage and capture many more variants than RefSeq in both genome and exome datasets, while the GENCODE Basic set shows a higher degree of concordance with RefSeq and has fewer unique features. We propose that the GENCODE Comprehensive set has great utility for the discovery of new variants with functional potential, while the GENCODE Basic set is more suitable for applications demanding less complex interpretation of functional variants.
Assuntos
Biologia Computacional , Genoma Humano , Anotação de Sequência Molecular , Isoformas de Proteínas/metabolismo , Software , Processamento Alternativo , Bases de Dados Genéticas , Humanos , Isoformas de Proteínas/genética , TranscriptomaRESUMO
Pre-mRNA structure impacts many cellular processes, including splicing in genes associated with disease. The contemporary paradigm of RNA structure prediction is biased toward secondary structures that occur within short ranges of pre-mRNA, although long-range base-pairings are known to be at least as important. Recently, we developed an efficient method for detecting conserved RNA structures on the genome-wide scale, one that does not require multiple sequence alignments and works equally well for the detection of local and long-range base-pairings. Using an enhanced method that detects base-pairings at all possible combinations of splice sites within each gene, we now report RNA structures that could be involved in the regulation of splicing in mammals. Statistically, we demonstrate strong association between the occurrence of conserved RNA structures and alternative splicing, where local RNA structures are generally more frequent at alternative donor splice sites, while long-range structures are more associated with weak alternative acceptor splice sites. As an example, we validated the RNA structure in the human SF1 gene using minigenes in the HEK293 cell line. Point mutations that disrupted the base-pairing of two complementary boxes between exons 9 and 10 of this gene altered the splicing pattern, while the compensatory mutations that reestablished the base-pairing reverted splicing to that of the wild-type. There is statistical evidence for a Dscam-like class of mammalian genes, in which mutually exclusive RNA structures control mutually exclusive alternative splicing. In sum, we propose that long-range base-pairings carry an important, yet unconsidered part of the splicing code, and that, even by modest estimates, there must be thousands of such potentially regulatory structures conserved throughout the evolutionary history of mammals.
Assuntos
Processamento Alternativo , Precursores de RNA/química , Precursores de RNA/genética , Splicing de RNA , Animais , Sequência de Bases , Sequência Conservada , Quinases Semelhantes a Duplacortina , Células HEK293 , Humanos , Peptídeos e Proteínas de Sinalização Intracelular/genética , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Proteínas Serina-Treonina Quinases/genética , Sítios de Splice de RNA , Análise de Sequência de RNARESUMO
MOTIVATION: Novel technologies brought in unprecedented amounts of high-throughput sequencing data along with great challenges in their analysis and interpretation. The percent-spliced-in (PSI, ) metric estimates the incidence of single-exon-skipping events and can be computed directly by counting reads that align to known or predicted splice junctions. However, the majority of human splicing events are more complex than single-exon skipping. RESULTS: In this short report, we present a framework that generalizes the metric to arbitrary classes of splicing events. We change the view from exon centric to intron centric and split the value of into two indices, and , measuring the rate of splicing at the 5' and 3' end of the intron, respectively. The advantage of having two separate indices is that they deconvolute two distinct elementary acts of the splicing reaction. The completeness of splicing index is decomposed in a similar way. This framework is implemented as bam2ssj, a BAM-file-processing pipeline for strand-specific counting of reads that align to splice junctions or overlap with splice sites. It can be used as a consistent protocol for quantifying splice junctions from RNA-seq data because no such standard procedure currently exists. AVAILABILITY: The C code of bam2ssj is open source and is available at https://github.com/pervouchine/bam2ssj CONTACT: dp@crg.eu
Assuntos
Processamento Alternativo , Íntrons , Análise de Sequência de RNA , Software , Éxons , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Sítios de Splice de RNARESUMO
The mammalian BRD2 and BRD3 genes encode structurally related proteins from the bromodomain and extraterminal domain protein family. The expression of BRD2 is regulated by unproductive splicing upon inclusion of exon 3b, which is located in the region encoding a bromodomain. Bioinformatic analysis indicated that BRD2 exon 3b inclusion is controlled by a pair of conserved complementary regions (PCCR) located in the flanking introns. Furthermore, we identified a highly conserved element encoding a cryptic poison exon 5b and a previously unknown PCCR in the intron between exons 5 and 6 of BRD3, however, outside of the homologous bromodomain. Minigene mutagenesis and blockage of RNA structure by antisense oligonucleotides demonstrated that RNA structure controls the rate of inclusion of poison exons. The patterns of BRD2 and BRD3 expression and splicing show downregulation upon inclusion of poison exons, which become skipped in response to transcription elongation slowdown, further confirming a role of PCCRs in unproductive splicing regulation. We conclude that BRD2 and BRD3 independently acquired poison exons and RNA structures to dynamically control unproductive splicing. This study describes a convergent evolution of regulatory unproductive splicing mechanisms in these genes, providing implications for selective modulation of their expression in therapeutic applications.
RESUMO
Alternative splicing (AS) and alternative polyadenylation (APA) are two crucial steps in the post-transcriptional regulation of eukaryotic gene expression. Protocols capturing and sequencing RNA 3'-ends have uncovered widespread intronic polyadenylation (IPA) in normal and disease conditions, where it is currently attributed to stochastic variations in the pre-mRNA processing. Here, we took advantage of the massive amount of RNA-seq data generated by the Genotype Tissue Expression project (GTEx) to simultaneously identify and match tissue-specific expression of intronic polyadenylation sites with tissue-specific splicing. A combination of computational methods including the analysis of short reads with non-templated adenines revealed that APA events are more abundant in introns than in exons. While the rate of IPA in composite terminal exons and skipped terminal exons expectedly correlates with splicing, we observed a considerable fraction of IPA events that lack AS support and attributed them to spliced polyadenylated introns (SPI). We hypothesize that SPIs represent transient byproducts of a dynamic coupling between APA and AS, in which the spliceosome removes the intron while it is being cleaved and polyadenylated. These findings indicate that cotranscriptional pre-mRNA splicing could serve as a rescue mechanism to suppress premature transcription termination at intronic polyadenylation sites.
RESUMO
RNA structure has been increasingly recognized as a critical player in the biogenesis and turnover of many transcripts classes. In eukaryotes, the prediction of RNA structure by thermodynamic modeling meets fundamental limitations due to the large sizes and complex, discontinuous organization of eukaryotic genes. Signatures of functional RNA structures can be found by detecting compensatory substitutions in homologous sequences, but a comparative approach is applicable only within conserved sequence blocks. Here, we developed a computational pipeline called PHRIC, which is not limited to conserved regions and relies on RNA contacts derived from RNA in situ conformation sequencing (RIC-seq) experiments. It extracts pairs of short RNA fragments surrounded by nested clusters of RNA contacts and predicts long, nearly perfect complementary base pairings formed between these fragments. In application to a panel of RIC-seq experiments in seven human cell lines, PHRIC predicted ~12,000 stable long-range RNA structures with equilibrium free energy below -15 kcal/mol, the vast majority of which fall outside of regions annotated as conserved among vertebrates. These structures, nevertheless, show some level of sequence conservation and remarkable compensatory substitution patterns in other clades. Furthermore, we found that introns have a higher propensity to form stable long-range RNA structures between each other, and moreover that RNA structures tend to concentrate within the same intron rather than connect adjacent introns. These results for the first time extend the application of proximity ligation assays to RNA structure prediction beyond conserved regions.
Assuntos
RNA , Transcriptoma , Animais , Humanos , RNA/genética , Sequência de Bases , Transcriptoma/genética , Íntrons , Splicing de RNARESUMO
Significant alterations in signaling pathways and transcriptional regulatory programs together represent major hallmarks of many cancers. These, among all, include the reactivation of stemness, which is registered by the expression of pathways that are active in the embryonic stem cells (ESCs). Here, we assembled gene sets that reflect the stemness and proliferation signatures and used them to analyze a large panel of RNA-seq data from The Cancer Genome Atlas (TCGA) Consortium in order to specifically assess the expression of stemness-related and proliferation-related genes across a collection of different tumor types. We introduced a metric that captures the collective similarity of the expression profile of a tumor to that of ESCs, which showed that stemness and proliferation signatures vary greatly between different tumor types. We also observed a high degree of intertumoral heterogeneity in the expression of stemness- and proliferation-related genes, which was associated with increased hazard ratios in a fraction of tumors and mirrored by high intratumoral heterogeneity and a remarkable stemness capacity in metastatic lesions across cancer cells in single cell RNA-seq datasets. Taken together, these results indicate that the expression of stemness signatures is highly heterogeneous and cannot be used as a universal determinant of cancer. This calls into question the universal validity of diagnostic tests that are based on stem cell markers.
Assuntos
Perfilação da Expressão Gênica , Neoplasias , Proliferação de Células/genética , Células-Tronco Embrionárias , Humanos , Neoplasias/patologia , Células-Tronco Neoplásicas/patologia , Transcriptoma , Sequenciamento do ExomaRESUMO
Accurate and efficient recognition of splice sites during pre-mRNA splicing is essential for proper transcriptome expression. Splice site usage can be modulated by secondary structures, but it is unclear if this type of modulation is commonly used or occurs to a significant degree with secondary structures forming over long distances. Using phlyogenetic comparisons of intronic sequences among 12 Drosophila genomes, we elucidated a group of 202 highly conserved pairs of sequences, each at least nine nucleotides long, capable of forming stable stem structures. This set was highly enriched in alternatively spliced introns and introns with weak acceptor sites and long introns, and most occurred over long distances (>150 nucleotides). Experimentally, we analyzed the splicing of several of these introns using mini-genes in Drosophila S2 cells. Wild-type splicing patterns were changed by mutations that opened the stem structure, and restored by compensatory mutations that re-established the base-pairing potential, demonstrating that these secondary structures were indeed implicated in the splice site choice. Mechanistically, the RNA structures masked splice sites, brought together distant splice sites and/or looped out introns. Thus, base-pairing interactions within introns, even those occurring over long distances, are more frequent modulators of alternative splicing than is currently assumed.
Assuntos
Processamento Alternativo , Drosophila melanogaster/genética , Íntrons , Precursores de RNA/química , RNA Mensageiro/química , Animais , Pareamento de Bases , Sequência de Bases , Sequência Conservada , Dados de Sequência Molecular , Sítios de Splice de RNARESUMO
The ability of nucleic acids to form double-stranded structures is essential for all living systems on Earth. Current knowledge on functional RNA structures is focused on locally-occurring base pairs. However, crosslinking and proximity ligation experiments demonstrated that long-range RNA structures are highly abundant. Here, we present the most complete to-date catalog of conserved complementary regions (PCCRs) in human protein-coding genes. PCCRs tend to occur within introns, suppress intervening exons, and obstruct cryptic and inactive splice sites. Double-stranded structure of PCCRs is supported by decreased icSHAPE nucleotide accessibility, high abundance of RNA editing sites, and frequent occurrence of forked eCLIP peaks. Introns with PCCRs show a distinct splicing pattern in response to RNAPII slowdown suggesting that splicing is widely affected by co-transcriptional RNA folding. The enrichment of 3'-ends within PCCRs raises the intriguing hypothesis that coupling between RNA folding and splicing could mediate co-transcriptional suppression of premature pre-mRNA cleavage and polyadenylation.
Assuntos
Pareamento de Bases/fisiologia , DNA Complementar/genética , Precursores de RNA/metabolismo , Splicing de RNA/fisiologia , Células A549 , Sequência de Bases/genética , Sequência Conservada/fisiologia , Biblioteca Gênica , Células Hep G2 , Humanos , Íntrons/genética , Poliadenilação , Dobramento de RNA/fisiologia , Precursores de RNA/genética , RNA-SeqRESUMO
PURPOSE: To investigate the urinary levels of TGF-ß1, VEGF, and MCP-1 as potential biomarkers of latent inflammation and fibrosis in the kidney before and 6 months after correction of vesicoureteral reflux (VUR) in children. METHODS: A total of 88 patients (mean age 26 months) with VUR were divided into three groups: group A-patients with grades II-III VUR, conservative treatment; group B-patients with grades III-V VUR, endoscopic correction of VUR; group C-patients with grades III-V VUR, ureteral reimplantation after failed endoscopic correction. Control group included 20 healthy children. Biomarker levels were measured by ELISA. 99mTc-DMSA scintigraphy and renal histology were performed if possible. RESULTS: At admission, TGF-ß1 was close to control in all study groups, VEGF increased with severity of the disease, and MCP-1 increased in group C. Six months after correction of VUR, despite clinical and laboratory improvement, TGF-ß1 and MCP-1 increased while VEGF decreased compared to the admission values in all groups; no amelioration of renal scarring was detected either by 99mTc-DMSA scintigraphy or renal histology. CONCLUSION: The results support our hypothesis that successful correction of VUR is not sufficient to stop or reduce the latent inflammatory and fibrotic processes that have already started in the kidney regardless of the reflux grade and treatment option. Measuring the urinary levels of TGF-ß1, VEGF, and MCP-1 may aid in the development of non-invasive, pathophysiologically relevant approach to diagnosis and monitoring of kidney injury and fibrosis in children with VUR.