Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 38
Filter
Add more filters











Publication year range
1.
Nucleic Acids Res ; 51(9): 4191-4207, 2023 05 22.
Article in English | MEDLINE | ID: mdl-37026479

ABSTRACT

Adenosine deaminase acting on RNA ADAR1 promotes A-to-I conversion in double-stranded and structured RNAs. ADAR1 has two isoforms transcribed from different promoters: cytoplasmic ADAR1p150 is interferon-inducible while ADAR1p110 is constitutively expressed and primarily localized in the nucleus. Mutations in ADAR1 cause Aicardi - Goutières syndrome (AGS), a severe autoinflammatory disease associated with aberrant IFN production. In mice, deletion of ADAR1 or the p150 isoform leads to embryonic lethality driven by overexpression of interferon-stimulated genes. This phenotype is rescued by deletion of the cytoplasmic dsRNA-sensor MDA5 indicating that the p150 isoform is indispensable and cannot be rescued by ADAR1p110. Nevertheless, editing sites uniquely targeted by ADAR1p150 remain elusive. Here, by transfection of ADAR1 isoforms into ADAR-less mouse cells we detect isoform-specific editing patterns. Using mutated ADAR variants, we test how intracellular localization and the presence of a Z-DNA binding domain-α affect editing preferences. These data show that ZBDα only minimally contributes to p150 editing-specificity while isoform-specific editing is primarily directed by the intracellular localization of ADAR1 isoforms. Our study is complemented by RIP-seq on human cells ectopically expressing tagged-ADAR1 isoforms. Both datasets reveal enrichment of intronic editing and binding by ADAR1p110 while ADAR1p150 preferentially binds and edits 3'UTRs.


Subject(s)
Adenosine Deaminase , Interferons , RNA Editing , RNA, Double-Stranded , Animals , Humans , Mice , Adenosine Deaminase/genetics , Adenosine Deaminase/metabolism , Cell Nucleus/metabolism , Cytoplasm/metabolism , Interferons/genetics , Protein Isoforms/genetics , Protein Isoforms/metabolism , RNA, Double-Stranded/genetics
2.
PLoS Genet ; 18(8): e1010376, 2022 08.
Article in English | MEDLINE | ID: mdl-35994477

ABSTRACT

The class I histone deacetylases are essential regulators of cell fate decisions in health and disease. While pan- and class-specific HDAC inhibitors are available, these drugs do not allow a comprehensive understanding of individual HDAC function, or the therapeutic potential of isoform-specific targeting. To systematically compare the impact of individual catalytic functions of HDAC1, HDAC2 and HDAC3, we generated human HAP1 cell lines expressing catalytically inactive HDAC enzymes. Using this genetic toolbox we compare the effect of individual HDAC inhibition with the effects of class I specific inhibitors on cell viability, protein acetylation and gene expression. Individual inactivation of HDAC1 or HDAC2 has only mild effects on cell viability, while HDAC3 inactivation or loss results in DNA damage and apoptosis. Inactivation of HDAC1/HDAC2 led to increased acetylation of components of the COREST co-repressor complex, reduced deacetylase activity associated with this complex and derepression of neuronal genes. HDAC3 controls the acetylation of nuclear hormone receptor associated proteins and the expression of nuclear hormone receptor regulated genes. Acetylation of specific histone acetyltransferases and HDACs is sensitive to inactivation of HDAC1/HDAC2. Over a wide range of assays, we determined that in particular HDAC1 or HDAC2 catalytic inactivation mimics class I specific HDAC inhibitors. Importantly, we further demonstrate that catalytic inactivation of HDAC1 or HDAC2 sensitizes cells to specific cancer drugs. In summary, our systematic study revealed isoform-specific roles of HDAC1/2/3 catalytic functions. We suggest that targeted genetic inactivation of particular isoforms effectively mimics pharmacological HDAC inhibition allowing the identification of relevant HDACs as targets for therapeutic intervention.


Subject(s)
Histone Deacetylase 1 , Histone Deacetylase Inhibitors , Acetylation , Histone Deacetylase 1/genetics , Histone Deacetylase 1/metabolism , Histone Deacetylase 2/genetics , Histone Deacetylase 2/metabolism , Histone Deacetylase Inhibitors/pharmacology , Histone Deacetylases/genetics , Histone Deacetylases/metabolism , Humans , Protein Isoforms/genetics , Protein Isoforms/metabolism
3.
Bioinformatics ; 37(15): 2126-2133, 2021 Aug 09.
Article in English | MEDLINE | ID: mdl-33538792

ABSTRACT

MOTIVATION: Predicting the folding dynamics of RNAs is a computationally difficult problem, first and foremost due to the combinatorial explosion of alternative structures in the folding space. Abstractions are therefore needed to simplify downstream analyses, and thus make them computationally tractable. This can be achieved by various structure sampling algorithms. However, current sampling methods are still time consuming and frequently fail to represent key elements of the folding space. METHOD: We introduce RNAxplorer, a novel adaptive sampling method to efficiently explore the structure space of RNAs. RNAxplorer uses dynamic programming to perform an efficient Boltzmann sampling in the presence of guiding potentials, which are accumulated into pseudo-energy terms and reflect similarity to already well-sampled structures. This way, we effectively steer sampling toward underrepresented or unexplored regions of the structure space. RESULTS: We developed and applied different measures to benchmark our sampling methods against its competitors. Most of the measures show that RNAxplorer produces more diverse structure samples, yields rare conformations that may be inaccessible to other sampling methods and is better at finding the most relevant kinetic traps in the landscape. Thus, it produces a more representative coarse graining of the landscape, which is well suited to subsequently compute better approximations of RNA folding kinetics. AVAILABILITYAND IMPLEMENTATION: https://github.com/ViennaRNA/RNAxplorer/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Plant Physiol ; 180(1): 305-322, 2019 05.
Article in English | MEDLINE | ID: mdl-30760640

ABSTRACT

Cis-Natural Antisense Transcripts (cis-NATs), which overlap protein coding genes and are transcribed from the opposite DNA strand, constitute an important group of noncoding RNAs. Whereas several examples of cis-NATs regulating the expression of their cognate sense gene are known, most cis-NATs function by altering the steady-state level or structure of mRNA via changes in transcription, mRNA stability, or splicing, and very few cases involve the regulation of sense mRNA translation. This study was designed to systematically search for cis-NATs influencing cognate sense mRNA translation in Arabidopsis (Arabidopsis thaliana). Establishment of a pipeline relying on sequencing of total polyA+ and polysomal RNA from Arabidopsis grown under various conditions (i.e. nutrient deprivation and phytohormone treatments) allowed the identification of 14 cis-NATs whose expression correlated either positively or negatively with cognate sense mRNA translation. With use of a combination of cis-NAT stable over-expression in transgenic plants and transient expression in protoplasts, the impact of cis-NAT expression on mRNA translation was confirmed for 4 out of 5 tested cis-NAT:sense mRNA pairs. These results expand the number of cis-NATs known to regulate cognate sense mRNA translation and provide a foundation for future studies of their mode of action. Moreover, this study highlights the role of this class of noncoding RNAs in translation regulation.


Subject(s)
Arabidopsis/genetics , Protein Biosynthesis , RNA, Antisense/genetics , Arabidopsis Proteins/genetics , DNA-Binding Proteins/genetics , Gene Expression Regulation, Plant , Plants, Genetically Modified , RNA, Messenger/genetics , RNA, Plant , Reproducibility of Results , Sequence Analysis, RNA , Transcription Factors/genetics
5.
Methods ; 156: 32-39, 2019 03 01.
Article in English | MEDLINE | ID: mdl-30385321

ABSTRACT

Chemical modifications of RNA nucleotides change their identity and characteristics and thus alter genetic and structural information encoded in the genomic DNA. tRNA and rRNA are probably the most heavily modified genes, and often depend on derivatization or isomerization of their nucleobases in order to correctly fold into their functional structures. Recent RNomics studies, however, report transcriptome wide RNA modification and suggest a more general regulation of structuredness of RNAs by this so called epitranscriptome. Modification seems to require specific substrate structures, which in turn are stabilized or destabilized and thus promote or inhibit refolding events of regulatory RNA structures. In this review, we revisit RNA modifications and the related structures from a computational point of view. We discuss known substrate structures, their properties such as sub-motifs as well as consequences of modifications on base pairing patterns and possible refolding events. Given that efficient RNA structure prediction methods for canonical base pairs have been established several decades ago, we review to what extend these methods allow the inclusion of modified nucleotides to model and study epitranscriptomic effects on RNA structures.


Subject(s)
Adenosine/metabolism , Inosine/metabolism , RNA Processing, Post-Transcriptional , Sequence Analysis, RNA/methods , Transcriptome , Animals , Base Pairing , Base Sequence , Humans , Methylation , MicroRNAs/genetics , MicroRNAs/metabolism , Nucleic Acid Conformation , RNA, Messenger/genetics , RNA, Messenger/metabolism , RNA, Ribosomal/genetics , RNA, Ribosomal/metabolism , RNA, Small Nuclear/genetics , RNA, Small Nuclear/metabolism , RNA, Transfer/genetics , RNA, Transfer/metabolism
6.
Genes (Basel) ; 9(8)2018 Aug 01.
Article in English | MEDLINE | ID: mdl-30071678

ABSTRACT

In this work, we present a computational screen conducted for functional RNA structures, resulting in over 100,000 conserved RNA structure elements found in alignments of mouse (mm10) against 59 other vertebrates. We explicitly included masked repeat regions to explore the potential of transposable elements and low-complexity regions to give rise to regulatory RNA elements. In our analysis pipeline, we implemented a four-step procedure: (i) we screened genome-wide alignments for potential structure elements using RNAz-2, (ii) realigned and refined candidate loci with LocARNA-P, (iii) scored candidates again with RNAz-2 in structure alignment mode, and (iv) searched for additional homologous loci in mouse genome that were not covered by genome alignments. The 3'-untranslated regions (3'-UTRs) of protein-coding genes and small noncoding RNAs are enriched for structures, while coding sequences are depleted. Repeat-associated loci make up about 95% of the homologous loci identified and are, as expected, predominantly found in intronic and intergenic regions. Nevertheless, we report the structure elements enriched in specific genome elements, such as 3'-UTRs and long noncoding RNAs (lncRNAs). We provide full access to our results via a custom UCSC genome browser trackhub freely available on our website (http://rna.tbi.univie.ac.at/trackhubs/#RNAz).

8.
Genome Biol ; 17(1): 220, 2016 10 25.
Article in English | MEDLINE | ID: mdl-27782844

ABSTRACT

BACKGROUND: Short interspersed elements (SINEs) represent the most abundant group of non-long-terminal repeat transposable elements in mammalian genomes. In primates, Alu elements are the most prominent and homogenous representatives of SINEs. Due to their frequent insertion within or close to coding regions, SINEs have been suggested to play a crucial role during genome evolution. Moreover, Alu elements within mRNAs have also been reported to control gene expression at different levels. RESULTS: Here, we undertake a genome-wide analysis of insertion patterns of human Alus within transcribed portions of the genome. Multiple, nearby insertions of SINEs within one transcript are more abundant in tandem orientation than in inverted orientation. Indeed, analysis of transcriptome-wide expression levels of 15 ENCODE cell lines suggests a cis-repressive effect of inverted Alu elements on gene expression. Using reporter assays, we show that the negative effect of inverted SINEs on gene expression is independent of known sensors of double-stranded RNAs. Instead, transcriptional elongation seems impaired, leading to reduced mRNA levels. CONCLUSIONS: Our study suggests that there is a bias against multiple SINE insertions that can promote intramolecular base pairing within a transcript. Moreover, at a genome-wide level, mRNAs harboring inverted SINEs are less expressed than mRNAs harboring single or tandemly arranged SINEs. Finally, we demonstrate a novel mechanism by which inverted SINEs can impact on gene expression by interfering with RNA polymerase II.


Subject(s)
RNA Polymerase II/genetics , Short Interspersed Nucleotide Elements/genetics , Transcription, Genetic , Transcriptome/genetics , Alu Elements/genetics , Cell Line , Evolution, Molecular , Gene Expression Regulation , Genome, Human , Humans , RNA, Double-Stranded/genetics , RNA, Messenger/genetics
9.
Sci Rep ; 6: 34589, 2016 10 07.
Article in English | MEDLINE | ID: mdl-27713552

ABSTRACT

The unprecedented outbreak of Ebola in West Africa resulted in over 28,000 cases and 11,000 deaths, underlining the need for a better understanding of the biology of this highly pathogenic virus to develop specific counter strategies. Two filoviruses, the Ebola and Marburg viruses, result in a severe and often fatal infection in humans. However, bats are natural hosts and survive filovirus infections without obvious symptoms. The molecular basis of this striking difference in the response to filovirus infections is not well understood. We report a systematic overview of differentially expressed genes, activity motifs and pathways in human and bat cells infected with the Ebola and Marburg viruses, and we demonstrate that the replication of filoviruses is more rapid in human cells than in bat cells. We also found that the most strongly regulated genes upon filovirus infection are chemokine ligands and transcription factors. We observed a strong induction of the JAK/STAT pathway, of several genes encoding inhibitors of MAP kinases (DUSP genes) and of PPP1R15A, which is involved in ER stress-induced cell death. We used comparative transcriptomics to provide a data resource that can be used to identify cellular responses that might allow bats to survive filovirus infections.


Subject(s)
Ebolavirus/metabolism , Gene Expression Regulation , Hemorrhagic Fever, Ebola/metabolism , Marburg Virus Disease/metabolism , Marburgvirus/metabolism , Signal Transduction , Transcription, Genetic , Animals , Cell Line, Tumor , Chiroptera , Humans
10.
Nat Commun ; 7: 12339, 2016 08 17.
Article in English | MEDLINE | ID: mdl-27531712

ABSTRACT

Long non-coding RNAs (lncRNAs) constitute a large, yet mostly uncharacterized fraction of the mammalian transcriptome. Such characterization requires a comprehensive, high-quality annotation of their gene structure and boundaries, which is currently lacking. Here we describe RACE-Seq, an experimental workflow designed to address this based on RACE (rapid amplification of cDNA ends) and long-read RNA sequencing. We apply RACE-Seq to 398 human lncRNA genes in seven tissues, leading to the discovery of 2,556 on-target, novel transcripts. About 60% of the targeted loci are extended in either 5' or 3', often reaching genomic hallmarks of gene boundaries. Analysis of the novel transcripts suggests that lncRNAs are as long, have as many exons and undergo as much alternative splicing as protein-coding genes, contrary to current assumptions. Overall, we show that RACE-Seq is an effective tool to annotate an organism's deep transcriptome, and compares favourably to other targeted sequencing techniques.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Polymerase Chain Reaction/methods , RNA, Long Noncoding/genetics , Sequence Analysis, RNA/methods , Exons/genetics , Genetic Loci , Humans , Molecular Sequence Annotation , Organ Specificity/genetics , Proof of Concept Study , Protein Isoforms/genetics , Protein Isoforms/metabolism , RNA Splice Sites/genetics , RNA, Long Noncoding/metabolism , RNA, Messenger/genetics , RNA, Messenger/metabolism , Transcriptome/genetics
11.
Mol Syst Biol ; 12(5): 868, 2016 05 13.
Article in English | MEDLINE | ID: mdl-27178967

ABSTRACT

Precise regulation of mRNA decay is fundamental for robust yet not exaggerated inflammatory responses to pathogens. However, a global model integrating regulation and functional consequences of inflammation-associated mRNA decay remains to be established. Using time-resolved high-resolution RNA binding analysis of the mRNA-destabilizing protein tristetraprolin (TTP), an inflammation-limiting factor, we qualitatively and quantitatively characterize TTP binding positions in the transcriptome of immunostimulated macrophages. We identify pervasive destabilizing and non-destabilizing TTP binding, including a robust intronic binding, showing that TTP binding is not sufficient for mRNA destabilization. A low degree of flanking RNA structuredness distinguishes occupied from silent binding motifs. By functionally relating TTP binding sites to mRNA stability and levels, we identify a TTP-controlled switch for the transition from inflammatory into the resolution phase of the macrophage immune response. Mapping of binding positions of the mRNA-stabilizing protein HuR reveals little target and functional overlap with TTP, implying a limited co-regulation of inflammatory mRNA decay by these proteins. Our study establishes a functionally annotated and navigable transcriptome-wide atlas (http://ttp-atlas.univie.ac.at) of cis-acting elements controlling mRNA decay in inflammation.


Subject(s)
Lipopolysaccharides/pharmacology , Macrophages/immunology , RNA, Messenger/chemistry , Tristetraprolin/metabolism , Animals , Binding Sites , Cells, Cultured , Gene Expression Profiling/methods , Gene Expression Regulation , HEK293 Cells , Humans , Macrophages/drug effects , Mice , RNA Stability , RNA, Messenger/metabolism , Sequence Analysis, RNA
12.
Methods ; 103: 86-98, 2016 07 01.
Article in English | MEDLINE | ID: mdl-27064083

ABSTRACT

RNA secondary structures have proven essential for understanding the regulatory functions performed by RNA such as microRNAs, bacterial small RNAs, or riboswitches. This success is in part due to the availability of efficient computational methods for predicting RNA secondary structures. Recent advances focus on dealing with the inherent uncertainty of prediction by considering the ensemble of possible structures rather than the single most stable one. Moreover, the advent of high-throughput structural probing has spurred the development of computational methods that incorporate such experimental data as auxiliary information.


Subject(s)
RNA/chemistry , Algorithms , Base Sequence , Computational Biology , Computer Simulation , Humans , Models, Molecular , RNA Folding , Sequence Analysis, RNA
13.
Nucleic Acids Res ; 44(D1): D90-5, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26602692

ABSTRACT

AREsite2 represents an update for AREsite, an on-line resource for the investigation of AU-rich elements (ARE) in human and mouse mRNA 3'UTR sequences. The new updated and enhanced version allows detailed investigation of AU, GU and U-rich elements (ARE, GRE, URE) in the transcriptome of Homo sapiens, Mus musculus, Danio rerio, Caenorhabditis elegans and Drosophila melanogaster. It contains information on genomic location, genic context, RNA secondary structure context and conservation of annotated motifs. Improvements include annotation of motifs not only in 3'UTRs but in the whole gene body including introns, additional genomes, and locally stable secondary structures from genome wide scans. Furthermore, we include data from CLIP-Seq experiments in order to highlight motifs with validated protein interaction. Additionally, we provide a REST interface for experienced users to interact with the database in a semi-automated manner. The database is publicly available at: http://rna.tbi.univie.ac.at/AREsite.


Subject(s)
3' Untranslated Regions , Databases, Nucleic Acid , RNA/chemistry , Animals , Genomics , Humans , Mice , Molecular Sequence Annotation , Nucleic Acid Conformation , Nucleotide Motifs
14.
Nat Commun ; 6: 5903, 2015 Jan 13.
Article in English | MEDLINE | ID: mdl-25582907

ABSTRACT

Mice have been a long-standing model for human biology and disease. Here we characterize, by RNA sequencing, the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates. Comparison with transcriptome profiles in human cell lines reveals substantial conservation of transcriptional programmes, and uncovers a distinct class of genes with levels of expression that have been constrained early in vertebrate evolution. This core set of genes captures a substantial fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types. Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer. Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but is associated with conserved epigenetic marking, as well as with characteristic post-transcriptional regulatory programme, in which sub-cellular localization and alternative splicing play comparatively large roles.


Subject(s)
Evolution, Molecular , Gene Expression Regulation , Transcriptome , Alternative Splicing , Animals , Biological Evolution , Cell Line , Epigenesis, Genetic , Gene Expression Profiling , Gene Library , Genome , Histones/chemistry , Humans , Mice , Mice, Inbred C57BL , Models, Genetic , Oligonucleotides, Antisense , Phenotype , Sequence Analysis, RNA
15.
Nature ; 515(7527): 355-64, 2014 Nov 20.
Article in English | MEDLINE | ID: mdl-25409824

ABSTRACT

The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.


Subject(s)
Genome/genetics , Genomics , Mice/genetics , Molecular Sequence Annotation , Animals , Cell Lineage/genetics , Chromatin/genetics , Chromatin/metabolism , Conserved Sequence/genetics , DNA Replication/genetics , Deoxyribonuclease I/metabolism , Gene Expression Regulation/genetics , Gene Regulatory Networks/genetics , Genome-Wide Association Study , Humans , RNA/genetics , Regulatory Sequences, Nucleic Acid/genetics , Species Specificity , Transcription Factors/metabolism , Transcriptome/genetics
16.
Genome Biol ; 15(2): R34, 2014 Feb 10.
Article in English | MEDLINE | ID: mdl-24512684

ABSTRACT

Numerous high-throughput sequencing studies have focused on detecting conventionally spliced mRNAs in RNA-seq data. However, non-standard RNAs arising through gene fusion, circularization or trans-splicing are often neglected. We introduce a novel, unbiased algorithm to detect splice junctions from single-end cDNA sequences. In contrast to other methods, our approach accommodates multi-junction structures. Our method compares favorably with competing tools for conventionally spliced mRNAs and, with a gain of up to 40% of recall, systematically outperforms them on reads with multiple splits, trans-splicing and circular products. The algorithm is integrated into our mapping tool segemehl (http://www.bioinf.uni-leipzig.de/Software/segemehl/).


Subject(s)
Algorithms , RNA Splicing/genetics , RNA/genetics , Trans-Splicing/genetics , DNA, Complementary/genetics , High-Throughput Nucleotide Sequencing , RNA, Circular , RNA, Messenger/metabolism , Software
17.
Article in English | MEDLINE | ID: mdl-24334379

ABSTRACT

G-quadruplexes are abundant locally stable structural elements in nucleic acids. The combinatorial theory of RNA structures and the dynamic programming algorithms for RNA secondary structure prediction are extended here to incorporate G-quadruplexes using a simple but plausible energy model. With preliminary energy parameters, we find that the overwhelming majority of putative quadruplex-forming sequences in the human genome are likely to fold into canonical secondary structures instead. Stable G-quadruplexes are strongly enriched, however, in the 5'UTR of protein coding mRNAs.


Subject(s)
G-Quadruplexes , Nucleic Acid Conformation , RNA, Messenger/chemistry , 5' Untranslated Regions , Base Sequence , Computational Biology , Humans , RNA Folding , RNA, Messenger/genetics , RNA, Messenger/metabolism , Sequence Alignment , Sequence Analysis, RNA , Thermodynamics
18.
Genome Res ; 22(9): 1698-710, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22955982

ABSTRACT

Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematically, experimentally validated. Predicted exon-exon junctions were evaluated by RT-PCR amplification followed by highly multiplexed sequencing readout, a method we called RT-PCR-seq. Seventy-nine percent of all assessed junctions are confirmed by this evaluation procedure, demonstrating the high quality of the GENCODE gene set. RT-PCR-seq was also efficient to screen gene models predicted using the Human Body Map (HBM) RNA-seq data. We validated 73% of these predictions, thus confirming 1168 novel genes, mostly noncoding, which will further complement the GENCODE annotation. Our novel experimental validation pipeline is extremely sensitive, far more than unbiased transcriptome profiling through RNA sequencing, which is becoming the norm. For example, exon-exon junctions unique to GENCODE annotated transcripts are five times more likely to be corroborated with our targeted approach than with extensive large human transcriptome profiling. Data sets such as the HBM and ENCODE RNA-seq data fail sampling of low-expressed transcripts. Our RT-PCR-seq targeted approach also has the advantage of identifying novel exons of known genes, as we discovered unannotated exons in ~11% of assessed introns. We thus estimate that at least 18% of known loci have yet-unannotated exons. Our work demonstrates that the cataloging of all of the genic elements encoded in the human genome will necessitate a coordinated effort between unbiased and targeted approaches, like RNA-seq and RT-PCR-seq.


Subject(s)
Gene Expression Profiling/methods , Genome, Human , Transcriptome , Computational Biology/methods , Exons , High-Throughput Nucleotide Sequencing , Humans , Introns , Molecular Sequence Annotation , Open Reading Frames , RNA Isoforms , RNA, Messenger/chemistry , RNA, Messenger/genetics , Reproducibility of Results , Reverse Transcriptase Polymerase Chain Reaction , Sensitivity and Specificity
19.
Genome Res ; 22(9): 1760-74, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22955987

ABSTRACT

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.


Subject(s)
Databases, Genetic , Genome, Human , Genomics/methods , Molecular Sequence Annotation , Animals , Computational Biology/methods , DNA, Complementary/chemistry , DNA, Complementary/genetics , Evolution, Molecular , Exons , Genetic Loci , Humans , Internet , Models, Molecular , Open Reading Frames , Pseudogenes , Quality Control , RNA Splice Sites , RNA, Long Noncoding , Reproducibility of Results , Untranslated Regions
20.
Genome Res ; 22(9): 1775-89, 2012 Sep.
Article in English | MEDLINE | ID: mdl-22955988

ABSTRACT

The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most complete human lncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/intron lengths. In contrast to protein-coding genes, however, lncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences-particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that lncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific lncRNAs expressed in the brain. Expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of lncRNAs.


Subject(s)
Databases, Genetic , RNA, Long Noncoding/genetics , Alternative Splicing , Animals , Cell Nucleus/genetics , Cell Nucleus/metabolism , Cluster Analysis , Evolution, Molecular , Exons , Gene Expression Profiling , Gene Expression Regulation , Histones/metabolism , Humans , Molecular Sequence Annotation , Open Reading Frames , Organ Specificity/genetics , Primates/genetics , RNA Processing, Post-Transcriptional , RNA Splice Sites , RNA, Messenger/genetics , Selection, Genetic , Transcription, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL