RESUMO
In eukaryotes, capped RNAs include long transcripts such as messenger RNAs and long noncoding RNAs, as well as shorter transcripts such as spliceosomal RNAs, small nucleolar RNAs, and enhancer RNAs. Long capped transcripts can be profiled using cap analysis gene expression (CAGE) sequencing and other methods. Here, we describe a sequencing library preparation protocol for short capped RNAs, apply it to a differentiation time course of the human cell line THP-1, and systematically compare the landscape of short capped RNAs to that of long capped RNAs. Transcription initiation peaks associated with genes in the sense direction have a strong preference to produce either long or short capped RNAs, with one out of six peaks detected in the short capped RNA libraries only. Gene-associated short capped RNAs have highly specific 3' ends, typically overlapping splice sites. Enhancers also preferentially generate either short or long capped RNAs, with 10% of enhancers observed in the short capped RNA libraries only. Enhancers producing either short or long capped RNAs show enrichment for GWAS-associated disease SNPs. We conclude that deep sequencing of short capped RNAs reveals new families of noncoding RNAs and elucidates the diversity of transcripts generated at known and novel promoters and enhancers.
RESUMO
Long-read RNA sequencing is essential to produce accurate and exhaustive annotation of eukaryotic genomes. Despite advancements in throughput and accuracy, achieving reliable end-to-end identification of RNA transcripts remains a challenge for long-read sequencing methods. To address this limitation, we develop CapTrap-seq, a cDNA library preparation method, which combines the Cap-trapping strategy with oligo(dT) priming to detect 5' capped, full-length transcripts. In our study, we evaluate the performance of CapTrap-seq alongside other widely used RNA-seq library preparation protocols in human and mouse tissues, employing both ONT and PacBio sequencing technologies. To explore the quantitative capabilities of CapTrap-seq and its accuracy in reconstructing full-length RNA molecules, we implement a capping strategy for synthetic RNA spike-in sequences that mimics the natural 5'cap formation. Our benchmarks, incorporating the Long-read RNA-seq Genome Annotation Assessment Project (LRGASP) data, demonstrate that CapTrap-seq is a competitive, platform-agnostic RNA library preparation method for generating full-length transcript sequences.
Assuntos
Biblioteca Gênica , Análise de Sequência de RNA , Animais , Humanos , Camundongos , Análise de Sequência de RNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , RNA/genética , Capuzes de RNA/genéticaRESUMO
Long-read RNA sequencing is essential to produce accurate and exhaustive annotation of eukaryotic genomes. Despite advancements in throughput and accuracy, achieving reliable end-to-end identification of RNA transcripts remains a challenge for long-read sequencing methods. To address this limitation, we developed CapTrap-seq, a cDNA library preparation method, which combines the Cap-trapping strategy with oligo(dT) priming to detect 5'capped, full-length transcripts, together with the data processing pipeline LyRic. We benchmarked CapTrap-seq and other popular RNA-seq library preparation protocols in a number of human tissues using both ONT and PacBio sequencing. To assess the accuracy of the transcript models produced, we introduced a capping strategy for synthetic RNA spike-in sequences that mimics the natural 5'cap formation in RNA spike-in molecules. We found that the vast majority (up to 90%) of transcript models that LyRic derives from CapTrap-seq reads are full-length. This makes it possible to produce highly accurate annotations with minimal human intervention.
RESUMO
MicroRNAs (miRNAs) have been demonstrated to be potent post-trascriptional modulators of protein expression. miRNA expression was profiled in the left and right dorsal hippocampal CA3 of mature rats by high-throughput deep sequencing. Among the sequenced and cross-mapped small RNAs, 88% belonged to the miRNAs annotated in the miRBase 15 database. Nearly half of the small RNAs belonged to the let-7 family miRNA. Seven percent of the sequenced small RNAs were not annotated in miRBase 15. Bioinformatic analysis of the unannotated small RNA sequences suggested seventeen novel miRNA candidates with relatively high expression levels (>100 tags per million). The left:right expression ratios were similar for all highly expressed miRNAs with less than 10% differences. These results provide a basic idea of the relative expression strengths of known and unknown miRNAs in the dorsal hippocampal CA3.
Assuntos
Região CA3 Hipocampal/metabolismo , MicroRNAs/genética , Animais , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Masculino , Ratos , Ratos Long-Evans , Análise de Sequência de RNARESUMO
Animal-specific gene families involved in cell-cell communication and developmental control comprise many subfamilies with distinct domain structures and functions. They diverged by subfamily-generating duplications and domain shufflings before the parazoan-eumetazoan split. Here, we have cloned 40 PTK cDNAs from choanoflagellates, Monosiga ovata, Stephanoeca diplocostata and Codosiga gracilis, the closest relatives to animals. A phylogeny-based analysis of PTKs revealed that 40 out of 47 subfamilies analyzed have unique domain structures and are possibly generated independently in animal and choanoflagellate lineages by domain shufflings. Seven cytoplasmic subfamilies showed divergence before the animal-choanoflagellate split originated by both duplications and shufflings.
Assuntos
Cordados não Vertebrados/enzimologia , Cordados não Vertebrados/genética , Evolução Molecular , Filogenia , Proteínas Tirosina Quinases/genética , Animais , DNA Complementar/isolamento & purificação , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Proteínas Tirosina Quinases/química , Especificidade da EspécieRESUMO
CAGE (cap analysis of gene expression) is a method for identifying transcription start sites by sequencing the first 20 or 21 nucleotides from the 5' end of capped transcripts, allowing genome-wide promoter analyses to be performed. The potential of the CAGE as a form of expression profiling was limited previously by sequencing technology and the labor-intensive protocol. Here we describe an improved CAGE method for use with a next generation sequencer. This modified method allows the identification of the RNA source of each CAGE tag within a pooled library by introducing DNA tags (barcodes). The method not only drastically improves the sequencing capacity, but also contributes to savings in both time and budget. Additionally, this pooled CAGE tag method enables the dynamic changes in promoter usage and gene expression to be monitored.
Assuntos
Etiquetas de Sequências Expressas , Perfilação da Expressão Gênica , Regiões Promotoras Genéticas , Análise de Sequência de DNA/instrumentação , Sítio de Iniciação de Transcrição , Reação em Cadeia da Polimerase Via Transcriptase ReversaRESUMO
The cap analysis of gene expression (CAGE) technology has been established to detect transcriptional starting sites (TSSs) and expression levels by utilizing 5' cDNA tags and PCR. It has been reported that the amount of templates is proportional to the amplification efficiency of PCR. CAGE has been used as a key technique for analyzing promoter activity and finding new transcripts including alternative spliced products and noncoding transcripts. Here, we introduce more powerful tools such as deepCAGE, which can be utilized for high-throughput next-generation sequencing technology. DeepCAGE can produce much deeper transcriptome datasets and can reveal more details of the regulatory network.
Assuntos
Perfilação da Expressão Gênica , Reação em Cadeia da Polimerase/métodos , Sequência de Bases , Primers do DNA , DNA Complementar/genéticaRESUMO
It has been reported that relatively short RNAs of heterogeneous sizes are derived from sequences near the promoters of eukaryotic genes. In conjunction with the FANTOM4 project, we have identified tiny RNAs with a modal length of 18 nt that map within -60 to +120 nt of transcription start sites (TSSs) in human, chicken and Drosophila. These transcription initiation RNAs (tiRNAs) are derived from sequences on the same strand as the TSS and are preferentially associated with G+C-rich promoters. The 5' ends of tiRNAs show peak density 10-30 nt downstream of TSSs, indicating that they are processed. tiRNAs are generally, although not exclusively, associated with highly expressed transcripts and sites of RNA polymerase II binding. We suggest that tiRNAs may be a general feature of transcription in metazoa and possibly all eukaryotes.
Assuntos
RNA/química , Sítio de Iniciação de Transcrição , Animais , Embrião de Galinha , Galinhas/metabolismo , Drosophila/genética , Drosophila/metabolismo , Humanos , Regiões Promotoras Genéticas , RNA/metabolismo , Transcrição GênicaRESUMO
Finding and characterizing mRNAs, their transcription start sites (TSS), and their associated promoters is a major focus in post-genome biology. Mammalian cells have at least 5-10 magnitudes more TSS than previously believed, and deeper sequencing is necessary to detect all active promoters in a given tissue. Here, we present a new method for high-throughput sequencing of 5' cDNA tags-DeepCAGE: merging the Cap Analysis of Gene Expression method with ultra-high-throughput sequence technology. We apply DeepCAGE to characterize 1.4 million sequenced TSS from mouse hippocampus and reveal a wealth of novel core promoters that are preferentially used in hippocampus: This is the most comprehensive promoter data set for any tissue to date. Using these data, we present evidence indicating a key role for the Arnt2 transcription factor in hippocampus gene regulation. DeepCAGE can also detect promoters used only in a small subset of cells within the complex tissue.