RESUMEN
Genes specifying long non-coding RNAs (lncRNAs) occupy a large fraction of the genomes of complex organisms. The term 'lncRNAs' encompasses RNA polymerase I (Pol I), Pol II and Pol III transcribed RNAs, and RNAs from processed introns. The various functions of lncRNAs and their many isoforms and interleaved relationships with other genes make lncRNA classification and annotation difficult. Most lncRNAs evolve more rapidly than protein-coding sequences, are cell type specific and regulate many aspects of cell differentiation and development and other physiological processes. Many lncRNAs associate with chromatin-modifying complexes, are transcribed from enhancers and nucleate phase separation of nuclear condensates and domains, indicating an intimate link between lncRNA expression and the spatial control of gene expression during development. lncRNAs also have important roles in the cytoplasm and beyond, including in the regulation of translation, metabolism and signalling. lncRNAs often have a modular structure and are rich in repeats, which are increasingly being shown to be relevant to their function. In this Consensus Statement, we address the definition and nomenclature of lncRNAs and their conservation, expression, phenotypic visibility, structure and functions. We also discuss research challenges and provide recommendations to advance the understanding of the roles of lncRNAs in development, cell biology and disease.
Asunto(s)
ARN Largo no Codificante , ARN Largo no Codificante/genética , Núcleo Celular/genética , Cromatina/genética , Secuencias Reguladoras de Ácidos Nucleicos , ARN Polimerasa II/genéticaRESUMEN
Most plant roots have multiple cortex layers that make up the bulk of the organ and play key roles in physiology, such as flood tolerance and symbiosis. However, little is known about the formation of cortical layers outside of the highly reduced anatomy of Arabidopsis. Here, we used single-cell RNA sequencing to rapidly generate a cell-resolution map of the maize root, revealing an alternative configuration of the tissue formative transcription factor SHORT-ROOT (SHR) adjacent to an expanded cortex. We show that maize SHR protein is hypermobile, moving at least eight cell layers into the cortex. Higher-order SHR mutants in both maize and Setaria have reduced numbers of cortical layers, showing that the SHR pathway controls expansion of cortical tissue to elaborate anatomical complexity.
Asunto(s)
Proteínas de Plantas/metabolismo , Raíces de Plantas/citología , Raíces de Plantas/metabolismo , Setaria (Planta)/metabolismo , Factores de Transcripción/metabolismo , Zea mays/metabolismo , Citometría de Flujo , Genoma de Planta , Proteínas de Plantas/genética , Raíces de Plantas/genética , RNA-Seq , Setaria (Planta)/citología , Setaria (Planta)/genética , Análisis de la Célula Individual , Factores de Transcripción/genética , Transcripción Genética , Zea mays/citología , Zea mays/genéticaRESUMEN
As a means to understand human neuropsychiatric disorders from human brain samples, we compared the transcription patterns and histological features of postmortem brain to fresh human neocortex isolated immediately following surgical removal. Compared to a number of neuropsychiatric disease-associated postmortem transcriptomes, the fresh human brain transcriptome had an entirely unique transcriptional pattern. To understand this difference, we measured genome-wide transcription as a function of time after fresh tissue removal to mimic the postmortem interval. Within a few hours, a selective reduction in the number of neuronal activity-dependent transcripts occurred with relative preservation of housekeeping genes commonly used as a reference for RNA normalization. Gene clustering indicated a rapid reduction in neuronal gene expression with a reciprocal time-dependent increase in astroglial and microglial gene expression that continued to increase for at least 24 h after tissue resection. Predicted transcriptional changes were confirmed histologically on the same tissue demonstrating that while neurons were degenerating, glial cells underwent an outgrowth of their processes. The rapid loss of neuronal genes and reciprocal expression of glial genes highlights highly dynamic transcriptional and cellular changes that occur during the postmortem interval. Understanding these time-dependent changes in gene expression in post mortem brain samples is critical for the interpretation of research studies on human brain disorders.
Asunto(s)
Biomarcadores , Encéfalo/metabolismo , Encéfalo/patología , Expresión Génica , Autopsia , Biología Computacional/métodos , Perfilación de la Expresión Génica , Humanos , Inmunohistoquímica , Neuronas/metabolismo , Especificidad de Órganos/genética , TranscriptomaRESUMEN
Crop productivity depends on activity of meristems that produce optimized plant architectures, including that of the maize ear. A comprehensive understanding of development requires insight into the full diversity of cell types and developmental domains and the gene networks required to specify them. Until now, these were identified primarily by morphology and insights from classical genetics, which are limited by genetic redundancy and pleiotropy. Here, we investigated the transcriptional profiles of 12,525 single cells from developing maize ears. The resulting developmental atlas provides a single-cell RNA sequencing (scRNA-seq) map of an inflorescence. We validated our results by mRNA in situ hybridization and by fluorescence-activated cell sorting (FACS) RNA-seq, and we show how these data may facilitate genetic studies by predicting genetic redundancy, integrating transcriptional networks, and identifying candidate genes associated with crop yield traits.
Asunto(s)
Estudios de Asociación Genética , Sitios de Carácter Cuantitativo/genética , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Zea mays/crecimiento & desarrollo , Zea mays/genética , Secuencia de Bases , Regulación del Desarrollo de la Expresión Génica , Regulación de la Expresión Génica de las Plantas , Redes Reguladoras de Genes , Protoplastos/metabolismo , Reproducibilidad de los Resultados , Transcriptoma/genéticaRESUMEN
We have produced RNA sequencing data for 53 primary cells from different locations in the human body. The clustering of these primary cells reveals that most cells in the human body share a few broad transcriptional programs, which define five major cell types: epithelial, endothelial, mesenchymal, neural, and blood cells. These act as basic components of many tissues and organs. Based on gene expression, these cell types redefine the basic histological types by which tissues have been traditionally classified. We identified genes whose expression is specific to these cell types, and from these genes, we estimated the contribution of the major cell types to the composition of human tissues. We found this cellular composition to be a characteristic signature of tissues and to reflect tissue morphological heterogeneity and histology. We identified changes in cellular composition in different tissues associated with age and sex, and found that departures from the normal cellular composition correlate with histological phenotypes associated with disease.
Asunto(s)
Transcripción Genética , Línea Celular , Células Endoteliales/metabolismo , Células Epiteliales/metabolismo , Femenino , Perfilación de la Expresión Génica , Ginecomastia/genética , Ginecomastia/metabolismo , Humanos , Masculino , Mesodermo/citología , Mesodermo/metabolismo , Neoplasias/genética , Especificidad de Órganos , Análisis de Secuencia de ARNRESUMEN
The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression1. The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community.
Asunto(s)
Bases de Datos Genéticas , Genoma/genética , Genómica , Anotación de Secuencia Molecular , Animales , Sitios de Unión , Cromatina/genética , Cromatina/metabolismo , Metilación de ADN , Bases de Datos Genéticas/normas , Bases de Datos Genéticas/tendencias , Regulación de la Expresión Génica/genética , Genoma Humano/genética , Genómica/normas , Genómica/tendencias , Histonas/metabolismo , Humanos , Ratones , Anotación de Secuencia Molecular/normas , Control de Calidad , Secuencias Reguladoras de Ácidos Nucleicos/genética , Factores de Transcripción/metabolismoRESUMEN
Extracellular RNAs participate in intercellular communication, and are being studied as promising minimally invasive diagnostic markers. Several studies in recent years showed that tRNA halves and distinct Y RNA fragments are abundant in the extracellular space, including in biofluids. While their regulatory and diagnostic potential has gained a substantial amount of attention, the biogenesis of these extracellular RNA fragments remains largely unexplored. Here, we demonstrate that these fragments are produced by RNase 1, a highly active secreted nuclease. We use RNA sequencing to investigate the effect of a null mutation of RNase 1 on the levels of tRNA halves and Y RNA fragments in the extracellular environment of cultured human cells. We complement and extend our RNA sequencing results with northern blots, showing that tRNAs and Y RNAs in the non-vesicular extracellular compartment are released from cells as full-length precursors and are subsequently cleaved to distinct fragments. In support of these results, formation of tRNA halves is recapitulated by recombinant human RNase 1 in our in vitro assay. These findings assign a novel function for RNase 1, and position it as a strong candidate for generation of tRNA halves and Y RNA fragments in biofluids.
Asunto(s)
ARN de Transferencia/metabolismo , ARN no Traducido/metabolismo , Ribonucleasas/metabolismo , Humanos , Células K562 , Mutación , División del ARN , Procesamiento Postranscripcional del ARN , ARN de Transferencia/química , ARN no Traducido/química , RNA-SeqRESUMEN
MaizeCODE is a project aimed at identifying and analyzing functional elements in the maize genome. In its initial phase, MaizeCODE assayed up to five tissues from four maize strains (B73, NC350, W22, TIL11) by RNA-Seq, Chip-Seq, RAMPAGE, and small RNA sequencing. To facilitate reproducible science and provide both human and machine access to the MaizeCODE data, we enhanced SciApps, a cloud-based portal, for analysis and distribution of both raw data and analysis results. Based on the SciApps workflow platform, we generated new components to support the complete cycle of MaizeCODE data management. These include publicly accessible scientific workflows for the reproducible and shareable analysis of various functional data, a RESTful API for batch processing and distribution of data and metadata, a searchable data page that lists each MaizeCODE experiment as a reproducible workflow, and integrated JBrowse genome browser tracks linked with workflows and metadata. The SciApps portal is a flexible platform that allows the integration of new analysis tools, workflows, and genomic data from multiple projects. Through metadata and a ready-to-compute cloud-based platform, the portal experience improves access to the MaizeCODE data and facilitates its analysis.
RESUMEN
MicroRNAs (miRNAs) play a critical role as posttranscriptional regulators of gene expression. The ENCODE Project profiled the expression of miRNAs in an extensive set of organs during a time-course of mouse embryonic development and captured the expression dynamics of 785 miRNAs. We found distinct organ-specific and developmental stage-specific miRNA expression clusters, with an overall pattern of increasing organ-specific expression as embryonic development proceeds. Comparative analysis of conserved miRNAs in mouse and human revealed stronger clustering of expression patterns by organ type rather than by species. An analysis of messenger RNA expression clusters compared with miRNA expression clusters identifies the potential role of specific miRNA expression clusters in suppressing the expression of mRNAs specific to other developmental programs in the organ in which these miRNAs are expressed during embryonic development. Our results provide the most comprehensive time-course of miRNA expression as part of an integrated ENCODE reference data set for mouse embryonic development.
Asunto(s)
Desarrollo Embrionario/genética , MicroARNs/genética , Animales , Femenino , Regulación del Desarrollo de la Expresión Génica , Ratones , Embarazo , ARN Mensajero/genéticaRESUMEN
Alu elements are one of the most successful families of transposons in the human genome. A portion of Alu elements is transcribed by RNA Pol III, whereas the remaining ones are part of Pol II transcripts. Because Alu elements are highly repetitive, it has been difficult to identify the Pol III-transcribed elements and quantify their expression levels. In this study, we generated high-resolution, long-genomic-span RAMPAGE data in 155 biosamples all with matching RNA-seq data and built an atlas of 17,249 Pol III-transcribed Alu elements. We further performed an integrative analysis on the ChIP-seq data of 10 histone marks and hundreds of transcription factors, whole-genome bisulfite sequencing data, ChIA-PET data, and functional data in several biosamples, and our results revealed that although the human-specific Alu elements are transcriptionally repressed, the older, expressed Alu elements may be exapted by the human host to function as cell-type-specific enhancers for their nearby protein-coding genes.
Asunto(s)
Elementos Alu , Análisis de Secuencia de ARN/métodos , Secuenciación Completa del Genoma/métodos , Biología Computacional/métodos , Elementos de Facilitación Genéticos , Evolución Molecular , Regulación de la Expresión Génica , Histonas/genética , Humanos , Anotación de Secuencia Molecular , ARN Polimerasa III/metabolismo , Sitio de Iniciación de la TranscripciónRESUMEN
Long noncoding RNAs (lncRNAs) can regulate target gene expression by acting in cis (locally) or in trans (non-locally). Here, we performed genome-wide expression analysis of Toll-like receptor (TLR)-stimulated human macrophages to identify pairs of cis-acting lncRNAs and protein-coding genes involved in innate immunity. A total of 229 gene pairs were identified, many of which were commonly regulated by signaling through multiple TLRs and were involved in the cytokine responses to infection by group B Streptococcus We focused on elucidating the function of one lncRNA, named lnc-MARCKS or ROCKI (Regulator of Cytokines and Inflammation), which was induced by multiple TLR stimuli and acted as a master regulator of inflammatory responses. ROCKI interacted with APEX1 (apurinic/apyrimidinic endodeoxyribonuclease 1) to form a ribonucleoprotein complex at the MARCKS promoter. In turn, ROCKI-APEX1 recruited the histone deacetylase HDAC1, which removed the H3K27ac modification from the promoter, thus reducing MARCKS transcription and subsequent Ca2+ signaling and inflammatory gene expression. Finally, genetic variants affecting ROCKI expression were linked to a reduced risk of certain inflammatory and infectious disease in humans, including inflammatory bowel disease and tuberculosis. Collectively, these data highlight the importance of cis-acting lncRNAs in TLR signaling, innate immunity, and pathophysiological inflammation.
Asunto(s)
Regulación de la Expresión Génica , Inmunidad Innata/inmunología , Inflamación/inmunología , Macrófagos/inmunología , ARN Largo no Codificante/metabolismo , Infecciones Estreptocócicas/microbiología , Receptores Toll-Like/metabolismo , Células Cultivadas , Citocinas/metabolismo , ADN-(Sitio Apurínico o Apirimidínico) Liasa/genética , ADN-(Sitio Apurínico o Apirimidínico) Liasa/metabolismo , Genoma Humano , Histona Desacetilasa 1/genética , Histona Desacetilasa 1/metabolismo , Humanos , Inflamación/genética , Inflamación/microbiología , Macrófagos/metabolismo , Macrófagos/microbiología , Sustrato de la Proteína Quinasa C Rico en Alanina Miristoilada/genética , Sustrato de la Proteína Quinasa C Rico en Alanina Miristoilada/metabolismo , Regiones Promotoras Genéticas , ARN Largo no Codificante/genética , Infecciones Estreptocócicas/inmunología , Streptococcus agalactiae/aislamiento & purificación , Receptores Toll-Like/genéticaRESUMEN
Many tools are available for RNA-seq alignment and expression quantification, with comparative value being hard to establish. Benchmarking assessments often highlight methods' good performance, but are focused on either model data or fail to explain variation in performance. This leaves us to ask, what is the most meaningful way to assess different alignment choices? And importantly, where is there room for progress? In this work, we explore the answers to these two questions by performing an exhaustive assessment of the STAR aligner. We assess STAR's performance across a range of alignment parameters using common metrics, and then on biologically focused tasks. We find technical metrics such as fraction mapping or expression profile correlation to be uninformative, capturing properties unlikely to have any role in biological discovery. Surprisingly, we find that changes in alignment parameters within a wide range have little impact on both technical and biological performance. Yet, when performance finally does break, it happens in difficult regions, such as X-Y paralogs and MHC genes. We believe improved reporting by developers will help establish where results are likely to be robust or fragile, providing a better baseline to establish where methodological progress can still occur.
Asunto(s)
Expresión Génica , Alineación de Secuencia/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Algoritmos , Cromosomas Humanos Y , Bases de Datos Genéticas , Femenino , Humanos , Masculino , Factores SexualesRESUMEN
Multicellular development is driven by regulatory programs that orchestrate the transcription of protein-coding and noncoding genes. To decipher this genomic regulatory code, and to investigate the developmental relevance of noncoding transcription, we compared genome-wide promoter activity throughout embryogenesis in 5 Drosophila species. Core promoters, generally not thought to play a significant regulatory role, in fact impart restrictions on the developmental timing of gene expression on a global scale. We propose a hierarchical regulatory model in which core promoters define broad windows of opportunity for expression, by defining a range of transcription factors from which they can receive regulatory inputs. This two-tiered mechanism globally orchestrates developmental gene expression, including extremely widespread noncoding transcription. The sequence and expression specificity of noncoding RNA promoters are evolutionarily conserved, implying biological relevance. Overall, this work introduces a hierarchical model for developmental gene regulation, and reveals a major role for noncoding transcription in animal development.
Asunto(s)
Drosophila/embriología , Regulación del Desarrollo de la Expresión Génica , Regiones Promotoras Genéticas , ARN no Traducido/biosíntesis , Transcripción Genética , Animales , Modelos BiológicosRESUMEN
Accurate annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput and accuracy. As a result, reference gene collections remain incomplete-many gene models are fragmentary, and thousands more remain uncataloged, particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), which combines targeted RNA capture with third-generation long-read sequencing. Here we present an experimental reannotation of the GENCODE intergenic lncRNA populations in matched human and mouse tissues that resulted in novel transcript models for 3,574 and 561 gene loci, respectively. CLS approximately doubled the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enabled us to definitively characterize the genomic features of lncRNAs, including promoter and gene structure, and protein-coding potential. Thus, CLS removes a long-standing bottleneck in transcriptome annotation and generates manual-quality full-length transcript models at high-throughput scales.
Asunto(s)
Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Anotación de Secuencia Molecular/métodos , ARN Largo no Codificante/genética , Animales , Perfilación de la Expresión Génica/métodos , Genómica/métodos , Humanos , Ratones , Sistemas de Lectura Abierta/genética , Reproducibilidad de los ResultadosRESUMEN
Cross-species comparisons of genomes, transcriptomes and gene regulation are now feasible at unprecedented resolution and throughput, enabling the comparison of human and mouse biology at the molecular level. Insights have been gained into the degree of conservation between human and mouse at the level of not only gene expression but also epigenetics and inter-individual variation. However, a number of limitations exist, including incomplete transcriptome characterization and difficulties in identifying orthologous phenotypes and cell types, which are beginning to be addressed by emerging technologies. Ultimately, these comparisons will help to identify the conditions under which the mouse is a suitable model of human physiology and disease, and optimize the use of animal models.
Asunto(s)
Modelos Animales de Enfermedad , Evolución Molecular , Regulación de la Expresión Génica , Transcriptoma , Animales , Secuencia Conservada , Genoma Humano , Humanos , Ratones , ARN Largo no Codificante/genéticaRESUMEN
BACKGROUND: A comparison of transcriptional profiles derived from different tissues in a given species or among different species assumes that commonalities reflect evolutionarily conserved programs and that differences reflect species or tissue responses to environmental conditions or developmental program staging. Apparently conflicting results have been published regarding whether organ-specific transcriptional patterns dominate over species-specific patterns, or vice versa, making it unclear to what extent the biology of a given organism can be extrapolated to another. These studies have in common that they treat the transcriptomes monolithically, implicitly ignoring that each gene is likely to have a specific pattern of transcriptional variation across organs and species. RESULTS: We use linear models to quantify this pattern. We find a continuum in the spectrum of expression variation: the expression of some genes varies considerably across species and little across organs, and simply reflects evolutionary distance. At the other extreme are genes whose expression varies considerably across organs and little across species; these genes are much more likely to be associated with diseases than are genes whose expression varies predominantly across species. CONCLUSIONS: Whether transcriptomes, when considered globally, cluster preferentially according to one component or the other may not be a property of the transcriptomes, but rather a consequence of the dominant behavior of a subset of genes. Therefore, the values of the components of the variance of expression for each gene could become a useful resource when planning, interpreting, and extrapolating experimental data from mouse to humans.
Asunto(s)
Evolución Molecular , Regulación del Desarrollo de la Expresión Génica/genética , Especificidad de Órganos/genética , Transcriptoma/genética , Animales , Perfilación de la Expresión Génica , Humanos , Ratones , Análisis de Secuencia por Matrices de Oligonucleótidos , Análisis de Secuencia de ARN , Especificidad de la EspecieRESUMEN
Recent advances in high-throughput sequencing technology made it possible to probe the cell transcriptomes by generating hundreds of millions of short reads which represent the fragments of the transcribed RNA molecules. The first and the most crucial task in the RNA-seq data analysis is mapping of the reads to the reference genome. STAR (Spliced Transcripts Alignment to a Reference) is an RNA-seq mapper that performs highly accurate spliced sequence alignment at an ultrafast speed. STAR alignment algorithm can be controlled by many user-defined parameters. Here, we describe the most important STAR options and parameters, as well as best practices for achieving the maximum mapping accuracy and speed.
Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Algoritmos , Biología Computacional/métodos , Humanos , Empalme del ARN , Alineación de Secuencia/métodos , Interfaz Usuario-ComputadorRESUMEN
Extracellular vesicles (EVs) have been proposed as a means to promote intercellular communication. We show that when human primary cells are exposed to cancer cell EVs, rapid cell death of the primary cells is observed, while cancer cells treated with primary or cancer cell EVs do not display this response. The active agents that trigger cell death are 29- to 31-nucleotide (nt) or 22- to 23-nt processed fragments of an 83-nt primary transcript of the human RNY5 gene that are highly likely to be formed within the EVs. Primary cells treated with either cancer cell EVs, deproteinized total RNA from either primary or cancer cell EVs, or synthetic versions of 31- and 23-nt fragments trigger rapid cell death in a dose-dependent manner. The transfer of processed RNY5 fragments through EVs may reflect a novel strategy used by cancer cells toward the establishment of a favorable microenvironment for their proliferation and invasion.
Asunto(s)
Vesículas Extracelulares/metabolismo , Neoplasias/metabolismo , ARN/metabolismo , Comunicación Celular/fisiología , Muerte Celular/fisiología , Línea Celular Tumoral , Proliferación Celular/fisiología , Humanos , Células K562RESUMEN
Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates, providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, and signal visualization. In this unit, we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is open source software that can be run on Unix, Linux, or Mac OS X systems.