Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
Cell ; 186(7): 1493-1511.e40, 2023 03 30.
Artículo en Inglés | MEDLINE | ID: mdl-37001506

RESUMEN

Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × âˆ¼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.


Asunto(s)
Epigenoma , Sitios de Carácter Cuantitativo , Estudio de Asociación del Genoma Completo , Genómica , Fenotipo , Polimorfismo de Nucleótido Simple
2.
Genome Res ; 30(7): 1047-1059, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32759341

RESUMEN

We have produced RNA sequencing data for 53 primary cells from different locations in the human body. The clustering of these primary cells reveals that most cells in the human body share a few broad transcriptional programs, which define five major cell types: epithelial, endothelial, mesenchymal, neural, and blood cells. These act as basic components of many tissues and organs. Based on gene expression, these cell types redefine the basic histological types by which tissues have been traditionally classified. We identified genes whose expression is specific to these cell types, and from these genes, we estimated the contribution of the major cell types to the composition of human tissues. We found this cellular composition to be a characteristic signature of tissues and to reflect tissue morphological heterogeneity and histology. We identified changes in cellular composition in different tissues associated with age and sex, and found that departures from the normal cellular composition correlate with histological phenotypes associated with disease.


Asunto(s)
Transcripción Genética , Línea Celular , Células Endoteliales/metabolismo , Células Epiteliales/metabolismo , Femenino , Perfilación de la Expresión Génica , Ginecomastia/genética , Ginecomastia/metabolismo , Humanos , Masculino , Mesodermo/citología , Mesodermo/metabolismo , Neoplasias/genética , Especificidad de Órganos , Análisis de Secuencia de ARN
3.
Genome Res ; 29(11): 1900-1909, 2019 11.
Artículo en Inglés | MEDLINE | ID: mdl-31645363

RESUMEN

MicroRNAs (miRNAs) play a critical role as posttranscriptional regulators of gene expression. The ENCODE Project profiled the expression of miRNAs in an extensive set of organs during a time-course of mouse embryonic development and captured the expression dynamics of 785 miRNAs. We found distinct organ-specific and developmental stage-specific miRNA expression clusters, with an overall pattern of increasing organ-specific expression as embryonic development proceeds. Comparative analysis of conserved miRNAs in mouse and human revealed stronger clustering of expression patterns by organ type rather than by species. An analysis of messenger RNA expression clusters compared with miRNA expression clusters identifies the potential role of specific miRNA expression clusters in suppressing the expression of mRNAs specific to other developmental programs in the organ in which these miRNAs are expressed during embryonic development. Our results provide the most comprehensive time-course of miRNA expression as part of an integrated ENCODE reference data set for mouse embryonic development.


Asunto(s)
Desarrollo Embrionario/genética , MicroARNs/genética , Animales , Femenino , Regulación del Desarrollo de la Expresión Génica , Ratones , Embarazo , ARN Mensajero/genética
4.
Nature ; 515(7527): 355-64, 2014 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-25409824

RESUMEN

The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.


Asunto(s)
Genoma/genética , Genómica , Ratones/genética , Anotación de Secuencia Molecular , Animales , Linaje de la Célula/genética , Cromatina/genética , Cromatina/metabolismo , Secuencia Conservada/genética , Replicación del ADN/genética , Desoxirribonucleasa I/metabolismo , Regulación de la Expresión Génica/genética , Redes Reguladoras de Genes/genética , Estudio de Asociación del Genoma Completo , Humanos , ARN/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Especificidad de la Especie , Factores de Transcripción/metabolismo , Transcriptoma/genética
5.
Nature ; 512(7515): 445-8, 2014 Aug 28.
Artículo en Inglés | MEDLINE | ID: mdl-25164755

RESUMEN

The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.


Asunto(s)
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Perfilación de la Expresión Génica , Transcriptoma/genética , Animales , Caenorhabditis elegans/embriología , Caenorhabditis elegans/crecimiento & desarrollo , Cromatina/genética , Análisis por Conglomerados , Drosophila melanogaster/crecimiento & desarrollo , Regulación del Desarrollo de la Expresión Génica/genética , Histonas/metabolismo , Humanos , Larva/genética , Larva/crecimiento & desarrollo , Modelos Genéticos , Anotación de Secuencia Molecular , Regiones Promotoras Genéticas/genética , Pupa/genética , Pupa/crecimiento & desarrollo , ARN no Traducido/genética , Análisis de Secuencia de ARN
6.
Nature ; 489(7414): 101-8, 2012 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-22955620

RESUMEN

Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.


Asunto(s)
ADN/genética , Enciclopedias como Asunto , Genoma Humano/genética , Anotación de Secuencia Molecular , Secuencias Reguladoras de Ácidos Nucleicos/genética , Transcripción Genética/genética , Transcriptoma/genética , Alelos , Línea Celular , ADN Intergénico/genética , Elementos de Facilitación Genéticos , Exones/genética , Perfilación de la Expresión Génica , Genes/genética , Genómica , Humanos , Poliadenilación/genética , Isoformas de Proteínas/genética , ARN/biosíntesis , ARN/genética , Edición de ARN/genética , Empalme del ARN/genética , Secuencias Repetitivas de Ácidos Nucleicos/genética , Análisis de Secuencia de ARN
7.
Bioinformatics ; 29(1): 15-21, 2013 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-23104886

RESUMEN

MOTIVATION: Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. RESULTS: To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. AVAILABILITY AND IMPLEMENTATION: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.


Asunto(s)
Alineación de Secuencia/métodos , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Perfilación de la Expresión Génica , Genoma Humano , Humanos , Empalme del ARN , Análisis de Secuencia de ARN/métodos
8.
Nat Methods ; 5(7): 629-35, 2008 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-18500348

RESUMEN

Rapid amplification of cDNA ends (RACE) is a widely used approach for transcript identification. Random clone selection from the RACE mixture, however, is an ineffective sampling strategy if the dynamic range of transcript abundances is large. To improve sampling efficiency of human transcripts, we hybridized the products of the RACE reaction onto tiling arrays and used the detected exons to delineate a series of reverse-transcriptase (RT)-PCRs, through which the original RACE transcript population was segregated into simpler transcript populations. We independently cloned the products and sequenced randomly selected clones. This approach, RACEarray, is superior to direct cloning and sequencing of RACE products because it specifically targets new transcripts and often results in overall normalization of transcript abundance. We show theoretically and experimentally that this strategy leads indeed to efficient sampling of new transcripts, and we investigated multiplexing the strategy by pooling RACE reactions from multiple interrogated loci before hybridization.


Asunto(s)
ADN Complementario/genética , Perfilación de la Expresión Génica/métodos , Biblioteca de Genes , Técnicas de Amplificación de Ácido Nucleico/métodos , ARN/genética , Empalme Alternativo , Cromosomas Humanos Par 21/genética , Cromosomas Humanos Par 22/genética , Clonación Molecular , Exones , Genoma Humano , Humanos , Datos de Secuencia Molecular , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Isoformas de Proteínas/genética , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Transcripción Genética
9.
Dev Cell ; 56(4): 557-568.e6, 2021 02 22.
Artículo en Inglés | MEDLINE | ID: mdl-33400914

RESUMEN

Crop productivity depends on activity of meristems that produce optimized plant architectures, including that of the maize ear. A comprehensive understanding of development requires insight into the full diversity of cell types and developmental domains and the gene networks required to specify them. Until now, these were identified primarily by morphology and insights from classical genetics, which are limited by genetic redundancy and pleiotropy. Here, we investigated the transcriptional profiles of 12,525 single cells from developing maize ears. The resulting developmental atlas provides a single-cell RNA sequencing (scRNA-seq) map of an inflorescence. We validated our results by mRNA in situ hybridization and by fluorescence-activated cell sorting (FACS) RNA-seq, and we show how these data may facilitate genetic studies by predicting genetic redundancy, integrating transcriptional networks, and identifying candidate genes associated with crop yield traits.


Asunto(s)
Estudios de Asociación Genética , Sitios de Carácter Cuantitativo/genética , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Zea mays/crecimiento & desarrollo , Zea mays/genética , Secuencia de Bases , Regulación del Desarrollo de la Expresión Génica , Regulación de la Expresión Génica de las Plantas , Redes Reguladoras de Genes , Protoplastos/metabolismo , Reproducibilidad de los Resultados , Transcriptoma/genética
10.
J Exp Med ; 198(7): 987-97, 2003 Oct 06.
Artículo en Inglés | MEDLINE | ID: mdl-14517275

RESUMEN

Macrophages are activated from a resting state by a combination of cytokines and microbial products. Microbes are often sensed through Toll-like receptors signaling through MyD88. We used large-scale microarrays in multiple replicate experiments followed by stringent statistical analysis to compare gene expression in wild-type (WT) and MyD88-/- macrophages. We confirmed key results by quantitative reverse transcription polymerase chain reaction, Western blot, and enzyme-linked immunosorbent assay. Surprisingly, many genes, such as inducible nitric oxide synthase, IRG-1, IP-10, MIG, RANTES, and interleukin 6 were induced by interferon (IFN)-gamma from 5- to 100-fold less extensively in MyD88-/- macrophages than in WT macrophages. Thus, widespread, full-scale activation of macrophages by IFN-gamma requires MyD88. Analysis of the mechanism revealed that MyD88 mediates a process of self-priming by which resting macrophages produce a low level of tumor necrosis factor. This and other factors lead to basal activation of nuclear factor kappaB, which synergizes with IFN-gamma for gene induction. In contrast, infection by live, virulent Mycobacterium tuberculosis (Mtb) activated macrophages largely through MyD88-independent pathways, and macrophages did not need MyD88 to kill Mtb in vitro. Thus, MyD88 plays a dynamic role in resting macrophages that supports IFN-gamma-dependent activation, whereas macrophages can respond to a complex microbial stimulus, the tubercle bacillus, chiefly by other routes.


Asunto(s)
Antígenos de Diferenciación/fisiología , Interferón gamma/farmacología , Activación de Macrófagos , Macrófagos/metabolismo , Mycobacterium tuberculosis/fisiología , Receptores Inmunológicos/fisiología , Proteínas Adaptadoras Transductoras de Señales , Animales , Regulación de la Expresión Génica/efectos de los fármacos , Interleucina-1/biosíntesis , Activación de Macrófagos/efectos de los fármacos , Glicoproteínas de Membrana/fisiología , Ratones , Ratones Endogámicos C57BL , Factor 88 de Diferenciación Mieloide , FN-kappa B/fisiología , Receptores de Superficie Celular/fisiología , Receptores Toll-Like , Activación Transcripcional , Factor de Necrosis Tumoral alfa/fisiología
11.
Front Plant Sci ; 11: 289, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32296450

RESUMEN

MaizeCODE is a project aimed at identifying and analyzing functional elements in the maize genome. In its initial phase, MaizeCODE assayed up to five tissues from four maize strains (B73, NC350, W22, TIL11) by RNA-Seq, Chip-Seq, RAMPAGE, and small RNA sequencing. To facilitate reproducible science and provide both human and machine access to the MaizeCODE data, we enhanced SciApps, a cloud-based portal, for analysis and distribution of both raw data and analysis results. Based on the SciApps workflow platform, we generated new components to support the complete cycle of MaizeCODE data management. These include publicly accessible scientific workflows for the reproducible and shareable analysis of various functional data, a RESTful API for batch processing and distribution of data and metadata, a searchable data page that lists each MaizeCODE experiment as a reproducible workflow, and integrated JBrowse genome browser tracks linked with workflows and metadata. The SciApps portal is a flexible platform that allows the integration of new analysis tools, workflows, and genomic data from multiple projects. Through metadata and a ready-to-compute cloud-based platform, the portal experience improves access to the MaizeCODE data and facilitates its analysis.

12.
Hum Mutat ; 30(9): E866-79, 2009 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-19562714

RESUMEN

The study of transcription using genomic tiling arrays has lead to the identification of numerous additional exons. One example is the MECP2 gene on the X chromosome; using 5'RACE and RT-PCR in human tissues and cell lines, we have found more than 70 novel exons (RACEfrags) connecting to at least one annotated exon.. We sequenced all MECP2-connected exons and flanking sequences in 3 groups: 46 patients with the Rett syndrome and without mutations in the currently annotated exons of the MECP2 and CDKL5 genes; 32 patients with the Rett syndrome and identified mutations in the MECP2 gene; 100 control individuals from the same geoethnic group. Approximately 13 kb were sequenced per sample, (2.4 Mb of DNA resequencing). A total of 75 individuals had novel rare variants (mostly private variants) but no statistically significant difference was found among the 3 groups. These results suggest that variants in the newly discovered exons may not contribute to Rett syndrome. Interestingly however, there are about twice more variants in the novel exons than in the flanking sequences (44 vs. 21 for approximately 1.3 Mb sequenced for each class of sequences, p=0.0025). Thus the evolutionary forces that shape these novel exons may be different than those of neighboring sequences.


Asunto(s)
Exones/genética , Variación Genética , Proteína 2 de Unión a Metil-CpG/genética , Síndrome de Rett/genética , Análisis Mutacional de ADN , Femenino , Humanos , Masculino , Proteína 2 de Unión a Metil-CpG/metabolismo , Proteínas Serina-Treonina Quinasas , Síndrome de Rett/metabolismo
13.
J Leukoc Biol ; 79(6): 1328-38, 2006 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-16614257

RESUMEN

We report a novel mechanism, involving up-regulation of the interleukin (IL)-7 cytokine receptor, by which human immunodeficiency virus (HIV) enhances its own production in monocyte-derived macrophages (MDM) in vitro. HIV-1 infection or treatment of MDM cultures with exogenous HIV-1 Tat(86) protein up-regulates the IL-7 receptor (IL-7R) alpha-chain at the levels of steady-state RNA, protein, and functional IL-7R on the cell surface (as measured by ligand-induced receptor signaling). This IL-7R up-regulation is associated with increased amounts of HIV-1 virions in the supernatants of infected MDM cultures treated with exogenous IL-7 cytokine. The overall effect of IL-7 stimulation on HIV replication in MDM culture supernatants is typically in the range of one log and greater. The results are consistent with a model in which HIV infection produces the Tat protein, which in turn up-regulates IL-7R in a paracrine manner. This results in increased IL-7R signaling in response to the IL-7 cytokine, which ultimately promotes early events in HIV replication, including binding/entry and possibly other steps prior to reverse transcription. The results suggest that the effects of IL-7 on HIV replication in MDM should be considered when analyzing and designing clinical trials involving treatment of patients with IL-7 or Tat vaccines.


Asunto(s)
Productos del Gen tat/fisiología , VIH-1/fisiología , Interleucina-7/fisiología , Macrófagos/virología , Modelos Biológicos , Replicación Viral/fisiología , Células Cultivadas/efectos de los fármacos , Células Cultivadas/metabolismo , Células Cultivadas/virología , Genes tat , Transcriptasa Inversa del VIH/metabolismo , Humanos , Interleucina-7/efectos adversos , Interleucina-7/farmacología , Macrófagos/efectos de los fármacos , Macrófagos/metabolismo , Comunicación Paracrina , Factor de Transcripción STAT3/metabolismo , Virión , Replicación Viral/efectos de los fármacos , Productos del Gen tat del Virus de la Inmunodeficiencia Humana
14.
Nat Commun ; 6: 5903, 2015 Jan 13.
Artículo en Inglés | MEDLINE | ID: mdl-25582907

RESUMEN

Mice have been a long-standing model for human biology and disease. Here we characterize, by RNA sequencing, the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates. Comparison with transcriptome profiles in human cell lines reveals substantial conservation of transcriptional programmes, and uncovers a distinct class of genes with levels of expression that have been constrained early in vertebrate evolution. This core set of genes captures a substantial fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types. Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer. Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but is associated with conserved epigenetic marking, as well as with characteristic post-transcriptional regulatory programme, in which sub-cellular localization and alternative splicing play comparatively large roles.


Asunto(s)
Evolución Molecular , Regulación de la Expresión Génica , Transcriptoma , Empalme Alternativo , Animales , Evolución Biológica , Línea Celular , Epigénesis Genética , Perfilación de la Expresión Génica , Biblioteca de Genes , Genoma , Histonas/química , Humanos , Ratones , Ratones Endogámicos C57BL , Modelos Genéticos , Oligonucleótidos Antisentido , Fenotipo , Análisis de Secuencia de ARN
15.
PLoS One ; 7(1): e28213, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22238572

RESUMEN

The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to human has blurred the definition and physical boundaries of genes. Using multiple analysis approaches we have characterized individual gene boundaries mapping on human chromosomes 21 and 22. Analyses of the locations of the 5' and 3' transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes. The biological and evolutionary importance of these chimeric transcripts is underscored by (1) the non-random interconnections of genes involved, (2) the greater phylogenetic depth of the genes involved in many chimeric interactions, (3) the coordination of the expression of connected genes and (4) the close in vivo and three dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs. The non-random nature of the connection of the genes involved suggest that chimeric transcripts should not be studied in isolation, but together, as an RNA network.


Asunto(s)
Células/metabolismo , Redes Reguladoras de Genes/fisiología , ARN/fisiología , Transcriptoma/fisiología , Algoritmos , Proteínas Quimerinas/química , Proteínas Quimerinas/genética , Cromosomas Humanos Par 1/genética , Femenino , Perfilación de la Expresión Génica , Redes Reguladoras de Genes/genética , Humanos , Masculino , Análisis por Micromatrices/métodos , Modelos Biológicos , Técnicas de Amplificación de Ácido Nucleico/métodos , ARN/genética , Isoformas de ARN/química , Isoformas de ARN/genética , Isoformas de ARN/metabolismo , Transcripción Genética/genética , Estudios de Validación como Asunto
16.
Genome Res ; 17(6): 746-59, 2007 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-17567994

RESUMEN

This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5' rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5' distal to the annotated 5' terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations.


Asunto(s)
Mapeo Cromosómico , Exones , Genoma Humano , Regiones Promotoras Genéticas , Sitios de Carácter Cuantitativo , Transcripción Genética/fisiología , ADN Complementario/genética , Proyecto Genoma Humano , Humanos , Sistemas de Lectura Abierta
17.
Genome Res ; 17(6): 852-64, 2007 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-17568003

RESUMEN

Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack characteristic signals in primary sequence, comparative approaches evaluating evolutionary conservation of structures are most promising. We have used three recently introduced programs based on either phylogenetic-stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding several thousand candidate structures (corresponding to approximately 2.7% of the ENCODE regions). EvoFold has its highest sensitivity in highly conserved and relatively AU-rich regions, while RNAz favors slightly GC-rich regions, resulting in a relatively small overlap between methods. Comparison with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3'-UTRs. While we estimate a significant false discovery rate of approximately 50%-70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz and EvoFold, and an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Five hundred seventy RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e., conserved elements). More than 700 predictions overlap with noncoding transcripts detected by oligonucleotide tiling arrays. One hundred seventy-five selected candidates were tested by RT-PCR in six tissues, and expression could be verified in 43 cases (24.6%).


Asunto(s)
Regiones no Traducidas 3'/genética , Secuencia Rica en GC , Genoma Humano , Sitios de Carácter Cuantitativo , ARN no Traducido/genética , Transcripción Genética , Secuencia de Bases , Humanos , Datos de Secuencia Molecular , ARN Mensajero/genética , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa
18.
Science ; 316(5830): 1484-8, 2007 Jun 08.
Artículo en Inglés | MEDLINE | ID: mdl-17510325

RESUMEN

Significant fractions of eukaryotic genomes give rise to RNA, much of which is unannotated and has reduced protein-coding potential. The genomic origins and the associations of human nuclear and cytosolic polyadenylated RNAs longer than 200 nucleotides (nt) and whole-cell RNAs less than 200 nt were investigated in this genome-wide study. Subcellular addresses for nucleotides present in detected RNAs were assigned, and their potential processing into short RNAs was investigated. Taken together, these observations suggest a novel role for some unannotated RNAs as primary transcripts for the production of short RNAs. Three potentially functional classes of RNAs have been identified, two of which are syntenically conserved and correlate with the expression state of protein-coding genes. These data support a highly interleaved organization of the human transcriptome.


Asunto(s)
Genoma Humano , Precursores del ARN/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo , ARN/genética , Transcripción Genética , Animales , Línea Celular Tumoral , Núcleo Celular/metabolismo , Citosol/metabolismo , Exones , Expresión Génica , Genoma , Células HeLa , Humanos , Ratones , Regiones Promotoras Genéticas , ARN/metabolismo , Precursores del ARN/metabolismo , Sintenía , Regiones Terminadoras Genéticas
19.
Genome Res ; 15(7): 987-97, 2005 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-15998911

RESUMEN

Recently, we mapped the sites of transcription across approximately 30% of the human genome and elucidated the structures of several hundred novel transcripts. In this report, we describe a novel combination of techniques including the rapid amplification of cDNA ends (RACE) and tiling array technologies that was used to further characterize transcripts in the human transcriptome. This technical approach allows for several important pieces of information to be gathered about each array-detected transcribed region, including strand of origin, start and termination positions, and the exonic structures of spliced and unspliced coding and noncoding RNAs. In this report, the structures of transcripts from 14 transcribed loci, representing both known genes and unannotated transcripts taken from the several hundred randomly selected unannotated transcripts described in our previous work are represented as examples of the complex organization of the human transcriptome. As a consequence of this complexity, it is not unusual that a single base pair can be part of an intricate network of multiple isoforms of overlapping sense and antisense transcripts, the majority of which are unannotated. Some of these transcripts follow the canonical splicing rules, whereas others combine the exons of different genes or represent other types of noncanonical transcripts. These results have important implications concerning the correlation of genotypes to phenotypes, the regulation of complex interlaced transcriptional patterns, and the definition of a gene.


Asunto(s)
Técnicas de Amplificación de Ácido Nucleico , Análisis de Secuencia por Matrices de Oligonucleótidos , Transcripción Genética , Línea Celular , Perfilación de la Expresión Génica , Humanos , Células Jurkat , Modelos Genéticos , Datos de Secuencia Molecular , Técnicas de Amplificación de Ácido Nucleico/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Isoformas de Proteínas/genética , Células Tumorales Cultivadas
20.
Science ; 308(5725): 1149-54, 2005 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-15790807

RESUMEN

Sites of transcription of polyadenylated and nonpolyadenylated RNAs for 10 human chromosomes were mapped at 5-base pair resolution in eight cell lines. Unannotated, nonpolyadenylated transcripts comprise the major proportion of the transcriptional output of the human genome. Of all transcribed sequences, 19.4, 43.7, and 36.9% were observed to be polyadenylated, nonpolyadenylated, and bimorphic, respectively. Half of all transcribed sequences are found only in the nucleus and for the most part are unannotated. Overall, the transcribed portions of the human genome are predominantly composed of interlaced networks of both poly A+ and poly A- annotated transcripts and unannotated transcripts of unknown function. This organization has important implications for interpreting genotype-phenotype associations, regulation of gene expression, and the definition of a gene.


Asunto(s)
Cromosomas Humanos/genética , Genoma Humano , ARN Mensajero/análisis , Transcripción Genética , Línea Celular , Línea Celular Tumoral , Núcleo Celular/metabolismo , Cromosomas Humanos Par 13/genética , Cromosomas Humanos Par 14/genética , Cromosomas Humanos Par 19/genética , Cromosomas Humanos Par 20/genética , Cromosomas Humanos Par 21/genética , Cromosomas Humanos Par 22/genética , Cromosomas Humanos Par 6/genética , Cromosomas Humanos Par 7/genética , Cromosomas Humanos X/genética , Cromosomas Humanos Y/genética , Biología Computacional , Citosol/metabolismo , ADN Complementario , ADN Intergénico , Exones , Femenino , Humanos , Intrones , Masculino , Datos de Secuencia Molecular , Técnicas de Amplificación de Ácido Nucleico , Análisis de Secuencia por Matrices de Oligonucleótidos , Mapeo Físico de Cromosoma , Empalme del ARN
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA