Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
Cell ; 186(7): 1493-1511.e40, 2023 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-37001506

RESUMO

Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × âˆ¼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.


Assuntos
Epigenoma , Locos de Características Quantitativas , Estudo de Associação Genômica Ampla , Genômica , Fenótipo , Polimorfismo de Nucleotídeo Único
2.
Genome Res ; 30(7): 1047-1059, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32759341

RESUMO

We have produced RNA sequencing data for 53 primary cells from different locations in the human body. The clustering of these primary cells reveals that most cells in the human body share a few broad transcriptional programs, which define five major cell types: epithelial, endothelial, mesenchymal, neural, and blood cells. These act as basic components of many tissues and organs. Based on gene expression, these cell types redefine the basic histological types by which tissues have been traditionally classified. We identified genes whose expression is specific to these cell types, and from these genes, we estimated the contribution of the major cell types to the composition of human tissues. We found this cellular composition to be a characteristic signature of tissues and to reflect tissue morphological heterogeneity and histology. We identified changes in cellular composition in different tissues associated with age and sex, and found that departures from the normal cellular composition correlate with histological phenotypes associated with disease.


Assuntos
Transcrição Gênica , Linhagem Celular , Células Endoteliais/metabolismo , Células Epiteliais/metabolismo , Feminino , Perfilação da Expressão Gênica , Ginecomastia/genética , Ginecomastia/metabolismo , Humanos , Masculino , Mesoderma/citologia , Mesoderma/metabolismo , Neoplasias/genética , Especificidade de Órgãos , Análise de Sequência de RNA
3.
Genome Res ; 29(11): 1900-1909, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31645363

RESUMO

MicroRNAs (miRNAs) play a critical role as posttranscriptional regulators of gene expression. The ENCODE Project profiled the expression of miRNAs in an extensive set of organs during a time-course of mouse embryonic development and captured the expression dynamics of 785 miRNAs. We found distinct organ-specific and developmental stage-specific miRNA expression clusters, with an overall pattern of increasing organ-specific expression as embryonic development proceeds. Comparative analysis of conserved miRNAs in mouse and human revealed stronger clustering of expression patterns by organ type rather than by species. An analysis of messenger RNA expression clusters compared with miRNA expression clusters identifies the potential role of specific miRNA expression clusters in suppressing the expression of mRNAs specific to other developmental programs in the organ in which these miRNAs are expressed during embryonic development. Our results provide the most comprehensive time-course of miRNA expression as part of an integrated ENCODE reference data set for mouse embryonic development.


Assuntos
Desenvolvimento Embrionário/genética , MicroRNAs/genética , Animais , Feminino , Regulação da Expressão Gênica no Desenvolvimento , Camundongos , Gravidez , RNA Mensageiro/genética
4.
Nature ; 515(7527): 355-64, 2014 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-25409824

RESUMO

The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.


Assuntos
Genoma/genética , Genômica , Camundongos/genética , Anotação de Sequência Molecular , Animais , Linhagem da Célula/genética , Cromatina/genética , Cromatina/metabolismo , Sequência Conservada/genética , Replicação do DNA/genética , Desoxirribonuclease I/metabolismo , Regulação da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Estudo de Associação Genômica Ampla , Humanos , RNA/genética , Sequências Reguladoras de Ácido Nucleico/genética , Especificidade da Espécie , Fatores de Transcrição/metabolismo , Transcriptoma/genética
5.
Nature ; 512(7515): 445-8, 2014 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-25164755

RESUMO

The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.


Assuntos
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Transcriptoma/genética , Animais , Caenorhabditis elegans/embriologia , Caenorhabditis elegans/crescimento & desenvolvimento , Cromatina/genética , Análise por Conglomerados , Drosophila melanogaster/crescimento & desenvolvimento , Regulação da Expressão Gênica no Desenvolvimento/genética , Histonas/metabolismo , Humanos , Larva/genética , Larva/crescimento & desenvolvimento , Modelos Genéticos , Anotação de Sequência Molecular , Regiões Promotoras Genéticas/genética , Pupa/genética , Pupa/crescimento & desenvolvimento , RNA não Traduzido/genética , Análise de Sequência de RNA
6.
Nature ; 489(7414): 101-8, 2012 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-22955620

RESUMO

Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.


Assuntos
DNA/genética , Enciclopédias como Assunto , Genoma Humano/genética , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico/genética , Transcrição Gênica/genética , Transcriptoma/genética , Alelos , Linhagem Celular , DNA Intergênico/genética , Elementos Facilitadores Genéticos , Éxons/genética , Perfilação da Expressão Gênica , Genes/genética , Genômica , Humanos , Poliadenilação/genética , Isoformas de Proteínas/genética , RNA/biossíntese , RNA/genética , Edição de RNA/genética , Splicing de RNA/genética , Sequências Repetitivas de Ácido Nucleico/genética , Análise de Sequência de RNA
7.
Bioinformatics ; 29(1): 15-21, 2013 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-23104886

RESUMO

MOTIVATION: Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. RESULTS: To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. AVAILABILITY AND IMPLEMENTATION: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.


Assuntos
Alinhamento de Sequência/métodos , Software , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica , Genoma Humano , Humanos , Splicing de RNA , Análise de Sequência de RNA/métodos
8.
Nat Methods ; 5(7): 629-35, 2008 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-18500348

RESUMO

Rapid amplification of cDNA ends (RACE) is a widely used approach for transcript identification. Random clone selection from the RACE mixture, however, is an ineffective sampling strategy if the dynamic range of transcript abundances is large. To improve sampling efficiency of human transcripts, we hybridized the products of the RACE reaction onto tiling arrays and used the detected exons to delineate a series of reverse-transcriptase (RT)-PCRs, through which the original RACE transcript population was segregated into simpler transcript populations. We independently cloned the products and sequenced randomly selected clones. This approach, RACEarray, is superior to direct cloning and sequencing of RACE products because it specifically targets new transcripts and often results in overall normalization of transcript abundance. We show theoretically and experimentally that this strategy leads indeed to efficient sampling of new transcripts, and we investigated multiplexing the strategy by pooling RACE reactions from multiple interrogated loci before hybridization.


Assuntos
DNA Complementar/genética , Perfilação da Expressão Gênica/métodos , Biblioteca Gênica , Técnicas de Amplificação de Ácido Nucleico/métodos , RNA/genética , Processamento Alternativo , Cromossomos Humanos Par 21/genética , Cromossomos Humanos Par 22/genética , Clonagem Molecular , Éxons , Genoma Humano , Humanos , Dados de Sequência Molecular , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Isoformas de Proteínas/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Transcrição Gênica
9.
Dev Cell ; 56(4): 557-568.e6, 2021 02 22.
Artigo em Inglês | MEDLINE | ID: mdl-33400914

RESUMO

Crop productivity depends on activity of meristems that produce optimized plant architectures, including that of the maize ear. A comprehensive understanding of development requires insight into the full diversity of cell types and developmental domains and the gene networks required to specify them. Until now, these were identified primarily by morphology and insights from classical genetics, which are limited by genetic redundancy and pleiotropy. Here, we investigated the transcriptional profiles of 12,525 single cells from developing maize ears. The resulting developmental atlas provides a single-cell RNA sequencing (scRNA-seq) map of an inflorescence. We validated our results by mRNA in situ hybridization and by fluorescence-activated cell sorting (FACS) RNA-seq, and we show how these data may facilitate genetic studies by predicting genetic redundancy, integrating transcriptional networks, and identifying candidate genes associated with crop yield traits.


Assuntos
Estudos de Associação Genética , Locos de Características Quantitativas/genética , Análise de Sequência de RNA , Análise de Célula Única , Zea mays/crescimento & desenvolvimento , Zea mays/genética , Sequência de Bases , Regulação da Expressão Gênica no Desenvolvimento , Regulação da Expressão Gênica de Plantas , Redes Reguladoras de Genes , Protoplastos/metabolismo , Reprodutibilidade dos Testes , Transcriptoma/genética
10.
J Exp Med ; 198(7): 987-97, 2003 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-14517275

RESUMO

Macrophages are activated from a resting state by a combination of cytokines and microbial products. Microbes are often sensed through Toll-like receptors signaling through MyD88. We used large-scale microarrays in multiple replicate experiments followed by stringent statistical analysis to compare gene expression in wild-type (WT) and MyD88-/- macrophages. We confirmed key results by quantitative reverse transcription polymerase chain reaction, Western blot, and enzyme-linked immunosorbent assay. Surprisingly, many genes, such as inducible nitric oxide synthase, IRG-1, IP-10, MIG, RANTES, and interleukin 6 were induced by interferon (IFN)-gamma from 5- to 100-fold less extensively in MyD88-/- macrophages than in WT macrophages. Thus, widespread, full-scale activation of macrophages by IFN-gamma requires MyD88. Analysis of the mechanism revealed that MyD88 mediates a process of self-priming by which resting macrophages produce a low level of tumor necrosis factor. This and other factors lead to basal activation of nuclear factor kappaB, which synergizes with IFN-gamma for gene induction. In contrast, infection by live, virulent Mycobacterium tuberculosis (Mtb) activated macrophages largely through MyD88-independent pathways, and macrophages did not need MyD88 to kill Mtb in vitro. Thus, MyD88 plays a dynamic role in resting macrophages that supports IFN-gamma-dependent activation, whereas macrophages can respond to a complex microbial stimulus, the tubercle bacillus, chiefly by other routes.


Assuntos
Antígenos de Diferenciação/fisiologia , Interferon gama/farmacologia , Ativação de Macrófagos , Macrófagos/metabolismo , Mycobacterium tuberculosis/fisiologia , Receptores Imunológicos/fisiologia , Proteínas Adaptadoras de Transdução de Sinal , Animais , Regulação da Expressão Gênica/efeitos dos fármacos , Interleucina-1/biossíntese , Ativação de Macrófagos/efeitos dos fármacos , Glicoproteínas de Membrana/fisiologia , Camundongos , Camundongos Endogâmicos C57BL , Fator 88 de Diferenciação Mieloide , NF-kappa B/fisiologia , Receptores de Superfície Celular/fisiologia , Receptores Toll-Like , Ativação Transcricional , Fator de Necrose Tumoral alfa/fisiologia
11.
Front Plant Sci ; 11: 289, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32296450

RESUMO

MaizeCODE is a project aimed at identifying and analyzing functional elements in the maize genome. In its initial phase, MaizeCODE assayed up to five tissues from four maize strains (B73, NC350, W22, TIL11) by RNA-Seq, Chip-Seq, RAMPAGE, and small RNA sequencing. To facilitate reproducible science and provide both human and machine access to the MaizeCODE data, we enhanced SciApps, a cloud-based portal, for analysis and distribution of both raw data and analysis results. Based on the SciApps workflow platform, we generated new components to support the complete cycle of MaizeCODE data management. These include publicly accessible scientific workflows for the reproducible and shareable analysis of various functional data, a RESTful API for batch processing and distribution of data and metadata, a searchable data page that lists each MaizeCODE experiment as a reproducible workflow, and integrated JBrowse genome browser tracks linked with workflows and metadata. The SciApps portal is a flexible platform that allows the integration of new analysis tools, workflows, and genomic data from multiple projects. Through metadata and a ready-to-compute cloud-based platform, the portal experience improves access to the MaizeCODE data and facilitates its analysis.

12.
Hum Mutat ; 30(9): E866-79, 2009 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-19562714

RESUMO

The study of transcription using genomic tiling arrays has lead to the identification of numerous additional exons. One example is the MECP2 gene on the X chromosome; using 5'RACE and RT-PCR in human tissues and cell lines, we have found more than 70 novel exons (RACEfrags) connecting to at least one annotated exon.. We sequenced all MECP2-connected exons and flanking sequences in 3 groups: 46 patients with the Rett syndrome and without mutations in the currently annotated exons of the MECP2 and CDKL5 genes; 32 patients with the Rett syndrome and identified mutations in the MECP2 gene; 100 control individuals from the same geoethnic group. Approximately 13 kb were sequenced per sample, (2.4 Mb of DNA resequencing). A total of 75 individuals had novel rare variants (mostly private variants) but no statistically significant difference was found among the 3 groups. These results suggest that variants in the newly discovered exons may not contribute to Rett syndrome. Interestingly however, there are about twice more variants in the novel exons than in the flanking sequences (44 vs. 21 for approximately 1.3 Mb sequenced for each class of sequences, p=0.0025). Thus the evolutionary forces that shape these novel exons may be different than those of neighboring sequences.


Assuntos
Éxons/genética , Variação Genética , Proteína 2 de Ligação a Metil-CpG/genética , Síndrome de Rett/genética , Análise Mutacional de DNA , Feminino , Humanos , Masculino , Proteína 2 de Ligação a Metil-CpG/metabolismo , Proteínas Serina-Treonina Quinases , Síndrome de Rett/metabolismo
13.
J Leukoc Biol ; 79(6): 1328-38, 2006 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16614257

RESUMO

We report a novel mechanism, involving up-regulation of the interleukin (IL)-7 cytokine receptor, by which human immunodeficiency virus (HIV) enhances its own production in monocyte-derived macrophages (MDM) in vitro. HIV-1 infection or treatment of MDM cultures with exogenous HIV-1 Tat(86) protein up-regulates the IL-7 receptor (IL-7R) alpha-chain at the levels of steady-state RNA, protein, and functional IL-7R on the cell surface (as measured by ligand-induced receptor signaling). This IL-7R up-regulation is associated with increased amounts of HIV-1 virions in the supernatants of infected MDM cultures treated with exogenous IL-7 cytokine. The overall effect of IL-7 stimulation on HIV replication in MDM culture supernatants is typically in the range of one log and greater. The results are consistent with a model in which HIV infection produces the Tat protein, which in turn up-regulates IL-7R in a paracrine manner. This results in increased IL-7R signaling in response to the IL-7 cytokine, which ultimately promotes early events in HIV replication, including binding/entry and possibly other steps prior to reverse transcription. The results suggest that the effects of IL-7 on HIV replication in MDM should be considered when analyzing and designing clinical trials involving treatment of patients with IL-7 or Tat vaccines.


Assuntos
Produtos do Gene tat/fisiologia , HIV-1/fisiologia , Interleucina-7/fisiologia , Macrófagos/virologia , Modelos Biológicos , Replicação Viral/fisiologia , Células Cultivadas/efeitos dos fármacos , Células Cultivadas/metabolismo , Células Cultivadas/virologia , Genes tat , Transcriptase Reversa do HIV/metabolismo , Humanos , Interleucina-7/efeitos adversos , Interleucina-7/farmacologia , Macrófagos/efeitos dos fármacos , Macrófagos/metabolismo , Comunicação Parácrina , Fator de Transcrição STAT3/metabolismo , Vírion , Replicação Viral/efeitos dos fármacos , Produtos do Gene tat do Vírus da Imunodeficiência Humana
14.
Nat Commun ; 6: 5903, 2015 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-25582907

RESUMO

Mice have been a long-standing model for human biology and disease. Here we characterize, by RNA sequencing, the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates. Comparison with transcriptome profiles in human cell lines reveals substantial conservation of transcriptional programmes, and uncovers a distinct class of genes with levels of expression that have been constrained early in vertebrate evolution. This core set of genes captures a substantial fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types. Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer. Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but is associated with conserved epigenetic marking, as well as with characteristic post-transcriptional regulatory programme, in which sub-cellular localization and alternative splicing play comparatively large roles.


Assuntos
Evolução Molecular , Regulação da Expressão Gênica , Transcriptoma , Processamento Alternativo , Animais , Evolução Biológica , Linhagem Celular , Epigênese Genética , Perfilação da Expressão Gênica , Biblioteca Gênica , Genoma , Histonas/química , Humanos , Camundongos , Camundongos Endogâmicos C57BL , Modelos Genéticos , Oligonucleotídeos Antissenso , Fenótipo , Análise de Sequência de RNA
15.
PLoS One ; 7(1): e28213, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22238572

RESUMO

The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to human has blurred the definition and physical boundaries of genes. Using multiple analysis approaches we have characterized individual gene boundaries mapping on human chromosomes 21 and 22. Analyses of the locations of the 5' and 3' transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes. The biological and evolutionary importance of these chimeric transcripts is underscored by (1) the non-random interconnections of genes involved, (2) the greater phylogenetic depth of the genes involved in many chimeric interactions, (3) the coordination of the expression of connected genes and (4) the close in vivo and three dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs. The non-random nature of the connection of the genes involved suggest that chimeric transcripts should not be studied in isolation, but together, as an RNA network.


Assuntos
Células/metabolismo , Redes Reguladoras de Genes/fisiologia , RNA/fisiologia , Transcriptoma/fisiologia , Algoritmos , Proteínas Quimerinas/química , Proteínas Quimerinas/genética , Cromossomos Humanos Par 1/genética , Feminino , Perfilação da Expressão Gênica , Redes Reguladoras de Genes/genética , Humanos , Masculino , Análise em Microsséries/métodos , Modelos Biológicos , Técnicas de Amplificação de Ácido Nucleico/métodos , RNA/genética , Isoformas de RNA/química , Isoformas de RNA/genética , Isoformas de RNA/metabolismo , Transcrição Gênica/genética , Estudos de Validação como Assunto
16.
Genome Res ; 17(6): 746-59, 2007 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-17567994

RESUMO

This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5' rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5' distal to the annotated 5' terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations.


Assuntos
Mapeamento Cromossômico , Éxons , Genoma Humano , Regiões Promotoras Genéticas , Locos de Características Quantitativas , Transcrição Gênica/fisiologia , DNA Complementar/genética , Projeto Genoma Humano , Humanos , Fases de Leitura Aberta
17.
Genome Res ; 17(6): 852-64, 2007 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-17568003

RESUMO

Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack characteristic signals in primary sequence, comparative approaches evaluating evolutionary conservation of structures are most promising. We have used three recently introduced programs based on either phylogenetic-stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding several thousand candidate structures (corresponding to approximately 2.7% of the ENCODE regions). EvoFold has its highest sensitivity in highly conserved and relatively AU-rich regions, while RNAz favors slightly GC-rich regions, resulting in a relatively small overlap between methods. Comparison with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3'-UTRs. While we estimate a significant false discovery rate of approximately 50%-70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz and EvoFold, and an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Five hundred seventy RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e., conserved elements). More than 700 predictions overlap with noncoding transcripts detected by oligonucleotide tiling arrays. One hundred seventy-five selected candidates were tested by RT-PCR in six tissues, and expression could be verified in 43 cases (24.6%).


Assuntos
Regiões 3' não Traduzidas/genética , Sequência Rica em GC , Genoma Humano , Locos de Características Quantitativas , RNA não Traduzido/genética , Transcrição Gênica , Sequência de Bases , Humanos , Dados de Sequência Molecular , RNA Mensageiro/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa
18.
Science ; 316(5830): 1484-8, 2007 Jun 08.
Artigo em Inglês | MEDLINE | ID: mdl-17510325

RESUMO

Significant fractions of eukaryotic genomes give rise to RNA, much of which is unannotated and has reduced protein-coding potential. The genomic origins and the associations of human nuclear and cytosolic polyadenylated RNAs longer than 200 nucleotides (nt) and whole-cell RNAs less than 200 nt were investigated in this genome-wide study. Subcellular addresses for nucleotides present in detected RNAs were assigned, and their potential processing into short RNAs was investigated. Taken together, these observations suggest a novel role for some unannotated RNAs as primary transcripts for the production of short RNAs. Three potentially functional classes of RNAs have been identified, two of which are syntenically conserved and correlate with the expression state of protein-coding genes. These data support a highly interleaved organization of the human transcriptome.


Assuntos
Genoma Humano , Precursores de RNA/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , RNA/genética , Transcrição Gênica , Animais , Linhagem Celular Tumoral , Núcleo Celular/metabolismo , Citosol/metabolismo , Éxons , Expressão Gênica , Genoma , Células HeLa , Humanos , Camundongos , Regiões Promotoras Genéticas , RNA/metabolismo , Precursores de RNA/metabolismo , Sintenia , Regiões Terminadoras Genéticas
19.
Genome Res ; 15(7): 987-97, 2005 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-15998911

RESUMO

Recently, we mapped the sites of transcription across approximately 30% of the human genome and elucidated the structures of several hundred novel transcripts. In this report, we describe a novel combination of techniques including the rapid amplification of cDNA ends (RACE) and tiling array technologies that was used to further characterize transcripts in the human transcriptome. This technical approach allows for several important pieces of information to be gathered about each array-detected transcribed region, including strand of origin, start and termination positions, and the exonic structures of spliced and unspliced coding and noncoding RNAs. In this report, the structures of transcripts from 14 transcribed loci, representing both known genes and unannotated transcripts taken from the several hundred randomly selected unannotated transcripts described in our previous work are represented as examples of the complex organization of the human transcriptome. As a consequence of this complexity, it is not unusual that a single base pair can be part of an intricate network of multiple isoforms of overlapping sense and antisense transcripts, the majority of which are unannotated. Some of these transcripts follow the canonical splicing rules, whereas others combine the exons of different genes or represent other types of noncanonical transcripts. These results have important implications concerning the correlation of genotypes to phenotypes, the regulation of complex interlaced transcriptional patterns, and the definition of a gene.


Assuntos
Técnicas de Amplificação de Ácido Nucleico , Análise de Sequência com Séries de Oligonucleotídeos , Transcrição Gênica , Linhagem Celular , Perfilação da Expressão Gênica , Humanos , Células Jurkat , Modelos Genéticos , Dados de Sequência Molecular , Técnicas de Amplificação de Ácido Nucleico/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Isoformas de Proteínas/genética , Células Tumorais Cultivadas
20.
Science ; 308(5725): 1149-54, 2005 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-15790807

RESUMO

Sites of transcription of polyadenylated and nonpolyadenylated RNAs for 10 human chromosomes were mapped at 5-base pair resolution in eight cell lines. Unannotated, nonpolyadenylated transcripts comprise the major proportion of the transcriptional output of the human genome. Of all transcribed sequences, 19.4, 43.7, and 36.9% were observed to be polyadenylated, nonpolyadenylated, and bimorphic, respectively. Half of all transcribed sequences are found only in the nucleus and for the most part are unannotated. Overall, the transcribed portions of the human genome are predominantly composed of interlaced networks of both poly A+ and poly A- annotated transcripts and unannotated transcripts of unknown function. This organization has important implications for interpreting genotype-phenotype associations, regulation of gene expression, and the definition of a gene.


Assuntos
Cromossomos Humanos/genética , Genoma Humano , RNA Mensageiro/análise , Transcrição Gênica , Linhagem Celular , Linhagem Celular Tumoral , Núcleo Celular/metabolismo , Cromossomos Humanos Par 13/genética , Cromossomos Humanos Par 14/genética , Cromossomos Humanos Par 19/genética , Cromossomos Humanos Par 20/genética , Cromossomos Humanos Par 21/genética , Cromossomos Humanos Par 22/genética , Cromossomos Humanos Par 6/genética , Cromossomos Humanos Par 7/genética , Cromossomos Humanos X/genética , Cromossomos Humanos Y/genética , Biologia Computacional , Citosol/metabolismo , DNA Complementar , DNA Intergênico , Éxons , Feminino , Humanos , Íntrons , Masculino , Dados de Sequência Molecular , Técnicas de Amplificação de Ácido Nucleico , Análise de Sequência com Séries de Oligonucleotídeos , Mapeamento Físico do Cromossomo , Splicing de RNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA