Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 186(7): 1493-1511.e40, 2023 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-37001506

RESUMO

Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × âˆ¼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.


Assuntos
Epigenoma , Locos de Características Quantitativas , Estudo de Associação Genômica Ampla , Genômica , Fenótipo , Polimorfismo de Nucleotídeo Único
2.
Cell ; 157(2): 382-394, 2014 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-24725405

RESUMO

Missense mutations in the p53 tumor suppressor inactivate its antiproliferative properties but can also promote metastasis through a gain-of-function activity. We show that sustained expression of mutant p53 is required to maintain the prometastatic phenotype of a murine model of pancreatic cancer, a highly metastatic disease that frequently displays p53 mutations. Transcriptional profiling and functional screening identified the platelet-derived growth factor receptor b (PDGFRb) as both necessary and sufficient to mediate these effects. Mutant p53 induced PDGFRb through a cell-autonomous mechanism involving inhibition of a p73/NF-Y complex that represses PDGFRb expression in p53-deficient, noninvasive cells. Blocking PDGFRb signaling by RNA interference or by small molecule inhibitors prevented pancreatic cancer cell invasion in vitro and metastasis formation in vivo. Finally, high PDGFRb expression correlates with poor disease-free survival in pancreatic, colon, and ovarian cancer patients, implicating PDGFRb as a prognostic marker and possible target for attenuating metastasis in p53 mutant tumors.


Assuntos
Carcinoma Ductal Pancreático/metabolismo , Metástase Neoplásica , Neoplasias Pancreáticas/metabolismo , Receptor beta de Fator de Crescimento Derivado de Plaquetas/metabolismo , Proteína Supressora de Tumor p53/metabolismo , Animais , Carcinoma Ductal Pancreático/patologia , Modelos Animais de Doenças , Perfilação da Expressão Gênica , Humanos , Camundongos , Neoplasias Pancreáticas/genética , Neoplasias Pancreáticas/patologia , Proteína Supressora de Tumor p53/genética
3.
Nature ; 583(7818): 699-710, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32728249

RESUMO

The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.


Assuntos
DNA/genética , Bases de Dados Genéticas , Genoma/genética , Genômica , Anotação de Sequência Molecular , Sistema de Registros , Sequências Reguladoras de Ácido Nucleico/genética , Animais , Cromatina/genética , Cromatina/metabolismo , DNA/química , Pegada de DNA , Metilação de DNA/genética , Período de Replicação do DNA , Desoxirribonuclease I/metabolismo , Genoma Humano , Histonas/metabolismo , Humanos , Camundongos , Camundongos Transgênicos , Proteínas de Ligação a RNA/genética , Transcrição Gênica/genética , Transposases/metabolismo
4.
Genome Res ; 30(7): 1047-1059, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32759341

RESUMO

We have produced RNA sequencing data for 53 primary cells from different locations in the human body. The clustering of these primary cells reveals that most cells in the human body share a few broad transcriptional programs, which define five major cell types: epithelial, endothelial, mesenchymal, neural, and blood cells. These act as basic components of many tissues and organs. Based on gene expression, these cell types redefine the basic histological types by which tissues have been traditionally classified. We identified genes whose expression is specific to these cell types, and from these genes, we estimated the contribution of the major cell types to the composition of human tissues. We found this cellular composition to be a characteristic signature of tissues and to reflect tissue morphological heterogeneity and histology. We identified changes in cellular composition in different tissues associated with age and sex, and found that departures from the normal cellular composition correlate with histological phenotypes associated with disease.


Assuntos
Transcrição Gênica , Linhagem Celular , Células Endoteliais/metabolismo , Células Epiteliais/metabolismo , Feminino , Perfilação da Expressão Gênica , Ginecomastia/genética , Ginecomastia/metabolismo , Humanos , Masculino , Mesoderma/citologia , Mesoderma/metabolismo , Neoplasias/genética , Especificidade de Órgãos , Análise de Sequência de RNA
6.
Genome Res ; 29(11): 1900-1909, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31645363

RESUMO

MicroRNAs (miRNAs) play a critical role as posttranscriptional regulators of gene expression. The ENCODE Project profiled the expression of miRNAs in an extensive set of organs during a time-course of mouse embryonic development and captured the expression dynamics of 785 miRNAs. We found distinct organ-specific and developmental stage-specific miRNA expression clusters, with an overall pattern of increasing organ-specific expression as embryonic development proceeds. Comparative analysis of conserved miRNAs in mouse and human revealed stronger clustering of expression patterns by organ type rather than by species. An analysis of messenger RNA expression clusters compared with miRNA expression clusters identifies the potential role of specific miRNA expression clusters in suppressing the expression of mRNAs specific to other developmental programs in the organ in which these miRNAs are expressed during embryonic development. Our results provide the most comprehensive time-course of miRNA expression as part of an integrated ENCODE reference data set for mouse embryonic development.


Assuntos
Desenvolvimento Embrionário/genética , MicroRNAs/genética , Animais , Feminino , Regulação da Expressão Gênica no Desenvolvimento , Camundongos , Gravidez , RNA Mensageiro/genética
7.
Nature ; 512(7515): 393-9, 2014 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-24670639

RESUMO

Animal transcriptomes are dynamic, with each cell type, tissue and organ system expressing an ensemble of transcript isoforms that give rise to substantial diversity. Here we have identified new genes, transcripts and proteins using poly(A)+ RNA sequencing from Drosophila melanogaster in cultured cell lines, dissected organ systems and under environmental perturbations. We found that a small set of mostly neural-specific genes has the potential to encode thousands of transcripts each through extensive alternative promoter usage and RNA splicing. The magnitudes of splicing changes are larger between tissues than between developmental stages, and most sex-specific splicing is gonad-specific. Gonads express hundreds of previously unknown coding and long non-coding RNAs (lncRNAs), some of which are antisense to protein-coding genes and produce short regulatory RNAs. Furthermore, previously identified pervasive intergenic transcription occurs primarily within newly identified introns. The fly transcriptome is substantially more complex than previously recognized, with this complexity arising from combinatorial usage of promoters, splice sites and polyadenylation sites.


Assuntos
Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Transcriptoma/genética , Processamento Alternativo/genética , Animais , Drosophila melanogaster/anatomia & histologia , Drosophila melanogaster/citologia , Feminino , Masculino , Anotação de Sequência Molecular , Tecido Nervoso/metabolismo , Especificidade de Órgãos , Poli A/genética , Poliadenilação , Regiões Promotoras Genéticas/genética , RNA Longo não Codificante/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Caracteres Sexuais , Estresse Fisiológico/genética
8.
Nature ; 512(7515): 445-8, 2014 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-25164755

RESUMO

The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.


Assuntos
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Transcriptoma/genética , Animais , Caenorhabditis elegans/embriologia , Caenorhabditis elegans/crescimento & desenvolvimento , Cromatina/genética , Análise por Conglomerados , Drosophila melanogaster/crescimento & desenvolvimento , Regulação da Expressão Gênica no Desenvolvimento/genética , Histonas/metabolismo , Humanos , Larva/genética , Larva/crescimento & desenvolvimento , Modelos Genéticos , Anotação de Sequência Molecular , Regiões Promotoras Genéticas/genética , Pupa/genética , Pupa/crescimento & desenvolvimento , RNA não Traduzido/genética , Análise de Sequência de RNA
9.
Nucleic Acids Res ; 46(D1): D794-D801, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29126249

RESUMO

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center has developed the ENCODE Portal database and website as the source for the data and metadata generated by the ENCODE Consortium. Two principles have motivated the design. First, experimental protocols, analytical procedures and the data themselves should be made publicly accessible through a coherent, web-based search and download interface. Second, the same interface should serve carefully curated metadata that record the provenance of the data and justify its interpretation in biological terms. Since its initial release in 2013 and in response to recommendations from consortium members and the wider community of scientists who use the Portal to access ENCODE data, the Portal has been regularly updated to better reflect these design principles. Here we report on these updates, including results from new experiments, uniformly-processed data from other projects, new visualization tools and more comprehensive metadata to describe experiments and analyses. Additionally, the Portal is now home to meta(data) from related projects including Genomics of Gene Regulation, Roadmap Epigenome Project, Model organism ENCODE (modENCODE) and modERN. The Portal now makes available over 13000 datasets and their accompanying metadata and can be accessed at: https://www.encodeproject.org/.


Assuntos
DNA/genética , Bases de Dados Genéticas , Componentes do Gene , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Metadados , Animais , Caenorhabditis elegans/genética , Apresentação de Dados , Conjuntos de Dados como Assunto , Drosophila melanogaster/genética , Previsões , Genoma Humano , Humanos , Camundongos/genética , Interface Usuário-Computador
10.
Nature ; 489(7414): 101-8, 2012 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-22955620

RESUMO

Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.


Assuntos
DNA/genética , Enciclopédias como Assunto , Genoma Humano/genética , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico/genética , Transcrição Gênica/genética , Transcriptoma/genética , Alelos , Linhagem Celular , DNA Intergênico/genética , Elementos Facilitadores Genéticos , Éxons/genética , Perfilação da Expressão Gênica , Genes/genética , Genômica , Humanos , Poliadenilação/genética , Isoformas de Proteínas/genética , RNA/biossíntese , RNA/genética , Edição de RNA/genética , Splicing de RNA/genética , Sequências Repetitivas de Ácido Nucleico/genética , Análise de Sequência de RNA
11.
Nature ; 471(7339): 473-9, 2011 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-21179090

RESUMO

Drosophila melanogaster is one of the most well studied genetic model organisms; nonetheless, its genome still contains unannotated coding and non-coding genes, transcripts, exons and RNA editing sites. Full discovery and annotation are pre-requisites for understanding how the regulation of transcription, splicing and RNA editing directs the development of this complex organism. Here we used RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events, and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. These data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.


Assuntos
Drosophila melanogaster/crescimento & desenvolvimento , Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento/genética , Transcrição Gênica/genética , Processamento Alternativo/genética , Animais , Sequência de Bases , Proteínas de Drosophila/genética , Drosophila melanogaster/embriologia , Éxons/genética , Feminino , Genes de Insetos/genética , Genoma de Inseto/genética , Masculino , MicroRNAs/genética , Análise de Sequência com Séries de Oligonucleotídeos , Isoformas de Proteínas/genética , Edição de RNA/genética , RNA Mensageiro/análise , RNA Mensageiro/genética , Pequeno RNA não Traduzido/análise , Pequeno RNA não Traduzido/genética , Análise de Sequência , Caracteres Sexuais
12.
Proc Natl Acad Sci U S A ; 111(48): 17224-9, 2014 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-25413365

RESUMO

Although the similarities between humans and mice are typically highlighted, morphologically and genetically, there are many differences. To better understand these two species on a molecular level, we performed a comparison of the expression profiles of 15 tissues by deep RNA sequencing and examined the similarities and differences in the transcriptome for both protein-coding and -noncoding transcripts. Although commonalities are evident in the expression of tissue-specific genes between the two species, the expression for many sets of genes was found to be more similar in different tissues within the same species than between species. These findings were further corroborated by associated epigenetic histone mark analyses. We also find that many noncoding transcripts are expressed at a low level and are not detectable at appreciable levels across individuals. Moreover, the majority lack obvious sequence homologs between species, even when we restrict our attention to those which are most highly reproducible across biological replicates. Overall, our results indicate that there is considerable RNA expression diversity between humans and mice, well beyond what was described previously, likely reflecting the fundamental physiological differences between these two organisms.


Assuntos
DNA Intergênico/genética , Perfilação da Expressão Gênica/métodos , Especificidade de Órgãos/genética , Proteínas/genética , Animais , Epigenômica/métodos , Evolução Molecular , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Camundongos Endogâmicos C57BL , Análise de Sequência de RNA , Especificidade da Espécie , Transcriptoma/genética
13.
Vet Anaesth Analg ; 44(4): 727-737, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-28624496

RESUMO

OBJECTIVE: To determine the effect of fentanyl on the induction dose of propofol and minimum infusion rate required to prevent movement in response to noxious stimulation (MIRNM) in dogs. STUDY DESIGN: Crossover experimental design. ANIMALS: Six healthy, adult intact male Beagle dogs, mean±standard deviation 12.6±0.4 kg. METHODS: Dogs were administered 0.9% saline (treatment P), fentanyl (5 µg kg-1) (treatment PLDF) or fentanyl (10 µg kg-1) (treatment PHDF) intravenously over 5 minutes. Five minutes later, anesthesia was induced with propofol (2 mg kg-1, followed by 1 mg kg-1 every 15 seconds to achieve intubation) and maintained for 90 minutes by constant rate infusions (CRIs) of propofol alone or with fentanyl: P, propofol (0.5 mg kg-1 minute-1); PLDF, propofol (0.35 mg kg-1 minute-1) and fentanyl (0.1 µg kg-1 minute-1); PHDF, propofol (0.3 mg kg-1 minute-1) and fentanyl (0.2 µg kg-1 minute-1). Propofol CRI was increased or decreased based on the response to stimulation (50 V, 50 Hz, 10 mA), with 20 minutes between adjustments. Data were analyzed using a mixed-model anova and presented as mean±standard error. RESULTS: ropofol induction doses were 6.16±0.31, 3.67±0.21 and 3.33±0.42 mg kg-1 for P, PLDF and PHDF, respectively. Doses for PLDF and PHDF were significantly decreased from P (p<0.05) but not different between treatments. Propofol MIRNM was 0.60±0.04, 0.29±0.02 and 0.22±0.02 mg kg-1 minute-1 for P, PLDF and PHDF, respectively. MIRNM in PLDF and PHDF was significantly decreased from P. MIRNM in PLDF and PHDF were not different, but their respective percent decreases of 51±3 and 63±2% differed (p=0.035). CONCLUSIONS AND CLINICAL RELEVANCE: Fentanyl, at the doses studied, caused statistically significant and clinically important decreases in the propofol induction dose and MIRNM.


Assuntos
Anestesia Intravenosa/veterinária , Anestésicos Intravenosos , Fentanila/farmacologia , Propofol , Anestesia Intravenosa/métodos , Anestésicos Combinados/administração & dosagem , Anestésicos Combinados/farmacologia , Anestésicos Intravenosos/administração & dosagem , Animais , Cães , Infusões Intravenosas/veterinária , Masculino , Movimento/efeitos dos fármacos , Propofol/administração & dosagem
14.
Genome Res ; 22(9): 1616-25, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22955974

RESUMO

Splicing remains an incompletely understood process. Recent findings suggest that chromatin structure participates in its regulation. Here, we analyze the RNA from subcellular fractions obtained through RNA-seq in the cell line K562. We show that in the human genome, splicing occurs predominantly during transcription. We introduce the coSI measure, based on RNA-seq reads mapping to exon junctions and borders, to assess the degree of splicing completion around internal exons. We show that, as expected, splicing is almost fully completed in cytosolic polyA+ RNA. In chromatin-associated RNA (which includes the RNA that is being transcribed), for 5.6% of exons, the removal of the surrounding introns is fully completed, compared with 0.3% of exons for which no intron-removal has occurred. The remaining exons exist as a mixture of spliced and fewer unspliced molecules, with a median coSI of 0.75. Thus, most RNAs undergo splicing while being transcribed: "co-transcriptional splicing." Consistent with co-transcriptional spliceosome assembly and splicing, we have found significant enrichment of spliceosomal snRNAs in chromatin-associated RNA compared with other cellular RNA fractions and other nonspliceosomal snRNAs. CoSI scores decrease along the gene, pointing to a "first transcribed, first spliced" rule, yet more downstream exons carry other characteristics, favoring rapid, co-transcriptional intron removal. Exons with low coSI values, that is, in the process of being spliced, are enriched with chromatin marks, consistent with a role for chromatin in splicing during transcription. For alternative exons and long noncoding RNAs, splicing tends to occur later, and the latter might remain unspliced in some cases.


Assuntos
Genoma Humano , Splicing de RNA , RNA Longo não Codificante/metabolismo , Transcrição Gênica , Cromatina/metabolismo , Análise por Conglomerados , Biologia Computacional/métodos , Éxons , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , RNA/genética , RNA/metabolismo , Análise de Sequência de RNA , Spliceossomos/genética , Spliceossomos/metabolismo , Frações Subcelulares/química
15.
Genome Res ; 22(9): 1658-67, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22955978

RESUMO

Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.


Assuntos
Regulação da Expressão Gênica , Genômica , Fatores de Transcrição/metabolismo , Transcrição Gênica , Composição de Bases , Sítios de Ligação/genética , Linhagem Celular , Cromatina/genética , Cromatina/metabolismo , Biologia Computacional/métodos , Histonas/genética , Humanos , Modelos Biológicos , Regiões Promotoras Genéticas , Ligação Proteica/genética , Sítio de Iniciação de Transcrição
16.
Genome Res ; 22(9): 1775-89, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22955988

RESUMO

The human genome contains many thousands of long noncoding RNAs (lncRNAs). While several studies have demonstrated compelling biological and disease roles for individual examples, analytical and experimental approaches to investigate these genes have been hampered by the lack of comprehensive lncRNA annotation. Here, we present and analyze the most complete human lncRNA annotation to date, produced by the GENCODE consortium within the framework of the ENCODE project and comprising 9277 manually annotated genes producing 14,880 transcripts. Our analyses indicate that lncRNAs are generated through pathways similar to that of protein-coding genes, with similar histone-modification profiles, splicing signals, and exon/intron lengths. In contrast to protein-coding genes, however, lncRNAs display a striking bias toward two-exon transcripts, they are predominantly localized in the chromatin and nucleus, and a fraction appear to be preferentially processed into small RNAs. They are under stronger selective pressure than neutrally evolving sequences-particularly in their promoter regions, which display levels of selection comparable to protein-coding genes. Importantly, about one-third seem to have arisen within the primate lineage. Comprehensive analysis of their expression in multiple human organs and brain regions shows that lncRNAs are generally lower expressed than protein-coding genes, and display more tissue-specific expression patterns, with a large fraction of tissue-specific lncRNAs expressed in the brain. Expression correlation analysis indicates that lncRNAs show particularly striking positive correlation with the expression of antisense coding genes. This GENCODE annotation represents a valuable resource for future studies of lncRNAs.


Assuntos
Bases de Dados Genéticas , RNA Longo não Codificante/genética , Processamento Alternativo , Animais , Núcleo Celular/genética , Núcleo Celular/metabolismo , Análise por Conglomerados , Evolução Molecular , Éxons , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Histonas/metabolismo , Humanos , Anotação de Sequência Molecular , Fases de Leitura Aberta , Especificidade de Órgãos/genética , Primatas/genética , Processamento Pós-Transcricional do RNA , Sítios de Splice de RNA , RNA Mensageiro/genética , Seleção Genética , Transcrição Gênica
17.
Genome Res ; 21(9): 1543-51, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21816910

RESUMO

High-throughput sequencing of cDNA (RNA-seq) is a widely deployed transcriptome profiling and annotation technique, but questions about the performance of different protocols and platforms remain. We used a newly developed pool of 96 synthetic RNAs with various lengths, and GC content covering a 2(20) concentration range as spike-in controls to measure sensitivity, accuracy, and biases in RNA-seq experiments as well as to derive standard curves for quantifying the abundance of transcripts. We observed linearity between read density and RNA input over the entire detection range and excellent agreement between replicates, but we observed significantly larger imprecision than expected under pure Poisson sampling errors. We use the control RNAs to directly measure reproducible protocol-dependent biases due to GC content and transcript length as well as stereotypic heterogeneity in coverage across transcripts correlated with position relative to RNA termini and priming sequence bias. These effects lead to biased quantification for short transcripts and individual exons, which is a serious problem for measurements of isoform abundances, but that can partially be corrected using appropriate models of bias. By using the control RNAs, we derive limits for the discovery and detection of rare transcripts in RNA-seq experiments. By using data collected as part of the model organism and human Encyclopedia of DNA Elements projects (ENCODE and modENCODE), we demonstrate that external RNA controls are a useful resource for evaluating sensitivity and accuracy of RNA-seq experiments for transcriptome discovery and quantification. These quality metrics facilitate comparable analysis across different samples, protocols, and platforms.


Assuntos
RNA/química , Análise de Sequência de RNA/normas , Animais , Viés , Perfilação da Expressão Gênica , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala/normas , Humanos , Controle de Qualidade , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
18.
Genome Res ; 21(2): 301-14, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21177962

RESUMO

Drosophila melanogaster cell lines are important resources for cell biologists. Here, we catalog the expression of exons, genes, and unannotated transcriptional signals for 25 lines. Unannotated transcription is substantial (typically 19% of euchromatic signal). Conservatively, we identify 1405 novel transcribed regions; 684 of these appear to be new exons of neighboring, often distant, genes. Sixty-four percent of genes are expressed detectably in at least one line, but only 21% are detected in all lines. Each cell line expresses, on average, 5885 genes, including a common set of 3109. Expression levels vary over several orders of magnitude. Major signaling pathways are well represented: most differentiation pathways are "off" and survival/growth pathways "on." Roughly 50% of the genes expressed by each line are not part of the common set, and these show considerable individuality. Thirty-one percent are expressed at a higher level in at least one cell line than in any single developmental stage, suggesting that each line is enriched for genes characteristic of small sets of cells. Most remarkable is that imaginal disc-derived lines can generally be assigned, on the basis of expression, to small territories within developing discs. These mappings reveal unexpected stability of even fine-grained spatial determination. No two cell lines show identical transcription factor expression. We conclude that each line has retained features of an individual founder cell superimposed on a common "cell line" gene expression pattern.


Assuntos
Drosophila melanogaster/genética , Variação Genética , Transcrição Gênica , Animais , Linhagem Celular , Análise por Conglomerados , Éxons , Feminino , Perfilação da Expressão Gênica , Masculino , Dados de Sequência Molecular , Transdução de Sinais/genética , Fatores de Transcrição/genética
19.
Bioinformatics ; 29(1): 15-21, 2013 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-23104886

RESUMO

MOTIVATION: Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. RESULTS: To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. AVAILABILITY AND IMPLEMENTATION: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.


Assuntos
Alinhamento de Sequência/métodos , Software , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica , Genoma Humano , Humanos , Splicing de RNA , Análise de Sequência de RNA/métodos
20.
Nat Methods ; 7(7): 528-34, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20543846

RESUMO

Large-scale sequencing projects have revealed an unexpected complexity in the origins, structures and functions of mammalian transcripts. Many loci are known to produce overlapping coding and noncoding RNAs with capped 5' ends that vary in size. Methods to identify the 5' ends of transcripts will facilitate the discovery of new promoters and 5' ends derived from secondary capping events. Such methods often require high input amounts of RNA not obtainable from highly refined samples such as tissue microdissections and subcellular fractions. Therefore, we developed nano-cap analysis of gene expression (nanoCAGE), a method that captures the 5' ends of transcripts from as little as 10 ng of total RNA, and CAGEscan, a mate-pair adaptation of nanoCAGE that captures the transcript 5' ends linked to a downstream region. Both of these methods allow further annotation-agnostic studies of the complex human transcriptome.


Assuntos
Perfilação da Expressão Gênica , Regulação da Expressão Gênica/fisiologia , Nanotecnologia/métodos , Regiões Promotoras Genéticas/fisiologia , RNA/metabolismo , Genoma Humano , Humanos , RNA/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA