Pesquisa | Biblioteca Virtual em Saúde Fiocruz

1.

Expanded encyclopaedias of DNA elements in the human and mouse genomes.

Moore, Jill E; Purcaro, Michael J; Pratt, Henry E; Epstein, Charles B; Shoresh, Noam; Adrian, Jessika; Kawli, Trupti; Davis, Carrie A; Dobin, Alexander; Kaul, Rajinder; Halow, Jessica; Van Nostrand, Eric L; Freese, Peter; Gorkin, David U; Shen, Yin; He, Yupeng; Mackiewicz, Mark; Pauli-Behn, Florencia; Williams, Brian A; Mortazavi, Ali; Keller, Cheryl A; Zhang, Xiao-Ou; Elhajjajy, Shaimae I; Huey, Jack; Dickel, Diane E; Snetkova, Valentina; Wei, Xintao; Wang, Xiaofeng; Rivera-Mulia, Juan Carlos; Rozowsky, Joel; Zhang, Jing; Chhetri, Surya B; Zhang, Jialing; Victorsen, Alec; White, Kevin P; Visel, Axel; Yeo, Gene W; Burge, Christopher B; Lécuyer, Eric; Gilbert, David M; Dekker, Job; Rinn, John; Mendenhall, Eric M; Ecker, Joseph R; Kellis, Manolis; Klein, Robert J; Noble, William S; Kundaje, Anshul; Guigó, Roderic; Farnham, Peggy J.

Nature ; 583(7818): 699-710, 2020 07.

Artigo em Inglês | MEDLINE | ID: mdl-32728249

RESUMO

The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.

Assuntos

DNA/genética , Bases de Dados Genéticas , Genoma/genética , Genômica , Anotação de Sequência Molecular , Sistema de Registros , Sequências Reguladoras de Ácido Nucleico/genética , Animais , Cromatina/genética , Cromatina/metabolismo , DNA/química , Pegada de DNA , Metilação de DNA/genética , Período de Replicação do DNA , Desoxirribonuclease I/metabolismo , Genoma Humano , Histonas/metabolismo , Humanos , Camundongos , Camundongos Transgênicos , Proteínas de Ligação a RNA/genética , Transcrição Gênica/genética , Transposases/metabolismo

2.

Myo-differentiation reporter screen reveals NF-Y as an activator of PAX3-FOXO1 in rhabdomyosarcoma.

Sroka, Martyna W; Skopelitis, Damianos; Vermunt, Marit W; Preall, Jonathan B; El Demerdash, Osama; de Almeida, Larissa M N; Chang, Kenneth; Utama, Raditya; Gryder, Berkley; Caligiuri, Giuseppina; Ren, Diqiu; Nalbant, Benan; Milazzo, Joseph P; Tuveson, David A; Dobin, Alexander; Hiebert, Scott W; Stengel, Kristy R; Mantovani, Roberto; Khan, Javed; Kohli, Rahul M; Shi, Junwei; Blobel, Gerd A; Vakoc, Christopher R.

Proc Natl Acad Sci U S A ; 120(36): e2303859120, 2023 09 05.

Artigo em Inglês | MEDLINE | ID: mdl-37639593

RESUMO

Recurrent chromosomal rearrangements found in rhabdomyosarcoma (RMS) produce the PAX3-FOXO1 fusion protein, which is an oncogenic driver and a dependency in this disease. One important function of PAX3-FOXO1 is to arrest myogenic differentiation, which is linked to the ability of RMS cells to gain an unlimited proliferation potential. Here, we developed a phenotypic screening strategy for identifying factors that collaborate with PAX3-FOXO1 to block myo-differentiation in RMS. Unlike most genes evaluated in our screen, we found that loss of any of the three subunits of the Nuclear Factor Y (NF-Y) complex leads to a myo-differentiation phenotype that resembles the effect of inactivating PAX3-FOXO1. While the transcriptomes of NF-Y- and PAX3-FOXO1-deficient RMS cells bear remarkable similarity to one another, we found that these two transcription factors occupy nonoverlapping sites along the genome: NF-Y preferentially occupies promoters, whereas PAX3-FOXO1 primarily binds to distal enhancers. By integrating multiple functional approaches, we map the PAX3 promoter as the point of intersection between these two regulators. We show that NF-Y occupies CCAAT motifs present upstream of PAX3 to function as a transcriptional activator of PAX3-FOXO1 expression in RMS. These findings reveal a critical upstream role of NF-Y in the oncogenic PAX3-FOXO1 pathway, highlighting how a broadly essential transcription factor can perform tumor-specific roles in governing cellular state.

Assuntos

Rabdomiossarcoma , Fator de Ligação a CCAAT/genética , Diferenciação Celular/genética , Aberrações Cromossômicas , Rabdomiossarcoma/genética , Fatores de Transcrição

3.

Pan-human consensus genome significantly improves the accuracy of RNA-seq analyses.

Kaminow, Benjamin; Ballouz, Sara; Gillis, Jesse; Dobin, Alexander.

Genome Res ; 32(4): 738-749, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-35256454

RESUMO

The Human Reference Genome serves as the foundation for modern genomic analyses. However, in its present form, it does not adequately represent the vast genetic diversity of the human population. In this study, we explored the consensus genome as a potential successor of the current reference genome and assessed its effect on the accuracy of RNA-seq read alignment. To find the best haploid genome representation, we constructed consensus genomes at the pan-human, superpopulation, and population levels, using variant information from The 1000 Genomes Project Consortium. Using personal haploid genomes as the ground truth, we compared mapping errors for real RNA-seq reads aligned to the consensus genomes versus the reference genome. For reads overlapping homozygous variants, we found that the mapping error decreased by a factor of approximately two to three when the reference was replaced with the pan-human consensus genome. We also found that using more population-specific consensuses resulted in little to no increase over using the pan-human consensus, suggesting a limit in the utility of incorporating a more specific genomic variation. Replacing the reference with consensus genomes impacts functional analyses, such as differential expressions of isoforms, genes, and splice junctions.

Assuntos

Genoma Humano , Genômica , Consenso , Genômica/métodos , Humanos , RNA-Seq , Sequenciamento do Exoma

4.

A mucus production programme promotes classical pancreatic ductal adenocarcinoma.

Tonelli, Claudia; Yordanov, Georgi N; Hao, Yuan; Deschênes, Astrid; Hinds, Juliene; Belleau, Pascal; Klingbeil, Olaf; Brosnan, Erin; Doshi, Abhishek; Park, Youngkyu; Hruban, Ralph H; Vakoc, Christopher R; Dobin, Alexander; Preall, Jonathan; Tuveson, David A.

Gut ; 73(6): 941-954, 2024 May 10.

Artigo em Inglês | MEDLINE | ID: mdl-38262672

RESUMO

OBJECTIVE: The optimal therapeutic response in cancer patients is highly dependent upon the differentiation state of their tumours. Pancreatic ductal adenocarcinoma (PDA) is a lethal cancer that harbours distinct phenotypic subtypes with preferential sensitivities to standard therapies. This study aimed to investigate intratumour heterogeneity and plasticity of cancer cell states in PDA in order to reveal cell state-specific regulators. DESIGN: We analysed single-cell expression profiling of mouse PDAs, revealing intratumour heterogeneity and cell plasticity and identified pathways activated in the different cell states. We performed comparative analysis of murine and human expression states and confirmed their phenotypic diversity in specimens by immunolabeling. We assessed the function of phenotypic regulators using mouse models of PDA, organoids, cell lines and orthotopically grafted tumour models. RESULTS: Our expression analysis and immunolabeling analysis show that a mucus production programme regulated by the transcription factor SPDEF is highly active in precancerous lesions and the classical subtype of PDA - the most common differentiation state. SPDEF maintains the classical differentiation and supports PDA transformation in vivo. The SPDEF tumour-promoting function is mediated by its target genes AGR2 and ERN2/IRE1ß that regulate mucus production, and inactivation of the SPDEF programme impairs tumour growth and facilitates subtype interconversion from classical towards basal-like differentiation. CONCLUSIONS: Our findings expand our understanding of the transcriptional programmes active in precancerous lesions and PDAs of classical differentiation, determine the regulators of mucus production as specific vulnerabilities in these cell states and reveal phenotype switching as a response mechanism to inactivation of differentiation states determinants.

Assuntos

Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Carcinoma Ductal Pancreático/patologia , Carcinoma Ductal Pancreático/genética , Carcinoma Ductal Pancreático/metabolismo , Animais , Neoplasias Pancreáticas/patologia , Neoplasias Pancreáticas/genética , Neoplasias Pancreáticas/metabolismo , Camundongos , Humanos , Muco/metabolismo , Mucoproteínas/metabolismo , Mucoproteínas/genética , Linhagem Celular Tumoral , Diferenciação Celular , Proteínas Serina-Treonina Quinases/metabolismo , Proteínas Serina-Treonina Quinases/genética , Proteínas/metabolismo , Proteínas/genética , Organoides/patologia , Organoides/metabolismo , Plasticidade Celular , Regulação Neoplásica da Expressão Gênica , Modelos Animais de Doenças , Proteínas Oncogênicas

5.

The long noncoding RNA ROCKI regulates inflammatory gene expression.

Zhang, Qiong; Chao, Ti-Chun; Patil, Veena S; Qin, Yue; Tiwari, Shashi Kant; Chiou, Joshua; Dobin, Alexander; Tsai, Chih-Ming; Li, Zhonghan; Dang, Jason; Gupta, Shagun; Urdahl, Kevin; Nizet, Victor; Gingeras, Thomas R; Gaulton, Kyle J; Rana, Tariq M.

EMBO J ; 38(8)2019 04 15.

Artigo em Inglês | MEDLINE | ID: mdl-30918008

RESUMO

Long noncoding RNAs (lncRNAs) can regulate target gene expression by acting in cis (locally) or in trans (non-locally). Here, we performed genome-wide expression analysis of Toll-like receptor (TLR)-stimulated human macrophages to identify pairs of cis-acting lncRNAs and protein-coding genes involved in innate immunity. A total of 229 gene pairs were identified, many of which were commonly regulated by signaling through multiple TLRs and were involved in the cytokine responses to infection by group B Streptococcus We focused on elucidating the function of one lncRNA, named lnc-MARCKS or ROCKI (Regulator of Cytokines and Inflammation), which was induced by multiple TLR stimuli and acted as a master regulator of inflammatory responses. ROCKI interacted with APEX1 (apurinic/apyrimidinic endodeoxyribonuclease 1) to form a ribonucleoprotein complex at the MARCKS promoter. In turn, ROCKI-APEX1 recruited the histone deacetylase HDAC1, which removed the H3K27ac modification from the promoter, thus reducing MARCKS transcription and subsequent Ca2+ signaling and inflammatory gene expression. Finally, genetic variants affecting ROCKI expression were linked to a reduced risk of certain inflammatory and infectious disease in humans, including inflammatory bowel disease and tuberculosis. Collectively, these data highlight the importance of cis-acting lncRNAs in TLR signaling, innate immunity, and pathophysiological inflammation.

Assuntos

Regulação da Expressão Gênica , Imunidade Inata/imunologia , Inflamação/imunologia , Macrófagos/imunologia , RNA Longo não Codificante/metabolismo , Infecções Estreptocócicas/microbiologia , Receptores Toll-Like/metabolismo , Células Cultivadas , Citocinas/metabolismo , DNA Liase (Sítios Apurínicos ou Apirimidínicos)/genética , DNA Liase (Sítios Apurínicos ou Apirimidínicos)/metabolismo , Genoma Humano , Histona Desacetilase 1/genética , Histona Desacetilase 1/metabolismo , Humanos , Inflamação/genética , Inflamação/microbiologia , Macrófagos/metabolismo , Macrófagos/microbiologia , Substrato Quinase C Rico em Alanina Miristoilada/genética , Substrato Quinase C Rico em Alanina Miristoilada/metabolismo , Regiões Promotoras Genéticas , RNA Longo não Codificante/genética , Infecções Estreptocócicas/imunologia , Streptococcus agalactiae/isolamento & purificação , Receptores Toll-Like/genética

6.

A limited set of transcriptional programs define major cell types.

Breschi, Alessandra; Muñoz-Aguirre, Manuel; Wucher, Valentin; Davis, Carrie A; Garrido-Martín, Diego; Djebali, Sarah; Gillis, Jesse; Pervouchine, Dmitri D; Vlasova, Anna; Dobin, Alexander; Zaleski, Chris; Drenkow, Jorg; Danyko, Cassidy; Scavelli, Alexandra; Reverter, Ferran; Snyder, Michael P; Gingeras, Thomas R; Guigó, Roderic.

Genome Res ; 30(7): 1047-1059, 2020 07.

Artigo em Inglês | MEDLINE | ID: mdl-32759341

RESUMO

We have produced RNA sequencing data for 53 primary cells from different locations in the human body. The clustering of these primary cells reveals that most cells in the human body share a few broad transcriptional programs, which define five major cell types: epithelial, endothelial, mesenchymal, neural, and blood cells. These act as basic components of many tissues and organs. Based on gene expression, these cell types redefine the basic histological types by which tissues have been traditionally classified. We identified genes whose expression is specific to these cell types, and from these genes, we estimated the contribution of the major cell types to the composition of human tissues. We found this cellular composition to be a characteristic signature of tissues and to reflect tissue morphological heterogeneity and histology. We identified changes in cellular composition in different tissues associated with age and sex, and found that departures from the normal cellular composition correlate with histological phenotypes associated with disease.

Assuntos

Transcrição Gênica , Linhagem Celular , Células Endoteliais/metabolismo , Células Epiteliais/metabolismo , Feminino , Perfilação da Expressão Gênica , Ginecomastia/genética , Ginecomastia/metabolismo , Humanos , Masculino , Mesoderma/citologia , Mesoderma/metabolismo , Neoplasias/genética , Especificidade de Órgãos , Análise de Sequência de RNA

7.

Author Correction: Expanded encyclopaedias of DNA elements in the human and mouse genomes.

Moore, Jill E; Purcaro, Michael J; Pratt, Henry E; Epstein, Charles B; Shoresh, Noam; Adrian, Jessika; Kawli, Trupti; Davis, Carrie A; Dobin, Alexander; Kaul, Rajinder; Halow, Jessica; Van Nostrand, Eric L; Freese, Peter; Gorkin, David U; Shen, Yin; He, Yupeng; Mackiewicz, Mark; Pauli-Behn, Florencia; Williams, Brian A; Mortazavi, Ali; Keller, Cheryl A; Zhang, Xiao-Ou; Elhajjajy, Shaimae I; Huey, Jack; Dickel, Diane E; Snetkova, Valentina; Wei, Xintao; Wang, Xiaofeng; Rivera-Mulia, Juan Carlos; Rozowsky, Joel; Zhang, Jing; Chhetri, Surya B; Zhang, Jialing; Victorsen, Alec; White, Kevin P; Visel, Axel; Yeo, Gene W; Burge, Christopher B; Lécuyer, Eric; Gilbert, David M; Dekker, Job; Rinn, John; Mendenhall, Eric M; Ecker, Joseph R; Kellis, Manolis; Klein, Robert J; Noble, William S; Kundaje, Anshul; Guigó, Roderic; Farnham, Peggy J.

Nature ; 605(7909): E3, 2022 May.

Artigo em Inglês | MEDLINE | ID: mdl-35474001

8.

The fractured landscape of RNA-seq alignment: the default in our STARs.

Ballouz, Sara; Dobin, Alexander; Gingeras, Thomas R; Gillis, Jesse.

Nucleic Acids Res ; 46(10): 5125-5138, 2018 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-29718481

RESUMO

Many tools are available for RNA-seq alignment and expression quantification, with comparative value being hard to establish. Benchmarking assessments often highlight methods' good performance, but are focused on either model data or fail to explain variation in performance. This leaves us to ask, what is the most meaningful way to assess different alignment choices? And importantly, where is there room for progress? In this work, we explore the answers to these two questions by performing an exhaustive assessment of the STAR aligner. We assess STAR's performance across a range of alignment parameters using common metrics, and then on biologically focused tasks. We find technical metrics such as fraction mapping or expression profile correlation to be uninformative, capturing properties unlikely to have any role in biological discovery. Surprisingly, we find that changes in alignment parameters within a wide range have little impact on both technical and biological performance. Yet, when performance finally does break, it happens in difficult regions, such as X-Y paralogs and MHC genes. We believe improved reporting by developers will help establish where results are likely to be robust or fragile, providing a better baseline to establish where methodological progress can still occur.

Assuntos

Expressão Gênica , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Software , Algoritmos , Cromossomos Humanos Y , Bases de Dados Genéticas , Feminino , Humanos , Masculino , Fatores Sexuais

9.

Comparison of the transcriptional landscapes between human and mouse tissues.

Lin, Shin; Lin, Yiing; Nery, Joseph R; Urich, Mark A; Breschi, Alessandra; Davis, Carrie A; Dobin, Alexander; Zaleski, Christopher; Beer, Michael A; Chapman, William C; Gingeras, Thomas R; Ecker, Joseph R; Snyder, Michael P.

Proc Natl Acad Sci U S A ; 111(48): 17224-9, 2014 Dec 02.

Artigo em Inglês | MEDLINE | ID: mdl-25413365

RESUMO

Although the similarities between humans and mice are typically highlighted, morphologically and genetically, there are many differences. To better understand these two species on a molecular level, we performed a comparison of the expression profiles of 15 tissues by deep RNA sequencing and examined the similarities and differences in the transcriptome for both protein-coding and -noncoding transcripts. Although commonalities are evident in the expression of tissue-specific genes between the two species, the expression for many sets of genes was found to be more similar in different tissues within the same species than between species. These findings were further corroborated by associated epigenetic histone mark analyses. We also find that many noncoding transcripts are expressed at a low level and are not detectable at appreciable levels across individuals. Moreover, the majority lack obvious sequence homologs between species, even when we restrict our attention to those which are most highly reproducible across biological replicates. Overall, our results indicate that there is considerable RNA expression diversity between humans and mice, well beyond what was described previously, likely reflecting the fundamental physiological differences between these two organisms.

Assuntos

DNA Intergênico/genética , Perfilação da Expressão Gênica/métodos , Especificidade de Órgãos/genética , Proteínas/genética , Animais , Epigenômica/métodos , Evolução Molecular , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Camundongos Endogâmicos C57BL , Análise de Sequência de RNA , Especificidade da Espécie , Transcriptoma/genética

10.

High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression.

Batut, Philippe; Dobin, Alexander; Plessy, Charles; Carninci, Piero; Gingeras, Thomas R.

Genome Res ; 23(1): 169-80, 2013 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-22936248

RESUMO

Many eukaryotic genes possess multiple alternative promoters with distinct expression specificities. Therefore, comprehensively annotating promoters and deciphering their individual regulatory dynamics is critical for gene expression profiling applications and for our understanding of regulatory complexity. We introduce RAMPAGE, a novel promoter activity profiling approach that combines extremely specific 5'-complete cDNA sequencing with an integrated data analysis workflow, to address the limitations of current techniques. RAMPAGE features a streamlined protocol for fast and easy generation of highly multiplexed sequencing libraries, offers very high transcription start site specificity, generates accurate and reproducible promoter expression measurements, and yields extensive transcript connectivity information through paired-end cDNA sequencing. We used RAMPAGE in a genome-wide study of promoter activity throughout 36 stages of the life cycle of Drosophila melanogaster, and describe here a comprehensive data set that represents the first available developmental time-course of promoter usage. We found that >40% of developmentally expressed genes have at least two promoters and that alternative promoters generally implement distinct regulatory programs. Transposable elements, long proposed to play a central role in the evolution of their host genomes through their ability to regulate gene expression, contribute at least 1300 promoters shaping the developmental transcriptome of D. melanogaster. Hundreds of these promoters drive the expression of annotated genes, and transposons often impart their own expression specificity upon the genes they regulate. These observations provide support for the theory that transposons may drive regulatory innovation through the distribution of stereotyped cis-regulatory modules throughout their host genomes.

Assuntos

Elementos de DNA Transponíveis , Regulação da Expressão Gênica no Desenvolvimento , Regiões Promotoras Genéticas , Animais , Drosophila melanogaster/genética , Drosophila melanogaster/crescimento & desenvolvimento , Biblioteca Gênica , Genes Controladores do Desenvolvimento , Genes de Insetos , Estágios do Ciclo de Vida/genética , Análise de Sequência de DNA/métodos , Sítio de Iniciação de Transcrição , Transcrição Gênica , Transcriptoma

11.

The transcriptional diversity of 25 Drosophila cell lines.

Cherbas, Lucy; Willingham, Aarron; Zhang, Dayu; Yang, Li; Zou, Yi; Eads, Brian D; Carlson, Joseph W; Landolin, Jane M; Kapranov, Philipp; Dumais, Jacqueline; Samsonova, Anastasia; Choi, Jeong-Hyeon; Roberts, Johnny; Davis, Carrie A; Tang, Haixu; van Baren, Marijke J; Ghosh, Srinka; Dobin, Alexander; Bell, Kim; Lin, Wei; Langton, Laura; Duff, Michael O; Tenney, Aaron E; Zaleski, Chris; Brent, Michael R; Hoskins, Roger A; Kaufman, Thomas C; Andrews, Justen; Graveley, Brenton R; Perrimon, Norbert; Celniker, Susan E; Gingeras, Thomas R; Cherbas, Peter.

Genome Res ; 21(2): 301-14, 2011 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-21177962

RESUMO

Drosophila melanogaster cell lines are important resources for cell biologists. Here, we catalog the expression of exons, genes, and unannotated transcriptional signals for 25 lines. Unannotated transcription is substantial (typically 19% of euchromatic signal). Conservatively, we identify 1405 novel transcribed regions; 684 of these appear to be new exons of neighboring, often distant, genes. Sixty-four percent of genes are expressed detectably in at least one line, but only 21% are detected in all lines. Each cell line expresses, on average, 5885 genes, including a common set of 3109. Expression levels vary over several orders of magnitude. Major signaling pathways are well represented: most differentiation pathways are "off" and survival/growth pathways "on." Roughly 50% of the genes expressed by each line are not part of the common set, and these show considerable individuality. Thirty-one percent are expressed at a higher level in at least one cell line than in any single developmental stage, suggesting that each line is enriched for genes characteristic of small sets of cells. Most remarkable is that imaginal disc-derived lines can generally be assigned, on the basis of expression, to small territories within developing discs. These mappings reveal unexpected stability of even fine-grained spatial determination. No two cell lines show identical transcription factor expression. We conclude that each line has retained features of an individual founder cell superimposed on a common "cell line" gene expression pattern.

Assuntos

Drosophila melanogaster/genética , Variação Genética , Transcrição Gênica , Animais , Linhagem Celular , Análise por Conglomerados , Éxons , Feminino , Perfilação da Expressão Gênica , Masculino , Dados de Sequência Molecular , Transdução de Sinais/genética , Fatores de Transcrição/genética

12.

STAR: ultrafast universal RNA-seq aligner.

Dobin, Alexander; Davis, Carrie A; Schlesinger, Felix; Drenkow, Jorg; Zaleski, Chris; Jha, Sonali; Batut, Philippe; Chaisson, Mark; Gingeras, Thomas R.

Bioinformatics ; 29(1): 15-21, 2013 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-23104886

RESUMO

MOTIVATION: Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. RESULTS: To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. AVAILABILITY AND IMPLEMENTATION: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/.

Assuntos

Alinhamento de Sequência/métodos , Software , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica , Genoma Humano , Humanos , Splicing de RNA , Análise de Sequência de RNA/métodos

13.

Genome-wide antisense transcription drives mRNA processing in bacteria.

Lasa, Iñigo; Toledo-Arana, Alejandro; Dobin, Alexander; Villanueva, Maite; de los Mozos, Igor Ruiz; Vergara-Irigaray, Marta; Segura, Víctor; Fagegaltier, Delphine; Penadés, José R; Valle, Jaione; Solano, Cristina; Gingeras, Thomas R.

Proc Natl Acad Sci U S A ; 108(50): 20172-7, 2011 Dec 13.

Artigo em Inglês | MEDLINE | ID: mdl-22123973

RESUMO

RNA deep sequencing technologies are revealing unexpected levels of complexity in bacterial transcriptomes with the discovery of abundant noncoding RNAs, antisense RNAs, long 5' and 3' untranslated regions, and alternative operon structures. Here, by applying deep RNA sequencing to both the long and short RNA fractions (<50 nucleotides) obtained from the major human pathogen Staphylococcus aureus, we have detected a collection of short RNAs that is generated genome-wide through the digestion of overlapping sense/antisense transcripts by RNase III endoribonuclease. At least 75% of sense RNAs from annotated genes are subject to this mechanism of antisense processing. Removal of RNase III activity reduces the amount of short RNAs and is accompanied by the accumulation of discrete antisense transcripts. These results suggest the production of pervasive but hidden antisense transcription used to process sense transcripts by means of creating double-stranded substrates. This process of RNase III-mediated digestion of overlapping transcripts can be observed in several evolutionarily diverse Gram-positive bacteria and is capable of providing a unique genome-wide posttranscriptional mechanism to adjust mRNA levels.

Assuntos

Genoma Bacteriano/genética , Processamento Pós-Transcricional do RNA/genética , RNA Antissenso/genética , RNA Mensageiro/genética , Staphylococcus aureus/genética , Transcrição Gênica , Regulação Bacteriana da Expressão Gênica , Humanos , Fases de Leitura Aberta/genética , RNA Antissenso/metabolismo , RNA Bacteriano/genética , RNA de Cadeia Dupla/genética , RNA de Cadeia Dupla/metabolismo , RNA Mensageiro/metabolismo , Ribonuclease III/metabolismo , Análise de Sequência de RNA , Especificidade da Espécie

14.

Identification of glioblastoma stem cell-associated lncRNAs using single-cell RNA-sequencing datasets.

Hazra, Rasmani; Utama, Raditya; Naik, Payal; Dobin, Alexander; Spector, David L.

bioRxiv ; 2023 Jan 20.

Artigo em Inglês | MEDLINE | ID: mdl-36711961

RESUMO

Glioblastoma multiforme (GBM) is an aggressive, heterogeneous grade IV brain tumor. Glioblastoma stem cells (GSCs) initiate the tumor and are known culprits of therapy resistance. Mounting evidence has demonstrated a regulatory role of long non-coding RNAs (lncRNAs) in various biological processes, including pluripotency, differentiation, and tumorigenesis. A few studies have suggested that aberrant expression of lncRNAs is associated with GSCs. However, a comprehensive single-cell analysis of the GSC-associated lncRNA transcriptome has not been carried out. Here, we analyzed recently published single-cell RNA-sequencing datasets of adult human GBM tumors, GBM organoids, GSC-enriched GBM tumors, and developing human brains to identify lncRNAs highly expressed in GBM. To categorize GSC populations in the GBM tumors, we used the GSC marker genes SOX2, PROM1, FUT4, and L1CAM. We found three major GSC population clusters: radial glia, oligodendrocyte progenitor cells, and neurons. We found 10â"100 lncRNAs significantly enriched in different GSC populations. We also validated the level of expression and localization of several GSC-enriched lncRNAs using qRT-PCR, single-molecule RNA FISH, and sub-cellular fractionation. We found that the radial glia GSC-enriched lncRNA PANTR1 is highly expressed in GSC lines and is localized to both the cytoplasmic and nuclear fractions. In contrast, the neuronal GSC-enriched lncRNAs LINC01563 and MALAT1 are highly enriched in the nuclear fraction of GSCs. Together, this study identified a panel of uncharacterized GSC-specific lncRNAs. These findings set the stage for future in-depth studies to examine their role in GBM pathology and their potential as biomarkers and/or therapeutic targets in GBM.

15.

Identification of glioblastoma stem cell-associated lncRNAs using single-cell RNA sequencing datasets.

Hazra, Rasmani; Utama, Raditya; Naik, Payal; Dobin, Alexander; Spector, David L.

Stem Cell Reports ; 18(11): 2056-2070, 2023 11 14.

Artigo em Inglês | MEDLINE | ID: mdl-37922916

RESUMO

Glioblastoma multiforme (GBM) is an aggressive, heterogeneous brain tumor in which glioblastoma stem cells (GSCs) are known culprits of therapy resistance. Long non-coding RNAs (lncRNAs) have been shown to play a critical role in both cancer and normal biology. A few studies have suggested that aberrant expression of lncRNAs is associated with GSCs. However, a comprehensive single-cell analysis of the GSC-associated lncRNA transcriptome has not been carried out. Here, we analyzed recently published single-cell RNA sequencing datasets of adult GBM tumors, GBM organoids, GSC-enriched GBM tumors, and developing human brain samples to identify lncRNAs highly expressed in GSCs. We further revealed that the GSC-specific lncRNAs GIHCG and LINC01563 promote proliferation, migration, and stemness in the GSC population. Together, this study identified a panel of uncharacterized GSC-enriched lncRNAs and set the stage for future in-depth studies to examine their role in GBM pathology and their potential as biomarkers and/or therapeutic targets in GBM.

Assuntos

Neoplasias Encefálicas , Glioblastoma , RNA Longo não Codificante , Adulto , Humanos , Glioblastoma/patologia , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Neoplasias Encefálicas/patologia , Células-Tronco Neoplásicas/metabolismo , Análise de Sequência de RNA , Linhagem Celular Tumoral , Regulação Neoplásica da Expressão Gênica

16.

Comparative single-cell transcriptomic analysis of primate brains highlights human-specific regulatory evolution.

Suresh, Hamsini; Crow, Megan; Jorstad, Nikolas; Hodge, Rebecca; Lein, Ed; Dobin, Alexander; Bakken, Trygve; Gillis, Jesse.

Nat Ecol Evol ; 7(11): 1930-1943, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37667001

RESUMO

Enhanced cognitive function in humans is hypothesized to result from cortical expansion and increased cellular diversity. However, the mechanisms that drive these phenotypic innovations remain poorly understood, in part because of the lack of high-quality cellular resolution data in human and non-human primates. Here, we take advantage of single-cell expression data from the middle temporal gyrus of five primates (human, chimp, gorilla, macaque and marmoset) to identify 57 homologous cell types and generate cell type-specific gene co-expression networks for comparative analysis. Although orthologue expression patterns are generally well conserved, we find 24% of genes with extensive differences between human and non-human primates (3,383 out of 14,131), which are also associated with multiple brain disorders. To assess the functional significance of gene expression differences in an evolutionary context, we evaluate changes in network connectivity across meta-analytic co-expression networks from 19 animals. We find that a subset of these genes has deeply conserved co-expression across all non-human animals, and strongly divergent co-expression relationships in humans (139 out of 3,383, <1% of primate orthologues). Genes with human-specific cellular expression and co-expression profiles (such as NHEJ1, GTF2H2, C2 and BBS5) typically evolve under relaxed selective constraints and may drive rapid evolutionary change in brain function.

Assuntos

Primatas , Transcriptoma , Animais , Humanos , Encéfalo/metabolismo , Redes Reguladoras de Genes , Pan troglodytes/genética , Proteínas do Citoesqueleto/genética , Proteínas do Citoesqueleto/metabolismo , Proteínas de Ligação a Fosfato/genética , Proteínas de Ligação a Fosfato/metabolismo

17.

Targeted in silico characterization of fusion transcripts in tumor and normal tissues via FusionInspector.

Haas, Brian J; Dobin, Alexander; Ghandi, Mahmoud; Van Arsdale, Anne; Tickle, Timothy; Robinson, James T; Gillani, Riaz; Kasif, Simon; Regev, Aviv.

Cell Rep Methods ; 3(5): 100467, 2023 05 22.

Artigo em Inglês | MEDLINE | ID: mdl-37323575

RESUMO

Here, we present FusionInspector for in silico characterization and interpretation of candidate fusion transcripts from RNA sequencing (RNA-seq) and exploration of their sequence and expression characteristics. We applied FusionInspector to thousands of tumor and normal transcriptomes and identified statistical and experimental features enriched among biologically impactful fusions. Through clustering and machine learning, we identified large collections of fusions potentially relevant to tumor and normal biological processes. We show that biologically relevant fusions are enriched for relatively high expression of the fusion transcript, imbalanced fusion allelic ratios, and canonical splicing patterns, and are deficient in sequence microhomologies between partner genes. We demonstrate that FusionInspector accurately validates fusion transcripts in silico and helps characterize numerous understudied fusions in tumor and normal tissue samples. FusionInspector is freely available as open source for screening, characterization, and visualization of candidate fusions via RNA-seq, and facilitates transparent explanation and interpretation of machine-learning predictions and their experimental sources.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Neoplasias , Humanos , Neoplasias/genética , Análise de Sequência de RNA , Transcriptoma/genética

18.

Comparative transcriptomics reveals human-specific cortical features.

Jorstad, Nikolas L; Song, Janet H T; Exposito-Alonso, David; Suresh, Hamsini; Castro-Pacheco, Nathan; Krienen, Fenna M; Yanny, Anna Marie; Close, Jennie; Gelfand, Emily; Long, Brian; Seeman, Stephanie C; Travaglini, Kyle J; Basu, Soumyadeep; Beaudin, Marc; Bertagnolli, Darren; Crow, Megan; Ding, Song-Lin; Eggermont, Jeroen; Glandon, Alexandra; Goldy, Jeff; Kiick, Katelyn; Kroes, Thomas; McMillen, Delissa; Pham, Trangthanh; Rimorin, Christine; Siletti, Kimberly; Somasundaram, Saroja; Tieu, Michael; Torkelson, Amy; Feng, Guoping; Hopkins, William D; Höllt, Thomas; Keene, C Dirk; Linnarsson, Sten; McCarroll, Steven A; Lelieveldt, Boudewijn P; Sherwood, Chet C; Smith, Kimberly; Walsh, Christopher A; Dobin, Alexander; Gillis, Jesse; Lein, Ed S; Hodge, Rebecca D; Bakken, Trygve E.

Science ; 382(6667): eade9516, 2023 10 13.

Artigo em Inglês | MEDLINE | ID: mdl-37824638

RESUMO

The cognitive abilities of humans are distinctive among primates, but their molecular and cellular substrates are poorly understood. We used comparative single-nucleus transcriptomics to analyze samples of the middle temporal gyrus (MTG) from adult humans, chimpanzees, gorillas, rhesus macaques, and common marmosets to understand human-specific features of the neocortex. Human, chimpanzee, and gorilla MTG showed highly similar cell-type composition and laminar organization as well as a large shift in proportions of deep-layer intratelencephalic-projecting neurons compared with macaque and marmoset MTG. Microglia, astrocytes, and oligodendrocytes had more-divergent expression across species compared with neurons or oligodendrocyte precursor cells, and neuronal expression diverged more rapidly on the human lineage. Only a few hundred genes showed human-specific patterning, suggesting that relatively few cellular and molecular changes distinctively define adult human cortical structure.

Assuntos

Cognição , Hominidae , Neocórtex , Lobo Temporal , Animais , Humanos , Perfilação da Expressão Gênica , Gorilla gorilla/genética , Hominidae/genética , Hominidae/fisiologia , Macaca mulatta/genética , Pan troglodytes/genética , Filogenia , Transcriptoma , Neocórtex/fisiologia , Especificidade da Espécie , Lobo Temporal/fisiologia

19.

Management, Analyses, and Distribution of the MaizeCODE Data on the Cloud.

Wang, Liya; Lu, Zhenyuan; delaBastide, Melissa; Van Buren, Peter; Wang, Xiaofei; Ghiban, Cornel; Regulski, Michael; Drenkow, Jorg; Xu, Xiaosa; Ortiz-Ramirez, Carlos; Marco, Cristina F; Goodwin, Sara; Dobin, Alexander; Birnbaum, Kenneth D; Jackson, David P; Martienssen, Robert A; McCombie, William R; Micklos, David A; Schatz, Michael C; Ware, Doreen H; Gingeras, Thomas R.

Front Plant Sci ; 11: 289, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32296450

RESUMO

MaizeCODE is a project aimed at identifying and analyzing functional elements in the maize genome. In its initial phase, MaizeCODE assayed up to five tissues from four maize strains (B73, NC350, W22, TIL11) by RNA-Seq, Chip-Seq, RAMPAGE, and small RNA sequencing. To facilitate reproducible science and provide both human and machine access to the MaizeCODE data, we enhanced SciApps, a cloud-based portal, for analysis and distribution of both raw data and analysis results. Based on the SciApps workflow platform, we generated new components to support the complete cycle of MaizeCODE data management. These include publicly accessible scientific workflows for the reproducible and shareable analysis of various functional data, a RESTful API for batch processing and distribution of data and metadata, a searchable data page that lists each MaizeCODE experiment as a reproducible workflow, and integrated JBrowse genome browser tracks linked with workflows and metadata. The SciApps portal is a flexible platform that allows the integration of new analysis tools, workflows, and genomic data from multiple projects. Through metadata and a ready-to-compute cloud-based platform, the portal experience improves access to the MaizeCODE data and facilitates its analysis.

20.

Is it time to change the reference genome?

Ballouz, Sara; Dobin, Alexander; Gillis, Jesse A.

Genome Biol ; 20(1): 159, 2019 08 09.

Artigo em Inglês | MEDLINE | ID: mdl-31399121

RESUMO

The use of the human reference genome has shaped methods and data across modern genomics. This has offered many benefits while creating a few constraints. In the following opinion, we outline the history, properties, and pitfalls of the current human reference genome. In a few illustrative analyses, we focus on its use for variant-calling, highlighting its nearness to a 'type specimen'. We suggest that switching to a consensus reference would offer important advantages over the continued use of the current reference with few disadvantages.

Assuntos

Genômica/normas , Genoma Humano , Humanos , Padrões de Referência

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA