Pesquisa | BVS - MINISTÉRIO DA SAÚDE

The ENCODE4 long-read RNA-seq collection reveals distinct classes of transcript structure diversity.

Reese, Fairlie; Williams, Brian; Balderrama-Gutierrez, Gabriela; Wyman, Dana; Çelik, Muhammed Hasan; Rebboah, Elisabeth; Rezaie, Narges; Trout, Diane; Razavi-Mohseni, Milad; Jiang, Yunzhe; Borsari, Beatrice; Morabito, Samuel; Liang, Heidi Yahan; McGill, Cassandra J; Rahmanian, Sorena; Sakr, Jasmine; Jiang, Shan; Zeng, Weihua; Carvalho, Klebea; Weimer, Annika K; Dionne, Louise A; McShane, Ariel; Bedi, Karan; Elhajjajy, Shaimae I; Upchurch, Sean; Jou, Jennifer; Youngworth, Ingrid; Gabdank, Idan; Sud, Paul; Jolanki, Otto; Strattan, J Seth; Kagda, Meenakshi S; Snyder, Michael P; Hitz, Ben C; Moore, Jill E; Weng, Zhiping; Bennett, David; Reinholdt, Laura; Ljungman, Mats; Beer, Michael A; Gerstein, Mark B; Pachter, Lior; Guigó, Roderic; Wold, Barbara J; Mortazavi, Ali.

bioRxiv ; 2023 May 16.

Artigo em Inglês | MEDLINE | ID: mdl-37292896

RESUMO

The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3' end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3' processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.

RNAget: an API to securely retrieve RNA quantifications.

Upchurch, Sean; Palumbo, Emilio; Adams, Jeremy; Bujold, David; Bourque, Guillaume; Nedzel, Jared; Graham, Keenan; Kagda, Meenakshi S; Assis, Pedro; Hitz, Benjamin; Righi, Emilio; Guigó, Roderic; Wold, Barbara J.

Bioinformatics ; 39(4)2023 04 03.

Artigo em Inglês | MEDLINE | ID: mdl-36897015

RESUMO

SUMMARY: Large-scale sharing of genomic quantification data requires standardized access interfaces. In this Global Alliance for Genomics and Health project, we developed RNAget, an API for secure access to genomic quantification data in matrix form. RNAget provides for slicing matrices to extract desired subsets of data and is applicable to all expression matrix-format data, including RNA sequencing and microarrays. Further, it generalizes to quantification matrices of other sequence-based genomics such as ATAC-seq and ChIP-seq. AVAILABILITY AND IMPLEMENTATION: https://ga4gh-rnaseq.github.io/schema/docs/index.html.

Assuntos

RNA , Software , Genômica , Genoma , Análise de Sequência de RNA

Structure and DNA-binding sites of the SWI1 AT-rich interaction domain (ARID) suggest determinants for sequence-specific DNA recognition.

Kim, Suhkmann; Zhang, Ziming; Upchurch, Sean; Isern, Nancy; Chen, Yuan.

J Biol Chem ; 279(16): 16670-6, 2004 Apr 16.

Artigo em Inglês | MEDLINE | ID: mdl-14722072

RESUMO

ARID (AT-rich interaction domain) is a homologous family of DNA-binding domains that occur in DNA-binding proteins from a wide variety of species, ranging from yeast to nematodes, insects, mammals, and plants. SWI1, a member of the SWI/SNF protein complex that is involved in chromatin remodeling during transcription, contains the ARID motif. The ARID domain of human SWI1 (also known as p270) does not select for a specific DNA sequence from a random sequence pool. The lack of sequence specificity shown by the SWI1 ARID domain stands in contrast to the other characterized ARID domains, which recognize specific AT-rich sequences. We have solved the three-dimensional structure of human SWI1 ARID using solution NMR methods. In addition, we have characterized nonspecific DNA binding by the SWI1 ARID domain. Results from this study indicate that a flexible, long, internal loop in the ARID motif is likely to be important for sequence-specific DNA recognition. The structure of the human SWI1 ARID domain also represents a distinct structural subfamily. Studies of ARID indicate that the boundary of DNA binding structural and functional domains can extend beyond the sequence homologous region in a homologous family of proteins. Structural studies of homologous domains such as the ARID family of DNA-binding domains should provide information to better predict the boundary of structural and functional domains in structural genomic studies.

Assuntos

Proteínas de Ligação a DNA/metabolismo , Fatores de Transcrição/metabolismo , Sequência de Aminoácidos , Animais , Sítios de Ligação , DNA/genética , DNA/metabolismo , Proteínas de Ligação a DNA/genética , Humanos , Modelos Moleculares , Dados de Sequência Molecular , Ligação Proteica , Conformação Proteica , Estrutura Terciária de Proteína , Alinhamento de Sequência , Fatores de Transcrição/genética

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA