Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
bioRxiv ; 2023 May 16.
Article in English | MEDLINE | ID: mdl-37292896

ABSTRACT

The majority of mammalian genes encode multiple transcript isoforms that result from differential promoter use, changes in exonic splicing, and alternative 3' end choice. Detecting and quantifying transcript isoforms across tissues, cell types, and species has been extremely challenging because transcripts are much longer than the short reads normally used for RNA-seq. By contrast, long-read RNA-seq (LR-RNA-seq) gives the complete structure of most transcripts. We sequenced 264 LR-RNA-seq PacBio libraries totaling over 1 billion circular consensus reads (CCS) for 81 unique human and mouse samples. We detect at least one full-length transcript from 87.7% of annotated human protein coding genes and a total of 200,000 full-length transcripts, 40% of which have novel exon junction chains. To capture and compute on the three sources of transcript structure diversity, we introduce a gene and transcript annotation framework that uses triplets representing the transcript start site, exon junction chain, and transcript end site of each transcript. Using triplets in a simplex representation demonstrates how promoter selection, splice pattern, and 3' processing are deployed across human tissues, with nearly half of multi-transcript protein coding genes showing a clear bias toward one of the three diversity mechanisms. Evaluated across samples, the predominantly expressed transcript changes for 74% of protein coding genes. In evolution, the human and mouse transcriptomes are globally similar in types of transcript structure diversity, yet among individual orthologous gene pairs, more than half (57.8%) show substantial differences in mechanism of diversification in matching tissues. This initial large-scale survey of human and mouse long-read transcriptomes provides a foundation for further analyses of alternative transcript usage, and is complemented by short-read and microRNA data on the same samples and by epigenome data elsewhere in the ENCODE4 collection.

2.
Cell ; 186(7): 1493-1511.e40, 2023 03 30.
Article in English | MEDLINE | ID: mdl-37001506

ABSTRACT

Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × âˆ¼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.


Subject(s)
Epigenome , Quantitative Trait Loci , Genome-Wide Association Study , Genomics , Phenotype , Polymorphism, Single Nucleotide
3.
Genome Res ; 31(8): 1325-1336, 2021 08.
Article in English | MEDLINE | ID: mdl-34290042

ABSTRACT

Tissue function and homeostasis reflect the gene expression signature by which the combination of ubiquitous and tissue-specific genes contribute to the tissue maintenance and stimuli-responsive function. Enhancers are central to control this tissue-specific gene expression pattern. Here, we explore the correlation between the genomic location of enhancers and their role in tissue-specific gene expression. We find that enhancers showing tissue-specific activity are highly enriched in intronic regions and regulate the expression of genes involved in tissue-specific functions, whereas housekeeping genes are more often controlled by intergenic enhancers, common to many tissues. Notably, an intergenic-to-intronic active enhancers continuum is observed in the transition from developmental to adult stages: the most differentiated tissues present higher rates of intronic enhancers, whereas the lowest rates are observed in embryonic stem cells. Altogether, our results suggest that the genomic location of active enhancers is key for the tissue-specific control of gene expression.


Subject(s)
Embryonic Stem Cells , Enhancer Elements, Genetic , Embryonic Stem Cells/metabolism , Genes, Essential , Introns/genetics
4.
Nat Commun ; 12(1): 727, 2021 02 01.
Article in English | MEDLINE | ID: mdl-33526779

ABSTRACT

Alternative splicing (AS) is a fundamental step in eukaryotic mRNA biogenesis. Here, we develop an efficient and reproducible pipeline for the discovery of genetic variants that affect AS (splicing QTLs, sQTLs). We use it to analyze the GTEx dataset, generating a comprehensive catalog of sQTLs in the human genome. Downstream analysis of this catalog provides insight into the mechanisms underlying splicing regulation. We report that a core set of sQTLs is shared across multiple tissues. sQTLs often target the global splicing pattern of genes, rather than individual splicing events. Many also affect the expression of the same or other genes, uncovering regulatory loci that act through different mechanisms. sQTLs tend to be located in post-transcriptionally spliced introns, which would function as hotspots for splicing regulation. While many variants affect splicing patterns by altering the sequence of splice sites, many more modify the binding sites of RNA-binding proteins. Genetic variants affecting splicing can have a stronger phenotypic impact than those affecting gene expression.


Subject(s)
Alternative Splicing , Genome, Human/genetics , Quantitative Trait Loci , RNA Splice Sites/genetics , Binding Sites/genetics , Datasets as Topic , Genome-Wide Association Study , Humans , Introns/genetics , Mutation , RNA-Binding Proteins/metabolism , RNA-Seq , Whole Genome Sequencing
5.
Genome Res ; 30(7): 1060-1072, 2020 07.
Article in English | MEDLINE | ID: mdl-32718982

ABSTRACT

Long noncoding RNAs (lncRNAs) constitute the majority of transcripts in the mammalian genomes, and yet, their functions remain largely unknown. As part of the FANTOM6 project, we systematically knocked down the expression of 285 lncRNAs in human dermal fibroblasts and quantified cellular growth, morphological changes, and transcriptomic responses using Capped Analysis of Gene Expression (CAGE). Antisense oligonucleotides targeting the same lncRNAs exhibited global concordance, and the molecular phenotype, measured by CAGE, recapitulated the observed cellular phenotypes while providing additional insights on the affected genes and pathways. Here, we disseminate the largest-to-date lncRNA knockdown data set with molecular phenotyping (over 1000 CAGE deep-sequencing libraries) for further exploration and highlight functional roles for ZNF213-AS1 and lnc-KHDC3L-2.


Subject(s)
RNA, Long Noncoding/physiology , Cell Growth Processes/genetics , Cell Movement/genetics , Fibroblasts/cytology , Fibroblasts/metabolism , Humans , KCNQ Potassium Channels/metabolism , Molecular Sequence Annotation , Oligonucleotides, Antisense , RNA, Long Noncoding/antagonists & inhibitors , RNA, Long Noncoding/metabolism , RNA, Small Interfering
6.
Nat Genet ; 52(7): 655-661, 2020 07.
Article in English | MEDLINE | ID: mdl-32514124

ABSTRACT

Three-dimensional organization of the genome is important for transcriptional regulation1-7. In mammals, CTCF and the cohesin complex create submegabase structures with elevated internal chromatin contact frequencies, called topologically associating domains (TADs)8-12. Although TADs can contribute to transcriptional regulation, ablation of TAD organization by disrupting CTCF or the cohesin complex causes modest gene expression changes13-16. In contrast, CTCF is required for cell cycle regulation17, embryonic development and formation of various adult cell types18. To uncouple the role of CTCF in cell-state transitions and cell proliferation, we studied the effect of CTCF depletion during the conversion of human leukemic B cells into macrophages with minimal cell division. CTCF depletion disrupts TAD organization but not cell transdifferentiation. In contrast, CTCF depletion in induced macrophages impairs the full-blown upregulation of inflammatory genes after exposure to endotoxin. Our results demonstrate that CTCF-dependent genome topology is not strictly required for a functional cell-fate conversion but facilitates a rapid and efficient response to an external stimulus.


Subject(s)
B-Lymphocytes/physiology , CCCTC-Binding Factor/physiology , Macrophages/physiology , Myelopoiesis/physiology , Antigens, Differentiation/metabolism , CCCTC-Binding Factor/genetics , Cell Line, Tumor , Cell Proliferation/physiology , Chromatin/physiology , Gene Expression Regulation , Humans , Molecular Conformation , Myelopoiesis/genetics , Protein Conformation
7.
Nucleic Acids Res ; 47(10): 5293-5306, 2019 06 04.
Article in English | MEDLINE | ID: mdl-30916337

ABSTRACT

Nonsense-mediated decay (NMD) is a eukaryotic mRNA surveillance system that selectively degrades transcripts with premature termination codons (PTC). Many RNA-binding proteins (RBP) regulate their expression levels by a negative feedback loop, in which RBP binds its own pre-mRNA and causes alternative splicing to introduce a PTC. We present a bioinformatic analysis integrating three data sources, eCLIP assays for a large RBP panel, shRNA inactivation of NMD pathway, and shRNA-depletion of RBPs followed by RNA-seq, to identify novel such autoregulatory feedback loops. We show that RBPs frequently bind their own pre-mRNAs, their exons respond prominently to NMD pathway disruption, and that the responding exons are enriched with nearby eCLIP peaks. We confirm previously proposed models of autoregulation in SRSF7 and U2AF1 genes and present two novel models, in which (i) SFPQ binds its mRNA and promotes switching to an alternative distal 3'-UTR that is targeted by NMD, and (ii) RPS3 binding activates a poison 5'-splice site in its pre-mRNA that leads to a frame shift and degradation by NMD. We also suggest specific splicing events that could be implicated in autoregulatory feedback loops in RBM39, HNRNPM, and U2AF2 genes. The results are available through a UCSC Genome Browser track hub.


Subject(s)
Codon, Nonsense , Nonsense Mediated mRNA Decay , RNA Splicing , RNA, Small Interfering/metabolism , Transcriptome , 3' Untranslated Regions , Alternative Splicing , Computational Biology , Exons , Frameshift Mutation , Heterogeneous-Nuclear Ribonucleoprotein Group M/metabolism , Humans , Nuclear Proteins/metabolism , RNA Precursors/metabolism , RNA, Messenger/metabolism , RNA-Binding Proteins/metabolism , Serine-Arginine Splicing Factors/metabolism , Spliceosomes , Splicing Factor U2AF/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...