ABSTRACT
Gene expression profiles in homologous tissues have been observed to be different between species, which may be due to differences between species in the gene expression program in each cell type, but may also reflect differences in cell type composition of each tissue in different species. Here, we compare expression profiles in matching primary cells in human, mouse, rat, dog, and chicken using Cap Analysis Gene Expression (CAGE) and short RNA (sRNA) sequencing data from FANTOM5. While we find that expression profiles of orthologous genes in different species are highly correlated across cell types, in each cell type many genes were differentially expressed between species. Expression of genes with products involved in transcription, RNA processing, and transcriptional regulation was more likely to be conserved, while expression of genes encoding proteins involved in intercellular communication was more likely to have diverged during evolution. Conservation of expression correlated positively with the evolutionary age of genes, suggesting that divergence in expression levels of genes critical for cell function was restricted during evolution. Motif activity analysis showed that both promoters and enhancers are activated by the same transcription factors in different species. An analysis of expression levels of mature miRNAs and of primary miRNAs identified by CAGE revealed that evolutionary old miRNAs are more likely to have conserved expression patterns than young miRNAs. We conclude that key aspects of the regulatory network are conserved, while differential expression of genes involved in cell-to-cell communication may contribute greatly to phenotypic differences between species.
Subject(s)
Evolution, Molecular , Transcriptome , Animals , Chickens/genetics , Dogs , Gene Expression Profiling , Gene Regulatory Networks , Humans , Mice , MicroRNAs/metabolism , Nucleotide Motifs , Principal Component Analysis , Promoter Regions, Genetic , Rats , Species Specificity , Transcription Factors/metabolismABSTRACT
Long noncoding RNAs (lncRNAs) constitute the majority of transcripts in the mammalian genomes, and yet, their functions remain largely unknown. As part of the FANTOM6 project, we systematically knocked down the expression of 285 lncRNAs in human dermal fibroblasts and quantified cellular growth, morphological changes, and transcriptomic responses using Capped Analysis of Gene Expression (CAGE). Antisense oligonucleotides targeting the same lncRNAs exhibited global concordance, and the molecular phenotype, measured by CAGE, recapitulated the observed cellular phenotypes while providing additional insights on the affected genes and pathways. Here, we disseminate the largest-to-date lncRNA knockdown data set with molecular phenotyping (over 1000 CAGE deep-sequencing libraries) for further exploration and highlight functional roles for ZNF213-AS1 and lnc-KHDC3L-2.
Subject(s)
RNA, Long Noncoding/physiology , Cell Growth Processes/genetics , Cell Movement/genetics , Fibroblasts/cytology , Fibroblasts/metabolism , Humans , KCNQ Potassium Channels/metabolism , Molecular Sequence Annotation , Oligonucleotides, Antisense , RNA, Long Noncoding/antagonists & inhibitors , RNA, Long Noncoding/metabolism , RNA, Small InterferingABSTRACT
Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Here, using FANTOM5 cap analysis of gene expression (CAGE) data, we integrate multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5' ends and expression profiles across 1,829 samples from the major human primary cell types and tissues. Genomic and epigenomic classification of these lncRNAs reveals that most intergenic lncRNAs originate from enhancers rather than from promoters. Incorporating genetic and expression data, we show that lncRNAs overlapping trait-associated single nucleotide polymorphisms are specifically expressed in cell types relevant to the traits, implicating these lncRNAs in multiple diseases. We further demonstrate that lncRNAs overlapping expression quantitative trait loci (eQTL)-associated single nucleotide polymorphisms of messenger RNAs are co-expressed with the corresponding messenger RNAs, suggesting their potential roles in transcriptional regulation. Combining these findings with conservation data, we identify 19,175 potentially functional lncRNAs in the human genome.
Subject(s)
Databases, Genetic , RNA, Long Noncoding/chemistry , RNA, Long Noncoding/genetics , Transcriptome/genetics , Cells, Cultured , Conserved Sequence/genetics , Datasets as Topic , Enhancer Elements, Genetic/genetics , Epigenesis, Genetic , Gene Expression Profiling , Gene Expression Regulation , Genome, Human/genetics , Genome-Wide Association Study , Genomics , Humans , Internet , Molecular Sequence Annotation , Organ Specificity/genetics , Polymorphism, Single Nucleotide , Promoter Regions, Genetic/genetics , Quantitative Trait Loci/genetics , RNA Stability , RNA, Messenger/geneticsABSTRACT
The Functional ANnoTation Of the Mammalian genome (FANTOM) Consortium has continued to provide extensive resources in the pursuit of understanding the transcriptome, and transcriptional regulation, of mammalian genomes for the last 20 years. To share these resources with the research community, the FANTOM web-interfaces and databases are being regularly updated, enhanced and expanded with new data types. In recent years, the FANTOM Consortium's efforts have been mainly focused on creating new non-coding RNA datasets and resources. The existing FANTOM5 human and mouse miRNA atlas was supplemented with rat, dog, and chicken datasets. The sixth (latest) edition of the FANTOM project was launched to assess the function of human long non-coding RNAs (lncRNAs). From its creation until 2020, FANTOM6 has contributed to the research community a large dataset generated from the knock-down of 285 lncRNAs in human dermal fibroblasts; this is followed with extensive expression profiling and cellular phenotyping. Other updates to the FANTOM resource includes the reprocessing of the miRNA and promoter atlases of human, mouse and chicken with the latest reference genome assemblies. To facilitate the use and accessibility of all above resources we further enhanced FANTOM data viewers and web interfaces. The updated FANTOM web resource is publicly available at https://fantom.gsc.riken.jp/.
Subject(s)
Molecular Sequence Annotation , RNA, Long Noncoding/genetics , Transcriptome/genetics , Animals , Binding Sites , Chromatin/metabolism , Drosophila/genetics , Fibroblasts/cytology , Fibroblasts/metabolism , Genome , Humans , Metadata , Mice , MicroRNAs/genetics , MicroRNAs/metabolism , Promoter Regions, Genetic , RNA, Long Noncoding/metabolism , Transcription Factors/metabolism , User-Computer InterfaceABSTRACT
Organogenesis involves dynamic regulation of gene transcription and complex multipathway interactions. Despite our knowledge of key factors regulating various steps of heart morphogenesis, considerable challenges in understanding its mechanism still exist because little is known about their downstream targets and interactive regulatory network. To better understand transcriptional regulatory mechanism driving heart development and the consequences of its disruption in vivo, we performed time-series analyses of the transcriptome and genome-wide chromatin accessibility in isolated cardiomyocytes (CMs) from wild-type zebrafish embryos at developmental stages corresponding to heart tube morphogenesis, looping, and maturation. We identified genetic regulatory modules driving crucial events of heart development that contained key cardiac TFs and are associated with open chromatin regions enriched for DNA sequence motifs belonging to the family of the corresponding TFs. Loss of function of cardiac TFs Gata5, Tbx5a, and Hand2 affected the cardiac regulatory networks and caused global changes in chromatin accessibility profile, indicating their role in heart development. Among regions with differential chromatin accessibility in mutants were highly conserved noncoding elements that represent putative enhancers driving heart development. The most prominent gene expression changes, which correlated with chromatin accessibility modifications within their proximal promoter regions, occurred between heart tube morphogenesis and looping, and were associated with metabolic shift and hematopoietic/cardiac fate switch during CM maturation. Our results revealed the dynamic regulatory landscape throughout heart development and identified interactive molecular networks driving key events of heart morphogenesis.
Subject(s)
Chromatin Assembly and Disassembly , Gene Expression Regulation, Developmental , Heart/growth & development , Myocytes, Cardiac/metabolism , Transcriptome , Animals , Cells, Cultured , Chromatin/genetics , Gene Regulatory Networks , Transcription Factors/genetics , Transcription Factors/metabolism , Zebrafish , Zebrafish Proteins/genetics , Zebrafish Proteins/metabolismABSTRACT
Upon the first publication of the fifth iteration of the Functional Annotation of Mammalian Genomes collaborative project, FANTOM5, we gathered a series of primary data and database systems into the FANTOM web resource (http://fantom.gsc.riken.jp) to facilitate researchers to explore transcriptional regulation and cellular states. In the course of the collaboration, primary data and analysis results have been expanded, and functionalities of the database systems enhanced. We believe that our data and web systems are invaluable resources, and we think the scientific community will benefit for this recent update to deepen their understanding of mammalian cellular organization. We introduce the contents of FANTOM5 here, report recent updates in the web resource and provide future perspectives.
Subject(s)
Databases, Genetic , Gene Expression Profiling/methods , Genomics/methods , Mammals/genetics , Software , Web Browser , Animals , Computational Biology , Humans , Search EngineABSTRACT
BACKGROUND: Children with problematic severe asthma have poor disease control despite high doses of inhaled corticosteroids and additional therapy, leading to personal suffering, early deterioration of lung function, and significant consumption of health care resources. If no exacerbating factors, such as smoking or allergies, are found after extensive investigation, these children are given a diagnosis of therapy-resistant (or therapy-refractory) asthma (SA). OBJECTIVE: We sought to deepen our understanding of childhood SA by analyzing gene expression and modeling the underlying regulatory transcription factor networks in peripheral blood leukocytes. METHODS: Gene expression was analyzed by using Cap Analysis of Gene Expression in children with SA (n = 13), children with controlled persistent asthma (n = 15), and age-matched healthy control subjects (n = 9). Cap Analysis of Gene Expression sequencing detects the transcription start sites of known and novel mRNAs and noncoding RNAs. RESULTS: Sample groups could be separated by hierarchical clustering on 1305 differentially expressed transcription start sites, including 816 known genes and several novel transcripts. Ten of 13 tested novel transcripts were validated by means of RT-PCR and Sanger sequencing. Expression of RAR-related orphan receptor A (RORA), which has been linked to asthma in genome-wide association studies, was significantly upregulated in patients with SA. Gene network modeling revealed decreased glucocorticoid receptor signaling and increased activity of the mitogen-activated protein kinase and Jun kinase cascades in patients with SA. CONCLUSION: Circulating leukocytes from children with controlled asthma and those with SA have distinct gene expression profiles, demonstrating the possible development of specific molecular biomarkers and supporting the need for novel therapeutic approaches.
Subject(s)
Asthma/drug therapy , Asthma/genetics , Drug Resistance/genetics , Glucocorticoids/therapeutic use , RNA, Messenger/genetics , Transcriptome , Adolescent , Asthma/pathology , Case-Control Studies , Child , Child, Preschool , Female , Gene Expression Profiling , Genome-Wide Association Study , Humans , JNK Mitogen-Activated Protein Kinases/genetics , Male , Nuclear Receptor Subfamily 1, Group F, Member 1/genetics , Receptors, Glucocorticoid/genetics , Severity of Illness IndexABSTRACT
The production of type 1 conventional dendritic cells (cDC1s) requires high expression of the transcription factor IRF8. Three enhancers at the Irf8 3' region function in a differentiation stage-specific manner. However, whether and how these enhancers interact physically and functionally remains unclear. Here, we show that the Irf8 3' enhancers directly interact with each other and contact the Irf8 gene body during cDC1 differentiation. The +56 kb enhancer, which functions from multipotent progenitor stages, activates the other 3' enhancers through an IRF8-dependent transcription factor program, that is, in trans. Then, the +32 kb enhancer, which operates in cDC1-committed cells, reversely acts in cis on the other 3' enhancers to maintain the high expression of Irf8. Indeed, mice with compound heterozygous deletion of the +56 and +32 kb enhancers are unable to generate cDC1s. These results illustrate how multiple enhancers cooperate to induce a lineage-determining transcription factor gene during cell differentiation.
Subject(s)
Cell Differentiation , Dendritic Cells , Enhancer Elements, Genetic , Interferon Regulatory Factors , Interferon Regulatory Factors/metabolism , Interferon Regulatory Factors/genetics , Animals , Dendritic Cells/metabolism , Dendritic Cells/cytology , Enhancer Elements, Genetic/genetics , Mice , Mice, Inbred C57BLABSTRACT
The human genome is pervasively transcribed and produces a wide variety of long non-coding RNAs (lncRNAs), constituting the majority of transcripts across human cell types. Some specific nuclear lncRNAs have been shown to be important regulatory components acting locally. As RNA-chromatin interaction and Hi-C chromatin conformation data showed that chromatin interactions of nuclear lncRNAs are determined by the local chromatin 3D conformation, we used Hi-C data to identify potential target genes of lncRNAs. RNA-protein interaction data suggested that nuclear lncRNAs act as scaffolds to recruit regulatory proteins to target promoters and enhancers. Nuclear lncRNAs may therefore play a role in directing regulatory factors to locations spatially close to the lncRNA gene. We provide the analysis results through an interactive visualization web portal at https://fantom.gsc.riken.jp/zenbu/reports/#F6_3D_lncRNA.
Subject(s)
Chromatin , RNA, Long Noncoding , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , Chromatin/metabolism , Chromatin/genetics , Humans , Molecular Sequence Annotation , Cell Nucleus/metabolism , Cell Nucleus/genetics , Genome, Human , Promoter Regions, GeneticABSTRACT
Medicinal and industrial properties of phytochemicals (e.g. glycyrrhizin) from the root of Glycyrrhiza uralensis (licorice plant) made it an attractive, multimillion-dollar trade item. Bioengineering is one of the solutions to overcome such high market demand and to protect plants from extinction. Unfortunately, limited genomic information on medicinal plants restricts their research and thus biosynthetic mechanisms of many important phytochemicals are still poorly understood. In this work we utilized the de novo (no reference genome sequence available) assembly of Illumina RNA-Seq data to study the transcriptome of the licorice plant. Our analysis is based on sequencing results of libraries constructed from samples belonging to different tissues (root and leaf) and collected in different seasons and from two distinct strains (low and high glycyrrhizin producers). We provide functional annotations and the expression profile of 43,882 assembled unigenes, which are suitable for various further studies. Here, we searched for G. uralensis-specific enzymes involved in isoflavonoid biosynthesis as well as elucidated putative cytochrome P450 enzymes and putative vacuolar saponin transporters involved in glycyrrhizin production in the licorice root. To disseminate the data and the analysis results, we constructed a publicly available G. uralensis database. This work will contribute to a better understanding of the biosynthetic pathways of secondary metabolites in licorice plants, and possibly in other medicinal plants, and will provide an important resource to further advance transcriptomic studies in legumes.
Subject(s)
Glycyrrhiza uralensis/genetics , Phytochemicals/metabolism , Transcriptome/genetics , Amino Acid Sequence , Databases as Topic , Gene Expression Profiling , Gene Library , Gene Ontology , Glycyrrhiza uralensis/enzymology , Glycyrrhizic Acid/chemistry , Glycyrrhizic Acid/metabolism , Membrane Transport Proteins/metabolism , Metabolic Networks and Pathways/genetics , Molecular Sequence Annotation , Molecular Sequence Data , Open Reading Frames/genetics , Phytochemicals/chemistry , Plant Proteins/chemistry , Plant Proteins/metabolism , Protein Transport , RNA, Plant/isolation & purification , Sequence Analysis, DNA , Subcellular Fractions/metabolism , Vacuoles/metabolismABSTRACT
In the genomic era, data dissemination and visualization is an integral part of scientific publications and research projects involving international consortia producing massive genome-wide data sets, intra-organizational collaborations, or individual labs. However, creating custom supporting websites is oftentimes impractical due to the required programming effort, web server infrastructure, and data storage facilities, as well as the long-term maintenance burden. ZENBU-Reports (https://fantom.gsc.riken.jp/zenbu/reports) is a web application to create interactive scientific web portals by using graphical interfaces while providing storage and secured collaborative sharing for data uploaded by users. ZENBU-Reports provides the scientific visualization elements commonly used in supplementary websites, publications and presentations, presenting a complete solution for the interactive display and dissemination of data and analysis results during the full lifespan of a scientific project both during the active research phase and after publication of the results.
ABSTRACT
PURPOSE: Depending on its histological subtype, salivary gland carcinoma (SGC) may have a poor prognosis. Due to the scarcity of preclinical experimental models, its molecular biology has so far remained largely unknown, hampering the development of new treatment modalities for patients with these malignancies. The aim of this study was to generate experimental human SGC models of multiple histological subtypes using patient-derived xenograft (PDX) and organoid culture techniques. METHODS: Tumor specimens from surgically resected SGCs were processed for the preparation of PDXs and patient-derived organoids (PDOs). Specimens from SGC PDXs were also processed for PDX-derived organoid (PDXO) generation. In vivo tumorigenicity was assessed using orthotopic transplantation of SGC organoids. The pathological characteristics of each model were compared to those of the original tumors using immunohistochemistry. RNA-seq was used to analyze the genetic traits of our models. RESULTS: Three series of PDOs, PDXs and PDXOs of salivary duct carcinomas, one series of PDOs, PDXs and PDXOs of mucoepidermoid carcinomas and PDXs of myoepithelial carcinomas were successfully generated. We found that PDXs and orthotopic transplants from PDOs/PDXOs showed similar histological features as the original tumors. Our models also retained their genetic traits, i.e., transcription profiles, genomic variants and fusion genes of the corresponding histological subtypes. CONCLUSION: We report the generation of SGC PDOs, PDXs and PDXOs of multiple histological subtypes, recapitulating the histological and genetical characteristics of the original tumors. These experimental SGC models may serve as a useful resource for the development of novel therapeutic strategies and for investigating the molecular mechanisms underlying the development of these malignancies.
Subject(s)
Salivary Gland Neoplasms , Animals , Humans , Transplantation, Heterologous , Disease Models, Animal , Phenotype , Salivary Gland Neoplasms/genetics , Salivary Gland Neoplasms/pathology , Organoids/pathology , Xenograft Model Antitumor AssaysABSTRACT
The diffusion Monte Carlo (DMC) method is a widely used algorithm for computing both ground and excited states of many-particle systems; for states without nodes the algorithm is numerically exact. In the presence of nodes approximations must be introduced, for example, the fixed-node approximation. Recently we have developed a genetic algorithm (GA) based approach which allows the computation of nodal surfaces on-the-fly [Ramilowski and Farrelly, Phys. Chem. Chem. Phys., 2010, 12, 12450]. Here GA-DMC is applied to the computation of rovibrational states of CO-(4)He(N) complexes with N≤ 10. These complexes have been the subject of recent high resolution microwave and millimeter-wave studies which traced the onset of microscopic superfluidity in a doped (4)He droplet, one atom at a time, up to N = 10 [Surin et al., Phys. Rev. Lett., 2008, 101, 233401; Raston et al., Phys. Chem. Chem. Phys., 2010, 12, 8260]. The frequencies of the a-type (microwave) series, which correlate with end-over-end rotation in the CO-(4)He dimer, decrease from N = 1 to 3 and then smoothly increase. This signifies the transition from a molecular complex to a quantum solvated system. The frequencies of the b-type (millimeter-wave) series, which evolves from free rotation of the rigid CO molecule, initially increase from N = 0 to Nâ¼ 6 before starting to decrease with increasing N. An interesting feature of the b-type series, originally observed in the high resolution infra-red (IR) experiments of Tang and McKellar [J. Chem. Phys., 2003, 119, 754] is that, for N = 7, two lines are observed. The GA-DMC algorithm is found to be in good agreement with experimental results and possibly detects the small (â¼0.7 cm(-1)) splitting in the b-series line at N = 7. Advantages and disadvantages of GA-DMC are discussed.
ABSTRACT
Within the scope of the FANTOM6 consortium, we perform a large-scale knockdown of 200 long non-coding RNAs (lncRNAs) in human induced pluripotent stem cells (iPSCs) and systematically characterize their roles in self-renewal and pluripotency. We find 36 lncRNAs (18%) exhibiting cell growth inhibition. From the knockdown of 123 lncRNAs with transcriptome profiling, 36 lncRNAs (29.3%) show molecular phenotypes. Integrating the molecular phenotypes with chromatin-interaction assays further reveals cis- and trans-interacting partners as potential primary targets. Additionally, cell-type enrichment analysis identifies lncRNAs associated with pluripotency, while the knockdown of LINC02595, CATG00000090305.1, and RP11-148B6.2 modulates colony formation of iPSCs. We compare our results with previously published fibroblasts phenotyping data and find that 2.9% of the lncRNAs exhibit a consistent cell growth phenotype, whereas we observe 58.3% agreement in molecular phenotypes. This highlights that molecular phenotyping is more comprehensive in revealing affected pathways.
Subject(s)
Induced Pluripotent Stem Cells , RNA, Long Noncoding , Humans , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , Induced Pluripotent Stem Cells/metabolism , Oligonucleotides, Antisense , Gene Expression Profiling/methods , Embryonic Stem Cells/metabolismABSTRACT
Our understanding of how each hereditary kidney cancer adapts to its tissue microenvironment is incomplete. Here, we present single-cell transcriptomes of 108,342 cells from patient specimens including from six hereditary kidney cancers. The transcriptomes displayed distinct characteristics of the cell of origin and unique tissue microenvironment for each hereditary kidney cancer. Of note, hereditary leiomyomatosis and renal cell carcinoma (HLRCC)-associated kidney cancer retained some characteristics of proximal tubules, which were completely lost in lymph node metastases and present as an avascular tumor with suppressed T cells and TREM2-high macrophages, leading to immune tolerance. Birt-Hogg-Dubé (BHD)-associated kidney cancer exhibited transcriptomic intratumor heterogeneity (tITH) with increased characteristics of intercalated cells of the collecting duct and upregulation of FOXI1-driven genes, a critical transcription factor for collecting duct differentiation. These findings facilitate our understanding of how hereditary kidney cancers adapt to their tissue microenvironment.
ABSTRACT
The Cap Analysis of Gene Expression (CAGE) is a powerful method to identify Transcription Start Sites (TSSs) of capped RNAs while simultaneously measuring transcripts expression level. CAGE allows mapping at single nucleotide resolution at all active promoters and enhancers. Large CAGE datasets have been produced over the years from individual laboratories and consortia, including the Encyclopedia of DNA Elements (ENCODE) and Functional Annotation of the Mammalian Genome (FANTOM) consortia. These datasets constitute open resource for TSS annotations and gene expression analysis. Here, we provide an experimental protocol for the most recent CAGE method called Low Quantity (LQ) single strand (ss) CAGE "LQ-ssCAGE", which enables cost-effective profiling of low quantity RNA samples. LQ-ssCAGE is especially useful for samples derived from cells cultured in small volumes, cellular compartments such as nuclear RNAs or for samples from developmental stages. We demonstrate the reproducibility and effectiveness of the method by constructing 240 LQ-ssCAGE libraries from 50 ng of THP-1 cell extracted RNAs and discover lowly expressed novel enhancer and promoter-derived lncRNAs.
Subject(s)
Computational Biology/methods , Enhancer Elements, Genetic , Promoter Regions, Genetic , RNA Caps , Transcription Initiation Site , Databases, Genetic , Gene Expression Regulation , Gene Library , High-Throughput Nucleotide Sequencing/methods , Molecular Sequence Annotation , Regulatory Sequences, Nucleic Acid , Reproducibility of Results , WorkflowABSTRACT
BACKGROUND: The lymphatic and the blood vasculature are closely related systems that collaborate to ensure the organism's physiological function. Despite their common developmental origin, they present distinct functional fates in adulthood that rely on robust lineage-specific regulatory programs. The recent technological boost in sequencing approaches unveiled long noncoding RNAs (lncRNAs) as prominent regulatory players of various gene expression levels in a cell-type-specific manner. RESULTS: To investigate the potential roles of lncRNAs in vascular biology, we performed antisense oligonucleotide (ASO) knockdowns of lncRNA candidates specifically expressed either in human lymphatic or blood vascular endothelial cells (LECs or BECs) followed by Cap Analysis of Gene Expression (CAGE-Seq). Here, we describe the quality control steps adopted in our analysis pipeline before determining the knockdown effects of three ASOs per lncRNA target on the LEC or BEC transcriptomes. In this regard, we especially observed that the choice of negative control ASOs can dramatically impact the conclusions drawn from the analysis depending on the cellular background. CONCLUSION: In conclusion, the comparison of negative control ASO effects on the targeted cell type transcriptomes highlights the essential need to select a proper control set of multiple negative control ASO based on the investigated cell types.
Subject(s)
Gene Knockdown Techniques/methods , Oligonucleotides, Antisense/genetics , Organ Specificity/genetics , RNA, Long Noncoding/genetics , Adult , Endothelial Cells/metabolism , Gene Knockdown Techniques/standards , Humans , Lymphatic System/cytology , Lymphatic System/metabolism , Oligonucleotides, Antisense/standards , TranscriptomeABSTRACT
Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.
Subject(s)
Microsatellite Repeats , Neural Networks, Computer , Neurodegenerative Diseases/genetics , Transcription Initiation Site , Transcription Initiation, Genetic , A549 Cells , Animals , Base Sequence , Computational Biology/methods , Deep Learning , Enhancer Elements, Genetic , Genome, Human , High-Throughput Nucleotide Sequencing , Humans , Mice , Neurodegenerative Diseases/diagnosis , Neurodegenerative Diseases/metabolism , Polymorphism, Genetic , Promoter Regions, GeneticABSTRACT
Aggressive therapy-resistant and refractory acute myeloid leukemia (AML) has an extremely poor outcome. By analyzing a large number of genetically complex and diverse, primary high-risk poor-outcome human AML samples, we identified specific pathways of therapeutic vulnerability. Through drug screens followed by extensive in vivo validation and genomic analyses, we found inhibition of cytosolic and mitochondrial anti-apoptotic proteins XIAP, BCL2 and MCL1, and a key regulator of mitosis, AURKB, as a vulnerability hub based on patient-specific genetic aberrations and transcriptional signatures. Combinatorial therapeutic inhibition of XIAP with an additional patient-specific vulnerability eliminated established AML in vivo in patient-derived xenografts (PDXs) bearing diverse genetic aberrations, with no signs of recurrence during off-treatment follow-up. By integrating genomic profiling and drug-sensitivity testing, this work provides a platform for a precision-medicine approach for treating aggressive AML with high unmet need.
Subject(s)
Leukemia, Myeloid, Acute , Proto-Oncogene Proteins c-bcl-2 , Apoptosis/genetics , Apoptosis Regulatory Proteins/therapeutic use , Humans , Leukemia, Myeloid, Acute/drug therapy , Proto-Oncogene Proteins c-bcl-2/genetics , X-Linked Inhibitor of Apoptosis Protein/geneticsABSTRACT
Bulk carbon and boron form very different materials, which are also reflected in their clusters. Small carbon clusters form linear structures, whereas boron clusters are planar. For example, it is known that the B(5)(-) cluster possesses a C(2v) planar structure and C(5)(-) is a linear chain. Here we study B/C mixed clusters containing five atoms, C(x)B(5-x)(-) (x = 1-5), which are expected to exhibit a planar to linear structural transition as a function of the C content. The C(x)B(5-x)(-) (x = 1-5) clusters were produced and studied by photoelectron spectroscopy; their geometric and electronic structures were investigated using a variety of theoretical methods. We found that the planar-to-linear transition occurs between x = 2 and 3: the global minimum structures of the B-rich clusters, CB(4)(-) and C(2)B(3)(-), are planar, similar to B(5)(-), and those of the C-rich clusters, C(3)B(2)(-) and C(4)B(-), are linear, similar to C(5)(-).