RESUMEN
Repeat expansions cause at least 50 hereditary disorders, including Friedreich ataxia and other diseases known to cause mitochondrial dysfunction. We identified a patient with NAXE-related mitochondrial encephalopathy and novel biallelic GGGCC repeat expansion as long as ~200 repeats in the NAXE promoter region using long-read sequencing. In addition to a marked reduction in the RNA and protein, we found a marked reduction in nascent RNA in the promoter using native elongating transcript-cap analysis of gene expression (NET-CAGE), suggesting transcriptional suppression. Accordingly, CpG hypermethylation was observed in the repeat region. Genetic analyses determined that homozygosity in the patient was due to maternal chromosome 1 uniparental disomy (UPD). We assessed short variants within NAXE including the repeat region in the undiagnosed mitochondrial encephalopathy cohort of 242 patients. This study identified the GGGCC repeat expansion causing a mitochondrial disease and suggests that UPD could significantly contribute to homozygosity for rare repeat-expanded alleles.
RESUMEN
Non-biting midges (Chironomidae) are known to inhabit a wide range of environments, and certain species can tolerate extreme conditions, where the rest of insects cannot survive. In particular, the sleeping chironomid Polypedilum vanderplanki is known for the remarkable ability of its larvae to withstand almost complete desiccation by entering a state called anhydrobiosis. Chromosome numbers in chironomids are higher than in other dipterans and this extra genomic resource might facilitate rapid adaptation to novel environments. We used improved sequencing strategies to assemble a chromosome-level genome sequence for P. vanderplanki for deep comparative analysis of genomic location of genes associated with desiccation tolerance. Using whole genome-based cross-species and intra-species analysis, we provide evidence for the unique functional specialization of Chromosome 4 through extensive acquisition of novel genes. In contrast to other insect genomes, in the sleeping chironomid a uniquely high degree of subfunctionalization in paralogous anhydrobiosis genes occurs in this chromosome, as well as pseudogenization in a highly duplicated gene family. Our findings suggest that the Chromosome 4 in Polypedilum is a site of high genetic turnover, allowing it to act as a 'sandbox' for evolutionary experiments, thus facilitating the rapid adaptation of midges to harsh environments.
RESUMEN
The gross morphology of the circulatory system in the amphibious mudskipper, Boleophthalmus pectinirostris, conforms with the typical teleost configuration, in which gills and systemic vascular beds are connected in series. However, at the microscopic level, the vasculatures of the respiratory organs, the inner epithelium of the bucco-opercular cavity, gills and skin, all show specializations for aerial gas exchange. The epithelium of the bucco-opercular cavity is heavily vascularized by respiratory capillaries that are derived from systemic arteries of the head, mainly branches of the hyomandibular artery and the dorsal opercular artery. The respiratory circuit of the secondary lamellae of the gills consists of 15-17 channels running in parallel, unlike the lacuna-like blood space of aquatic fishes. The most notable specialization is found in the microcirculation of the respiratory papillae in the skin. Each respiratory papilla is supplied by an arteriole that is derived from a systemic artery, mainly the cranial artery in the head and the segmental artery in the trunk. The arteriole divides several times along its course to the apical region of a papilla, where the branches split into approximately 65 capillaries that radiate to the periphery of the papilla. The capillaries twist 5-10 times before they unite to form the venules that encircle maximally half the circumference of a papilla. A variable number of venules merge into a vein, which progressively coalesces with veins from other papillae. There is no morphological specialization that separates oxygen-rich effluent blood of the epithelia of the bucco-opercular cavity and the respiratory papillae of the skin from the oxygen-poor systemic venous blood. The ecophysiological implications of these findings are discussed in relation to the environmental conditions that B. pectinirostris experience during tidal cycles in the warm months and during overwintering.
Asunto(s)
Perciformes , Animales , Arterias , Peces , Branquias , VenasRESUMEN
Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.
Asunto(s)
Repeticiones de Microsatélite , Redes Neurales de la Computación , Enfermedades Neurodegenerativas/genética , Sitio de Iniciación de la Transcripción , Iniciación de la Transcripción Genética , Células A549 , Animales , Secuencia de Bases , Biología Computacional/métodos , Aprendizaje Profundo , Elementos de Facilitación Genéticos , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Ratones , Enfermedades Neurodegenerativas/diagnóstico , Enfermedades Neurodegenerativas/metabolismo , Polimorfismo Genético , Regiones Promotoras GenéticasRESUMEN
Gene expression is controlled at the transcriptional and post-transcriptional levels. The TACC2 gene was known to be associated with tumors but the control of its expression is unclear. We have reported that activity of the intronic promoter p10 of TACC2 in primary lesion of endometrial cancer is indicative of lymph node metastasis among a low-risk patient group. Here, we analyze the intronic promoter derived isoforms in JHUEM-1 endometrial cancer cells, and primary tissues of endometrial cancers and normal endometrium. Full-length cDNA amplicons are produced by long-range PCR and subjected to nanopore sequencing followed by computational error correction. We identify 16 stable, 4 variable, and 9 rare exons including 3 novel exons validated independently. All variable and rare exons reside N-terminally of the TACC domain and contribute to isoform variety. We found 240 isoforms as high-confidence, supported by more than 20 reads. The large number of isoforms produced from one minor promoter indicates the post-transcriptional complexity coupled with transcription at the TACC2 locus in cancer and normal cells.
Asunto(s)
Empalme Alternativo , Proteínas Portadoras/genética , Neoplasias Endometriales/patología , Exones , Intrones , Regiones Promotoras Genéticas , ARN Mensajero/metabolismo , Proteínas Supresoras de Tumor/genética , Neoplasias Endometriales/genética , Neoplasias Endometriales/metabolismo , Femenino , Humanos , Isoformas de Proteínas , ARN Mensajero/genética , Células Tumorales CultivadasRESUMEN
RNA splicing, a highly conserved process in eukaryotic gene expression, is seen as a promising target for anticancer agents. Splicing is associated with other RNA processing steps, such as transcription and nuclear export; however, our understanding of the interaction between splicing and other RNA regulatory mechanisms remains incomplete. Moreover, the impact of chemical splicing inhibition on long non-coding RNAs (lncRNAs) has been poorly understood. Here, we demonstrate that spliceostatin A (SSA), a chemical splicing modulator that binds to the SF3B subcomplex of the U2 small nuclear ribonucleoprotein particle (snRNP), limits U1 snRNP availability in splicing, resulting in premature cleavage and polyadenylation of MALAT1, a nuclear lncRNA, as well as protein-coding mRNAs. Therefore, truncated transcripts are exported into the cytoplasm and translated, resulting in aberrant protein products. Our work demonstrates that active recycling of the splicing machinery maintains homeostasis of RNA processing beyond intron excision.
Asunto(s)
Fosfoproteínas/antagonistas & inhibidores , Piranos/farmacología , Factores de Empalme de ARN/antagonistas & inhibidores , ARN Largo no Codificante/metabolismo , Ribonucleoproteína Nuclear Pequeña U1/antagonistas & inhibidores , Compuestos de Espiro/farmacología , Femenino , Células HeLa , Humanos , Fosfoproteínas/química , Fosfoproteínas/metabolismo , Poliadenilación/efectos de los fármacos , Piranos/química , Empalme del ARN/efectos de los fármacos , Factores de Empalme de ARN/química , Factores de Empalme de ARN/metabolismo , Ribonucleoproteína Nuclear Pequeña U1/química , Ribonucleoproteína Nuclear Pequeña U1/metabolismo , Compuestos de Espiro/química , Células Tumorales CultivadasRESUMEN
Gene expression profiles in homologous tissues have been observed to be different between species, which may be due to differences between species in the gene expression program in each cell type, but may also reflect differences in cell type composition of each tissue in different species. Here, we compare expression profiles in matching primary cells in human, mouse, rat, dog, and chicken using Cap Analysis Gene Expression (CAGE) and short RNA (sRNA) sequencing data from FANTOM5. While we find that expression profiles of orthologous genes in different species are highly correlated across cell types, in each cell type many genes were differentially expressed between species. Expression of genes with products involved in transcription, RNA processing, and transcriptional regulation was more likely to be conserved, while expression of genes encoding proteins involved in intercellular communication was more likely to have diverged during evolution. Conservation of expression correlated positively with the evolutionary age of genes, suggesting that divergence in expression levels of genes critical for cell function was restricted during evolution. Motif activity analysis showed that both promoters and enhancers are activated by the same transcription factors in different species. An analysis of expression levels of mature miRNAs and of primary miRNAs identified by CAGE revealed that evolutionary old miRNAs are more likely to have conserved expression patterns than young miRNAs. We conclude that key aspects of the regulatory network are conserved, while differential expression of genes involved in cell-to-cell communication may contribute greatly to phenotypic differences between species.
Asunto(s)
Evolución Molecular , Transcriptoma , Animales , Pollos/genética , Perros , Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Ratones , MicroARNs/metabolismo , Motivos de Nucleótidos , Análisis de Componente Principal , Regiones Promotoras Genéticas , Ratas , Especificidad de la Especie , Factores de Transcripción/metabolismoRESUMEN
In the FANTOM5 project, transcription initiation events across the human and mouse genomes were mapped at a single base-pair resolution and their frequencies were monitored by CAGE (Cap Analysis of Gene Expression) coupled with single-molecule sequencing. Approximately three thousands of samples, consisting of a variety of primary cells, tissues, cell lines, and time series samples during cell activation and development, were subjected to a uniform pipeline of CAGE data production. The analysis pipeline started by measuring RNA extracts to assess their quality, and continued to CAGE library production by using a robotic or a manual workflow, single molecule sequencing, and computational processing to generate frequencies of transcription initiation. Resulting data represents the consequence of transcriptional regulation in each analyzed state of mammalian cells. Non-overlapping peaks over the CAGE profiles, approximately 200,000 and 150,000 peaks for the human and mouse genomes, were identified and annotated to provide precise location of known promoters as well as novel ones, and to quantify their activities.
Asunto(s)
Perfilación de la Expresión Génica , Genoma , Animales , Regulación de la Expresión Génica , Humanos , Ratones , Regiones Promotoras Genéticas , Especificidad de la EspecieRESUMEN
MicroRNAs (miRNAs) are short non-coding RNAs with key roles in cellular regulation. As part of the fifth edition of the Functional Annotation of Mammalian Genome (FANTOM5) project, we created an integrated expression atlas of miRNAs and their promoters by deep-sequencing 492 short RNA (sRNA) libraries, with matching Cap Analysis Gene Expression (CAGE) data, from 396 human and 47 mouse RNA samples. Promoters were identified for 1,357 human and 804 mouse miRNAs and showed strong sequence conservation between species. We also found that primary and mature miRNA expression levels were correlated, allowing us to use the primary miRNA measurements as a proxy for mature miRNA levels in a total of 1,829 human and 1,029 mouse CAGE libraries. We thus provide a broad atlas of miRNA expression and promoters in primary mammalian cells, establishing a foundation for detailed analysis of miRNA expression patterns and transcriptional control regions.
Asunto(s)
Perfilación de la Expresión Génica/métodos , MicroARNs/genética , Anotación de Secuencia Molecular , Regiones Promotoras Genéticas/genética , Animales , Células Cultivadas , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Ratones , MicroARNs/metabolismoRESUMEN
Long non-coding RNAs (lncRNAs) are largely heterogeneous and functionally uncharacterized. Here, using FANTOM5 cap analysis of gene expression (CAGE) data, we integrate multiple transcript collections to generate a comprehensive atlas of 27,919 human lncRNA genes with high-confidence 5' ends and expression profiles across 1,829 samples from the major human primary cell types and tissues. Genomic and epigenomic classification of these lncRNAs reveals that most intergenic lncRNAs originate from enhancers rather than from promoters. Incorporating genetic and expression data, we show that lncRNAs overlapping trait-associated single nucleotide polymorphisms are specifically expressed in cell types relevant to the traits, implicating these lncRNAs in multiple diseases. We further demonstrate that lncRNAs overlapping expression quantitative trait loci (eQTL)-associated single nucleotide polymorphisms of messenger RNAs are co-expressed with the corresponding messenger RNAs, suggesting their potential roles in transcriptional regulation. Combining these findings with conservation data, we identify 19,175 potentially functional lncRNAs in the human genome.
Asunto(s)
Bases de Datos Genéticas , ARN Largo no Codificante/química , ARN Largo no Codificante/genética , Transcriptoma/genética , Células Cultivadas , Secuencia Conservada/genética , Conjuntos de Datos como Asunto , Elementos de Facilitación Genéticos/genética , Epigénesis Genética , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Genoma Humano/genética , Estudio de Asociación del Genoma Completo , Genómica , Humanos , Internet , Anotación de Secuencia Molecular , Especificidad de Órganos/genética , Polimorfismo de Nucleótido Simple , Regiones Promotoras Genéticas/genética , Sitios de Carácter Cuantitativo/genética , Estabilidad del ARN , ARN Mensajero/genéticaRESUMEN
Spliceostatin A (SSA) is a methyl ketal derivative of FR901464, a potent antitumor compound isolated from a culture broth of Pseudomonas sp no. 2663. These compounds selectively bind to the essential spliceosome component SF3b, a subcomplex of the U2 snRNP, to inhibit pre-mRNA splicing. However, the mechanism of SSA's antitumor activity is unknown. It is noteworthy that SSA causes accumulation of a truncated form of the CDK inhibitor protein p27 translated from CDKN1B pre-mRNA, which is involved in SSA-induced cell-cycle arrest. However, it is still unclear whether pre-mRNAs are uniformly exported from the nucleus following SSA treatment. We performed RNA-seq analysis on nuclear and cytoplasmic fractions of SSA-treated cells. Our statistical analyses showed that intron retention is the major consequence of SSA treatment, and a small number of intron-containing pre-mRNAs leak into the cytoplasm. Using a series of reporter plasmids to investigate the roles of intronic sequences in the pre-mRNA leakage, we showed that the strength of the 5' splice site affects pre-mRNA leakage. Additionally, we found that the level of pre-mRNA leakage is related to transcript length. These results suggest that the strength of the 5' splice site and the length of the transcripts are determinants of the pre-mRNA leakage induced by SF3b inhibitors.
Asunto(s)
Inhibidor p27 de las Quinasas Dependientes de la Ciclina/genética , Neoplasias/genética , Piranos/farmacología , Análisis de Secuencia de ARN/métodos , Compuestos de Espiro/farmacología , Núcleo Celular/genética , Citoplasma/genética , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Células HeLa , Humanos , Precursores del ARN/genética , Empalme del ARNRESUMEN
CAGE (cap analysis gene expression) and RNA-seq are two major technologies used to identify transcript abundances as well as structures. They measure expression by sequencing from either the 5' end of capped molecules (CAGE) or tags randomly distributed along the length of a transcript (RNA-seq). Library protocols for clonally amplified (Illumina, SOLiD, 454 Life Sciences [Roche], Ion Torrent), second-generation sequencing platforms typically employ PCR preamplification prior to clonal amplification, while third-generation, single-molecule sequencers can sequence unamplified libraries. Although these transcriptome profiling platforms have been demonstrated to be individually reproducible, no systematic comparison has been carried out between them. Here we compare CAGE, using both second- and third-generation sequencers, and RNA-seq, using a second-generation sequencer based on a panel of RNA mixtures from two human cell lines to examine power in the discrimination of biological states, detection of differentially expressed genes, linearity of measurements, and quantification reproducibility. We found that the quantified levels of gene expression are largely comparable across platforms and conclude that CAGE and RNA-seq are complementary technologies that can be used to improve incomplete gene models. We also found systematic bias in the second- and third-generation platforms, which is likely due to steps such as linker ligation, cleavage by restriction enzymes, and PCR amplification. This study provides a perspective on the performance of these platforms, which will be a baseline in the design of further experiments to tackle complex transcriptomes uncovered in a wide range of cell types.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ARN/genética , Transcriptoma/genética , Perfilación de la Expresión Génica , Humanos , Análisis de Secuencia de ARN/métodosRESUMEN
In DNA amplification, the initial step of copying a target sequence from the template DNA--the so-called intermediate product generation step--is very important. In examining the turn-back primer (TP)-dependent isothermal DNA amplification (TIA) method, we determined the actual time point of intermediate product generation by extrapolating dsDNA amplification curves. Our results indicate that intermediate product creation is the rate-limiting step in TIA, and good TP design is advantageous for improving the intermediate production process.
Asunto(s)
Cartilla de ADN/química , Modelos Genéticos , Técnicas de Amplificación de Ácido Nucleico/métodos , Sondas de Ácido Nucleico/química , Algoritmos , Cartilla de ADN/genética , Humanos , Modelos Moleculares , Sondas de Ácido Nucleico/genética , Análisis de Secuencia de ADN/métodosRESUMEN
In order to elucidate the mechanism controlling the biogenesis of the Golgi complex, we have studied whether the expression of a resident membrane protein p138 of the Golgi complex is dependent upon the cell cycle. The protein level of p138 in human KB cells was increased during thymidine block to synchronize the cells in the early-S phase, but changed little from S to G2 after release from the block. On the other hand, the mRNA level of the p138 gene was constant during the block. The change in mRNA level in the cells was small with a low peak at S to G2. Both p138 protein and mRNA levels decreased after cell division and then rose rapidly to the same level as those of log-phase cells in the next G1 to S. Thus, translation of p138 protein was upregulated in the cells at the early-S phase. However, we found also that the p138 protein level increased during an arrest at G2/M caused by etoposide. The kinetics of centrosome duplication apparently differ from those of p138 protein production. The duplication occurred mainly at S to G2 after the release from thymidine block, while the ratio of cells containing duplicated centrosomes increased gradually during the block. Taken together, these results show that both the translation and transcription of p138 protein are regulated independent of the cell cycle and dissociated from the duplication of the centrosome. Rather, the expression of p138 protein seems to be coupled with a change in cell size since both thymidine block and etoposide inhibition resulted in an apparent increase in cell size.