ABSTRACT
Retroposed protein-coding genes are commonly considered to be nonfunctional duplicates. However, they often gain transcriptional capability and have important roles. Amici et al. recently identified novel functions of a retroposed gene. HAPSTR2, a retrocopy of HAPSTR1, encodes a protein that stabilizes the HAPSTR1 protein and functionally buffers its loss.
Subject(s)
Evolution, Molecular , Retroelements , Retroelements/geneticsABSTRACT
U7 snRNA is part of the U7 snRNP complex, required for the 3' end processing of replication-dependent histone pre-mRNAs in S phase of the cell cycle. Here, we show that U7 snRNA plays another function in inhibiting the expression of a subset of long terminal repeats of human endogenous retroviruses (HERV1/LTR12s) and LTR12-containing long intergenic noncoding RNAs (lincRNAs), both bearing sequence motifs that perfectly match the 5' end of U7 snRNA. We demonstrate that U7 snRNA inhibits LTR12 and lincRNA transcription and propose a mechanism in which U7 snRNA hampers the binding/activity of the NF-Y transcription factor to CCAAT motifs within LTR12 elements. Thereby, U7 snRNA plays a protective role in maintaining the silencing of deleterious genetic elements in selected types of cells.
Subject(s)
Endogenous Retroviruses , RNA, Long Noncoding , RNA, Small Nuclear , Terminal Repeat Sequences , Humans , RNA, Small Nuclear/metabolism , RNA, Small Nuclear/genetics , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , Terminal Repeat Sequences/genetics , Endogenous Retroviruses/genetics , CCAAT-Binding Factor/metabolism , CCAAT-Binding Factor/genetics , Transcription, GeneticABSTRACT
As it is well known, messenger RNA has many regulatory regions along its sequence length. One of them is the 5' untranslated region (5'UTR), which itself contains many regulatory elements such as upstream ORFs (uORFs), internal ribosome entry sites (IRESs), microRNA binding sites, and structural components involved in the regulation of mRNA stability, pre-mRNA splicing, and translation initiation. Activation of the alternative, more upstream transcription start site leads to an extension of 5'UTR. One of the consequences of 5'UTRs extension may be head-to-head gene overlap. This review describes elements in 5'UTR of protein-coding transcripts and the functional significance of protein-coding genes 5' overlap with implications for transcription, translation, and disease.
Subject(s)
Gene Expression Regulation , Protein Biosynthesis , 5' Untranslated Regions , RNA, Messenger/genetics , Regulatory Sequences, Nucleic AcidABSTRACT
Head and neck squamous cell carcinoma is one of the most common and fatal cancers worldwide. Lack of appropriate preventive screening tests, late detection, and high heterogeneity of these tumors are the main reasons for the unsatisfactory effects of therapy and, consequently, unfavorable outcomes for patients. An opportunity to improve the quality of diagnostics and treatment of this group of cancers are microRNAs (miRNAs) - molecules with a great potential both as biomarkers and therapeutic targets. This review aims to present the characteristics of these short non-coding RNAs (ncRNAs) and summarize the current reports on their use in oncology focused on medical strategies tailored to patients' needs.
ABSTRACT
SyntDB (http://syntdb.amu.edu.pl/) is a collection of data on long noncoding RNAs (lncRNAs) and their evolutionary relationships in twelve primate species, including humans. This is the first database dedicated to primate lncRNAs, thousands of which are uniquely stored in SyntDB. The lncRNAs were predicted with our computational pipeline using publicly available RNA-Seq data spanning diverse tissues and organs. Most of the species included in SyntDB still lack lncRNA annotations in public resources. In addition to providing users with unique sets of lncRNAs and their characteristics, SyntDB provides data on orthology relationships between the lncRNAs of humans and other primates, which are not available on this scale elsewhere. Keeping in mind that only a small fraction of currently known human lncRNAs have been functionally characterized and that lncRNA conservation is frequently used to identify the most relevant lncRNAs for functional studies, we believe that SyntDB will contribute to ongoing research aimed at deciphering the biological roles of lncRNAs.
Subject(s)
Databases, Nucleic Acid , Primates/genetics , RNA, Long Noncoding/metabolism , Animals , Humans , RNA, Long Noncoding/chemistry , RNA-SeqABSTRACT
BACKGROUND: Long noncoding RNAs represent a large class of transcripts with two common features: they exceed an arbitrary length threshold of 200 nt and are assumed to not encode proteins. Although a growing body of evidence indicates that the vast majority of lncRNAs are potentially nonfunctional, hundreds of them have already been revealed to perform essential gene regulatory functions or to be linked to a number of cellular processes, including those associated with the etiology of human diseases. To better understand the biology of lncRNAs, it is essential to perform a more in-depth study of their evolution. In contrast to protein-encoding transcripts, however, they do not show the strong sequence conservation that usually results from purifying selection; therefore, software that is typically used to resolve the evolutionary relationships of protein-encoding genes and transcripts is not applicable to the study of lncRNAs. RESULTS: To tackle this issue, we developed lncEvo, a computational pipeline that consists of three modules: (1) transcriptome assembly from RNA-Seq data, (2) prediction of lncRNAs, and (3) conservation study-a genome-wide comparison of lncRNA transcriptomes between two species of interest, including search for orthologs. Importantly, one can choose to apply lncEvo solely for transcriptome assembly or lncRNA prediction, without calling the conservation-related part. CONCLUSIONS: lncEvo is an all-in-one tool built with the Nextflow framework, utilizing state-of-the-art software and algorithms with customizable trade-offs between speed and sensitivity, ease of use and built-in reporting functionalities. The source code of the pipeline is freely available for academic and nonacademic use under the MIT license at https://gitlab.com/spirit678/lncrna_conservation_nf .
Subject(s)
Algorithms , Computational Biology , RNA, Long Noncoding , Software , Computational Biology/methods , Conserved Sequence , Genome , Humans , RNA, Long Noncoding/genetics , TranscriptomeABSTRACT
A large portion of the human genome is transcribed into long noncoding RNAs that can range from 200 nucleotides to several kilobases in length. The number of identified lncRNAs is still growing, but only a handful of them have been functionally characterized. However, it is known that the functions of lncRNAs are closely related to their subcellular localization. Cytoplasmic lncRNAs can regulate mRNA stability, affect translation and act as miRNA sponges, while nuclear-retained long noncoding RNAs have been reported to be involved in transcriptional control, chromosome scaffolding, modulation of alternative splicing and chromatin remodelling. Through these processes, lncRNAs have diverse regulatory roles in cell biology and diseases. OIP5-AS1 (also known as Cyrano), a poorly characterized lncRNA expressed antisense to the OIP5 oncogene, is deregulated in multiple cancers. We showed that one of the OIP5-AS1 splicing forms (ENST00000501665.2) is retained in the cell nucleus where it associates with chromatin, thus narrowing down the spectrum of its possible mechanisms of action. Its knockdown with antisense LNA gapmeRs led to inhibited expression of a sense partner, OIP5, strongly suggesting a functional coupling between OIP5 and ENST00000501665.2. A subsequent bioinformatics analysis followed by RAP-MS and RNA Immunoprecipitation experiments suggested its possible mode of action; in particular, we found that ENST00000501665.2 directly binds to a number of nuclear proteins, including SMARCA4, a component of the SWI/SNF chromatin remodelling complex, whose binding motif is located in the promoter of the OIP5 oncogene.
Subject(s)
Alternative Splicing , Cell Cycle Proteins/metabolism , Chromatin/genetics , Chromosomal Proteins, Non-Histone/metabolism , Gene Expression Regulation, Neoplastic , Oncogenes , RNA, Long Noncoding/genetics , Cell Cycle Proteins/genetics , Cell Proliferation , Chromosomal Proteins, Non-Histone/genetics , HEK293 Cells , Humans , RNA, Long Noncoding/chemistryABSTRACT
Gene overlap plays various regulatory functions on transcriptional and post-transcriptional levels. Most current studies focus on protein-coding genes overlapping with non-protein-coding counterparts, the so called natural antisense transcripts. Considerably less is known about the role of gene overlap in the case of two protein-coding genes. Here, we provide OverGeneDB, a database of human and mouse 5' end protein-coding overlapping genes. The database contains 582 human and 113 mouse gene pairs that are transcribed using overlapping promoters in at least one analyzed library. Gene pairs were identified based on the analysis of the transcription start site (TSS) coordinates in 73 human and 10 mouse organs, tissues and cell lines. Beside TSS data, resources for 26 human lung adenocarcinoma cell lines also contain RNA-Seq and ChIP-Seq data for seven histone modifications and RNA Polymerase II activity. The collected data revealed that the overlap region is rarely conserved between the studied species and tissues. In â¼50% of the overlapping genes, transcription started explicitly in the overlap regions. In the remaining half of overlapping genes, transcription was initiated both from overlapping and non-overlapping TSSs. OverGeneDB is accessible at http://overgenedb.amu.edu.pl.
Subject(s)
Databases, Genetic , Genes, Overlapping , Animals , Gene Expression , Histone Code , Humans , Mice , Multigene Family , Open Reading Frames , Promoter Regions, Genetic , RNA Polymerase II/metabolism , Sequence Analysis, RNA , Transcription Factors/metabolism , Transcription Initiation SiteABSTRACT
Proximal promoter regions (PPR) are heavily transcribed yielding different types of small RNAs. The act of transcription within PPRs might regulate downstream gene expression via transcriptional interference (TI). For analysis, we investigated capped and polyadenylated small RNA transcripts within PPRs of human RefSeq genes in eight different cell lines. Transcripts of our datasets overlapped with experimentally determined transcription factor binding sites (TFBS). For TFBSs intersected by these small RNA transcripts, we established negative correlation of sRNA expression levels and transcription factor (TF) DNA binding affinities; suggesting that the transcripts acted via TI. Accordingly, datasets were designated as TFbiTrs (TF-binding interfering transcripts). Expression of most TFbiTrs was restricted to certain cell lines. This facilitated the analysis of effects related to TFbiTr expression for the same RefSeq genes across cell lines. We consistently uncovered higher relative TF/DNA binding affinities and concomitantly higher expression levels for RefSeq genes in the absence of TFbiTrs. Analysis of corresponding chromatin landscapes supported these results. ChIA-PET revealed the participation of distal enhancers in TFbiTr transcription. Enhancers regulating TFbiTrs, in effect, act as repressors for corresponding downstream RefSeq genes. We demonstrate the significant impact of TI on gene expression using selected small RNA datasets.
Subject(s)
DNA/genetics , Promoter Regions, Genetic , RNA, Messenger/genetics , Transcription Factors/genetics , Transcription, Genetic , A549 Cells , Binding Sites , Cell Line , Chromatin/chemistry , Chromatin/metabolism , DNA/metabolism , Datasets as Topic , Enhancer Elements, Genetic , HeLa Cells , Human Embryonic Stem Cells/cytology , Human Embryonic Stem Cells/metabolism , Humans , K562 Cells , MCF-7 Cells , Neurons/cytology , Neurons/metabolism , Protein Binding , RNA, Messenger/metabolism , Transcription Factors/metabolismABSTRACT
Gene retroposition leads to considerable genetic variation between individuals. Recent studies revealed the presence of at least 208 retroduplication variations (RDVs), a class of polymorphisms, in which a retrocopy is present or absent from individual genomes. Most of these RDVs resulted from recent retroduplications. In this study, we used the results of Phase 1 from the 1000 Genomes Project to investigate the variation in loss of ancestral (i.e. shared with other primates) retrocopies among different human populations. In addition, we examined retrocopy expression levels using RNA-Seq data derived from the Ilumina BodyMap project, as well as data from lymphoblastoid cell lines provided by the Geuvadis Consortium. We also developed a new approach to detect novel retrocopies absent from the reference human genome. We experimentally confirmed the existence of the detected retrocopies and determined their presence or absence in the human genomes of 17 different populations. Altogether, we were able to detect 193 RDVs; the majority resulted from retrocopy deletion. Most of these RDVs had not been previously reported. We experimentally confirmed the expression of 11 ancestral retrogenes that underwent deletion in certain individuals. The frequency of their deletion, with the exception of one retrogene, is very low. The expression, conservation and low rate of deletion of the remaining 10 retrocopies may suggest some functionality. Aside from the presence or absence of expressed retrocopies, we also searched for differences in retrocopy expression levels between populations, finding 9 retrogenes that undergo statistically significant differential expression.
Subject(s)
Evolution, Molecular , Gene Duplication , Genome, Human , Polymorphism, Genetic , Animals , Gene Expression Regulation , High-Throughput Nucleotide Sequencing , Human Genome Project , Humans , Primates/geneticsABSTRACT
shmiRs are pri-miRNA-based RNA interference triggers from which exogenous siRNAs are expressed in cells to silence target genes. These reagents are very promising tools in RNAi in vivo applications due to their good activity profile and lower toxicity than observed for other vector-based reagents such as shRNAs. In this study, using high-resolution northern blotting and small RNA sequencing, we investigated the precision with which RNases Drosha and Dicer process shmiRs. The fidelity of siRNA release from the commonly used pri-miRNA shuttles was found to depend on both the siRNA insert and the pri-miR scaffold. Then, we searched for specific factors that may affect the precision of siRNA release and found that both the structural features of shmiR hairpins and the nucleotide sequence at Drosha and Dicer processing sites contribute to cleavage site selection and cleavage precision. An analysis of multiple shRNA intermediates generated from several reagents revealed the complexity of shmiR processing by Drosha and demonstrated that Dicer selects substrates for further processing. Aside from providing new basic knowledge regarding the specificity of nucleases involved in miRNA biogenesis, our results facilitate the rational design of more efficient genetic reagents for RNAi technology.
Subject(s)
DEAD-box RNA Helicases/genetics , MicroRNAs/genetics , RNA Interference , Ribonuclease III/genetics , Base Sequence/genetics , DEAD-box RNA Helicases/metabolism , HEK293 Cells , Humans , MicroRNAs/biosynthesis , Nucleic Acid Conformation , RNA Processing, Post-Transcriptional/genetics , RNA, Small Interfering/genetics , Ribonuclease III/metabolismABSTRACT
RNA interference triggers such as short interfering RNA (siRNA) or genetically encoded short hairpin RNA (shRNA) and artificial miRNA (sh-miR) are widely used to silence the expression of specific genes. In addition to silencing selected targets, RNAi reagents may induce various side effects, including immune responses. To determine the molecular markers of immune response activation when using RNAi reagents, we analyzed the results of experiments gathered in the RNAimmuno (v 2.0) and GEO Profiles databases. To better characterize and compare cellular responses to various RNAi reagents in one experimental system, we designed a reagent series in corresponding siRNA, D-siRNA, shRNA and sh-miR forms. To exclude sequence-specific effects the reagents targeted 3 different transcripts (Luc, ATXN3 and HTT). We demonstrate that RNAi reagents induce a broad variety of sequence-non-specific effects, including the deregulation of cellular miRNA levels. Typical siRNAs are weak stimulators of interferon response but may saturate the miRNA biogenesis pathway, leading to the downregulation of highly expressed miRNAs, whereas plasmid-based reagents induce known markers of immune response and may alter miRNA levels and their isomiR composition.
Subject(s)
Immunity, Cellular/genetics , MicroRNAs/genetics , RNA Interference/immunology , RNA, Small Interfering/genetics , Gene Silencing , Interferons/genetics , MicroRNAs/immunology , RNA, Small Interfering/immunologyABSTRACT
Long non-coding RNAs (lncRNAs) represent a class of potent regulators of gene expression that are found in a wide array of eukaryotes; however, our knowledge about these molecules in plants is still very limited. In particular, a number of model plant species still lack comprehensive data sets of lncRNAs and their annotations, and very little is known about their biological roles. To meet these shortcomings, we created an online database of lncRNAs in 10 model plant species. The lncRNAs were identified computationally using dozens of publicly available RNA sequencing (RNA-Seq) libraries. Expression values, coding potential, sequence alignments as well as other types of data provide annotation for the identified lncRNAs. In order to better characterize them, we investigated their potential roles in splicing modulation and deregulation of microRNA functions. The data are freely available for searching, browsing and downloading from an online database called CANTATAdb (http://cantata.amu.edu.pl, http://yeti.amu.edu.pl/CANTATA/).
Subject(s)
Databases, Nucleic Acid , Gene Expression Regulation, Plant , MicroRNAs/genetics , Plants/genetics , RNA, Long Noncoding/genetics , Internet , RNA, Plant/genetics , Sequence Analysis, RNA , User-Computer InterfaceABSTRACT
Ever growing interest in microRNAs has immensely populated the number of resources and research papers devoted to the field and, as a result, it becomes more and more demanding to find miRNA data of interest. To mitigate this problem, we created miRNEST database (http://mirnest.amu.edu.pl), an integrative microRNAs resource. In its updated version, named miRNEST 2.0, the database is complemented with our extensive miRNA predictions from deep sequencing libraries, data from plant degradome analyses, results of pre-miRNA classification with HuntMi and miRNA splice sites information. We also added download and upload options and improved the user interface to make it easier to browse through miRNA records.
Subject(s)
Databases, Nucleic Acid , MicroRNAs/chemistry , RNA, Plant/chemistry , Animals , High-Throughput Nucleotide Sequencing , Internet , RNA Precursors/chemistry , RNA Splice Sites , Sequence Analysis, RNAABSTRACT
Retrocopies of protein-coding genes, reverse transcribed and inserted into the genome copies of mature RNA, have commonly been categorized as pseudogenes with no biological importance. However, recent studies showed that they play important role in the genomes evolution and shaping interspecies differences. Here, we present RetrogeneDB, a database of retrocopies in 62 animal genomes. RetrogeneDB contains information about retrocopies, their genomic localization, parental genes, ORF conservation, and expression. To our best knowledge, this is the most complete retrocopies database providing information for dozens of species previously never analyzed in the context of protein-coding genes retroposition. The database is available at http://retrogenedb.amu.edu.pl.
Subject(s)
Databases, Genetic , Pseudogenes , Retroelements , Animals , Evolution, Molecular , Genome , HumansABSTRACT
Gene duplicates generated via retroposition were long thought to be pseudogenized and consequently decayed. However, a significant number of these genes escaped their evolutionary destiny and evolved into functional genes. Despite multiple studies, the number of functional retrogenes in human and other genomes remains unclear. We performed a comparative analysis of human, chicken, and worm genomes to identify "orphan" retrogenes, that is, retrogenes that have replaced their progenitors. We located 25 such candidates in the human genome. All of these genes were previously known, and the majority has been intensively studied. Despite this, they have never been recognized as retrogenes. Analysis revealed that the phenomenon of replacing parental genes with their retrocopies has been taking place over the entire span of animal evolution. This process was often species specific and contributed to interspecies differences. Surprisingly, these retrogenes, which should evolve in a more relaxed mode, are subject to a very strong purifying selection, which is, on average, two and a half times stronger than other human genes. Also, for retrogenes, they do not show a typical overall tendency for a testis-specific expression. Notably, seven of them are associated with human diseases. Recognizing them as "orphan" retrocopies, which have different regulatory machinery than their parents, is important for any disease studies in model organisms, especially when discoveries made in one species are transferred to humans.
Subject(s)
Genome, Human , Retroelements , Amino Acid Sequence , Animals , Cluster Analysis , Endosomal Sorting Complexes Required for Transport/chemistry , Endosomal Sorting Complexes Required for Transport/genetics , Gene Duplication , Gene Expression Profiling , Gene Order , Genetic Association Studies , Humans , Male , Mice , MicroRNAs/genetics , MicroRNAs/metabolism , Molecular Sequence Data , Paraplegia/genetics , Phylogeny , Pseudogenes , Sequence AlignmentABSTRACT
Despite accumulating data on animal and plant microRNAs and their functions, existing public miRNA resources usually collect miRNAs from a very limited number of species. A lot of microRNAs, including those from model organisms, remain undiscovered. As a result there is a continuous need to search for new microRNAs. We present miRNEST (http://mirnest.amu.edu.pl), a comprehensive database of animal, plant and virus microRNAs. The core part of the database is built from our miRNA predictions conducted on Expressed Sequence Tags of 225 animal and 202 plant species. The miRNA search was performed based on sequence similarity and as many as 10,004 miRNA candidates in 221 animal and 199 plant species were discovered. Out of them only 299 have already been deposited in miRBase. Additionally, miRNEST has been integrated with external miRNA data from literature and 13 databases, which includes miRNA sequences, small RNA sequencing data, expression, polymorphisms and targets data as well as links to external miRNA resources, whenever applicable. All this makes miRNEST a considerable miRNA resource in a sense of number of species (544) that integrates a scattered miRNA data into a uniform format with a user-friendly web interface.
Subject(s)
Databases, Nucleic Acid , MicroRNAs/chemistry , Molecular Sequence Annotation , RNA, Plant/chemistry , RNA, Viral/chemistry , Animals , Internet , MicroRNAs/metabolism , RNA, Plant/metabolism , RNA, Viral/metabolism , Systems Integration , User-Computer InterfaceABSTRACT
Retrotransposition is one of the main factors responsible for gene duplication and thus genome evolution. However, the sequences that undergo this process are not only an excellent source of biological diversity, but in certain cases also pose a threat to the integrity of the DNA. One of the mechanisms that protects against the incorporation of mobile elements is the HUSH complex, which is responsible for silencing long, intronless, transcriptionally active transposed sequences that are rich in adenine on the sense strand. In this study, broad sets of human and porcine retrocopies were analysed with respect to the above factors, taking into account evolution of these molecules. Analysis of expression pattern, genomic structure, transcript length, and nucleotide substitution frequency showed the strong relationship between the expression level and exon length as well as the protective nature of introns. The results of the studies also showed that there is no direct correlation between the expression level and adenine content. However, protein-coding retrocopies, which have a lower adenine content, have a significantly higher expression level than the adenine-rich non-coding but expressed retrocopies. Therefore, although the mechanism of HUSH silencing may be an important part of the regulation of retrocopy expression, it is one component of a more complex molecular network that remains to be elucidated.
Subject(s)
Evolution, Molecular , Gene Silencing , Retroelements , Retroelements/genetics , Animals , Humans , Swine/genetics , Introns , ExonsABSTRACT
BACKGROUND: Machine learning techniques are known to be a powerful way of distinguishing microRNA hairpins from pseudo hairpins and have been applied in a number of recognised miRNA search tools. However, many current methods based on machine learning suffer from some drawbacks, including not addressing the class imbalance problem properly. It may lead to overlearning the majority class and/or incorrect assessment of classification performance. Moreover, those tools are effective for a narrow range of species, usually the model ones. This study aims at improving performance of miRNA classification procedure, extending its usability and reducing computational time. RESULTS: We present HuntMi, a stand-alone machine learning miRNA classification tool. We developed a novel method of dealing with the class imbalance problem called ROC-select, which is based on thresholding score function produced by traditional classifiers. We also introduced new features to the data representation. Several classification algorithms in combination with ROC-select were tested and random forest was selected for the best balance between sensitivity and specificity. Reliable assessment of classification performance is guaranteed by using large, strongly imbalanced, and taxon-specific datasets in 10-fold cross-validation procedure. As a result, HuntMi achieves a considerably better performance than any other miRNA classification tool and can be applied in miRNA search experiments in a wide range of species. CONCLUSIONS: Our results indicate that HuntMi represents an effective and flexible tool for identification of new microRNAs in animals, plants and viruses. ROC-select strategy proves to be superior to other methods of dealing with class imbalance problem and can possibly be used in other machine learning classification tasks. The HuntMi software as well as datasets used in the research are freely available at http://lemur.amu.edu.pl/share/HuntMi/.
Subject(s)
Artificial Intelligence , MicroRNAs/classification , RNA Precursors/classification , Algorithms , SoftwareABSTRACT
Splicing is one of the major contributors to observed spatiotemporal diversification of transcripts and proteins in metazoans. There are numerous factors that affect the process, but splice sites themselves along with the adjacent splicing signals are critical here. Unfortunately, there is still little known about splicing in plants and, consequently, further research in some fields of plant molecular biology will encounter difficulties. Keeping this in mind, we performed a large-scale analysis of splice sites in eight plant species, using novel algorithms and tools developed by us. The analyses included identification of orthologous splice sites, polypyrimidine tracts and branch sites. Additionally we identified putative intronic and exonic cis-regulatory motifs, U12 introns as well as splice sites in 45 microRNA genes in five plant species. We also provide experimental evidence for plant splice sites in the form of expressed sequence tag and RNA-Seq data. All the data are stored in a novel database called ERISdb and are freely available at http://lemur.amu.edu.pl/share/ERISdb/.