Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 61
Filter
1.
Cell ; 178(1): 107-121.e18, 2019 06 27.
Article in English | MEDLINE | ID: mdl-31251911

ABSTRACT

Increasing evidence suggests that transcriptional control and chromatin activities at large involve regulatory RNAs, which likely enlist specific RNA-binding proteins (RBPs). Although multiple RBPs have been implicated in transcription control, it has remained unclear how extensively RBPs directly act on chromatin. We embarked on a large-scale RBP ChIP-seq analysis, revealing widespread RBP presence in active chromatin regions in the human genome. Like transcription factors (TFs), RBPs also show strong preference for hotspots in the genome, particularly gene promoters, where their association is frequently linked to transcriptional output. Unsupervised clustering reveals extensive co-association between TFs and RBPs, as exemplified by YY1, a known RNA-dependent TF, and RBM25, an RBP involved in splicing regulation. Remarkably, RBM25 depletion attenuates all YY1-dependent activities, including chromatin binding, DNA looping, and transcription. We propose that various RBPs may enhance network interaction through harnessing regulatory RNAs to control transcription.


Subject(s)
Chromatin/metabolism , RNA-Binding Proteins/metabolism , RNA/metabolism , Transcription, Genetic/genetics , YY1 Transcription Factor/metabolism , Binding Sites , Gene Expression Regulation , Genome, Human/genetics , Hep G2 Cells , Humans , K562 Cells , Nuclear Proteins , Promoter Regions, Genetic/genetics , Protein Binding , RNA-Binding Proteins/genetics , RNA-Seq , Transcriptome , YY1 Transcription Factor/genetics
2.
Nucleic Acids Res ; 52(D1): D607-D621, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37757861

ABSTRACT

Liquid biopsy has emerged as a promising non-invasive approach for detecting, monitoring diseases, and predicting their recurrence. However, the effective utilization of liquid biopsy data to identify reliable biomarkers for various cancers and other diseases requires further exploration. Here, we present cfOmics, a web-accessible database (https://cfomics.ncRNAlab.org/) that integrates comprehensive multi-omics liquid biopsy data, including cfDNA, cfRNA based on next-generation sequencing, and proteome, metabolome based on mass-spectrometry data. As the first multi-omics database in the field, cfOmics encompasses a total of 17 distinct data types and 13 specimen variations across 69 disease conditions, with a collection of 11345 samples. Moreover, cfOmics includes reported potential biomarkers for reference. To facilitate effective analysis and visualization of multi-omics data, cfOmics offers powerful functionalities to its users. These functionalities include browsing, profile visualization, the Integrative Genomic Viewer, and correlation analysis, all centered around genes, microbes, or end-motifs. The primary objective of cfOmics is to assist researchers in the field of liquid biopsy by providing comprehensive multi-omics data. This enables them to explore cell-free data and extract profound insights that can significantly impact disease diagnosis, treatment monitoring, and management.


Subject(s)
Biomarkers , Databases, Factual , Disease , Multiomics , Neoplasms , Humans , Biomarkers/analysis , Genomics/methods , Neoplasms/chemistry , Neoplasms/genetics , Disease/genetics
3.
Bioinformatics ; 40(5)2024 May 02.
Article in English | MEDLINE | ID: mdl-38741230

ABSTRACT

MOTIVATION: Multi-omics data provide a comprehensive view of gene regulation at multiple levels, which is helpful in achieving accurate diagnosis of complex diseases like cancer. However, conventional integration methods rarely utilize prior biological knowledge and lack interpretability. RESULTS: To integrate various multi-omics data of tissue and liquid biopsies for disease diagnosis and prognosis, we developed a biological pathway informed Transformer, Pathformer. It embeds multi-omics input with a compacted multi-modal vector and a pathway-based sparse neural network. Pathformer also leverages criss-cross attention mechanism to capture the crosstalk between different pathways and modalities. We first benchmarked Pathformer with 18 comparable methods on multiple cancer datasets, where Pathformer outperformed all the other methods, with an average improvement of 6.3%-14.7% in F1 score for cancer survival prediction, 5.1%-12% for cancer stage prediction, and 8.1%-13.6% for cancer drug response prediction. Subsequently, for cancer prognosis prediction based on tissue multi-omics data, we used a case study to demonstrate the biological interpretability of Pathformer by identifying key pathways and their biological crosstalk. Then, for cancer early diagnosis based on liquid biopsy data, we used plasma and platelet datasets to demonstrate Pathformer's potential of clinical applications in cancer screening. Moreover, we revealed deregulation of interesting pathways (e.g. scavenger receptor pathway) and their crosstalk in cancer patients' blood, providing potential candidate targets for cancer microenvironment study. AVAILABILITY AND IMPLEMENTATION: Pathformer is implemented and freely available at https://github.com/lulab/Pathformer.


Subject(s)
Neoplasms , Humans , Prognosis , Neoplasms/metabolism , Neoplasms/diagnosis , Computational Biology/methods , Neural Networks, Computer , Algorithms , Multiomics
4.
Nucleic Acids Res ; 50(D1): D287-D294, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34403477

ABSTRACT

RNA-binding proteins (RBPs) play key roles in post-transcriptional regulation. Accurate identification of RBP binding sites in multiple cell lines and tissue types from diverse species is a fundamental endeavor towards understanding the regulatory mechanisms of RBPs under both physiological and pathological conditions. Our POSTAR annotation processes make use of publicly available large-scale CLIP-seq datasets and external functional genomic annotations to generate a comprehensive map of RBP binding sites and their association with other regulatory events as well as functional variants. Here, we present POSTAR3, an updated database with improvements in data collection, annotation infrastructure, and analysis that support the annotation of post-transcriptional regulation in multiple species including: we made a comprehensive update on the CLIP-seq and Ribo-seq datasets which cover more biological conditions, technologies, and species; we added RNA secondary structure profiling for RBP binding sites; we provided miRNA-mediated degradation events validated by degradome-seq; we included RBP binding sites at circRNA junction regions; we expanded the annotation of RBP binding sites, particularly using updated genomic variants and mutations associated with diseases. POSTAR3 is freely available at http://postar.ncrnalab.org.


Subject(s)
Databases, Genetic , MicroRNAs/genetics , RNA Processing, Post-Transcriptional , RNA, Circular/genetics , RNA-Binding Proteins/genetics , Software , Animals , Arabidopsis/genetics , Arabidopsis/metabolism , Binding Sites , Cell Line , Datasets as Topic , Humans , Internet , MicroRNAs/classification , MicroRNAs/metabolism , Molecular Sequence Annotation , Nucleic Acid Conformation , RNA, Circular/classification , RNA, Circular/metabolism , RNA-Binding Proteins/classification , RNA-Binding Proteins/metabolism , Sequence Analysis, RNA
5.
PLoS Genet ; 17(3): e1009355, 2021 03.
Article in English | MEDLINE | ID: mdl-33760820

ABSTRACT

Neurogenesis in the developing neocortex begins with the generation of the preplate, which consists of early-born neurons including Cajal-Retzius (CR) cells and subplate neurons. Here, utilizing the Ebf2-EGFP transgenic mouse in which EGFP initially labels the preplate neurons then persists in CR cells, we reveal the dynamic transcriptome profiles of early neurogenesis and CR cell differentiation. Genome-wide RNA-seq and ChIP-seq analyses at multiple early neurogenic stages have revealed the temporal gene expression dynamics of early neurogenesis and distinct histone modification patterns in early differentiating neurons. We have identified a new set of coding genes and lncRNAs involved in early neuronal differentiation and validated with functional assays in vitro and in vivo. In addition, at E15.5 when Ebf2-EGFP+ cells are mostly CR neurons, single-cell sequencing analysis of purified Ebf2-EGFP+ cells uncovers molecular heterogeneities in CR neurons, but without apparent clustering of cells with distinct regional origins. Along a pseudotemporal trajectory these cells are classified into three different developing states, revealing genetic cascades from early generic neuronal differentiation to late fate specification during the establishment of CR neuron identity and function. Our findings shed light on the molecular mechanisms governing the early differentiation steps during cortical development, especially CR neuron differentiation.


Subject(s)
Cell Differentiation , Genomics , Neurogenesis/genetics , Neurons/metabolism , Temporal Lobe/metabolism , Animals , Basic Helix-Loop-Helix Transcription Factors/metabolism , Biomarkers , Cell Differentiation/genetics , Cells, Cultured , Cerebral Cortex/metabolism , Gene Expression , Gene Expression Regulation , Genes, Reporter , Genetic Heterogeneity , Genomics/methods , Histones , Immunohistochemistry , Mice , Mice, Transgenic , Neurons/cytology , RNA, Long Noncoding/genetics , Single-Cell Analysis , Transcription Factors , Transcription Initiation Site
6.
Proc Natl Acad Sci U S A ; 117(32): 19487-19496, 2020 08 11.
Article in English | MEDLINE | ID: mdl-32723820

ABSTRACT

Alternative ribosome subunit proteins are prevalent in the genomes of diverse bacterial species, but their functional significance is controversial. Attempts to study microbial ribosomal heterogeneity have mostly relied on comparing wild-type strains with mutants in which subunits have been deleted, but this approach does not allow direct comparison of alternate ribosome isoforms isolated from identical cellular contexts. Here, by simultaneously purifying canonical and alternative RpsR ribosomes from Mycobacterium smegmatis, we show that alternative ribosomes have distinct translational features compared with their canonical counterparts. Both alternative and canonical ribosomes actively take part in protein synthesis, although they translate a subset of genes with differential efficiency as measured by ribosome profiling. We also show that alternative ribosomes have a relative defect in initiation complex formation. Furthermore, a strain of M. smegmatis in which the alternative ribosome protein operon is deleted grows poorly in iron-depleted medium, uncovering a role for alternative ribosomes in iron homeostasis. Our work confirms the distinct and nonredundant contribution of alternative bacterial ribosomes for adaptation to hostile environments.


Subject(s)
Bacterial Proteins/metabolism , Mycobacterium smegmatis/metabolism , Ribosomes/metabolism , Bacterial Proteins/genetics , Iron/metabolism , Mycobacterium smegmatis/genetics , Mycobacterium smegmatis/growth & development , Peptide Chain Initiation, Translational/genetics , Protein Biosynthesis , Ribosomal Proteins/genetics , Ribosomal Proteins/metabolism , Ribosome Subunits/metabolism
7.
Brief Bioinform ; 21(6): 2194-2205, 2020 12 01.
Article in English | MEDLINE | ID: mdl-31774912

ABSTRACT

The methodologies for evaluating similarities between gene expression profiles of different perturbagens are the key to understanding mechanisms of actions (MoAs) of unknown compounds and finding new indications for existing drugs. L1000-based next-generation Connectivity Map (CMap) data is more than a thousand-fold scale-up of the CMap pilot dataset. Although several systematic evaluations have been performed individually to assess the accuracy of the methodologies for the CMap pilot study, the performance of these methodologies needs to be re-evaluated for the L1000 data. Here, using the drug-drug similarities from the Drug Repurposing Hub database as a benchmark standard, we evaluated six popular published methods for the prediction performance of drug-drug relationships based on the partial area under the receiver operating characteristic (ROC) curve at false positive rates of 0.001, 0.005 and 0.01 (AUC0.001, AUC0.005 and AUC0.01). The similarity evaluating algorithm called ZhangScore was generally superior to other methods and exhibited the highest accuracy at the gene signature sizes ranging from 10 to 200. Further, we tested these methods with an experimentally derived gene signature related to estrogen in breast cancer cells, and the results confirmed that ZhangScore was more accurate than other methods. Moreover, based on scoring results of ZhangScore for the gene signature of TOP2A knockdown, in addition to well-known TOP2A inhibitors, we identified a number of potential inhibitors and at least two of them were the subject of previous investigation. Our studies provide potential guidelines for researchers to choose the suitable connectivity method. The six connectivity methods used in this report have been implemented in R package (https://github.com/Jasonlinchina/RCSM).


Subject(s)
Computational Biology , Drug Repositioning , Gene Expression Profiling , Algorithms , Computational Biology/methods , Databases, Factual , Gene Expression Profiling/methods , Pilot Projects , Transcriptome
8.
FASEB J ; 35(7): e21720, 2021 07.
Article in English | MEDLINE | ID: mdl-34110642

ABSTRACT

Methylation of circulating free DNA (CfDNA) has emerged as an efficient marker of tumor screening and prognostics. However, no efficient methylation marker has been developed for monitoring liver metastasis (LM) in colorectal cancer (CRC). Utilizing methylome profiling and bisulfite sequencing polymerase chain reaction of paired primary and LM sites, significantly increased methylation of TCHH was identified in the process of LM in CRC in the present study. Methylight analysis of TCHH methylation in CfDNA displayed a promisingly discriminative power between CRC with and without LM. Besides, significant coefficient of TCHH methylation and LM tumor volume was also validated. Together, these results indicated the potential of TCHH methylation in CfDNA as a monitoring marker of LM in CRC.


Subject(s)
Antigens/genetics , Biomarkers, Tumor/genetics , Cell-Free Nucleic Acids/genetics , Colorectal Neoplasms/genetics , DNA Methylation/genetics , DNA, Neoplasm/genetics , Intermediate Filament Proteins/genetics , Liver Neoplasms/genetics , Colorectal Neoplasms/pathology , Epigenome/genetics , Humans , Liver Neoplasms/pathology , Prognosis
9.
Brief Bioinform ; 20(4): 1420-1433, 2019 07 19.
Article in English | MEDLINE | ID: mdl-29415187

ABSTRACT

Circular RNAs (circRNAs) are emerging as a new class of endogenous and regulatory noncoding RNAs in latest years. With the widespread application of RNA sequencing (RNA-seq) technology and bioinformatics prediction, large numbers of circRNAs have been identified. However, at present, we lack a comprehensive characterization of all these circRNAs in interested samples. In this study, we integrated 87 935 circRNAs sequences that cover most of circRNAs identified till now represented in circBase to design microarray probes targeting back-splice site of each circRNA to profile expression of those circRNAs. By comparing the circRNA detection efficiency of RNA-seq with this circRNA microarray, we revealed that microarray is more efficient than RNA-seq for circRNA profiling. Then, we found ∼80 000 circRNAs were expressed in cervical tumors and matched normal tissues, and ∼25 000 of them were differently expressed. Notably, many of these circRNAs detected by this microarray can be validated by quantitative reverse transcription polymerase chain reaction (RT-qPCR) or RNA-seq. Strikingly, as many as ∼18 000 circRNAs could be robustly detected in cell-free plasma samples, and the expression of ∼2700 of them differed after surgery for tumor removal. Our findings provided a comprehensive and genome-wide characterization of circRNAs in paired normal tissues and tumors and plasma samples from multiple individuals. In addition, we also provide a rich resource with 41 microarray data sets and 10 RNA-seq data sets and strong evidences for circRNA expression in cervical cancer. In conclusion, circRNAs could be efficiently profiled by circRNA microarray to target their reported back-splice sites in interested samples.


Subject(s)
Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , RNA, Circular/genetics , Brain/metabolism , Computational Biology , Databases, Nucleic Acid/statistics & numerical data , Female , Gene Expression Profiling/statistics & numerical data , Humans , Neoplasms/blood , Neoplasms/genetics , Neoplasms/metabolism , Oligonucleotide Array Sequence Analysis/statistics & numerical data , RNA, Circular/blood , RNA, Circular/metabolism , RNA-Seq/methods , RNA-Seq/statistics & numerical data , Tissue Distribution , Uterine Cervical Neoplasms/blood , Uterine Cervical Neoplasms/genetics , Uterine Cervical Neoplasms/metabolism
10.
Nucleic Acids Res ; 47(D1): D203-D211, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30239819

ABSTRACT

Post-transcriptional regulation of RNAs is critical to the diverse range of cellular processes. The volume of functional genomic data focusing on post-transcriptional regulation logics continues to grow in recent years. In the current database version, POSTAR2 (http://lulab.life.tsinghua.edu.cn/postar), we included the following new features and data: updated ∼500 CLIP-seq datasets (∼1200 CLIP-seq datasets in total) from six species, including human, mouse, fly, worm, Arabidopsis and yeast; added a new module 'Translatome', which is derived from Ribo-seq datasets and contains ∼36 million open reading frames (ORFs) in the genomes from the six species; updated and unified post-transcriptional regulation and variation data. Finally, we improved web interfaces for searching and visualizing protein-RNA interactions with multi-layer information. Meanwhile, we also merged our CLIPdb database into POSTAR2. POSTAR2 will help researchers investigate the post-transcriptional regulatory logics coordinated by RNA-binding proteins and translational landscape of cellular RNAs.


Subject(s)
Computational Biology , Databases, Genetic , Gene Expression Regulation , RNA Processing, Post-Transcriptional , Animals , Binding Sites , Computational Biology/methods , Humans , Immunoprecipitation , Molecular Sequence Annotation , Open Reading Frames , Protein Binding , RNA-Binding Proteins/metabolism , Sequence Analysis, DNA , Web Browser
11.
J Exp Bot ; 71(19): 5837-5851, 2020 10 07.
Article in English | MEDLINE | ID: mdl-32969475

ABSTRACT

Signaling by the phytohormone abscisic acid (ABA) involves pre-mRNA splicing, a key process of post-transcriptional regulation of gene expression. However, the regulatory mechanism of alternative pre-mRNA splicing in ABA signaling remains largely unknown. We previously identified a pentatricopeptide repeat protein SOAR1 (suppressor of the ABAR-overexpressor 1) as a crucial player downstream of ABAR (putative ABA receptor) in ABA signaling. In this study, we identified a SOAR1 interaction partner USB1, which is an exoribonuclease catalyzing U6 production for spliceosome assembly. We reveal that together USB1 and SOAR1 negatively regulate ABA signaling in early seedling development. USB1 and SOAR1 are both required for the splicing of transcripts of numerous genes, including those involved in ABA signaling pathways, suggesting that USB1 and SOAR1 collaborate to regulate ABA signaling by affecting spliceosome assembly. These findings provide important new insights into the mechanistic control of alternative pre-mRNA splicing in the regulation of ABA-mediated plant responses to environmental cues.


Subject(s)
Arabidopsis Proteins , Arabidopsis , Abscisic Acid , Arabidopsis/genetics , Arabidopsis/metabolism , Arabidopsis Proteins/genetics , Arabidopsis Proteins/metabolism , Exoribonucleases/genetics , Gene Expression Regulation, Plant , Plant Growth Regulators
12.
Nucleic Acids Res ; 46(D1): D194-D201, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29040625

ABSTRACT

We present RISE (http://rise.zhanglab.net), a database of RNA Interactome from Sequencing Experiments. RNA-RNA interactions (RRIs) are essential for RNA regulation and function. RISE provides a comprehensive collection of RRIs that mainly come from recent transcriptome-wide sequencing-based experiments like PARIS, SPLASH, LIGR-seq, and MARIO, as well as targeted studies like RIA-seq, RAP-RNA and CLASH. It also includes interactions aggregated from other primary databases and publications. The RISE database currently contains 328,811 RNA-RNA interactions mainly in human, mouse and yeast. While most existing RNA databases mainly contain interactions of miRNA targeting, notably, more than half of the RRIs in RISE are among mRNA and long non-coding RNAs. We compared different RRI datasets in RISE and found limited overlaps in interactions resolved by different techniques and in different cell lines. It may suggest technology preference and also dynamic natures of RRIs. We also analyzed the basic features of the human and mouse RRI networks and found that they tend to be scale-free, small-world, hierarchical and modular. The analysis may nominate important RNAs or RRIs for further investigation. Finally, RISE provides a Circos plot and several table views for integrative visualization, with extensive molecular and functional annotations to facilitate exploration of biological functions for any RRI of interest.


Subject(s)
Databases, Nucleic Acid , Animals , Gene Regulatory Networks , High-Throughput Nucleotide Sequencing , Humans , Mice , Molecular Sequence Annotation , Protein Interaction Maps , RNA/genetics , RNA/metabolism , Sequence Analysis, RNA , Transcriptome , User-Computer Interface
13.
Plant J ; 93(5): 814-827, 2018 03.
Article in English | MEDLINE | ID: mdl-29265542

ABSTRACT

Recently, long non-coding RNAs (lncRNAs) have been demonstrated to be involved in many biological processes of plants; however, a systematic study on transcriptional and, in particular, post-transcriptional regulation of stress-responsive lncRNAs in Oryza sativa (rice) is lacking. We sequenced three types of RNA libraries (poly(A)+, poly(A)- and nuclear RNAs) under four abiotic stresses (cold, heat, drought and salt). Based on an integrative bioinformatics approach and ~200 high-throughput data sets, ~170 of which have been published, we revealed over 7000 lncRNAs, nearly half of which were identified for the first time. Notably, we found that the majority of the ~500 poly(A) lncRNAs that were differentially expressed under stress were significantly downregulated, but approximately 25% were found to have upregulated non-poly(A) forms. Moreover, hundreds of lncRNAs with downregulated polyadenylation (DPA) tend to be highly conserved, show significant nuclear retention and are co-expressed with protein-coding genes that function under stress. Remarkably, these DPA lncRNAs are significantly enriched in quantitative trait loci (QTLs) for stress tolerance or development, suggesting their potential important roles in rice growth under various stresses. In particular, we observed substantially accumulated DPA lncRNAs in plants exposed to drought and salt, which is consistent with the severe reduction of RNA 3'-end processing factors under these conditions. Taken together, the results of this study reveal that polyadenylation and subcellular localization of many rice lncRNAs are likely to be regulated at the post-transcriptional level. Our findings strongly suggest that many upregulated/downregulated lncRNAs previously identified by traditional RNA-seq analyses need to be carefully reviewed to assess the influence of post-transcriptional modification.


Subject(s)
Gene Expression Regulation, Plant , Oryza/genetics , RNA, Long Noncoding/metabolism , Stress, Physiological/genetics , Base Sequence , Cell Nucleus/genetics , Conserved Sequence , Down-Regulation , Droughts , Oryza/physiology , Plant Proteins/genetics , Plant Proteins/metabolism , Poly A/genetics , Poly A/metabolism , Polyadenylation , Quantitative Trait Loci , RNA, Long Noncoding/genetics , RNA, Plant/metabolism
14.
Genome Res ; 26(9): 1233-44, 2016 09.
Article in English | MEDLINE | ID: mdl-27516619

ABSTRACT

Long noncoding RNAs (lncRNAs), a recently discovered class of cellular RNAs, play important roles in the regulation of many cellular developmental processes. Although lncRNAs have been systematically identified in various systems, most of them have not been functionally characterized in vivo in animal models. In this study, we identified 128 testis-specific Drosophila lncRNAs and knocked out 105 of them using an optimized three-component CRISPR/Cas9 system. Among the lncRNA knockouts, 33 (31%) exhibited a partial or complete loss of male fertility, accompanied by visual developmental defects in late spermatogenesis. In addition, six knockouts were fully or partially rescued by transgenes in a trans configuration, indicating that those lncRNAs primarily work in trans Furthermore, gene expression profiles for five lncRNA mutants revealed that testis-specific lncRNAs regulate global gene expression, orchestrating late male germ cell differentiation. Compared with coding genes, the testis-specific lncRNAs evolved much faster. Moreover, lncRNAs of greater functional importance exhibited higher sequence conservation, suggesting that they are under constant evolutionary selection. Collectively, our results reveal critical functions of rapidly evolving testis-specific lncRNAs in late Drosophila spermatogenesis.


Subject(s)
Conserved Sequence/genetics , RNA, Long Noncoding/genetics , Spermatogenesis/genetics , Testis/growth & development , Animals , CRISPR-Cas Systems , Drosophila/genetics , Drosophila/growth & development , Gene Expression Regulation, Developmental , Germ Cells/growth & development , Infertility, Male/genetics , Infertility, Male/pathology , Male
15.
Nucleic Acids Res ; 45(1): e2, 2017 01 09.
Article in English | MEDLINE | ID: mdl-27608726

ABSTRACT

Recent genomic studies suggest that novel long non-coding RNAs (lncRNAs) are specifically expressed and far outnumber annotated lncRNA sequences. To identify and characterize novel lncRNAs in RNA sequencing data from new samples, we have developed COME, a coding potential calculation tool based on multiple features. It integrates multiple sequence-derived and experiment-based features using a decompose-compose method, which makes it more accurate and robust than other well-known tools. We also showed that COME was able to substantially improve the consistency of predication results from other coding potential calculators. Moreover, COME annotates and characterizes each predicted lncRNA transcript with multiple lines of supporting evidence, which are not provided by other tools. Remarkably, we found that one subgroup of lncRNAs classified by such supporting features (i.e. conserved local RNA secondary structure) was highly enriched in a well-validated database (lncRNAdb). We further found that the conserved structural domains on lncRNAs had better chance than other RNA regions to interact with RNA binding proteins, based on the recent eCLIP-seq data in human, indicating their potential regulatory roles. Overall, we present COME as an accurate, robust and multiple-feature supported method for the identification and characterization of novel lncRNAs. The software implementation is available at https://github.com/lulab/COME.


Subject(s)
Molecular Sequence Annotation , RNA, Long Noncoding/genetics , RNA-Binding Proteins/genetics , Software , Animals , Arabidopsis/genetics , Base Sequence , Binding Sites , Caenorhabditis elegans/genetics , Computer Graphics , Databases, Nucleic Acid , Drosophila melanogaster/genetics , Genome , Humans , Internet , Mice , Nucleic Acid Conformation , Protein Binding , RNA, Long Noncoding/classification , RNA, Long Noncoding/metabolism , RNA-Binding Proteins/metabolism , Sequence Analysis, RNA
16.
Nucleic Acids Res ; 45(4): 1657-1672, 2017 02 28.
Article in English | MEDLINE | ID: mdl-27980097

ABSTRACT

Distinguishing cell states based only on gene expression data remains a challenging task. This is true even for analyses within a species. In cross-species comparisons, the results obtained by different groups have varied widely. Here, we integrate RNA-seq data from more than 40 cell and tissue types of four mammalian species to identify sets of associated genes as indicators for specific cell states in each species. We employ a statistical method, TROM, to identify both protein-coding and non-coding indicators. Next, we map the cell states within each species and also between species using these indicator genes. We recapitulate known phenotypic similarity between related cell and tissue types and reveal molecular basis for their similarity. We also report novel associations between several tissues and cell types with functional support. Moreover, our identified conserved associated genes are found to be a good resource for studying cell differentiation and reprogramming. Lastly, long non-coding RNAs can serve well as associated genes to indicate cell states. We further infer the biological functions of those non-coding associated genes based on their co-expressed protein-coding genes. This study demonstrates that combining statistical modeling with public RNA-seq data can be powerful for improving our understanding of cell identity control.


Subject(s)
Contig Mapping , Evolution, Molecular , Gene Expression Profiling , Gene Expression Regulation , Mammals/genetics , Transcriptome , Algorithms , Animals , Cluster Analysis , Computational Biology/methods , Gene Expression Regulation, Developmental , Gene Ontology , High-Throughput Nucleotide Sequencing , Humans , Mice , Molecular Sequence Annotation , Multigene Family , Organ Specificity
17.
Nucleic Acids Res ; 45(D1): D104-D114, 2017 01 04.
Article in English | MEDLINE | ID: mdl-28053162

ABSTRACT

We present POSTAR (http://POSTAR.ncrnalab.org), a resource of POST-trAnscriptional Regulation coordinated by RNA-binding proteins (RBPs). Precise characterization of post-transcriptional regulatory maps has accelerated dramatically in the past few years. Based on new studies and resources, POSTAR supplies the largest collection of experimentally probed (∼23 million) and computationally predicted (approximately 117 million) RBP binding sites in the human and mouse transcriptomes. POSTAR annotates every transcript and its RBP binding sites using extensive information regarding various molecular regulatory events (e.g., splicing, editing, and modification), RNA secondary structures, disease-associated variants, and gene expression and function. Moreover, POSTAR provides a friendly, multi-mode, integrated search interface, which helps users to connect multiple RBP binding sites with post-transcriptional regulatory events, phenotypes, and diseases. Based on our platform, we were able to obtain novel insights into post-transcriptional regulation, such as the putative association between CPSF6 binding, RNA structural domains, and Li-Fraumeni syndrome SNPs. In summary, POSTAR represents an early effort to systematically annotate post-transcriptional regulatory maps and explore the putative roles of RBPs in human diseases.


Subject(s)
Databases, Genetic , RNA Processing, Post-Transcriptional , RNA-Binding Proteins/metabolism , RNA/chemistry , RNA/metabolism , Alternative Splicing , Animals , Binding Sites , Disease/genetics , Gene Ontology , Humans , Mice , MicroRNAs/metabolism , Molecular Sequence Annotation , Nucleic Acid Conformation , Polymorphism, Single Nucleotide
18.
Brief Bioinform ; 17(6): 1032-1043, 2016 11.
Article in English | MEDLINE | ID: mdl-26655457

ABSTRACT

High-throughput sequencing has been used to study posttranscriptional regulations, where the identification of protein-RNA binding is a major and fast-developing sub-area, which is in turn benefited by the sequencing methods for whole-transcriptome probing of RNA secondary structures. In the study of RNA secondary structures using high-throughput sequencing, bases are modified or cleaved according to their structural features, which alter the resulting composition of sequencing reads. In the study of protein-RNA binding, methods have been proposed to immuno-precipitate (IP) protein-bound RNA transcripts in vitro or in vivo By sequencing these transcripts, the protein-RNA interactions and the binding locations can be identified. For both types of data, read counts are affected by a combination of confounding factors, including expression levels of transcripts, sequence biases, mapping errors and the probing or IP efficiency of the experimental protocols. Careful processing of the sequencing data and proper extraction of important features are fundamentally important to a successful analysis. Here we review and compare different experimental methods for probing RNA secondary structures and binding sites of RNA-binding proteins (RBPs), and the computational methods proposed for analyzing the corresponding sequencing data. We suggest how these two types of data should be integrated to study the structural properties of RBP binding sites as a systematic way to better understand posttranscriptional regulations.


Subject(s)
RNA/chemistry , Nucleic Acid Conformation , Protein Binding , Protein Structure, Secondary , RNA-Binding Proteins , Transcriptome
19.
Nucleic Acids Res ; 44(W1): W294-301, 2016 07 08.
Article in English | MEDLINE | ID: mdl-27137891

ABSTRACT

Several high-throughput technologies have been developed to probe RNA base pairs and loops at the transcriptome level in multiple species. However, to obtain the final RNA secondary structure, extensive effort and considerable expertise is required to statistically process the probing data and combine them with free energy models. Therefore, we developed an RNA secondary structure prediction server that is enhanced by experimental data (RNAex). RNAex is a web interface that enables non-specialists to easily access cutting-edge structure-probing data and predict RNA secondary structures enhanced by in vivo and in vitro data. RNAex annotates the RNA editing, RNA modification and SNP sites on the predicted structures. It provides four structure-folding methods, restrained MaxExpect, SeqFold, RNAstructure (Fold) and RNAfold that can be selected by the user. The performance of these four folding methods has been verified by previous publications on known structures. We re-mapped the raw sequencing data of the probing experiments to the whole genome for each species. RNAex thus enables users to predict secondary structures for both known and novel RNA transcripts in human, mouse, yeast and Arabidopsis The RNAex web server is available at http://RNAex.ncrnalab.org/.


Subject(s)
Nucleic Acid Conformation , Polymorphism, Single Nucleotide , RNA/chemistry , Transcriptome , User-Computer Interface , Animals , Arabidopsis/genetics , Arabidopsis/metabolism , Base Pairing , Computer Graphics , High-Throughput Screening Assays , Humans , Internet , Mice , Molecular Sequence Annotation , RNA/genetics , RNA Editing , RNA Folding , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Thermodynamics
20.
Plant Physiol ; 171(4): 2841-53, 2016 08.
Article in English | MEDLINE | ID: mdl-27329222

ABSTRACT

Induction and secretion of acid phosphatases (APases) is an adaptive response that plants use to cope with P (Pi) deficiency in their environment. The molecular mechanism that regulates this response, however, is poorly understood. In this work, we identified an Arabidopsis (Arabidopsis thaliana) mutant, hps8, which exhibits enhanced APase activity on its root surface (also called root-associated APase activity). Our molecular and genetic analyses indicate that this altered Pi response results from a mutation in the AtTHO1 gene that encodes a subunit of the THO/TREX protein complex. The mutation in another subunit of this complex, AtTHO3, also enhances root-associated APase activity under Pi starvation. In Arabidopsis, the THO/TREX complex functions in mRNA export and miRNA biogenesis. When treated with Ag(+), an inhibitor of ethylene perception, the enhanced root-associated APase activity in hps8 is largely reversed. hpr1-5 is another mutant allele of AtTHO1 and shows similar phenotypes as hps8 ein2 is completely insensitive to ethylene. In the hpr1-5ein2 double mutant, the enhanced root-associated APase activity is also greatly suppressed. These results indicate that the THO/TREX complex in Arabidopsis negatively regulates root-associated APase activity induced by Pi starvation by inhibiting ethylene signaling. In addition, we found that the miRNA399-PHO2 pathway is also involved in the regulation of root-associated APase activity induced by Pi starvation. These results provide insight into the molecular mechanism underlying the adaptive response of plants to Pi starvation.


Subject(s)
Acid Phosphatase/metabolism , Arabidopsis Proteins/metabolism , Arabidopsis/genetics , MicroRNAs/biosynthesis , Multiprotein Complexes/metabolism , Phosphates/deficiency , Plant Roots/enzymology , Plant Roots/genetics , Acid Phosphatase/genetics , Arabidopsis/drug effects , Arabidopsis/enzymology , Arabidopsis Proteins/genetics , Cell Nucleus/drug effects , Cell Nucleus/metabolism , Ethylenes/metabolism , Gene Expression Profiling , Gene Expression Regulation, Plant/drug effects , Mutation/genetics , Phenotype , Phosphates/pharmacology , Plant Roots/anatomy & histology , Protein Subunits/metabolism , Real-Time Polymerase Chain Reaction , Signal Transduction/drug effects
SELECTION OF CITATIONS
SEARCH DETAIL