Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 62
Filtrar
1.
Cell ; 178(1): 107-121.e18, 2019 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-31251911

RESUMO

Increasing evidence suggests that transcriptional control and chromatin activities at large involve regulatory RNAs, which likely enlist specific RNA-binding proteins (RBPs). Although multiple RBPs have been implicated in transcription control, it has remained unclear how extensively RBPs directly act on chromatin. We embarked on a large-scale RBP ChIP-seq analysis, revealing widespread RBP presence in active chromatin regions in the human genome. Like transcription factors (TFs), RBPs also show strong preference for hotspots in the genome, particularly gene promoters, where their association is frequently linked to transcriptional output. Unsupervised clustering reveals extensive co-association between TFs and RBPs, as exemplified by YY1, a known RNA-dependent TF, and RBM25, an RBP involved in splicing regulation. Remarkably, RBM25 depletion attenuates all YY1-dependent activities, including chromatin binding, DNA looping, and transcription. We propose that various RBPs may enhance network interaction through harnessing regulatory RNAs to control transcription.


Assuntos
Cromatina/metabolismo , Proteínas de Ligação a RNA/metabolismo , RNA/metabolismo , Transcrição Gênica/genética , Fator de Transcrição YY1/metabolismo , Sítios de Ligação , Regulação da Expressão Gênica , Genoma Humano/genética , Células Hep G2 , Humanos , Células K562 , Proteínas Nucleares , Regiões Promotoras Genéticas/genética , Ligação Proteica , Proteínas de Ligação a RNA/genética , RNA-Seq , Transcriptoma , Fator de Transcrição YY1/genética
2.
Nucleic Acids Res ; 52(D1): D607-D621, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37757861

RESUMO

Liquid biopsy has emerged as a promising non-invasive approach for detecting, monitoring diseases, and predicting their recurrence. However, the effective utilization of liquid biopsy data to identify reliable biomarkers for various cancers and other diseases requires further exploration. Here, we present cfOmics, a web-accessible database (https://cfomics.ncRNAlab.org/) that integrates comprehensive multi-omics liquid biopsy data, including cfDNA, cfRNA based on next-generation sequencing, and proteome, metabolome based on mass-spectrometry data. As the first multi-omics database in the field, cfOmics encompasses a total of 17 distinct data types and 13 specimen variations across 69 disease conditions, with a collection of 11345 samples. Moreover, cfOmics includes reported potential biomarkers for reference. To facilitate effective analysis and visualization of multi-omics data, cfOmics offers powerful functionalities to its users. These functionalities include browsing, profile visualization, the Integrative Genomic Viewer, and correlation analysis, all centered around genes, microbes, or end-motifs. The primary objective of cfOmics is to assist researchers in the field of liquid biopsy by providing comprehensive multi-omics data. This enables them to explore cell-free data and extract profound insights that can significantly impact disease diagnosis, treatment monitoring, and management.


Assuntos
Biomarcadores , Bases de Dados Factuais , Doença , Multiômica , Neoplasias , Humanos , Biomarcadores/análise , Genômica/métodos , Neoplasias/química , Neoplasias/genética , Doença/genética
3.
Bioinformatics ; 2024 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-39321261

RESUMO

MOTIVATION: RNA interference (RNAi) has become a widely used experimental approach for post-transcriptional regulation and is increasingly showing its potential as future targeted drugs. However, the prediction of highly efficient siRNAs (small interfering RNA) is still hindered by dataset biases, the inadequacy of prediction methods, and the presence of off-target effects. To overcome these limitations, we propose an accurate and robust prediction method, OligoFormer, for siRNA design. RESULTS: OligoFormer comprises three different modules including thermodynamic calculation, RNA-FM module, and Oligo encoder. Oligo encoder is the core module based on the transformer encoder. Taking siRNA and mRNA sequences as input, OligoFormer can obtain thermodynamic parameters, RNA-FM embedding, and Oligo embedding through these three modules, respectively. We carefully benchmarked OligoFormer against 6 comparable methods on siRNA efficacy datasets. OligoFormer outperforms all the other methods, with an average improvement of 9% in AUC, 6.6% in PRC, 9.8% in F1 score, and 5.1% in PCC compared to the best method among them in our inter-dataset validation. We also provide a comprehensive pipeline with prediction of siRNA efficacy and off-target effects using PITA score and TargetScan score. The ablation study shows RNA-FM module and thermodynamic parameters improved the performance and accelerated convergence of OligoFormer. The saliency maps by gradient backpropagation and base preference maps show certain base preferences in initial and terminal region of siRNAs. AVAILABILITY AND IMPLEMENTATION: The source code of OligoFormer is freely available on GitHub at: Https://github.com/lulab/OligoFormer. Docker image of OligoFormer is freely available on the docker hub at https://hub.docker.com/r/yilanbai/oligoformer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Bioinformatics ; 40(5)2024 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-38741230

RESUMO

MOTIVATION: Multi-omics data provide a comprehensive view of gene regulation at multiple levels, which is helpful in achieving accurate diagnosis of complex diseases like cancer. However, conventional integration methods rarely utilize prior biological knowledge and lack interpretability. RESULTS: To integrate various multi-omics data of tissue and liquid biopsies for disease diagnosis and prognosis, we developed a biological pathway informed Transformer, Pathformer. It embeds multi-omics input with a compacted multi-modal vector and a pathway-based sparse neural network. Pathformer also leverages criss-cross attention mechanism to capture the crosstalk between different pathways and modalities. We first benchmarked Pathformer with 18 comparable methods on multiple cancer datasets, where Pathformer outperformed all the other methods, with an average improvement of 6.3%-14.7% in F1 score for cancer survival prediction, 5.1%-12% for cancer stage prediction, and 8.1%-13.6% for cancer drug response prediction. Subsequently, for cancer prognosis prediction based on tissue multi-omics data, we used a case study to demonstrate the biological interpretability of Pathformer by identifying key pathways and their biological crosstalk. Then, for cancer early diagnosis based on liquid biopsy data, we used plasma and platelet datasets to demonstrate Pathformer's potential of clinical applications in cancer screening. Moreover, we revealed deregulation of interesting pathways (e.g. scavenger receptor pathway) and their crosstalk in cancer patients' blood, providing potential candidate targets for cancer microenvironment study. AVAILABILITY AND IMPLEMENTATION: Pathformer is implemented and freely available at https://github.com/lulab/Pathformer.


Assuntos
Neoplasias , Humanos , Prognóstico , Neoplasias/metabolismo , Neoplasias/diagnóstico , Biologia Computacional/métodos , Redes Neurais de Computação , Algoritmos , Multiômica
5.
Nucleic Acids Res ; 50(D1): D287-D294, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34403477

RESUMO

RNA-binding proteins (RBPs) play key roles in post-transcriptional regulation. Accurate identification of RBP binding sites in multiple cell lines and tissue types from diverse species is a fundamental endeavor towards understanding the regulatory mechanisms of RBPs under both physiological and pathological conditions. Our POSTAR annotation processes make use of publicly available large-scale CLIP-seq datasets and external functional genomic annotations to generate a comprehensive map of RBP binding sites and their association with other regulatory events as well as functional variants. Here, we present POSTAR3, an updated database with improvements in data collection, annotation infrastructure, and analysis that support the annotation of post-transcriptional regulation in multiple species including: we made a comprehensive update on the CLIP-seq and Ribo-seq datasets which cover more biological conditions, technologies, and species; we added RNA secondary structure profiling for RBP binding sites; we provided miRNA-mediated degradation events validated by degradome-seq; we included RBP binding sites at circRNA junction regions; we expanded the annotation of RBP binding sites, particularly using updated genomic variants and mutations associated with diseases. POSTAR3 is freely available at http://postar.ncrnalab.org.


Assuntos
Bases de Dados Genéticas , MicroRNAs/genética , Processamento Pós-Transcricional do RNA , RNA Circular/genética , Proteínas de Ligação a RNA/genética , Software , Animais , Arabidopsis/genética , Arabidopsis/metabolismo , Sítios de Ligação , Linhagem Celular , Conjuntos de Dados como Assunto , Humanos , Internet , MicroRNAs/classificação , MicroRNAs/metabolismo , Anotação de Sequência Molecular , Conformação de Ácido Nucleico , RNA Circular/classificação , RNA Circular/metabolismo , Proteínas de Ligação a RNA/classificação , Proteínas de Ligação a RNA/metabolismo , Análise de Sequência de RNA
6.
PLoS Genet ; 17(3): e1009355, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33760820

RESUMO

Neurogenesis in the developing neocortex begins with the generation of the preplate, which consists of early-born neurons including Cajal-Retzius (CR) cells and subplate neurons. Here, utilizing the Ebf2-EGFP transgenic mouse in which EGFP initially labels the preplate neurons then persists in CR cells, we reveal the dynamic transcriptome profiles of early neurogenesis and CR cell differentiation. Genome-wide RNA-seq and ChIP-seq analyses at multiple early neurogenic stages have revealed the temporal gene expression dynamics of early neurogenesis and distinct histone modification patterns in early differentiating neurons. We have identified a new set of coding genes and lncRNAs involved in early neuronal differentiation and validated with functional assays in vitro and in vivo. In addition, at E15.5 when Ebf2-EGFP+ cells are mostly CR neurons, single-cell sequencing analysis of purified Ebf2-EGFP+ cells uncovers molecular heterogeneities in CR neurons, but without apparent clustering of cells with distinct regional origins. Along a pseudotemporal trajectory these cells are classified into three different developing states, revealing genetic cascades from early generic neuronal differentiation to late fate specification during the establishment of CR neuron identity and function. Our findings shed light on the molecular mechanisms governing the early differentiation steps during cortical development, especially CR neuron differentiation.


Assuntos
Diferenciação Celular , Genômica , Neurogênese/genética , Neurônios/metabolismo , Lobo Temporal/metabolismo , Animais , Fatores de Transcrição Hélice-Alça-Hélice Básicos/metabolismo , Biomarcadores , Diferenciação Celular/genética , Células Cultivadas , Córtex Cerebral/metabolismo , Expressão Gênica , Regulação da Expressão Gênica , Genes Reporter , Heterogeneidade Genética , Genômica/métodos , Histonas , Imuno-Histoquímica , Camundongos , Camundongos Transgênicos , Neurônios/citologia , RNA Longo não Codificante/genética , Análise de Célula Única , Fatores de Transcrição , Sítio de Iniciação de Transcrição
7.
Proc Natl Acad Sci U S A ; 117(32): 19487-19496, 2020 08 11.
Artigo em Inglês | MEDLINE | ID: mdl-32723820

RESUMO

Alternative ribosome subunit proteins are prevalent in the genomes of diverse bacterial species, but their functional significance is controversial. Attempts to study microbial ribosomal heterogeneity have mostly relied on comparing wild-type strains with mutants in which subunits have been deleted, but this approach does not allow direct comparison of alternate ribosome isoforms isolated from identical cellular contexts. Here, by simultaneously purifying canonical and alternative RpsR ribosomes from Mycobacterium smegmatis, we show that alternative ribosomes have distinct translational features compared with their canonical counterparts. Both alternative and canonical ribosomes actively take part in protein synthesis, although they translate a subset of genes with differential efficiency as measured by ribosome profiling. We also show that alternative ribosomes have a relative defect in initiation complex formation. Furthermore, a strain of M. smegmatis in which the alternative ribosome protein operon is deleted grows poorly in iron-depleted medium, uncovering a role for alternative ribosomes in iron homeostasis. Our work confirms the distinct and nonredundant contribution of alternative bacterial ribosomes for adaptation to hostile environments.


Assuntos
Proteínas de Bactérias/metabolismo , Mycobacterium smegmatis/metabolismo , Ribossomos/metabolismo , Proteínas de Bactérias/genética , Ferro/metabolismo , Mycobacterium smegmatis/genética , Mycobacterium smegmatis/crescimento & desenvolvimento , Iniciação Traducional da Cadeia Peptídica/genética , Biossíntese de Proteínas , Proteínas Ribossômicas/genética , Proteínas Ribossômicas/metabolismo , Subunidades Ribossômicas/metabolismo
8.
Brief Bioinform ; 21(6): 2194-2205, 2020 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31774912

RESUMO

The methodologies for evaluating similarities between gene expression profiles of different perturbagens are the key to understanding mechanisms of actions (MoAs) of unknown compounds and finding new indications for existing drugs. L1000-based next-generation Connectivity Map (CMap) data is more than a thousand-fold scale-up of the CMap pilot dataset. Although several systematic evaluations have been performed individually to assess the accuracy of the methodologies for the CMap pilot study, the performance of these methodologies needs to be re-evaluated for the L1000 data. Here, using the drug-drug similarities from the Drug Repurposing Hub database as a benchmark standard, we evaluated six popular published methods for the prediction performance of drug-drug relationships based on the partial area under the receiver operating characteristic (ROC) curve at false positive rates of 0.001, 0.005 and 0.01 (AUC0.001, AUC0.005 and AUC0.01). The similarity evaluating algorithm called ZhangScore was generally superior to other methods and exhibited the highest accuracy at the gene signature sizes ranging from 10 to 200. Further, we tested these methods with an experimentally derived gene signature related to estrogen in breast cancer cells, and the results confirmed that ZhangScore was more accurate than other methods. Moreover, based on scoring results of ZhangScore for the gene signature of TOP2A knockdown, in addition to well-known TOP2A inhibitors, we identified a number of potential inhibitors and at least two of them were the subject of previous investigation. Our studies provide potential guidelines for researchers to choose the suitable connectivity method. The six connectivity methods used in this report have been implemented in R package (https://github.com/Jasonlinchina/RCSM).


Assuntos
Biologia Computacional , Reposicionamento de Medicamentos , Perfilação da Expressão Gênica , Algoritmos , Biologia Computacional/métodos , Bases de Dados Factuais , Perfilação da Expressão Gênica/métodos , Projetos Piloto , Transcriptoma
9.
FASEB J ; 35(7): e21720, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34110642

RESUMO

Methylation of circulating free DNA (CfDNA) has emerged as an efficient marker of tumor screening and prognostics. However, no efficient methylation marker has been developed for monitoring liver metastasis (LM) in colorectal cancer (CRC). Utilizing methylome profiling and bisulfite sequencing polymerase chain reaction of paired primary and LM sites, significantly increased methylation of TCHH was identified in the process of LM in CRC in the present study. Methylight analysis of TCHH methylation in CfDNA displayed a promisingly discriminative power between CRC with and without LM. Besides, significant coefficient of TCHH methylation and LM tumor volume was also validated. Together, these results indicated the potential of TCHH methylation in CfDNA as a monitoring marker of LM in CRC.


Assuntos
Antígenos/genética , Biomarcadores Tumorais/genética , Ácidos Nucleicos Livres/genética , Neoplasias Colorretais/genética , Metilação de DNA/genética , DNA de Neoplasias/genética , Proteínas de Filamentos Intermediários/genética , Neoplasias Hepáticas/genética , Neoplasias Colorretais/patologia , Epigenoma/genética , Humanos , Neoplasias Hepáticas/patologia , Prognóstico
10.
Brief Bioinform ; 20(4): 1420-1433, 2019 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-29415187

RESUMO

Circular RNAs (circRNAs) are emerging as a new class of endogenous and regulatory noncoding RNAs in latest years. With the widespread application of RNA sequencing (RNA-seq) technology and bioinformatics prediction, large numbers of circRNAs have been identified. However, at present, we lack a comprehensive characterization of all these circRNAs in interested samples. In this study, we integrated 87 935 circRNAs sequences that cover most of circRNAs identified till now represented in circBase to design microarray probes targeting back-splice site of each circRNA to profile expression of those circRNAs. By comparing the circRNA detection efficiency of RNA-seq with this circRNA microarray, we revealed that microarray is more efficient than RNA-seq for circRNA profiling. Then, we found ∼80 000 circRNAs were expressed in cervical tumors and matched normal tissues, and ∼25 000 of them were differently expressed. Notably, many of these circRNAs detected by this microarray can be validated by quantitative reverse transcription polymerase chain reaction (RT-qPCR) or RNA-seq. Strikingly, as many as ∼18 000 circRNAs could be robustly detected in cell-free plasma samples, and the expression of ∼2700 of them differed after surgery for tumor removal. Our findings provided a comprehensive and genome-wide characterization of circRNAs in paired normal tissues and tumors and plasma samples from multiple individuals. In addition, we also provide a rich resource with 41 microarray data sets and 10 RNA-seq data sets and strong evidences for circRNA expression in cervical cancer. In conclusion, circRNAs could be efficiently profiled by circRNA microarray to target their reported back-splice sites in interested samples.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , RNA Circular/genética , Encéfalo/metabolismo , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Feminino , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos , Neoplasias/sangue , Neoplasias/genética , Neoplasias/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , RNA Circular/sangue , RNA Circular/metabolismo , RNA-Seq/métodos , RNA-Seq/estatística & dados numéricos , Distribuição Tecidual , Neoplasias do Colo do Útero/sangue , Neoplasias do Colo do Útero/genética , Neoplasias do Colo do Útero/metabolismo
11.
Nucleic Acids Res ; 47(D1): D203-D211, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30239819

RESUMO

Post-transcriptional regulation of RNAs is critical to the diverse range of cellular processes. The volume of functional genomic data focusing on post-transcriptional regulation logics continues to grow in recent years. In the current database version, POSTAR2 (http://lulab.life.tsinghua.edu.cn/postar), we included the following new features and data: updated ∼500 CLIP-seq datasets (∼1200 CLIP-seq datasets in total) from six species, including human, mouse, fly, worm, Arabidopsis and yeast; added a new module 'Translatome', which is derived from Ribo-seq datasets and contains ∼36 million open reading frames (ORFs) in the genomes from the six species; updated and unified post-transcriptional regulation and variation data. Finally, we improved web interfaces for searching and visualizing protein-RNA interactions with multi-layer information. Meanwhile, we also merged our CLIPdb database into POSTAR2. POSTAR2 will help researchers investigate the post-transcriptional regulatory logics coordinated by RNA-binding proteins and translational landscape of cellular RNAs.


Assuntos
Biologia Computacional , Bases de Dados Genéticas , Regulação da Expressão Gênica , Processamento Pós-Transcricional do RNA , Animais , Sítios de Ligação , Biologia Computacional/métodos , Humanos , Imunoprecipitação , Anotação de Sequência Molecular , Fases de Leitura Aberta , Ligação Proteica , Proteínas de Ligação a RNA/metabolismo , Análise de Sequência de DNA , Navegador
12.
J Exp Bot ; 71(19): 5837-5851, 2020 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-32969475

RESUMO

Signaling by the phytohormone abscisic acid (ABA) involves pre-mRNA splicing, a key process of post-transcriptional regulation of gene expression. However, the regulatory mechanism of alternative pre-mRNA splicing in ABA signaling remains largely unknown. We previously identified a pentatricopeptide repeat protein SOAR1 (suppressor of the ABAR-overexpressor 1) as a crucial player downstream of ABAR (putative ABA receptor) in ABA signaling. In this study, we identified a SOAR1 interaction partner USB1, which is an exoribonuclease catalyzing U6 production for spliceosome assembly. We reveal that together USB1 and SOAR1 negatively regulate ABA signaling in early seedling development. USB1 and SOAR1 are both required for the splicing of transcripts of numerous genes, including those involved in ABA signaling pathways, suggesting that USB1 and SOAR1 collaborate to regulate ABA signaling by affecting spliceosome assembly. These findings provide important new insights into the mechanistic control of alternative pre-mRNA splicing in the regulation of ABA-mediated plant responses to environmental cues.


Assuntos
Proteínas de Arabidopsis , Arabidopsis , Ácido Abscísico , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Exorribonucleases/genética , Regulação da Expressão Gênica de Plantas , Reguladores de Crescimento de Plantas
13.
Nucleic Acids Res ; 46(D1): D194-D201, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29040625

RESUMO

We present RISE (http://rise.zhanglab.net), a database of RNA Interactome from Sequencing Experiments. RNA-RNA interactions (RRIs) are essential for RNA regulation and function. RISE provides a comprehensive collection of RRIs that mainly come from recent transcriptome-wide sequencing-based experiments like PARIS, SPLASH, LIGR-seq, and MARIO, as well as targeted studies like RIA-seq, RAP-RNA and CLASH. It also includes interactions aggregated from other primary databases and publications. The RISE database currently contains 328,811 RNA-RNA interactions mainly in human, mouse and yeast. While most existing RNA databases mainly contain interactions of miRNA targeting, notably, more than half of the RRIs in RISE are among mRNA and long non-coding RNAs. We compared different RRI datasets in RISE and found limited overlaps in interactions resolved by different techniques and in different cell lines. It may suggest technology preference and also dynamic natures of RRIs. We also analyzed the basic features of the human and mouse RRI networks and found that they tend to be scale-free, small-world, hierarchical and modular. The analysis may nominate important RNAs or RRIs for further investigation. Finally, RISE provides a Circos plot and several table views for integrative visualization, with extensive molecular and functional annotations to facilitate exploration of biological functions for any RRI of interest.


Assuntos
Bases de Dados de Ácidos Nucleicos , Animais , Redes Reguladoras de Genes , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Camundongos , Anotação de Sequência Molecular , Mapas de Interação de Proteínas , RNA/genética , RNA/metabolismo , Análise de Sequência de RNA , Transcriptoma , Interface Usuário-Computador
14.
Plant J ; 93(5): 814-827, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29265542

RESUMO

Recently, long non-coding RNAs (lncRNAs) have been demonstrated to be involved in many biological processes of plants; however, a systematic study on transcriptional and, in particular, post-transcriptional regulation of stress-responsive lncRNAs in Oryza sativa (rice) is lacking. We sequenced three types of RNA libraries (poly(A)+, poly(A)- and nuclear RNAs) under four abiotic stresses (cold, heat, drought and salt). Based on an integrative bioinformatics approach and ~200 high-throughput data sets, ~170 of which have been published, we revealed over 7000 lncRNAs, nearly half of which were identified for the first time. Notably, we found that the majority of the ~500 poly(A) lncRNAs that were differentially expressed under stress were significantly downregulated, but approximately 25% were found to have upregulated non-poly(A) forms. Moreover, hundreds of lncRNAs with downregulated polyadenylation (DPA) tend to be highly conserved, show significant nuclear retention and are co-expressed with protein-coding genes that function under stress. Remarkably, these DPA lncRNAs are significantly enriched in quantitative trait loci (QTLs) for stress tolerance or development, suggesting their potential important roles in rice growth under various stresses. In particular, we observed substantially accumulated DPA lncRNAs in plants exposed to drought and salt, which is consistent with the severe reduction of RNA 3'-end processing factors under these conditions. Taken together, the results of this study reveal that polyadenylation and subcellular localization of many rice lncRNAs are likely to be regulated at the post-transcriptional level. Our findings strongly suggest that many upregulated/downregulated lncRNAs previously identified by traditional RNA-seq analyses need to be carefully reviewed to assess the influence of post-transcriptional modification.


Assuntos
Regulação da Expressão Gênica de Plantas , Oryza/genética , RNA Longo não Codificante/metabolismo , Estresse Fisiológico/genética , Sequência de Bases , Núcleo Celular/genética , Sequência Conservada , Regulação para Baixo , Secas , Oryza/fisiologia , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Poli A/genética , Poli A/metabolismo , Poliadenilação , Locos de Características Quantitativas , RNA Longo não Codificante/genética , RNA de Plantas/metabolismo
15.
Genome Res ; 26(9): 1233-44, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27516619

RESUMO

Long noncoding RNAs (lncRNAs), a recently discovered class of cellular RNAs, play important roles in the regulation of many cellular developmental processes. Although lncRNAs have been systematically identified in various systems, most of them have not been functionally characterized in vivo in animal models. In this study, we identified 128 testis-specific Drosophila lncRNAs and knocked out 105 of them using an optimized three-component CRISPR/Cas9 system. Among the lncRNA knockouts, 33 (31%) exhibited a partial or complete loss of male fertility, accompanied by visual developmental defects in late spermatogenesis. In addition, six knockouts were fully or partially rescued by transgenes in a trans configuration, indicating that those lncRNAs primarily work in trans Furthermore, gene expression profiles for five lncRNA mutants revealed that testis-specific lncRNAs regulate global gene expression, orchestrating late male germ cell differentiation. Compared with coding genes, the testis-specific lncRNAs evolved much faster. Moreover, lncRNAs of greater functional importance exhibited higher sequence conservation, suggesting that they are under constant evolutionary selection. Collectively, our results reveal critical functions of rapidly evolving testis-specific lncRNAs in late Drosophila spermatogenesis.


Assuntos
Sequência Conservada/genética , RNA Longo não Codificante/genética , Espermatogênese/genética , Testículo/crescimento & desenvolvimento , Animais , Sistemas CRISPR-Cas , Drosophila/genética , Drosophila/crescimento & desenvolvimento , Regulação da Expressão Gênica no Desenvolvimento , Células Germinativas/crescimento & desenvolvimento , Infertilidade Masculina/genética , Infertilidade Masculina/patologia , Masculino
16.
Nucleic Acids Res ; 45(1): e2, 2017 01 09.
Artigo em Inglês | MEDLINE | ID: mdl-27608726

RESUMO

Recent genomic studies suggest that novel long non-coding RNAs (lncRNAs) are specifically expressed and far outnumber annotated lncRNA sequences. To identify and characterize novel lncRNAs in RNA sequencing data from new samples, we have developed COME, a coding potential calculation tool based on multiple features. It integrates multiple sequence-derived and experiment-based features using a decompose-compose method, which makes it more accurate and robust than other well-known tools. We also showed that COME was able to substantially improve the consistency of predication results from other coding potential calculators. Moreover, COME annotates and characterizes each predicted lncRNA transcript with multiple lines of supporting evidence, which are not provided by other tools. Remarkably, we found that one subgroup of lncRNAs classified by such supporting features (i.e. conserved local RNA secondary structure) was highly enriched in a well-validated database (lncRNAdb). We further found that the conserved structural domains on lncRNAs had better chance than other RNA regions to interact with RNA binding proteins, based on the recent eCLIP-seq data in human, indicating their potential regulatory roles. Overall, we present COME as an accurate, robust and multiple-feature supported method for the identification and characterization of novel lncRNAs. The software implementation is available at https://github.com/lulab/COME.


Assuntos
Anotação de Sequência Molecular , RNA Longo não Codificante/genética , Proteínas de Ligação a RNA/genética , Software , Animais , Arabidopsis/genética , Sequência de Bases , Sítios de Ligação , Caenorhabditis elegans/genética , Gráficos por Computador , Bases de Dados de Ácidos Nucleicos , Drosophila melanogaster/genética , Genoma , Humanos , Internet , Camundongos , Conformação de Ácido Nucleico , Ligação Proteica , RNA Longo não Codificante/classificação , RNA Longo não Codificante/metabolismo , Proteínas de Ligação a RNA/metabolismo , Análise de Sequência de RNA
17.
Nucleic Acids Res ; 45(D1): D104-D114, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-28053162

RESUMO

We present POSTAR (http://POSTAR.ncrnalab.org), a resource of POST-trAnscriptional Regulation coordinated by RNA-binding proteins (RBPs). Precise characterization of post-transcriptional regulatory maps has accelerated dramatically in the past few years. Based on new studies and resources, POSTAR supplies the largest collection of experimentally probed (∼23 million) and computationally predicted (approximately 117 million) RBP binding sites in the human and mouse transcriptomes. POSTAR annotates every transcript and its RBP binding sites using extensive information regarding various molecular regulatory events (e.g., splicing, editing, and modification), RNA secondary structures, disease-associated variants, and gene expression and function. Moreover, POSTAR provides a friendly, multi-mode, integrated search interface, which helps users to connect multiple RBP binding sites with post-transcriptional regulatory events, phenotypes, and diseases. Based on our platform, we were able to obtain novel insights into post-transcriptional regulation, such as the putative association between CPSF6 binding, RNA structural domains, and Li-Fraumeni syndrome SNPs. In summary, POSTAR represents an early effort to systematically annotate post-transcriptional regulatory maps and explore the putative roles of RBPs in human diseases.


Assuntos
Bases de Dados Genéticas , Processamento Pós-Transcricional do RNA , Proteínas de Ligação a RNA/metabolismo , RNA/química , RNA/metabolismo , Processamento Alternativo , Animais , Sítios de Ligação , Doença/genética , Ontologia Genética , Humanos , Camundongos , MicroRNAs/metabolismo , Anotação de Sequência Molecular , Conformação de Ácido Nucleico , Polimorfismo de Nucleotídeo Único
18.
Nucleic Acids Res ; 45(4): 1657-1672, 2017 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-27980097

RESUMO

Distinguishing cell states based only on gene expression data remains a challenging task. This is true even for analyses within a species. In cross-species comparisons, the results obtained by different groups have varied widely. Here, we integrate RNA-seq data from more than 40 cell and tissue types of four mammalian species to identify sets of associated genes as indicators for specific cell states in each species. We employ a statistical method, TROM, to identify both protein-coding and non-coding indicators. Next, we map the cell states within each species and also between species using these indicator genes. We recapitulate known phenotypic similarity between related cell and tissue types and reveal molecular basis for their similarity. We also report novel associations between several tissues and cell types with functional support. Moreover, our identified conserved associated genes are found to be a good resource for studying cell differentiation and reprogramming. Lastly, long non-coding RNAs can serve well as associated genes to indicate cell states. We further infer the biological functions of those non-coding associated genes based on their co-expressed protein-coding genes. This study demonstrates that combining statistical modeling with public RNA-seq data can be powerful for improving our understanding of cell identity control.


Assuntos
Mapeamento de Sequências Contíguas , Evolução Molecular , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Mamíferos/genética , Transcriptoma , Algoritmos , Animais , Análise por Conglomerados , Biologia Computacional/métodos , Regulação da Expressão Gênica no Desenvolvimento , Ontologia Genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Camundongos , Anotação de Sequência Molecular , Família Multigênica , Especificidade de Órgãos
19.
Brief Bioinform ; 17(6): 1032-1043, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-26655457

RESUMO

High-throughput sequencing has been used to study posttranscriptional regulations, where the identification of protein-RNA binding is a major and fast-developing sub-area, which is in turn benefited by the sequencing methods for whole-transcriptome probing of RNA secondary structures. In the study of RNA secondary structures using high-throughput sequencing, bases are modified or cleaved according to their structural features, which alter the resulting composition of sequencing reads. In the study of protein-RNA binding, methods have been proposed to immuno-precipitate (IP) protein-bound RNA transcripts in vitro or in vivo By sequencing these transcripts, the protein-RNA interactions and the binding locations can be identified. For both types of data, read counts are affected by a combination of confounding factors, including expression levels of transcripts, sequence biases, mapping errors and the probing or IP efficiency of the experimental protocols. Careful processing of the sequencing data and proper extraction of important features are fundamentally important to a successful analysis. Here we review and compare different experimental methods for probing RNA secondary structures and binding sites of RNA-binding proteins (RBPs), and the computational methods proposed for analyzing the corresponding sequencing data. We suggest how these two types of data should be integrated to study the structural properties of RBP binding sites as a systematic way to better understand posttranscriptional regulations.


Assuntos
RNA/química , Conformação de Ácido Nucleico , Ligação Proteica , Estrutura Secundária de Proteína , Proteínas de Ligação a RNA , Transcriptoma
20.
Nucleic Acids Res ; 44(W1): W294-301, 2016 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-27137891

RESUMO

Several high-throughput technologies have been developed to probe RNA base pairs and loops at the transcriptome level in multiple species. However, to obtain the final RNA secondary structure, extensive effort and considerable expertise is required to statistically process the probing data and combine them with free energy models. Therefore, we developed an RNA secondary structure prediction server that is enhanced by experimental data (RNAex). RNAex is a web interface that enables non-specialists to easily access cutting-edge structure-probing data and predict RNA secondary structures enhanced by in vivo and in vitro data. RNAex annotates the RNA editing, RNA modification and SNP sites on the predicted structures. It provides four structure-folding methods, restrained MaxExpect, SeqFold, RNAstructure (Fold) and RNAfold that can be selected by the user. The performance of these four folding methods has been verified by previous publications on known structures. We re-mapped the raw sequencing data of the probing experiments to the whole genome for each species. RNAex thus enables users to predict secondary structures for both known and novel RNA transcripts in human, mouse, yeast and Arabidopsis The RNAex web server is available at http://RNAex.ncrnalab.org/.


Assuntos
Conformação de Ácido Nucleico , Polimorfismo de Nucleotídeo Único , RNA/química , Transcriptoma , Interface Usuário-Computador , Animais , Arabidopsis/genética , Arabidopsis/metabolismo , Pareamento de Bases , Gráficos por Computador , Ensaios de Triagem em Larga Escala , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , RNA/genética , Edição de RNA , Dobramento de RNA , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Termodinâmica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA