Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
Genome Res ; 34(7): 1027-1035, 2024 Aug 20.
Artigo em Inglês | MEDLINE | ID: mdl-38951026

RESUMO

mRNA-based vaccines and therapeutics are gaining popularity and usage across a wide range of conditions. One of the critical issues when designing such mRNAs is sequence optimization. Even small proteins or peptides can be encoded by an enormously large number of mRNAs. The actual mRNA sequence can have a large impact on several properties, including expression, stability, immunogenicity, and more. To enable the selection of an optimal sequence, we developed CodonBERT, a large language model (LLM) for mRNAs. Unlike prior models, CodonBERT uses codons as inputs, which enables it to learn better representations. CodonBERT was trained using more than 10 million mRNA sequences from a diverse set of organisms. The resulting model captures important biological concepts. CodonBERT can also be extended to perform prediction tasks for various mRNA properties. CodonBERT outperforms previous mRNA prediction methods, including on a new flu vaccine data set.


Assuntos
RNA Mensageiro , Vacinas de mRNA , Humanos , RNA Mensageiro/genética , Códon , Algoritmos
2.
Nucleic Acids Res ; 52(1): e1, 2024 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-37962298

RESUMO

Enhanced crosslinking and immunoprecipitation (eCLIP) sequencing is a method for transcriptome-wide detection of binding sites of RNA-binding proteins (RBPs). However, identified crosslink sites can deviate from experimentally established functional elements of even well-studied RBPs. Current peak-calling strategies result in low replication and high false positive rates. Here, we present the R/Bioconductor package DEWSeq that makes use of replicate information and size-matched input controls. We benchmarked DEWSeq on 107 RBPs for which both eCLIP data and RNA sequence motifs are available and were able to more than double the number of motif-containing binding regions relative to standard eCLIP processing. The improvement not only relates to the number of binding sites (3.1-fold with known motifs for RBFOX2), but also their subcellular localization (1.9-fold of mitochondrial genes for FASTKD2) and structural targets (2.2-fold increase of stem-loop regions for SLBP. On several orthogonal CLIP-seq datasets, DEWSeq recovers a larger number of motif-containing binding sites (3.3-fold). DEWSeq is a well-documented R/Bioconductor package, scalable to adequate numbers of replicates, and tends to substantially increase the proportion and total number of RBP binding sites containing biologically relevant features.


Assuntos
Proteínas de Ligação a RNA , Software , Sítios de Ligação , Imunoprecipitação , Ligação Proteica , RNA/química , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo
3.
Nucleic Acids Res ; 48(W1): W287-W291, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32392303

RESUMO

RNA molecules fold into complex structures as a result of intramolecular interactions between their nucleotides. The function of many non-coding RNAs and some cis-regulatory elements of messenger RNAs highly depends on their fold. Single-nucleotide variants (SNVs) and other types of mutations can disrupt the native function of an RNA element by altering its base pairing pattern. Identifying the effect of a mutation on an RNA's structure is, therefore, a crucial step in evaluating the impact of mutations on the post-transcriptional regulation and function of RNAs within the cell. Even though a single nucleotide variation can have striking impacts on the structure formation, interpreting and comparing the impact usually needs expertise and meticulous efforts. Here, we present MutaRNA, a web server for visualization and interpretation of mutation-induced changes on the RNA structure in an intuitive and integrative fashion. To this end, probabilities of base pairing and position-wise unpaired probabilities of wildtype and mutated RNA sequences are computed and compared. Differential heatmap-like dot plot representations in combination with circular plots and arc diagrams help to identify local structure abberations, which are otherwise hidden in standard outputs. Eventually, MutaRNA provides a comprehensive and comparative overview of the mutation-induced changes in base pairing potentials and accessibility. The MutaRNA web server is freely available at http://rna.informatik.uni-freiburg.de/MutaRNA.


Assuntos
Mutação , RNA/química , Software , Regiões 5' não Traduzidas , Apoferritinas/genética , Pareamento de Bases , Genes ras , Ferro/metabolismo , Elementos de Resposta
4.
Bioinformatics ; 36(Suppl_1): i242-i250, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657398

RESUMO

MOTIVATION: Elucidating the functions of non-coding RNAs by homology has been strongly limited due to fundamental computational and modeling issues. While existing simultaneous alignment and folding (SA&F) algorithms successfully align homologous RNAs with precisely known boundaries (global SA&F), the more pressing problem of identifying new classes of homologous RNAs in the genome (local SA&F) is intrinsically more difficult and much less understood. Typically, the length of local alignments is strongly overestimated and alignment boundaries are dramatically mispredicted. We hypothesize that local SA&F approaches are compromised this way due to a score bias, which is caused by the contribution of RNA structure similarity to their overall alignment score. RESULTS: In the light of this hypothesis, we study pairwise local SA&F for the first time systematically-based on a novel local RNA alignment benchmark set and quality measure. First, we vary the relative influence of structure similarity compared to sequence similarity. Putting more emphasis on the structure component leads to overestimating the length of local alignments. This clearly shows the bias of current scores and strongly hints at the structure component as its origin. Second, we study the interplay of several important scoring parameters by learning parameters for local and global SA&F. The divergence of these optimized parameter sets underlines the fundamental obstacles for local SA&F. Third, by introducing a position-wise correction term in local SA&F, we constructively solve its principal issues. AVAILABILITY AND IMPLEMENTATION: The benchmark data, detailed results and scripts are available at https://github.com/BackofenLab/local_alignment. The RNA alignment tool LocARNA, including the modifications proposed in this work, is available at https://github.com/s-will/LocARNA/releases/tag/v2.0.0RC6. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , RNA , Genoma , RNA/genética , Alinhamento de Sequência , Análise de Sequência de RNA , Software
5.
RNA Biol ; 18(sup1): 268-277, 2021 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-34241565

RESUMO

MicroRNAs (miRNAs) can serve as activation signals for membrane receptors, a recently discovered function that is independent of the miRNAs' conventional role in post-transcriptional gene regulation. Here, we introduce a machine learning approach, BrainDead, to identify oligonucleotides that act as ligands for single-stranded RNA-detecting Toll-like receptors (TLR)7/8, thereby triggering an immune response. BrainDead was trained on activation data obtained from in vitro experiments on murine microglia, incorporating sequence and intra-molecular structure, as well as inter-molecular homo-dimerization potential of candidate RNAs. The method was applied to analyse all known human miRNAs regarding their potential to induce TLR7/8 signalling and microglia activation. We validated the predicted functional activity of subsets of high- and low-scoring miRNAs experimentally, of which a selection has been linked to Alzheimer's disease. High agreement between predictions and experiments confirms the robustness and power of BrainDead. The results provide new insight into the mechanisms of how miRNAs act as TLR ligands. Eventually, BrainDead implements a generic machine learning methodology for learning and predicting the functions of short RNAs in any context.


Assuntos
Regulação da Expressão Gênica , Aprendizado de Máquina , MicroRNAs/metabolismo , Microglia/metabolismo , Oligonucleotídeos/metabolismo , Receptor 7 Toll-Like/metabolismo , Receptor 8 Toll-Like/metabolismo , Animais , Humanos , Ligantes , Camundongos , Camundongos Endogâmicos C57BL , MicroRNAs/genética , Oligonucleotídeos/química , Oligonucleotídeos/genética , Receptor 7 Toll-Like/genética , Receptor 8 Toll-Like/genética
6.
Bioinformatics ; 35(16): 2862-2864, 2019 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-30590479

RESUMO

SUMMARY: Experimental structure probing data has been shown to improve thermodynamics-based RNA secondary structure prediction. To this end, chemical reactivity information (as provided e.g. by SHAPE) is incorporated, which encodes whether or not individual nucleotides are involved in intra-molecular structure. Since inter-molecular RNA-RNA interactions are often confined to unpaired RNA regions, SHAPE data is even more promising to improve interaction prediction. Here, we show how such experimental data can be incorporated seamlessly into accessibility-based RNA-RNA interaction prediction approaches, as implemented in IntaRNA. This is possible via the computation and use of unpaired probabilities that incorporate the structure probing information. We show that experimental SHAPE data can significantly improve RNA-RNA interaction prediction. We evaluate our approach by investigating interactions of a spliceosomal U1 snRNA transcript with its target splice sites. When SHAPE data is incorporated, known target sites are predicted with increased precision and specificity. AVAILABILITY AND IMPLEMENTATION: https://github.com/BackofenLab/IntaRNA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
RNA/genética , Estrutura Molecular , Conformação de Ácido Nucleico , Nucleotídeos , Termodinâmica
7.
Bioinformatics ; 35(14): i354-i359, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510707

RESUMO

SUMMARY: SHAPE experiments are used to probe the structure of RNA molecules. We present ShaKer to predict SHAPE data for RNA using a graph-kernel-based machine learning approach that is trained on experimental SHAPE information. While other available methods require a manually curated reference structure, ShaKer predicts reactivity data based on sequence input only and by sampling the ensemble of possible structures. Thus, ShaKer is well placed to enable experiment-driven, transcriptome-wide SHAPE data prediction to enable the study of RNA structuredness and to improve RNA structure and RNA-RNA interaction prediction. For performance evaluation, we use accuracy and accessibility comparing to experimental SHAPE data and competing methods. We can show that Shaker outperforms its competitors and is able to predict high quality SHAPE annotations even when no reference structure is provided. AVAILABILITY AND IMPLEMENTATION: ShaKer is freely available at https://github.com/BackofenLab/ShaKer.


Assuntos
Algoritmos , Software , Aprendizado de Máquina , RNA , Transcriptoma
8.
Nucleic Acids Res ; 46(W1): W25-W29, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29788132

RESUMO

The Freiburg RNA tools webserver is a well established online resource for RNA-focused research. It provides a unified user interface and comprehensive result visualization for efficient command line tools. The webserver includes RNA-RNA interaction prediction (IntaRNA, CopraRNA, metaMIR), sRNA homology search (GLASSgo), sequence-structure alignments (LocARNA, MARNA, CARNA, ExpaRNA), CRISPR repeat classification (CRISPRmap), sequence design (antaRNA, INFO-RNA, SECISDesign), structure aberration evaluation of point mutations (RaSE), and RNA/protein-family models visualization (CMV), and other methods. Open education resources offer interactive visualizations of RNA structure and RNA-RNA interaction prediction as well as basic and advanced sequence alignment algorithms. The services are freely available at http://rna.informatik.uni-freiburg.de.


Assuntos
Sequência de Bases/genética , Internet , RNA/genética , Software , Algoritmos , Conformação de Ácido Nucleico , RNA/química , Alinhamento de Sequência/instrumentação , Análise de Sequência de RNA/instrumentação , Relação Estrutura-Atividade
9.
Bioinformatics ; 33(14): 2089-2096, 2017 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-28334186

RESUMO

MOTIVATION: Clustering RNA sequences with common secondary structure is an essential step towards studying RNA function. Whereas structural RNA alignment strategies typically identify common structure for orthologous structured RNAs, clustering seeks to group paralogous RNAs based on structural similarities. However, existing approaches for clustering paralogous RNAs, do not take the compensatory base pair changes obtained from structure conservation in orthologous sequences into account. RESULTS: Here, we present RNAscClust , the implementation of a new algorithm to cluster a set of structured RNAs taking their respective structural conservation into account. For a set of multiple structural alignments of RNA sequences, each containing a paralog sequence included in a structural alignment of its orthologs, RNAscClust computes minimum free-energy structures for each sequence using conserved base pairs as prior information for the folding. The paralogs are then clustered using a graph kernel-based strategy, which identifies common structural features. We show that the clustering accuracy clearly benefits from an increasing degree of compensatory base pair changes in the alignments. AVAILABILITY AND IMPLEMENTATION: RNAscClust is available at http://www.bioinf.uni-freiburg.de/Software/RNAscClust . CONTACT: gorodkin@rth.dk or backofen@informatik.uni-freiburg.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
RNA/química , Análise de Sequência de RNA/métodos , Software , Algoritmos , Análise por Conglomerados , Humanos , Conformação de Ácido Nucleico
10.
Bioinformatics ; 31(15): 2489-96, 2015 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-25838465

RESUMO

MOTIVATION: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). RESULTS: Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics.


Assuntos
Algoritmos , Dobramento de RNA , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Heurística
11.
Methods Mol Biol ; 2726: 209-234, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38780733

RESUMO

Computational prediction of RNA-RNA interactions (RRI) is a central methodology for the specific investigation of inter-molecular RNA interactions and regulatory effects of non-coding RNAs like eukaryotic microRNAs or prokaryotic small RNAs. Available methods can be classified according to their underlying prediction strategies, each implicating specific capabilities and restrictions often not transparent to the non-expert user. Within this work, we review seven classes of RRI prediction strategies and discuss the advantages and limitations of respective tools, since such knowledge is essential for selecting the right tool in the first place.Among the RRI prediction strategies, accessibility-based approaches have been shown to provide the most reliable predictions. Here, we describe how IntaRNA, as one of the state-of-the-art accessibility-based tools, can be applied in various use cases for the task of computational RRI prediction. Detailed hands-on examples for individual RRI predictions as well as large-scale target prediction scenarios are provided. We illustrate the flexibility and capabilities of IntaRNA through the examples. Each example is designed using real-life data from the literature and is accompanied by instructions on interpreting the respective results from IntaRNA output. Our use-case driven instructions enable non-expert users to comprehensively understand and utilize IntaRNA's features for effective RRI predictions.


Assuntos
Biologia Computacional , Software , Biologia Computacional/métodos , RNA/genética , RNA/metabolismo , Algoritmos , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo
12.
NAR Genom Bioinform ; 6(3): lqae089, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39131818

RESUMO

RNA secondary structures play essential roles in the formation of the tertiary structure and function of a transcript. Recent genome-wide studies highlight significant potential for RNA structures in the mammalian genome. However, a major challenge is assigning functional roles to these structured RNAs. In this study, we conduct a guilt-by-association analysis of clusters of computationally predicted conserved RNA structure (CRSs) in human untranslated regions (UTRs) to associate them with gene functions. We filtered a broad pool of ∼500 000 human CRSs for UTR overlap, resulting in 4734 and 24 754 CRSs from the 5' and 3' UTR of protein-coding genes, respectively. We separately clustered these CRSs for both sets using RNAscClust, obtaining 793 and 2403 clusters, each containing an average of five CRSs per cluster. We identified overrepresented binding sites for 60 and 43 RNA-binding proteins co-localizing with the clustered CRSs. Furthermore, 104 and 441 clusters from the 5' and 3' UTRs, respectively, showed enrichment for various Gene Ontologies, including biological processes such as 'signal transduction', 'nervous system development', molecular functions like 'transferase activity' and the cellular components such as 'synapse' among others. Our study shows that significant functional insights can be gained by clustering RNA structures based on their structural characteristics.

13.
bioRxiv ; 2021 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-33791701

RESUMO

The COVID-19 pandemic is the first global health crisis to occur in the age of big genomic data.Although data generation capacity is well established and sufficiently standardized, analytical capacity is not. To establish analytical capacity it is necessary to pull together global computational resources and deliver the best open source tools and analysis workflows within a ready to use, universally accessible resource. Such a resource should not be controlled by a single research group, institution, or country. Instead it should be maintained by a community of users and developers who ensure that the system remains operational and populated with current tools. A community is also essential for facilitating the types of discourse needed to establish best analytical practices. Bringing together public computational research infrastructure from the USA, Europe, and Australia, we developed a distributed data analysis platform that accomplishes these goals. It is immediately accessible to anyone in the world and is designed for the analysis of rapidly growing collections of deep sequencing datasets. We demonstrate its utility by detecting allelic variants in high-quality existing SARS-CoV-2 sequencing datasets and by continuous reanalysis of COG-UK data. All workflows, data, and documentation is available at https://covid19.galaxyproject.org .

14.
Mol Neurodegener ; 16(1): 80, 2021 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-34838071

RESUMO

BACKGROUND: MicroRNA (miRNA) expression in the brain is altered in neurodegenerative diseases. Recent studies demonstrated that selected miRNAs conventionally regulating gene expression at the post-transcriptional level can act extracellularly as signaling molecules. The identity of miRNA species serving as membrane receptor ligands involved in neuronal apoptosis in the central nervous system (CNS), as well as the miRNAs' sequence and structure required for this mode of action remained largely unresolved. METHODS: Using a microarray-based screening approach we analyzed apoptotic cortical neurons of C56BL/6 mice and their supernatant with respect to alterations in miRNA expression/presence. HEK-Blue Toll-like receptor (TLR) 7/8 reporter cells, primary microglia and macrophages derived from human and mouse were employed to test the potential of the identified miRNAs released from apoptotic neurons to serve as signaling molecules for the RNA-sensing receptors. Biophysical and bioinformatical approaches, as well as immunoassays and sequential microscopy were used to analyze the interaction between candidate miRNA and TLR. Immunocytochemical and -histochemical analyses of murine CNS cultures and adult mice intrathecally injected with miRNAs, respectively, were performed to evaluate the impact of miRNA-induced TLR activation on neuronal survival and microglial activation. RESULTS: We identified a specific pattern of miRNAs released from apoptotic cortical neurons that activate TLR7 and/or TLR8, depending on sequence and species. Exposure of microglia and macrophages to certain miRNA classes released from apoptotic neurons resulted in the sequence-specific production of distinct cytokines/chemokines and increased phagocytic activity. Out of those miRNAs miR-100-5p and miR-298-5p, which have consistently been linked to neurodegenerative diseases, entered microglia, located to their endosomes, and directly bound to human TLR8. The miRNA-TLR interaction required novel sequence features, but no specific structure formation of mature miRNA. As a consequence of miR-100-5p- and miR-298-5p-induced TLR activation, cortical neurons underwent cell-autonomous apoptosis. Presence of miR-100-5p and miR-298-5p in cerebrospinal fluid led to neurodegeneration and microglial accumulation in the murine cerebral cortex through TLR7 signaling. CONCLUSION: Our data demonstrate that specific miRNAs are released from apoptotic cortical neurons, serve as endogenous TLR7/8 ligands, and thereby trigger further neuronal apoptosis in the CNS. Our findings underline the recently discovered role of miRNAs as extracellular signaling molecules, particularly in the context of neurodegeneration.


Assuntos
MicroRNAs , Receptor 7 Toll-Like , Animais , Córtex Cerebral/metabolismo , Ligantes , Camundongos , MicroRNAs/genética , MicroRNAs/metabolismo , Neurônios/metabolismo , Receptor 7 Toll-Like/genética , Receptor 7 Toll-Like/metabolismo
15.
Algorithms Mol Biol ; 15(1): 19, 2020 Nov 13.
Artigo em Inglês | MEDLINE | ID: mdl-33292340

RESUMO

MOTIVATION: Simultaneous alignment and folding (SA&F) of RNAs is the indispensable gold standard for inferring the structure of non-coding RNAs and their general analysis. The original algorithm, proposed by Sankoff, solves the theoretical problem exactly with a complexity of [Formula: see text] in the full energy model. Over the last two decades, several variants and improvements of the Sankoff algorithm have been proposed to reduce its extreme complexity by proposing simplified energy models or imposing restrictions on the predicted alignments. RESULTS: Here, we introduce a novel variant of Sankoff's algorithm that reconciles the simplifications of PMcomp, namely moving from the full energy model to a simpler base pair-based model, with the accuracy of the loop-based full energy model. Instead of estimating pseudo-energies from unconditional base pair probabilities, our model calculates energies from conditional base pair probabilities that allow to accurately capture structure probabilities, which obey a conditional dependency. This model gives rise to the fast and highly accurate novel algorithm Pankov (Probabilistic Sankoff-like simultaneous alignment and folding of RNAs inspired by Markov chains). CONCLUSIONS: Pankov benefits from the speed-up of excluding unreliable base-pairing without compromising the loop-based free energy model of the Sankoff's algorithm. We show that Pankov outperforms its predecessors LocARNA and SPARSE in folding quality and is faster than LocARNA.

16.
Gigascience ; 9(10)2020 10 17.
Artigo em Inglês | MEDLINE | ID: mdl-33068114

RESUMO

BACKGROUND: Long-read sequencing can be applied to generate very long contigs and even completely assembled genomes at relatively low cost and with minimal sample preparation. As a result, long-read sequencing platforms are becoming more popular. In this respect, the Oxford Nanopore Technologies-based long-read sequencing "nanopore" platform is becoming a widely used tool with a broad range of applications and end-users. However, the need to explore and manipulate the complex data generated by long-read sequencing platforms necessitates accompanying specialized bioinformatics platforms and tools to process the long-read data correctly. Importantly, such tools should additionally help democratize bioinformatics analysis by enabling easy access and ease-of-use solutions for researchers. RESULTS: The Galaxy platform provides a user-friendly interface to computational command line-based tools, handles the software dependencies, and provides refined workflows. The users do not have to possess programming experience or extended computer skills. The interface enables researchers to perform powerful bioinformatics analysis, including the assembly and analysis of short- or long-read sequence data. The newly developed "NanoGalaxy" is a Galaxy-based toolkit for analysing long-read sequencing data, which is suitable for diverse applications, including de novo genome assembly from genomic, metagenomic, and plasmid sequence reads. CONCLUSIONS: A range of best-practice tools and workflows for long-read sequence genome assembly has been integrated into a NanoGalaxy platform to facilitate easy access and use of bioinformatics tools for researchers. NanoGalaxy is freely available at the European Galaxy server https://nanopore.usegalaxy.eu with supporting self-learning training material available at https://training.galaxyproject.org.


Assuntos
Sequenciamento por Nanoporos , Nanoporos , Análise de Dados , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software
17.
Gigascience ; 8(12)2019 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-31808801

RESUMO

BACKGROUND: RNA plays essential roles in all known forms of life. Clustering RNA sequences with common sequence and structure is an essential step towards studying RNA function. With the advent of high-throughput sequencing techniques, experimental and genomic data are expanding to complement the predictive methods. However, the existing methods do not effectively utilize and cope with the immense amount of data becoming available. RESULTS: Hundreds of thousands of non-coding RNAs have been detected; however, their annotation is lagging behind. Here we present GraphClust2, a comprehensive approach for scalable clustering of RNAs based on sequence and structural similarities. GraphClust2 bridges the gap between high-throughput sequencing and structural RNA analysis and provides an integrative solution by incorporating diverse experimental and genomic data in an accessible manner via the Galaxy framework. GraphClust2 can efficiently cluster and annotate large datasets of RNAs and supports structure-probing data. We demonstrate that the annotation performance of clustering functional RNAs can be considerably improved. Furthermore, an off-the-shelf procedure is introduced for identifying locally conserved structure candidates in long RNAs. We suggest the presence and the sparseness of phylogenetically conserved local structures for a collection of long non-coding RNAs. CONCLUSIONS: By clustering data from 2 cross-linking immunoprecipitation experiments, we demonstrate the benefits of GraphClust2 for motif discovery under the presence of biological and methodological biases. Finally, we uncover prominent targets of double-stranded RNA binding protein Roquin-1, such as BCOR's 3' untranslated region that contains multiple binding stem-loops that are evolutionary conserved.


Assuntos
RNA não Traduzido/química , RNA não Traduzido/genética , Análise de Sequência de RNA/métodos , Análise por Conglomerados , Biologia Computacional , Sequenciamento de Nucleotídeos em Larga Escala , Anotação de Sequência Molecular , Conformação de Ácido Nucleico , Software
18.
Nat Commun ; 10(1): 2569, 2019 06 12.
Artigo em Inglês | MEDLINE | ID: mdl-31189880

RESUMO

Synonymous mutations have been viewed as silent mutations, since they only affect the DNA and mRNA, but not the amino acid sequence of the resulting protein. Nonetheless, recent studies suggest their significant impact on splicing, RNA stability, RNA folding, translation or co-translational protein folding. Hence, we compile 659194 synonymous mutations found in human cancer and characterize their properties. We provide the user-friendly, comprehensive resource for synonymous mutations in cancer, SynMICdb ( http://SynMICdb.dkfz.de ), which also contains orthogonal information about gene annotation, recurrence, mutation loads, cancer association, conservation, alternative events, impact on mRNA structure and a SynMICdb score. Notably, synonymous and missense mutations are depleted at the 5'-end of the coding sequence as well as at the ends of internal exons independent of mutational signatures. For patient-derived synonymous mutations in the oncogene KRAS, we indicate that single point mutations can have a relevant impact on expression as well as on mRNA secondary structure.


Assuntos
Bases de Dados de Ácidos Nucleicos , Regulação Neoplásica da Expressão Gênica/genética , Neoplasias/genética , Mutação Silenciosa/genética , Conjuntos de Dados como Assunto , Humanos , Mutação de Sentido Incorreto/genética , Mutação Puntual/genética , Proteínas Proto-Oncogênicas p21(ras)/genética , Dobramento de RNA/genética , Splicing de RNA/genética , RNA Mensageiro/química , RNA Mensageiro/genética
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa