Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 96
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nat Methods ; 21(3): 435-443, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38238559

RESUMO

RNA engineering has immense potential to drive innovation in biotechnology and medicine. Despite its importance, a versatile platform for the automated design of functional RNA is still lacking. Here, we propose RNA family sequence generator (RfamGen), a deep generative model that designs RNA family sequences in a data-efficient manner by explicitly incorporating alignment and consensus secondary structure information. RfamGen can generate novel and functional RNA family sequences by sampling points from a semantically rich and continuous representation. We have experimentally demonstrated the versatility of RfamGen using diverse RNA families. Furthermore, we confirmed the high success rate of RfamGen in designing functional ribozymes through a quantitative massively parallel assay. Notably, RfamGen successfully generates artificial sequences with higher activity than natural sequences. Overall, RfamGen significantly improves our ability to design functional RNA and opens up new potential for generative RNA engineering in synthetic biology.


Assuntos
RNA Catalítico , Humanos , RNA Catalítico/genética , RNA Catalítico/química , RNA/genética , Biotecnologia , Biologia Sintética
2.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37232359

RESUMO

Computational analysis of RNA sequences constitutes a crucial step in the field of RNA biology. As in other domains of the life sciences, the incorporation of artificial intelligence and machine learning techniques into RNA sequence analysis has gained significant traction in recent years. Historically, thermodynamics-based methods were widely employed for the prediction of RNA secondary structures; however, machine learning-based approaches have demonstrated remarkable advancements in recent years, enabling more accurate predictions. Consequently, the precision of sequence analysis pertaining to RNA secondary structures, such as RNA-protein interactions, has also been enhanced, making a substantial contribution to the field of RNA biology. Additionally, artificial intelligence and machine learning are also introducing technical innovations in the analysis of RNA-small molecule interactions for RNA-targeted drug discovery and in the design of RNA aptamers, where RNA serves as its own ligand. This review will highlight recent trends in the prediction of RNA secondary structure, RNA aptamers and RNA drug discovery using machine learning, deep learning and related technologies, and will also discuss potential future avenues in the field of RNA informatics.


Assuntos
Aptâmeros de Nucleotídeos , Aprendizado Profundo , Inteligência Artificial , RNA/genética , Aprendizado de Máquina , Descoberta de Drogas/métodos , Informática
3.
Nucleic Acids Res ; 51(15): 7820-7831, 2023 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-37463833

RESUMO

Phase-separated membraneless organelles often contain RNAs that exhibit unusual semi-extractability using the conventional RNA extraction method, and can be efficiently retrieved by needle shearing or heating during RNA extraction. Semi-extractable RNAs are promising resources for understanding RNA-centric phase separation. However, limited assessments have been performed to systematically identify and characterize semi-extractable RNAs. In this study, 1074 semi-extractable RNAs, including ASAP1, DANT2, EXT1, FTX, IGF1R, LIMS1, NEAT1, PHF21A, PVT1, SCMH1, STRG.3024.1, TBL1X, TCF7L2, TVP23C-CDRT4, UBE2E2, ZCCHC7, ZFAND3 and ZSWIM6, which exhibited consistent semi-extractability were identified across five human cell lines. By integrating publicly available datasets, we found that semi-extractable RNAs tend to be distributed in the nuclear compartments but are dissociated from the chromatin. Long and repeat-containing semi-extractable RNAs act as hubs to provide global RNA-RNA interactions. Semi-extractable RNAs were divided into four groups based on their k-mer content. The NEAT1 group preferred to interact with paraspeckle proteins, such as FUS and NONO, implying that RNAs in this group are potential candidates of architectural RNAs that constitute nuclear bodies.


Assuntos
RNA Longo não Codificante , RNA , Humanos , Linhagem Celular , Núcleo Celular/metabolismo , Cromatina/metabolismo , Proteínas de Ligação a DNA/genética , RNA/isolamento & purificação , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo
4.
Biochemistry ; 63(7): 906-912, 2024 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-38457656

RESUMO

Optimization of aptamers in length and chemistry is crucial for industrial applications. Here, we developed aptamers against the SARS-CoV-2 spike protein and achieved optimization with a deep-learning-based algorithm, RaptGen. We conducted a primer-less SELEX against the receptor binding domain (RBD) of the spike with an RNA/DNA hybrid library, and the resulting sequences were subjected to RaptGen analysis. Based on the sequence profiling by RaptGen, a short truncation aptamer of 26 nucleotides was obtained and further optimized by a chemical modification of relevant nucleotides. The resulting aptamer is bound to RBD not only of SARS-CoV-2 wildtype but also of its variants, SARS-CoV-1, and Middle East respiratory syndrome coronavirus (MERS-CoV). We concluded that the RaptGen-assisted discovery is efficient for developing optimized aptamers.


Assuntos
Aptâmeros de Nucleotídeos , SARS-CoV-2 , Humanos , COVID-19/prevenção & controle , DNA , SARS-CoV-2/química , SARS-CoV-2/genética , Glicoproteína da Espícula de Coronavírus/genética , Glicoproteína da Espícula de Coronavírus/química
5.
Nucleic Acids Res ; 50(19): 11229-11242, 2022 10 28.
Artigo em Inglês | MEDLINE | ID: mdl-36259651

RESUMO

Non-coding RNAs (ncRNAs) ubiquitously exist in normal and cancer cells. Despite their prevalent distribution, the functions of most long ncRNAs remain uncharacterized. The fission yeast Schizosaccharomyces pombe expresses >1800 ncRNAs annotated to date, but most unconventional ncRNAs (excluding tRNA, rRNA, snRNA and snoRNA) remain uncharacterized. To discover the functional ncRNAs, here we performed a combinatory screening of computational and biological tests. First, all S. pombe ncRNAs were screened in silico for those showing conservation in sequence as well as in secondary structure with ncRNAs in closely related species. Almost a half of the 151 selected conserved ncRNA genes were uncharacterized. Twelve ncRNA genes that did not overlap with protein-coding sequences were next chosen for biological screening that examines defects in growth or sexual differentiation, as well as sensitivities to drugs and stresses. Finally, we highlighted an ncRNA transcribed from SPNCRNA.1669, which inhibited untimely initiation of sexual differentiation. A domain that was predicted as conserved secondary structure by the computational operations was essential for the ncRNA to function. Thus, this study demonstrates that in silico selection focusing on conservation of the secondary structure over species is a powerful method to pinpoint novel functional ncRNAs.


Assuntos
Schizosaccharomyces , Schizosaccharomyces/genética , Diferenciação Sexual , RNA não Traduzido/genética , RNA não Traduzido/química , RNA Nucleolar Pequeno/genética , Fases de Leitura Aberta
6.
Bioinformatics ; 37(Suppl_1): i16-i24, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252954

RESUMO

MOTIVATION: Accumulating evidence has highlighted the importance of microbial interaction networks. Methods have been developed for estimating microbial interaction networks, of which the generalized Lotka-Volterra equation (gLVE)-based method can estimate a directed interaction network. The previous gLVE-based method for estimating microbial interaction networks did not consider time-varying interactions. RESULTS: In this study, we developed unsupervised learning-based microbial interaction inference method using Bayesian estimation (Umibato), a method for estimating time-varying microbial interactions. The Umibato algorithm comprises Gaussian process regression (GPR) and a new Bayesian probabilistic model, the continuous-time regression hidden Markov model (CTRHMM). Growth rates are estimated by GPR, and interaction networks are estimated by CTRHMM. CTRHMM can estimate time-varying interaction networks using interaction states, which are defined as hidden variables. Umibato outperformed the existing methods on synthetic datasets. In addition, it yielded reasonable estimations in experiments on a mouse gut microbiota dataset, thus providing novel insights into the relationship between consumed diets and the gut microbiota. AVAILABILITY AND IMPLEMENTATION: The C++ and python source codes of the Umibato software are available at https://github.com/shion-h/Umibato. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Animais , Teorema de Bayes , Camundongos , Interações Microbianas , Distribuição Normal
7.
Bioinformatics ; 37(5): 589-595, 2021 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-32976553

RESUMO

MOTIVATION: Recent advances in high-throughput long-read sequencers, such as PacBio and Oxford Nanopore sequencers, produce longer reads with more errors than short-read sequencers. In addition to the high error rates of reads, non-uniformity of errors leads to difficulties in various downstream analyses using long reads. Many useful simulators, which characterize long-read error patterns and simulate them, have been developed. However, there is still room for improvement in the simulation of the non-uniformity of errors. RESULTS: To capture characteristics of errors in reads for long-read sequencers, here, we introduce a generative model for quality scores, in which a hidden Markov Model with a latest model selection method, called factorized information criteria, is utilized. We evaluated our developed simulator from various points, indicating that our simulator successfully simulates reads that are consistent with real reads. AVAILABILITY AND IMPLEMENTATION: The source codes of PBSIM2 are freely available from https://github.com/yukiteruono/pbsim2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Nanoporos , Simulação por Computador , Análise de Sequência de DNA , Software
8.
Pediatr Res ; 92(2): 378-387, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35292727

RESUMO

BACKGROUND: Kawasaki disease (KD) is a systemic vasculitis that is currently the most common cause of acquired heart disease in children. However, its etiology remains unknown. Long non-coding RNAs (lncRNAs) contribute to the pathophysiology of various diseases. Few studies have reported the role of lncRNAs in KD inflammation; thus, we investigated the role of lncRNA in KD inflammation. METHODS: A total of 50 patients with KD (median age, 19 months; 29 males and 21 females) were enrolled. We conducted cap analysis gene expression sequencing to determine differentially expressed genes in monocytes of the peripheral blood of the subjects. RESULTS: About 21 candidate lncRNA transcripts were identified. The analyses of transcriptome and gene ontology revealed that the immune system was involved in KD. Among these genes, G0/G1 switch gene 2 (G0S2) and its antisense lncRNA, HSD11B1-AS1, were upregulated during the acute phase of KD (P < 0.0001 and <0.0001, respectively). Moreover, G0S2 increased when lipopolysaccharides induced inflammation in THP-1 monocytes, and silencing of G0S2 suppressed the expression of HSD11B1-AS1 and tumor necrosis factor-α. CONCLUSIONS: This study uncovered the crucial role of lncRNAs in innate immunity in acute KD. LncRNA may be a novel target for the diagnosis of KD. IMPACT: This study revealed the whole aspect of the gene expression profile of monocytes of patients with Kawasaki disease (KD) using cap analysis gene expression sequencing and identified KD-specific molecules: G0/G1 switch gene 2 (G0S2) and long non-coding RNA (lncRNA) HSD11B1-AS1. We demonstrated that G0S2 and its antisense HSD11B1-AS1 were associated with inflammation of innate immunity in KD. lncRNA may be a novel key target for the diagnosis of patients with KD.


Assuntos
Síndrome de Linfonodos Mucocutâneos , RNA Longo não Codificante , 11-beta-Hidroxiesteroide Desidrogenase Tipo 1 , Proteínas de Ciclo Celular , Criança , Feminino , Humanos , Imunidade Inata , Lactente , Inflamação , Masculino , Síndrome de Linfonodos Mucocutâneos/genética , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Fator de Necrose Tumoral alfa
9.
Nucleic Acids Res ; 48(14): e82, 2020 08 20.
Artigo em Inglês | MEDLINE | ID: mdl-32537639

RESUMO

Aptamers are short single-stranded RNA/DNA molecules that bind to specific target molecules. Aptamers with high binding-affinity and target specificity are identified using an in vitro procedure called high throughput systematic evolution of ligands by exponential enrichment (HT-SELEX). However, the development of aptamer affinity reagents takes a considerable amount of time and is costly because HT-SELEX produces a large dataset of candidate sequences, some of which have insufficient binding-affinity. Here, we present RNA aptamer Ranker (RaptRanker), a novel in silico method for identifying high binding-affinity aptamers from HT-SELEX data by scoring and ranking. RaptRanker analyzes HT-SELEX data by evaluating the nucleotide sequence and secondary structure simultaneously, and by ranking according to scores reflecting local structure and sequence frequencies. To evaluate the performance of RaptRanker, we performed two new HT-SELEX experiments, and evaluated binding affinities of a part of sequences that include aptamers with low binding-affinity. In both datasets, the performance of RaptRanker was superior to Frequency, Enrichment and MPBind. We also confirmed that the consideration of secondary structures is effective in HT-SELEX data analysis, and that RaptRanker successfully predicted the essential subsequence motifs in each identified sequence.


Assuntos
Aptâmeros de Nucleotídeos/química , Técnica de Seleção de Aptâmeros/métodos , Aptâmeros de Nucleotídeos/isolamento & purificação , Aptâmeros de Nucleotídeos/metabolismo , Sequência de Bases , Simulação por Computador , Sequenciamento de Nucleotídeos em Larga Escala , Conformação de Ácido Nucleico , Motivos de Nucleotídeos , Curva ROC
10.
BMC Bioinformatics ; 22(1): 554, 2021 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-34781902

RESUMO

BACKGROUND: Protein-RNA interactions play key roles in many processes regulating gene expression. To understand the underlying binding preference, ultraviolet cross-linking and immunoprecipitation (CLIP)-based methods have been used to identify the binding sites for hundreds of RNA-binding proteins (RBPs) in vivo. Using these large-scale experimental data to infer RNA binding preference and predict missing binding sites has become a great challenge. Some existing deep-learning models have demonstrated high prediction accuracy for individual RBPs. However, it remains difficult to avoid significant bias due to the experimental protocol. The DeepRiPe method was recently developed to solve this problem via introducing multi-task or multi-label learning into this field. However, this method has not reached an ideal level of prediction power due to the weak neural network architecture. RESULTS: Compared to the DeepRiPe approach, our Multi-resBind method demonstrated substantial improvements using the same large-scale PAR-CLIP dataset with respect to an increase in the area under the receiver operating characteristic curve and average precision. We conducted extensive experiments to evaluate the impact of various types of input data on the final prediction accuracy. The same approach was used to evaluate the effect of loss functions. Finally, a modified integrated gradient was employed to generate attribution maps. The patterns disentangled from relative contributions according to context offer biological insights into the underlying mechanism of protein-RNA interactions. CONCLUSIONS: Here, we propose Multi-resBind as a new multi-label deep-learning approach to infer protein-RNA binding preferences and predict novel interactions. The results clearly demonstrate that Multi-resBind is a promising tool to predict unknown binding sites in vivo and gain biology insights into why the neural network makes a given prediction.


Assuntos
Proteínas de Ligação a RNA , RNA , Sítios de Ligação , Redes Neurais de Computação , Ligação Proteica , RNA/genética , RNA/metabolismo , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo
11.
BMC Genomics ; 22(1): 730, 2021 Oct 08.
Artigo em Inglês | MEDLINE | ID: mdl-34625021

RESUMO

BACKGROUND: Differential expression (DE) analysis of RNA-seq data typically depends on gene annotations. Different sets of gene annotations are available for the human genome and are continually updated-a process complicated with the development and application of high-throughput sequencing technologies. However, the impact of the complexity of gene annotations on DE analysis remains unclear. RESULTS: Using "mappability", a metric of the complexity of gene annotation, we compared three distinct human gene annotations, GENCODE, RefSeq, and NONCODE, and evaluated how mappability affected DE analysis. We found that mappability was significantly different among the human gene annotations. We also found that increasing mappability improved the performance of DE analysis, and the impact of mappability mainly evident in the quantification step and propagated downstream of DE analysis systematically. CONCLUSIONS: We assessed how the complexity of gene annotations affects DE analysis using mappability. Our findings indicate that the growth and complexity of gene annotations negatively impact the performance of DE analysis, suggesting that an approach that excludes unnecessary gene models from gene annotations improves the performance of DE analysis.


Assuntos
Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , RNA-Seq , Sequenciamento do Exoma
12.
Int J Mol Sci ; 22(8)2021 Apr 13.
Artigo em Inglês | MEDLINE | ID: mdl-33924522

RESUMO

(1) Background: Acquired resistance to epidermal growth factor receptor-tyrosine kinase inhibitors (EGFR-TKIs) is an intractable problem for many clinical oncologists. The mechanisms of resistance to EGFR-TKIs are complex. Long non-coding RNAs (lncRNAs) may play an important role in cancer development and metastasis. However, the biological process between lncRNAs and drug resistance to EGFR-mutated lung cancer remains largely unknown. (2) Methods: Osimertinib- and afatinib-resistant EGFR-mutated lung cancer cells were established using a stepwise method. A microarray analysis of non-coding and coding RNAs was performed using parental and resistant EGFR-mutant non-small cell lung cancer (NSCLC) cells and evaluated by bioinformatics analysis through medical-industrial collaboration. (3) Results: Colorectal neoplasia differentially expressed (CRNDE) and DiGeorge syndrome critical region gene 5 (DGCR5) lncRNAs were highly expressed in EGFR-TKI-resistant cells by microarray analysis. RNA-protein binding analysis revealed eukaryotic translation initiation factor 4A3 (eIF4A3) bound in an overlapping manner to CRNDE and DGCR5. The CRNDE downregulates the expression of eIF4A3, mucin 1 (MUC1), and phospho-EGFR. Inhibition of CRNDE activated the eIF4A3/MUC1/EGFR signaling pathway and apoptotic activity, and restored sensitivity to EGFR-TKIs. (4) Conclusions: The results showed that CRNDE is associated with the development of resistance to EGFR-TKIs. CRNDE may be a novel therapeutic target to conquer EGFR-mutant NSCLC.


Assuntos
RNA Helicases DEAD-box/metabolismo , Resistencia a Medicamentos Antineoplásicos/genética , Receptores ErbB/genética , Fator de Iniciação 4A em Eucariotos/metabolismo , Neoplasias Pulmonares/genética , Mucina-1/metabolismo , Mutação/genética , Inibidores de Proteínas Quinases/farmacologia , RNA Longo não Codificante/metabolismo , Acrilamidas/farmacologia , Acrilamidas/uso terapêutico , Adenocarcinoma de Pulmão/tratamento farmacológico , Adenocarcinoma de Pulmão/genética , Adenocarcinoma de Pulmão/patologia , Compostos de Anilina/farmacologia , Compostos de Anilina/uso terapêutico , Apoptose/efeitos dos fármacos , Apoptose/genética , Carcinoma Pulmonar de Células não Pequenas/tratamento farmacológico , Carcinoma Pulmonar de Células não Pequenas/genética , Carcinoma Pulmonar de Células não Pequenas/patologia , Linhagem Celular Tumoral , Neoplasias Colorretais/genética , Resistencia a Medicamentos Antineoplásicos/efeitos dos fármacos , Receptores ErbB/antagonistas & inibidores , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Humanos , Concentração Inibidora 50 , Neoplasias Pulmonares/tratamento farmacológico , Neoplasias Pulmonares/patologia , Modelos Biológicos , Inibidores de Proteínas Quinases/uso terapêutico , RNA Longo não Codificante/genética , Transdução de Sinais/efeitos dos fármacos
13.
BMC Bioinformatics ; 21(1): 103, 2020 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-32171255

RESUMO

BACKGROUND: Methylated RNA immunoprecipitation sequencing (MeRIP-Seq) is a popular sequencing method for studying RNA modifications and, in particular, for N6-methyladenosine (m6A), the most abundant RNA methylation modification found in various species. The detection of enriched regions is a main challenge of MeRIP-Seq analysis, however current tools either require a long time or do not fully utilize features of RNA sequencing such as strand information which could cause ambiguous calling. On the other hand, with more attention on the treatment experiments of MeRIP-Seq, biologists need intuitive evaluation on the treatment effect from comparison. Therefore, efficient and user-friendly software that can solve these tasks must be developed. RESULTS: We developed a software named "model-based analysis and inference of MeRIP-Seq (MoAIMS)" to detect enriched regions of MeRIP-Seq and infer signal proportion based on a mixture negative-binomial model. MoAIMS is designed for transcriptome immunoprecipitation sequencing experiments; therefore, it is compatible with different RNA sequencing protocols. MoAIMS offers excellent processing speed and competitive performance when compared with other tools. When MoAIMS is applied to studies of m6A, the detected enriched regions contain known biological features of m6A. Furthermore, signal proportion inferred from MoAIMS for m6A treatment datasets (perturbation of m6A methyltransferases) showed a decreasing trend that is consistent with experimental observations, suggesting that the signal proportion can be used as an intuitive indicator of treatment effect. CONCLUSIONS: MoAIMS is efficient and easy-to-use software implemented in R. MoAIMS can not only detect enriched regions of MeRIP-Seq efficiently but also provide intuitive evaluation on treatment effect for MeRIP-Seq treatment datasets.


Assuntos
Imunoprecipitação/métodos , RNA/metabolismo , Análise de Sequência de RNA/métodos , Software , Adenosina/análogos & derivados , Adenosina/metabolismo , Perfilação da Expressão Gênica , Humanos , Metilação , RNA/química
14.
Bioinformatics ; 35(22): 4543-4552, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-30993319

RESUMO

MOTIVATION: A cancer genome includes many mutations derived from various mutagens and mutational processes, leading to specific mutation patterns. It is known that each mutational process leads to characteristic mutations, and when a mutational process has preferences for mutations, this situation is called a 'mutation signature.' Identification of mutation signatures is an important task for elucidation of carcinogenic mechanisms. In previous studies, analyses with statistical approaches (e.g. non-negative matrix factorization and latent Dirichlet allocation) revealed a number of mutation signatures. Nonetheless, strictly speaking, these existing approaches employ an ad hoc method or incorrect approximation to estimate the number of mutation signatures, and the whole picture of mutation signatures is unclear. RESULTS: In this study, we present a novel method for estimating the number of mutation signatures-latent Dirichlet allocation with variational Bayes inference (VB-LDA)-where variational lower bounds are utilized for finding a plausible number of mutation patterns. In addition, we performed cluster analyses for estimated mutation signatures to extract novel mutation signatures that appear in multiple primary lesions. In a simulation with artificial data, we confirmed that our method estimated the correct number of mutation signatures. Furthermore, applying our method in combination with clustering procedures for real mutation data revealed many interesting mutation signatures that have not been previously reported. AVAILABILITY AND IMPLEMENTATION: All the predicted mutation signatures with clustering results are freely available at http://www.f.waseda.jp/mhamada/MS/index.html. All the C++ source code and python scripts utilized in this study can be downloaded on the Internet (https://github.com/qkirikigaku/MS_LDA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Mutação , Software , Teorema de Bayes , Análise por Conglomerados
15.
Biochem Biophys Res Commun ; 512(4): 641-646, 2019 05 14.
Artigo em Inglês | MEDLINE | ID: mdl-30497775

RESUMO

Chemical safety screening requires the development of more efficient assays that do not involve testing in animals. In vitro cell-based assays are among the most appropriate alternatives to animal testing for screening of chemical toxicity. Most studies performed to date made use of mRNAs as biomarkers. Recent studies have however indicated the presence of many unannotated non-coding RNAs (ncRNAs) in the transcriptome that do appear to encode proteins. In the present study, we performed whole-transcriptome sequencing analysis (RNA-Seq) to identify novel RNA biomarkers, including ncRNAs, which showed marked responses to the toxicity of nine chemicals. Chemical safety screening was performed in cell-based assays using mouse embryonic stem cell (mESC)-derived neural cells. Marked responses in the expression of some ncRNAs to the chemical compounds were observed. The results of the present study suggested that ncRNAs may be useful in chemical safety screening as novel RNA biomarkers.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neurônios/efeitos dos fármacos , RNA/genética , Testes de Toxicidade/métodos , Transcriptoma/efeitos dos fármacos , Alternativas aos Testes com Animais/métodos , Animais , Células Cultivadas , Segurança Química , Perfilação da Expressão Gênica/métodos , Camundongos , Camundongos Endogâmicos C57BL , Células-Tronco Embrionárias Murinas/citologia , Células-Tronco Embrionárias Murinas/metabolismo , Neurônios/citologia , Neurônios/metabolismo , Fenol/toxicidade , RNA não Traduzido/genética
16.
Bioinformatics ; 34(4): 576-584, 2018 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-29040374

RESUMO

Motivation: Pair Hidden Markov Models (PHMMs) are probabilistic models used for pairwise sequence alignment, a quintessential problem in bioinformatics. PHMMs include three types of hidden states: match, insertion and deletion. Most previous studies have used one or two hidden states for each PHMM state type. However, few studies have examined the number of states suitable for representing sequence data or improving alignment accuracy. Results: We developed a novel method to select superior models (including the number of hidden states) for PHMM. Our method selects models with the highest posterior probability using Factorized Information Criterion, which is widely utilized in model selection for probabilistic models with hidden variables. Our simulations indicated that this method has excellent model selection capabilities with slightly improved alignment accuracy. We applied our method to DNA datasets from 5 and 28 species, ultimately selecting more complex models than those used in previous studies. Availability and implementation: The software is available at https://github.com/bigsea-t/fab-phmm. Contact: mhamada@waseda.jp. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Software , Algoritmos , Animais , Teorema de Bayes , Humanos , Modelos Estatísticos , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Análise de Sequência de RNA/métodos
17.
BMC Bioinformatics ; 19(Suppl 19): 524, 2018 Dec 31.
Artigo em Inglês | MEDLINE | ID: mdl-30598068

RESUMO

BACKGROUND: N6-methyladensine (m6A) is a common and abundant RNA methylation modification found in various species. As a type of post-transcriptional methylation, m6A plays an important role in diverse RNA activities such as alternative splicing, an interplay with microRNAs and translation efficiency. Although existing tools can predict m6A at single-base resolution, it is still challenging to extract the biological information surrounding m6A sites. RESULTS: We implemented a deep learning framework, named DeepM6ASeq, to predict m6A-containing sequences and characterize surrounding biological features based on miCLIP-Seq data, which detects m6A sites at single-base resolution. DeepM6ASeq showed better performance as compared to other machine learning classifiers. Moreover, an independent test on m6A-Seq data, which identifies m6A-containing genomic regions, revealed that our model is competitive in predicting m6A-containing sequences. The learned motifs from DeepM6ASeq correspond to known m6A readers. Notably, DeepM6ASeq also identifies a newly recognized m6A reader: FMR1. Besides, we found that a saliency map in the deep learning model could be utilized to visualize locations of m6A sites. CONCULSION: We developed a deep-learning-based framework to predict and characterize m6A-containing sequences and hope to help investigators to gain more insights for m6A research. The source code is available at https://github.com/rreybeyb/DeepM6ASeq .


Assuntos
Adenosina/análogos & derivados , Processamento Alternativo , Biologia Computacional/métodos , Aprendizado Profundo , RNA/análise , Análise de Sequência de RNA/métodos , Adenosina/química , Adenosina/genética , Animais , Encéfalo/metabolismo , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/patologia , Células-Tronco Embrionárias/citologia , Células-Tronco Embrionárias/metabolismo , Humanos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/patologia , Metilação , Camundongos , RNA/genética , Peixe-Zebra
18.
BMC Genomics ; 19(Suppl 10): 906, 2018 Dec 31.
Artigo em Inglês | MEDLINE | ID: mdl-30598103

RESUMO

BACKGROUND: With the increasing number of annotated long noncoding RNAs (lncRNAs) from the genome, researchers are continually updating their understanding of lncRNAs. Recently, thousands of lncRNAs have been reported to be associated with ribosomes in mammals. However, their biological functions or mechanisms are still unclear. RESULTS: In this study, we tried to investigate the sequence features involved in the ribosomal association of lncRNA. We have extracted ninety-nine sequence features corresponding to different biological mechanisms (i.e., RNA splicing, putative ORF, k-mer frequency, RNA modification, RNA secondary structure, and repeat element). An [Formula: see text]-regularized logistic regression model was applied to screen these features. Finally, we obtained fifteen and nine important features for the ribosomal association of human and mouse lncRNAs, respectively. CONCLUSION: To our knowledge, this is the first study to characterize ribosome-associated lncRNAs and ribosome-free lncRNAs from the perspective of sequence features. These sequence features that were identified in this study may shed light on the biological mechanism of the ribosomal association and provide important clues for functional analysis of lncRNAs.


Assuntos
RNA Longo não Codificante/genética , Ribossomos/genética , Análise de Sequência de RNA , Fases de Leitura Aberta/genética , Polimerização , RNA Longo não Codificante/química , Sequências Repetitivas de Ácido Nucleico/genética
19.
BMC Genomics ; 19(1): 414, 2018 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-29843593

RESUMO

BACKGROUND: Although the number of discovered long non-coding RNAs (lncRNAs) has increased dramatically, their biological roles have not been established. Many recent studies have used ribosome profiling data to assess the protein-coding capacity of lncRNAs. However, very little work has been done to identify ribosome-associated lncRNAs, here defined as lncRNAs interacting with ribosomes related to protein synthesis as well as other unclear biological functions. RESULTS: On average, 39.17% of expressed lncRNAs were observed to interact with ribosomes in human and 48.16% in mouse. We developed the ribosomal association index (RAI), which quantifies the evidence for ribosomal associability of lncRNAs over various tissues and cell types, to catalog 691 and 409 lncRNAs that are robustly associated with ribosomes in human and mouse, respectively. Moreover, we identified 78 and 42 lncRNAs with a high probability of coding peptides in human and mouse, respectively. Compared with ribosome-free lncRNAs, ribosome-associated lncRNAs were observed to be more likely to be located in the cytoplasm and more sensitive to nonsense-mediated decay. CONCLUSION: Our results suggest that RAI can be used as an integrative and evidence-based tool for distinguishing between ribosome-associated and free lncRNAs, providing a valuable resource for the study of lncRNA functions.


Assuntos
RNA Longo não Codificante/genética , Ribossomos/genética , Análise de Sequência de RNA , Perfilação da Expressão Gênica , Células HeLa , Humanos
20.
Bioinformatics ; 33(17): 2666-2674, 2017 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-28459942

RESUMO

MOTIVATION: LncRNAs play important roles in various biological processes. Although more than 58 000 human lncRNA genes have been discovered, most known lncRNAs are still poorly characterized. One approach to understanding the functions of lncRNAs is the detection of the interacting RNA target of each lncRNA. Because experimental detections of comprehensive lncRNA-RNA interactions are difficult, computational prediction of lncRNA-RNA interactions is an indispensable technique. However, the high computational costs of existing RNA-RNA interaction prediction tools prevent their application to large-scale lncRNA datasets. RESULTS: Here, we present 'RIblast', an ultrafast RNA-RNA interaction prediction method based on the seed-and-extension approach. RIblast discovers seed regions using suffix arrays and subsequently extends seed regions based on an RNA secondary structure energy model. Computational experiments indicate that RIblast achieves a level of prediction accuracy similar to those of existing programs, but at speeds over 64 times faster than existing programs. AVAILABILITY AND IMPLEMENTATION: The source code of RIblast is freely available at https://github.com/fukunagatsu/RIblast . CONTACT: t.fukunaga@kurenai.waseda.jp or mhamada@waseda.jp. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Anotação de Sequência Molecular/métodos , RNA Longo não Codificante/metabolismo , Software , Humanos , RNA Longo não Codificante/genética , RNA Mensageiro/metabolismo , Análise de Sequência de RNA/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA