Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

DeepRaccess: high-speed RNA accessibility prediction using deep learning.

Hara, Kaisei; Iwano, Natsuki; Fukunaga, Tsukasa; Hamada, Michiaki.

Front Bioinform ; 3: 1275787, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37881622

RESUMO

RNA accessibility is a useful RNA secondary structural feature for predicting RNA-RNA interactions and translation efficiency in prokaryotes. However, conventional accessibility calculation tools, such as Raccess, are computationally expensive and require considerable computational time to perform transcriptome-scale analysis. In this study, we developed DeepRaccess, which predicts RNA accessibility based on deep learning methods. DeepRaccess was trained to take artificial RNA sequences as input and to predict the accessibility of these sequences as calculated by Raccess. Simulation and empirical dataset analyses showed that the accessibility predicted by DeepRaccess was highly correlated with the accessibility calculated by Raccess. In addition, we confirmed that DeepRaccess could predict protein abundance in E.coli with moderate accuracy from the sequences around the start codon. We also demonstrated that DeepRaccess achieved tens to hundreds of times software speed-up in a GPU environment. The source codes and the trained models of DeepRaccess are freely available at https://github.com/hmdlab/DeepRaccess.

2.

Neat1 lncRNA organizes the inflammatory gene expressions in the dorsal root ganglion in neuropathic pain caused by nerve injury.

Maruyama, Motoyo; Sakai, Atsushi; Fukunaga, Tsukasa; Miyagawa, Yoshitaka; Okada, Takashi; Hamada, Michiaki; Suzuki, Hidenori.

Front Immunol ; 14: 1185322, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37614230

RESUMO

Primary sensory neurons regulate inflammatory processes in innervated regions through neuro-immune communication. However, how their immune-modulating functions are regulated in concert remains largely unknown. Here, we show that Neat1 long non-coding RNA (lncRNA) organizes the proinflammatory gene expressions in the dorsal root ganglion (DRG) in chronic intractable neuropathic pain in rats. Neat1 was abundantly expressed in the DRG and was upregulated after peripheral nerve injury. Neat1 overexpression in primary sensory neurons caused mechanical and thermal hypersensitivity, whereas its knockdown alleviated neuropathic pain. Bioinformatics analysis of comprehensive transcriptome changes indicated the inflammatory response was the most relevant function of genes upregulated through Neat1. Consistent with this, upregulation of proinflammatory genes in the DRG following nerve injury was suppressed by Neat1 knockdown. Expression changes of these proinflammatory genes were regulated through Neat1-mRNA interaction-dependent and -independent mechanisms. Notably, Neat1 increased proinflammatory genes by stabilizing its interacting mRNAs in neuropathic pain. Finally, Neat1 in primary sensory neurons contributed to spinal inflammatory processes that mediated peripheral neuropathic pain. These findings demonstrate that Neat1 lncRNA is a key regulator of neuro-immune communication in neuropathic pain.

Assuntos

Neuralgia , RNA Longo não Codificante , Traumatismos do Sistema Nervoso , Animais , Ratos , RNA Longo não Codificante/genética , Gânglios Espinais , Neuralgia/genética , RNA Mensageiro , Transcriptoma

3.

Bioinformatics approaches for unveiling virus-host interactions.

Iuchi, Hitoshi; Kawasaki, Junna; Kubo, Kento; Fukunaga, Tsukasa; Hokao, Koki; Yokoyama, Gentaro; Ichinose, Akiko; Suga, Kanta; Hamada, Michiaki.

Comput Struct Biotechnol J ; 21: 1774-1784, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36874163

RESUMO

The coronavirus disease-2019 (COVID-19) pandemic has elucidated major limitations in the capacity of medical and research institutions to appropriately manage emerging infectious diseases. We can improve our understanding of infectious diseases by unveiling virus-host interactions through host range prediction and protein-protein interaction prediction. Although many algorithms have been developed to predict virus-host interactions, numerous issues remain to be solved, and the entire network remains veiled. In this review, we comprehensively surveyed algorithms used to predict virus-host interactions. We also discuss the current challenges, such as dataset biases toward highly pathogenic viruses, and the potential solutions. The complete prediction of virus-host interactions remains difficult; however, bioinformatics can contribute to progress in research on infectious diseases and human health.

4.

Fast RNA-RNA Interaction Prediction Methods for Interaction Analysis of Transcriptome-Scale Large Datasets.

Fukunaga, Tsukasa; Hamada, Michiaki.

Methods Mol Biol ; 2586: 163-173, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36705904

RESUMO

The computational prediction of RNA-RNA interactions has long been studied in RNA informatics. Most of the existing approaches focused on the interaction prediction of short RNAs in small datasets. However, in recent years, two fast prediction methods, RIsearch2 and RIblast, have been developed to predict transcriptome-scale interactions or long RNA interactions. The key idea of the software acceleration of these tools was the integration of a seed-and-extend method, which is used in fast sequence alignment tools, into RNA-RNA interaction prediction. As a result, the two software programs were ten to a thousand times faster than the existing tools; because of this acceleration, detection of genome-wide microRNA target sites or interaction partners of function-unknown long noncoding RNAs has become possible. In this review, we describe the basic concept of the algorithm, its applications, and the future perspectives of the fast RNA-RNA interaction prediction tools.

Assuntos

MicroRNAs , RNA Longo não Codificante , Transcriptoma , Software , MicroRNAs/genética , Algoritmos , RNA Longo não Codificante/genética , Biologia Computacional/métodos

5.

Web Services for RNA-RNA Interaction Prediction.

Fukunaga, Tsukasa; Iwakiri, Junichi; Hamada, Michiaki.

Methods Mol Biol ; 2586: 175-195, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36705905

RESUMO

Non-coding RNAs have various biological functions such as translational regulation, and RNA-RNA interactions play essential roles in the mechanisms of action of these RNAs. Therefore, RNA-RNA interaction prediction is an important problem in bioinformatics, and many tools have been developed for the computational prediction of RNA-RNA interactions. In addition to the development of novel algorithms with high accuracy, the development and maintenance of web services is essential for enhancing usability by experimental biologists. In this review, we survey web services for RNA-RNA interaction predictions and introduce how to use primary web services. We present various prediction tools, including general interaction prediction tools, prediction tools for specific RNA classes, and RNA-RNA interaction-based RNA design tools. Additionally, we discuss the future perspectives of the development of RNA-RNA interaction prediction tools and the sustainability of web services.

Assuntos

MicroRNAs , RNA , RNA/genética , Algoritmos , Biologia Computacional , MicroRNAs/genética

6.

Bioinformatics Approaches for Determining the Functional Impact of Repetitive Elements on Non-coding RNAs.

Zeng, Chao; Takeda, Atsushi; Sekine, Kotaro; Osato, Naoki; Fukunaga, Tsukasa; Hamada, Michiaki.

Methods Mol Biol ; 2509: 315-340, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35796972

RESUMO

With a large number of annotated non-coding RNAs (ncRNAs), repetitive sequences are found to constitute functional components (termed as repetitive elements) in ncRNAs that perform specific biological functions. Bioinformatics analysis is a powerful tool for improving our understanding of the role of repetitive elements in ncRNAs. This chapter summarizes recent findings that reveal the role of repetitive elements in ncRNAs. Furthermore, relevant bioinformatics approaches are systematically reviewed, which promises to provide valuable resources for studying the functional impact of repetitive elements on ncRNAs.

Assuntos

Biologia Computacional , RNA não Traduzido , RNA não Traduzido/genética , Sequências Repetitivas de Ácido Nucleico/genética

7.

Mirage 2.0: fast and memory-efficient reconstruction of gene-content evolution considering heterogeneous evolutionary patterns among gene families.

Fukunaga, Tsukasa; Iwasaki, Wataru.

Bioinformatics ; 38(16): 4039-4041, 2022 08 10.

Artigo em Inglês | MEDLINE | ID: mdl-35771653

RESUMO

SUMMARY: We present Mirage 2.0, which accurately estimates gene-content evolutionary history by considering heterogeneous evolutionary patterns among gene families. Notably, we introduce a deterministic pattern mixture model, which makes Mirage substantially faster and more memory-efficient to be applicable to large datasets with thousands of genomes. AVAILABILITY AND IMPLEMENTATION: The source code is freely available at https://github.com/fukunagatsu/Mirage. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Genoma , Software , Evolução Molecular , Evolução Biológica , Porcelana Dentária

8.

Inverse Potts model improves accuracy of phylogenetic profiling.

Fukunaga, Tsukasa; Iwasaki, Wataru.

Bioinformatics ; 38(7): 1794-1800, 2022 03 28.

Artigo em Inglês | MEDLINE | ID: mdl-35060594

RESUMO

MOTIVATION: Phylogenetic profiling is a powerful computational method for revealing the functions of function-unknown genes. Although conventional similarity metrics in phylogenetic profiling achieved high prediction accuracy, they have two estimation biases: an evolutionary bias and a spurious correlation bias. While previous studies reduced the evolutionary bias by considering a phylogenetic tree, few studies have analyzed the spurious correlation bias. RESULTS: To reduce the spurious correlation bias, we developed metrics based on the inverse Potts model (IPM) for phylogenetic profiling. We also developed a metric based on both the IPM and a phylogenetic tree. In an empirical dataset analysis, we demonstrated that these IPM-based metrics improved the prediction performance of phylogenetic profiling. In addition, we found that the integration of several metrics, including the IPM-based metrics, had superior performance to a single metric. AVAILABILITY AND IMPLEMENTATION: The source code is freely available at https://github.com/fukunagatsu/Ipm. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Software , Filogenia

9.

LinAliFold and CentroidLinAliFold: fast RNA consensus secondary structure prediction for aligned sequences using beam search methods.

Fukunaga, Tsukasa; Hamada, Michiaki.

Bioinform Adv ; 2(1): vbac078, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36699418

RESUMO

Motivation: RNA consensus secondary structure prediction from aligned sequences is a powerful approach for improving the secondary structure prediction accuracy. However, because the computational complexities of conventional prediction tools scale with the cube of the alignment lengths, their application to long RNA sequences, such as viral RNAs or long non-coding RNAs, requires significant computational time. Results: In this study, we developed LinAliFold and CentroidLinAliFold, fast RNA consensus secondary structure prediction tools based on minimum free energy and maximum expected accuracy principles, respectively. We achieved software acceleration using beam search methods that were successfully used for fast secondary structure prediction from a single RNA sequence. Benchmark analyses showed that LinAliFold and CentroidLinAliFold were much faster than the existing methods while preserving the prediction accuracy. As an empirical application, we predicted the consensus secondary structure of coronaviruses with approximately 30 000 nt in 5 and 79 min by LinAliFold and CentroidLinAliFold, respectively. We confirmed that the predicted consensus secondary structure of coronaviruses was consistent with the experimental results. Availability and implementation: The source codes of LinAliFold and CentroidLinAliFold are freely available at https://github.com/fukunagatsu/LinAliFold-CentroidLinAliFold. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

10.

Umibato: estimation of time-varying microbial interaction using continuous-time regression hidden Markov model.

Hosoda, Shion; Fukunaga, Tsukasa; Hamada, Michiaki.

Bioinformatics ; 37(Suppl_1): i16-i24, 2021 07 12.

Artigo em Inglês | MEDLINE | ID: mdl-34252954

RESUMO

MOTIVATION: Accumulating evidence has highlighted the importance of microbial interaction networks. Methods have been developed for estimating microbial interaction networks, of which the generalized Lotka-Volterra equation (gLVE)-based method can estimate a directed interaction network. The previous gLVE-based method for estimating microbial interaction networks did not consider time-varying interactions. RESULTS: In this study, we developed unsupervised learning-based microbial interaction inference method using Bayesian estimation (Umibato), a method for estimating time-varying microbial interactions. The Umibato algorithm comprises Gaussian process regression (GPR) and a new Bayesian probabilistic model, the continuous-time regression hidden Markov model (CTRHMM). Growth rates are estimated by GPR, and interaction networks are estimated by CTRHMM. CTRHMM can estimate time-varying interaction networks using interaction states, which are defined as hidden variables. Umibato outperformed the existing methods on synthetic datasets. In addition, it yielded reasonable estimations in experiments on a mouse gut microbiota dataset, thus providing novel insights into the relationship between consumed diets and the gut microbiota. AVAILABILITY AND IMPLEMENTATION: The C++ and python source codes of the Umibato software are available at https://github.com/shion-h/Umibato. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Software , Animais , Teorema de Bayes , Camundongos , Interações Microbianas , Distribuição Normal

11.

Representation learning applications in biological sequence analysis.

Iuchi, Hitoshi; Matsutani, Taro; Yamada, Keisuke; Iwano, Natsuki; Sumi, Shunsuke; Hosoda, Shion; Zhao, Shitao; Fukunaga, Tsukasa; Hamada, Michiaki.

Comput Struct Biotechnol J ; 19: 3198-3208, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34141139

RESUMO

Although remarkable advances have been reported in high-throughput sequencing, the ability to aptly analyze a substantial amount of rapidly generated biological (DNA/RNA/protein) sequencing data remains a critical hurdle. To tackle this issue, the application of natural language processing (NLP) to biological sequence analysis has received increased attention. In this method, biological sequences are regarded as sentences while the single nucleic acids/amino acids or k-mers in these sequences represent the words. Embedding is an essential step in NLP, which performs the conversion of these words into vectors. Specifically, representation learning is an approach used for this transformation process, which can be applied to biological sequences. Vectorized biological sequences can then be applied for function and structure estimation, or as input for other probabilistic models. Considering the importance and growing trend for the application of representation learning to biological research, in the present study, we have reviewed the existing knowledge in representation learning for biological sequence analysis.

12.

Novel metric for hyperbolic phylogenetic tree embeddings.

Matsumoto, Hirotaka; Mimori, Takahiro; Fukunaga, Tsukasa.

Biol Methods Protoc ; 6(1): bpab006, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33928190

RESUMO

Advances in experimental technologies, such as DNA sequencing, have opened up new avenues for the applications of phylogenetic methods to various fields beyond their traditional application in evolutionary investigations, extending to the fields of development, differentiation, cancer genomics, and immunogenomics. Thus, the importance of phylogenetic methods is increasingly being recognized, and the development of a novel phylogenetic approach can contribute to several areas of research. Recently, the use of hyperbolic geometry has attracted attention in artificial intelligence research. Hyperbolic space can better represent a hierarchical structure compared to Euclidean space, and can therefore be useful for describing and analyzing a phylogenetic tree. In this study, we developed a novel metric that considers the characteristics of a phylogenetic tree for representation in hyperbolic space. We compared the performance of the proposed hyperbolic embeddings, general hyperbolic embeddings, and Euclidean embeddings, and confirmed that our method could be used to more precisely reconstruct evolutionary distance. We also demonstrate that our approach is useful for predicting the nearest-neighbor node in a partial phylogenetic tree with missing nodes. Furthermore, we proposed a novel approach based on our metric to integrate multiple trees for analyzing tree nodes or imputing missing distances. This study highlights the utility of adopting a geometric approach for further advancing the applications of phylogenetic methods.

13.

Mirage: estimation of ancestral gene-copy numbers by considering different evolutionary patterns among gene families.

Fukunaga, Tsukasa; Iwasaki, Wataru.

Bioinform Adv ; 1(1): vbab014, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-36700099

RESUMO

Motivation: Reconstruction of gene copy number evolution is an essential approach for understanding how complex biological systems have been organized. Although various models have been proposed for gene copy number evolution, existing evolutionary models have not appropriately addressed the fact that different gene families can have very different gene gain/loss rates. Results: In this study, we developed Mirage (MIxtuRe model for Ancestral Genome Estimation), which allows different gene families to have flexible gene gain/loss rates. Mirage can use three models for formulating heterogeneous evolution among gene families: the discretized Γ model, probability distribution-free model and pattern mixture (PM) model. Simulation analysis showed that Mirage can accurately estimate heterogeneous gene gain/loss rates and reconstruct gene-content evolutionary history. Application to empirical datasets demonstrated that the PM model fits genome data from various taxonomic groups better than the other heterogeneous models. Using Mirage, we revealed that metabolic function-related gene families displayed frequent gene gains and losses in all taxa investigated. Availability and implementation: The source code of Mirage is freely available at https://github.com/fukunagatsu/Mirage. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

14.

Revealing the microbial assemblage structure in the human gut microbiome using latent Dirichlet allocation.

Hosoda, Shion; Nishijima, Suguru; Fukunaga, Tsukasa; Hattori, Masahira; Hamada, Michiaki.

Microbiome ; 8(1): 95, 2020 06 23.

Artigo em Inglês | MEDLINE | ID: mdl-32576288

RESUMO

BACKGROUND: The human gut microbiome has been suggested to affect human health and thus has received considerable attention. To clarify the structure of the human gut microbiome, clustering methods are frequently applied to human gut taxonomic profiles. Enterotypes, i.e., clusters of individuals with similar microbiome composition, are well-studied and characterized. However, only a few detailed studies on assemblages, i.e., clusters of co-occurring bacterial taxa, have been conducted. Particularly, the relationship between the enterotype and assemblage is not well-understood. RESULTS: In this study, we detected gut microbiome assemblages using a latent Dirichlet allocation (LDA) method. We applied LDA to a large-scale human gut metagenome dataset and found that a 4-assemblage LDA model could represent relationships between enterotypes and assemblages with high interpretability. This model indicated that each individual tends to have several assemblages, three of which corresponded to the three classically recognized enterotypes. Conversely, the fourth assemblage corresponded to no enterotypes and emerged in all enterotypes. Interestingly, the dominant genera of this assemblage (Clostridium, Eubacterium, Faecalibacterium, Roseburia, Coprococcus, and Butyrivibrio) included butyrate-producing species such as Faecalibacterium prausnitzii. Indeed, the fourth assemblage significantly positively correlated with three butyrate-producing functions. CONCLUSIONS: We conducted an assemblage analysis on a large-scale human gut metagenome dataset using LDA. The present study revealed that there is an enterotype-independent assemblage. Video Abstract.

Assuntos

Microbioma Gastrointestinal/genética , Análise de Classes Latentes , Metagenoma , Bactérias/classificação , Bactérias/genética , Bactérias/isolamento & purificação , Butiratos/metabolismo , Conjuntos de Dados como Assunto , Humanos

15.

Logicome Profiler: Exhaustive detection of statistically significant logic relationships from comparative omics data.

Fukunaga, Tsukasa; Iwasaki, Wataru.

PLoS One ; 15(5): e0232106, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32357172

RESUMO

Logic relationship analysis is a data mining method that comprehensively detects item triplets that satisfy logic relationships from a binary matrix dataset, such as an ortholog table in comparative genomics. Thanks to recent technological advancements, many binary matrix datasets are now being produced in genomics, transcriptomics, epigenomics, metagenomics, and many other fields for comparative purposes. However, regardless of presumed interpretability and importance of logic relationships, existing data mining methods are not based on the framework of statistical hypothesis testing. That means, the type-1 and type-2 error rates are neither controlled nor estimated. Here, we developed Logicome Profiler, which exhaustively detects statistically significant triplet logic relationships from a binary matrix dataset (Logicome means ome of logics). To test all item triplets in a dataset while avoiding false positives, Logicome Profiler adjusts a significance level by the Bonferroni or Benjamini-Yekutieli method for the multiple testing correction. Its application to an ocean metagenomic dataset showed that Logicome Profiler can effectively detect statistically significant triplet logic relationships among environmental microbes and genes, which include those among urea transporter, urease, and photosynthesis-related genes. Beyond omics data analysis, Logicome Profiler is applicable to various binary matrix datasets in general for finding significant triplet logic relationships. The source code is available at https://github.com/fukunagatsu/LogicomeProfiler.

Assuntos

Mineração de Dados/métodos , Conjuntos de Dados como Assunto , Algoritmos , Lógica , Modelos Estatísticos

16.

Targeting the TR4 nuclear receptor-mediated lncTASR/AXL signaling with tretinoin increases the sunitinib sensitivity to better suppress the RCC progression.

Shi, Hangchuan; Sun, Yin; He, Miao; Yang, Xiong; Hamada, Michiaki; Fukunaga, Tsukasa; Zhang, Xiaoping; Chang, Chawnshang.

Oncogene ; 39(3): 530-545, 2020 01.

Artigo em Inglês | MEDLINE | ID: mdl-31501521

RESUMO

Renal cell carcinoma (RCC) is one of the most lethal urological tumors. Using sunitinib to improve the survival has become the first-line therapy for metastatic RCC patients. However, the occurrence of sunitinib resistance in the clinical application has curtailed its efficacy. Here we found TR4 nuclear receptor might alter the sunitinib resistance to RCC via altering the TR4/lncTASR/AXL signaling. Mechanism dissection revealed that TR4 could modulate lncTASR (ENST00000600671.1) expression via transcriptional regulation, which might then increase AXL protein expression via enhancing the stability of AXL mRNA to increase the sunitinib resistance in RCC. Human clinical surveys also linked the expression of TR4, lncTASR, and AXL to the RCC survival, and results from multiple RCC cell lines revealed that targeting this newly identified TR4-mediated signaling with small molecules, including tretinoin, metformin, or TR4-shRNAs, all led to increase the sunitinib sensitivity to better suppress the RCC progression, and our preclinical study using the in vivo mouse model further proved tretinoin had a better synergistic effect to increase sunitinib sensitivity to suppress RCC progression. Future successful clinical trials may help in the development of a novel therapy to better suppress the RCC progression.

Assuntos

Protocolos de Quimioterapia Combinada Antineoplásica/farmacologia , Carcinoma de Células Renais/tratamento farmacológico , Neoplasias Renais/tratamento farmacológico , Proteínas Proto-Oncogênicas/genética , RNA Longo não Codificante/genética , Receptores Proteína Tirosina Quinases/genética , Receptores de Esteroides/metabolismo , Receptores dos Hormônios Tireóideos/metabolismo , Sunitinibe/farmacologia , Animais , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Carcinoma de Células Renais/genética , Carcinoma de Células Renais/patologia , Hipóxia Celular/genética , Linhagem Celular Tumoral , Movimento Celular/efeitos dos fármacos , Movimento Celular/genética , Progressão da Doença , Resistencia a Medicamentos Antineoplásicos/efeitos dos fármacos , Resistencia a Medicamentos Antineoplásicos/genética , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Humanos , Neoplasias Renais/genética , Neoplasias Renais/patologia , Metformina/farmacologia , Metformina/uso terapêutico , Camundongos , RNA Longo não Codificante/metabolismo , RNA Interferente Pequeno/metabolismo , Receptores de Esteroides/genética , Receptores dos Hormônios Tireóideos/genética , Transdução de Sinais/efeitos dos fármacos , Transdução de Sinais/genética , Sunitinibe/uso terapêutico , Tretinoína/farmacologia , Tretinoína/uso terapêutico , Ensaios Antitumorais Modelo de Xenoenxerto , Receptor Tirosina Quinase Axl

17.

LncRRIsearch: A Web Server for lncRNA-RNA Interaction Prediction Integrated With Tissue-Specific Expression and Subcellular Localization Data.

Fukunaga, Tsukasa; Iwakiri, Junichi; Ono, Yukiteru; Hamada, Michiaki.

Front Genet ; 10: 462, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31191601

RESUMO

Long non-coding RNAs (lncRNAs) play critical roles in various biological processes, but the function of the majority of lncRNAs is still unclear. One approach for estimating a function of a lncRNA is the identification of its interaction target because functions of lncRNAs are expressed through interaction with other biomolecules in quite a few cases. In this paper, we developed "LncRRIsearch," which is a web server for comprehensive prediction of human and mouse lncRNA-lncRNA and lncRNA-mRNA interaction. The prediction was conducted using RIblast, which is a fast and accurate RNA-RNA interaction prediction tool. Users can investigate interaction target RNAs of a particular lncRNA through a web interface. In addition, we integrated tissue-specific expression and subcellular localization data for the lncRNAs with the web server. These data enable users to examine tissue-specific or subcellular localized lncRNA interactions. LncRRIsearch is publicly accessible at http://rtools.cbrc.jp/LncRRIsearch/.

18.

Discovering novel mutation signatures by latent Dirichlet allocation with variational Bayes inference.

Matsutani, Taro; Ueno, Yuki; Fukunaga, Tsukasa; Hamada, Michiaki.

Bioinformatics ; 35(22): 4543-4552, 2019 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-30993319

RESUMO

MOTIVATION: A cancer genome includes many mutations derived from various mutagens and mutational processes, leading to specific mutation patterns. It is known that each mutational process leads to characteristic mutations, and when a mutational process has preferences for mutations, this situation is called a 'mutation signature.' Identification of mutation signatures is an important task for elucidation of carcinogenic mechanisms. In previous studies, analyses with statistical approaches (e.g. non-negative matrix factorization and latent Dirichlet allocation) revealed a number of mutation signatures. Nonetheless, strictly speaking, these existing approaches employ an ad hoc method or incorrect approximation to estimate the number of mutation signatures, and the whole picture of mutation signatures is unclear. RESULTS: In this study, we present a novel method for estimating the number of mutation signatures-latent Dirichlet allocation with variational Bayes inference (VB-LDA)-where variational lower bounds are utilized for finding a plausible number of mutation patterns. In addition, we performed cluster analyses for estimated mutation signatures to extract novel mutation signatures that appear in multiple primary lesions. In a simulation with artificial data, we confirmed that our method estimated the correct number of mutation signatures. Furthermore, applying our method in combination with clustering procedures for real mutation data revealed many interesting mutation signatures that have not been previously reported. AVAILABILITY AND IMPLEMENTATION: All the predicted mutation signatures with clustering results are freely available at http://www.f.waseda.jp/mhamada/MS/index.html. All the C++ source code and python scripts utilized in this study can be downloaded on the Internet (https://github.com/qkirikigaku/MS_LDA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Mutação , Software , Teorema de Bayes , Análise por Conglomerados

19.

A Novel Method for Assessing the Statistical Significance of RNA-RNA Interactions Between Two Long RNAs.

Fukunaga, Tsukasa; Hamada, Michiaki.

J Comput Biol ; 25(9): 976-986, 2018 09.

Artigo em Inglês | MEDLINE | ID: mdl-29963900

RESUMO

RNA-RNA interactions are key mechanisms through which noncoding RNA (ncRNA) regions exert biological functions. Computational prediction of RNA-RNA interactions is an essential method for detecting novel RNA-RNA interactions because their comprehensive detection by biological experimentation is still quite difficult. Many RNA-RNA interaction prediction tools have been developed, but they tend to produce many false positives. Accordingly, assessment of the statistical significance of computationally predicted interactions is an important task. However, there is no method to evaluate the statistical significance of RNA-RNA interactions that is applicable to interactions between two long RNA sequences. We developed a method to calculate the p-value for the minimal interaction energy between two long RNA sequences. The developed method depends on the fact that minimum interaction energies of RNA-RNA interactions between long RNAs follow a Gumbel distribution when repeat sequences in RNAs are masked. To show the usefulness of the developed method, we applied it to whole human 5'-untranslated region (UTR) and 3'-UTR sequences to detect novel 5'-UTR-3'-UTR interactions. We thus identified two significant 5'-UTR-3'-UTR interactions. Specifically, the human small proline-rich repeat protein 3 shows conserved 5'-UTR-3'-UTR interactions with some nucleotide variations preserving base pairings among primates. Our developed method enables us to detect statistically significant RNA-RNA interactions between long RNAs such as long ncRNAs. Statistical significance estimates help in identification of interactions for experimental validation and provide novel insights into the function of ncRNA regions.

Assuntos

Regiões 3' não Traduzidas , Regiões 5' não Traduzidas , Algoritmos , Proteínas Ricas em Prolina do Estrato Córneo/metabolismo , RNA não Traduzido/metabolismo , Sequência de Bases , Proteínas Ricas em Prolina do Estrato Córneo/química , Humanos , RNA não Traduzido/química , Homologia de Sequência

20.

Identification and analysis of ribosome-associated lncRNAs using ribosome profiling data.

Zeng, Chao; Fukunaga, Tsukasa; Hamada, Michiaki.

BMC Genomics ; 19(1): 414, 2018 May 29.

Artigo em Inglês | MEDLINE | ID: mdl-29843593

RESUMO

BACKGROUND: Although the number of discovered long non-coding RNAs (lncRNAs) has increased dramatically, their biological roles have not been established. Many recent studies have used ribosome profiling data to assess the protein-coding capacity of lncRNAs. However, very little work has been done to identify ribosome-associated lncRNAs, here defined as lncRNAs interacting with ribosomes related to protein synthesis as well as other unclear biological functions. RESULTS: On average, 39.17% of expressed lncRNAs were observed to interact with ribosomes in human and 48.16% in mouse. We developed the ribosomal association index (RAI), which quantifies the evidence for ribosomal associability of lncRNAs over various tissues and cell types, to catalog 691 and 409 lncRNAs that are robustly associated with ribosomes in human and mouse, respectively. Moreover, we identified 78 and 42 lncRNAs with a high probability of coding peptides in human and mouse, respectively. Compared with ribosome-free lncRNAs, ribosome-associated lncRNAs were observed to be more likely to be located in the cytoplasm and more sensitive to nonsense-mediated decay. CONCLUSION: Our results suggest that RAI can be used as an integrative and evidence-based tool for distinguishing between ribosome-associated and free lncRNAs, providing a valuable resource for the study of lncRNA functions.

Assuntos

RNA Longo não Codificante/genética , Ribossomos/genética , Análise de Sequência de RNA , Perfilação da Expressão Gênica , Células HeLa , Humanos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA