Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
BMC Bioinformatics ; 25(1): 13, 2024 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-38195423

RESUMO

BACKGROUND: MicroRNAs (miRNAs) are a class of non-coding RNAs that play a pivotal role as gene expression regulators. These miRNAs are typically approximately 20 to 25 nucleotides long. The maturation of miRNAs requires Dicer cleavage at specific sites within the precursor miRNAs (pre-miRNAs). Recent advances in machine learning-based approaches for cleavage site prediction, such as PHDcleav and LBSizeCleav, have been reported. ReCGBM, a gradient boosting-based model, demonstrates superior performance compared with existing methods. Nonetheless, ReCGBM operates solely as a binary classifier despite the presence of two cleavage sites in a typical pre-miRNA. Previous approaches have focused on utilizing only a fraction of the structural information in pre-miRNAs, often overlooking comprehensive secondary structure information. There is a compelling need for the development of a novel model to address these limitations. RESULTS: In this study, we developed a deep learning model for predicting the presence of a Dicer cleavage site within a pre-miRNA segment. This model was enhanced by an autoencoder that learned the secondary structure embeddings of pre-miRNA. Benchmarking experiments demonstrated that the performance of our model was comparable to that of ReCGBM in the binary classification tasks. In addition, our model excelled in multi-class classification tasks, making it a more versatile and practical solution than ReCGBM. CONCLUSIONS: Our proposed model exhibited superior performance compared with the current state-of-the-art model, underscoring the effectiveness of a deep learning approach in predicting Dicer cleavage sites. Furthermore, our model could be trained using only sequence and secondary structure information. Its capacity to accommodate multi-class classification tasks has enhanced the practical utility of our model.


Assuntos
Aprendizado Profundo , MicroRNAs , Humanos , Benchmarking , Aprendizado de Máquina , Nucleotídeos
2.
BMC Bioinformatics ; 24(1): 252, 2023 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-37322439

RESUMO

BACKGROUND: Bioinformatics capability to analyze spatio-temporal dynamics of gene expression is essential in understanding animal development. Animal cells are spatially organized as functional tissues where cellular gene expression data contain information that governs morphogenesis during the developmental process. Although several computational tissue reconstruction methods using transcriptomics data have been proposed, those methods have been ineffective in arranging cells in their correct positions in tissues or organs unless spatial information is explicitly provided. RESULTS: This study demonstrates stochastic self-organizing map clustering with Markov chain Monte Carlo calculations for optimizing informative genes effectively reconstruct any spatio-temporal topology of cells from their transcriptome profiles with only a coarse topological guideline. The method, eSPRESSO (enhanced SPatial REconstruction by Stochastic Self-Organizing Map), provides a powerful in silico spatio-temporal tissue reconstruction capability, as confirmed by using human embryonic heart and mouse embryo, brain, embryonic heart, and liver lobule with generally high reproducibility (average max. accuracy = 92.0%), while revealing topologically informative genes, or spatial discriminator genes. Furthermore, eSPRESSO was used for temporal analysis of human pancreatic organoids to infer rational developmental trajectories with several candidate 'temporal' discriminator genes responsible for various cell type differentiations. CONCLUSIONS: eSPRESSO provides a novel strategy for analyzing mechanisms underlying the spatio-temporal formation of cellular organizations.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Humanos , Animais , Camundongos , Reprodutibilidade dos Testes , Encéfalo , Análise por Conglomerados , Análise Espaço-Temporal
3.
PLoS Comput Biol ; 18(1): e1009702, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-35030172

RESUMO

Boolean networks (BNs) have been developed to describe various biological processes, which requires analysis of attractors, the long-term stable states. While many methods have been proposed to detection and enumeration of attractors, there are no methods which have been demonstrated to be theoretically better than the naive method and be practically used for large biological BNs. Here, we present a novel method to calculate attractors based on a priori information, which works much and verifiably faster than the naive method. We apply the method to two BNs which differ in size, modeling formalism, and biological scope. Despite these differences, the method presented here provides a powerful tool for the analysis of both networks. First, our analysis of a BN studying the effect of the microenvironment during angiogenesis shows that the previously defined microenvironments inducing the specialized phalanx behavior in endothelial cells (ECs) additionally induce stalk behavior. We obtain this result from an extended network version which was previously not analyzed. Second, we were able to heuristically detect attractors in a cell cycle control network formalized as a bipartite Boolean model (bBM) with 3158 nodes. These attractors are directly interpretable in terms of genotype-to-phenotype relationships, allowing network validation equivalent to an in silico mutagenesis screen. Our approach contributes to the development of scalable analysis methods required for whole-cell modeling efforts.


Assuntos
Algoritmos , Biologia Computacional/métodos , Modelos Biológicos , Simulação por Computador , Bases de Dados Genéticas , Células Endoteliais/citologia , Células Endoteliais/metabolismo , Mutagênese/genética
4.
Proc Natl Acad Sci U S A ; 117(12): 6469-6475, 2020 03 24.
Artigo em Inglês | MEDLINE | ID: mdl-32144142

RESUMO

City-size distributions are known to be well approximated by power laws across a wide range of countries. But such distributions are also meaningful at other spatial scales, such as within certain regions of a country. Using data from China, France, Germany, India, Japan, and the United States, we first document that large cities are significantly more spaced out than would be expected by chance alone. We next construct spatial hierarchies for countries by first partitioning geographic space using a given number of their largest cities as cell centers and then continuing this partitioning procedure within each cell recursively. We find that city-size distributions in different parts of these spatial hierarchies exhibit power laws that are, again, far more similar than would be expected by chance alone-suggesting the existence of a spatial fractal structure.

5.
Bioinformatics ; 33(2): 202-209, 2017 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-27663495

RESUMO

MOTIVATION: RNA-RNA interactions via base pairing play a vital role in the post-transcriptional regulation of gene expression. Efficient identification of targets for such regulatory RNAs needs not only discriminative power for positive and negative RNA-RNA interacting sequence data but also accurate prediction of interaction sites from positive data. Recently, a few studies have incorporated interaction site accessibility into their prediction methods, indicating the enhancement of predictive performance on limited positive data. RESULTS: Here we show the efficacy of our accessibility-based prediction model RactIPAce on newly compiled datasets. The first experiment in interaction site prediction shows that RactIPAce achieves the best predictive performance on the newly compiled dataset of experimentally verified interactions in the literature as compared with the state-of-the-art methods. In addition, the second experiment in discrimination between positive and negative interacting pairs reveals that the combination of accessibility-based methods including our approach can be effective to discern real interacting RNAs. Taking these into account, our prediction model can be effective to predict interaction sites after screening for real interacting RNAs, which will boost the functional analysis of regulatory RNAs. AVAILABILITY AND IMPLEMENTATION: The program RactIPAce along with data used in this work is available at https://github.com/satoken/ractip/releases/tag/v1.0.1 CONTACT: : ykato@rna.med.osaka-u.ac.jp or shingo@i.kyoto-u.ac.jpSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Simulação de Acoplamento Molecular , RNA/metabolismo , Software , Pareamento de Bases , RNA/química
6.
Methods Mol Biol ; 2586: 79-88, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36705899

RESUMO

RNA secondary structure comparison is one of the important analyses for elucidating individual functions of RNAs since it is widely accepted that their functions and structures are strongly correlated. However, although the RNA secondary structures with pseudoknot play important roles in vivo, it is difficult to deal with such structures in silico due to their structural complexity, which is a major obstacle to the analysis of RNA functions.Here, we introduce an algorithm and a metric for comparing pseudoknotted RNA secondary structures based on topological centroid identification and tree edit distance and describe the usage protocol of a software enabling us to run the comparison. This software is publicly available and works on both Microsoft Windows and Apple macOS.


Assuntos
Algoritmos , RNA , RNA/genética , RNA/química , Conformação de Ácido Nucleico , Software , Análise de Sequência de RNA/métodos
7.
Comput Struct Biotechnol J ; 20: 2512-2520, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35685366

RESUMO

The Boolean network (BN) is a mathematical model used to represent various biological processes such as gene regulatory networks. The state of a BN is determined from the previous state and eventually reaches a stable state called an attractor. Due to its significance for elucidating the whole system, extensive studies have been conducted on analysis of attractors. However, the problem of detecting an attractor from a given BN has been shown to be NP-hard, and for general BNs, the time complexity of most existing algorithms is not guaranteed to be less than O ( 2 n ) . Therefore, the computational difficulty of attractor detection has been a big obstacle for analysis of BNs. This review highlights singleton/periodic attractor detection algorithms that have guaranteed computational complexities less than O ( 2 n ) time for particular classes of BNs under synchronous update in which the maximum indegree is limited to a constant, each Boolean function is AND or OR of literals, or each Boolean function is given as a nested canalyzing function. We also briefly review practically efficient algorithms for the problem.

8.
Nat Commun ; 13(1): 5972, 2022 10 14.
Artigo em Inglês | MEDLINE | ID: mdl-36241645

RESUMO

Global alignment of complex pseudotime trajectories between different single-cell RNA-seq datasets is challenging, as existing tools mainly focus on linear alignment of single-cell trajectories. Here we present CAPITAL (comparative analysis of pseudotime trajectory inference with tree alignment), a method for comparing single-cell trajectories with tree alignment whereby branching trajectories can be automatically compared. Computational tests on synthetic datasets and authentic bone marrow cells datasets indicate that CAPITAL has achieved accurate and robust alignments of trajectory trees, revealing various gene expression dynamics including gene-gene correlation conservation between different species.


Assuntos
Análise de Célula Única , Algoritmos , Análise de Célula Única/métodos
9.
J Inorg Biochem ; 230: 111770, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35272237

RESUMO

Aldoxime dehydratase (Oxd) is a heme enzyme that catalyzes aldoxime dehydration to the corresponding nitriles. Unlike many other heme enzymes, Oxd has a unique feature that the substrate binds directly to the heme. Therefore, it is thought that structural differences around the bound heme directly relate to differences in substrate selection. However sufficient structural information to discuss the substrate specificity has not been obtained. Oxd from Bacillus sp. OxB-1 (OxdB) shows unique substrate specificity and enantioselectivity compared to the Oxds whose crystal structures have already been reported. Here, we report the crystal structure of OxdB, which has not been reported previously. Although the crystallization of OxdB has been difficult, by adding a site-specific mutation to Glu85 located on the surface of the protein, we succeeded in crystallizing OxdB without reducing the enzyme activity. The catalytic triad essential for Oxd activity were structurally conserved in OxdB. In addition, the crystal structure of the Michaelis complex of OxdB and the diastereomerically pure substrate Z-2-(3-bromophenyl)-propanal oxime implied the importance of several hydrophobic residues for substrate specificity. Mutational analysis implicated Ala12 and Ala14 in the E/Z selectivity of bulky compounds. The N-terminal region of OxdB was shown to be shorter than those of Oxds from Pseudomonas chlororaphis and Rhodococcus sp. N-771, and have high flexibility. These structural differences possibly result in distinct preferences for aldoxime substrates based on factors such as substrate size.


Assuntos
Bacillus , Cristalização , Heme/química , Hidroliases , Oximas/química , Especificidade por Substrato
10.
J Comput Biol ; 27(9): 1443-1451, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32058802

RESUMO

Comparison of RNA structures is one of the most crucial analysis for elucidating their individual functions and promoting medical applications. Because it is widely accepted that their functions and structures are strongly correlated, various methods for RNA secondary structure analysis have been proposed owing to the difficulty in predicting RNA three-dimensional structure directly from its sequence. However, there are few methods dealing with RNA secondary structures with a specific and complex partial structure called pseudoknot despite its significance to biological process, which is a big obstacle for analyzing their functions. In this study, we propose a novel tree representation of pseudoknotted RNA secondary structures by topological centroid identification and their comparison methods based on the tree edit distance. In the proposed method, a given graph representing an RNA secondary structure is transformed to a tree rooted at one of the vertices constituting the topological centroid that is identified by removing cycles with peeling processing for the graph. When comparing tree-represented RNA secondary structures collected from a public database using the tree edit distance and functional gene groups defined by Gene Ontology (GO), the proposed method showed better clustering results according to their GOs than canonical RNA sequence-based comparison. In addition, we also report a case that the combination of the tree edit distance and the sequence edit distance shows a better classification of the pseudoknotted RNA secondary structures.


Assuntos
Conformação de Ácido Nucleico , RNA/ultraestrutura , Alinhamento de Sequência , Algoritmos , Sequência de Bases/genética , Análise por Conglomerados , Ontologia Genética , RNA/genética
11.
J Control Release ; 323: 519-529, 2020 07 10.
Artigo em Inglês | MEDLINE | ID: mdl-32360306

RESUMO

Tissue factor (TF), which is well known as a trigger molecule of extrinsic coagulation, is found in not only tumor cells but also in stromal cells in tumor tissues. Thus, TF is a candidate molecule to potentially enable targeting of both tumor cells and stromal cells for anti-cancer drug delivery. Herein, we prepared liposomes conjugated with the Fab' fragment of anti-TF antibody (TF Ab-Lip) and evaluated the capability for drug delivery to stroma-rich tumors for realizing a whole tumor tissue-targetable strategy. When the targetability of TF Ab-Lip to TF-expressing KLN205 squamous tumor cells and NIH3T3 fibroblast cells were examined, TF Ab-Lip was significantly taken up into both cells compared with non-targeted liposomes. Corresponding to this result, doxorubicin-encapsulated TF Ab-Lip (TF Ab-LipDOX) showed potent cytotoxicity against KLN205 cells. In vivo experiments using KLN205 solid tumor-bearing mice indicated that TF Ab-Lip became highly accumulated and distributed widely in not only the tumor cell region but also in the stromal one in the tumor. Treatment with TF Ab-LipDOX significantly suppressed the growth of KLN205 solid tumors. Furthermore, TF Ab-Lip targetable both mouse and human TF (mhTF Ab-Lip) became distributed throughout stroma-rich human pancreatic BxPC3 tumors and the treatment of the BxPC3 tumor-bearing mice with mhTF Ab-LipDOX showed highest tumor-suppressive effect. These data suggest that TF Ab-Lip could achieve effective accumulation for stroma-rich tumor treatment.


Assuntos
Lipossomos , Tromboplastina , Animais , Linhagem Celular Tumoral , Doxorrubicina , Sistemas de Liberação de Medicamentos , Camundongos , Células NIH 3T3
12.
Sci Rep ; 9(1): 12597, 2019 08 29.
Artigo em Inglês | MEDLINE | ID: mdl-31467377

RESUMO

Deciphering the key mechanisms of morphogenesis during embryonic development is crucial to understanding the guiding principles of the body plan and promote applications in biomedical research fields. Although several computational tissue reconstruction methods using cellular gene expression data have been proposed, those methods are insufficient with regard to arranging cells in their correct positions in tissues or organs unless spatial information is explicitly provided. Here, we report SPRESSO, a new in silico three-dimensional (3D) tissue reconstruction method using stochastic self-organizing map (stochastic-SOM) clustering, to estimate the spatial domains of cells in tissues or organs from only their gene expression profiles. With only five gene sets defined by Gene Ontology (GO), we successfully demonstrated the reconstruction of a four-domain structure of mid-gastrula mouse embryo (E7.0) with high reproducibility (success rate = 99%). Interestingly, the five GOs contain 20 genes, most of which are related to differentiation and morphogenesis, such as activin A receptor and Wnt family member genes. Further analysis indicated that Id2 is the most influential gene contributing to the reconstruction. SPRESSO may provide novel and better insights on the mechanisms of 3D structure formation of living tissues via informative genes playing a role as spatial discriminators.


Assuntos
Simulação por Computador , Gástrula/crescimento & desenvolvimento , Morfogênese , Animais , Sequência de Bases , Gástrula/metabolismo , Perfilação da Expressão Gênica , Ontologia Genética , Camundongos , Modelos Biológicos , Processos Estocásticos
13.
IEEE/ACM Trans Comput Biol Bioinform ; 13(6): 1107-1116, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26661790

RESUMO

We study the number of samples required to uniquely determine the structure of a probabilistic Boolean network (PBN), where PBNs are probabilistic extensions of Boolean networks. We show via theoretical analysis and computational analysis that the structure of a PBN can be exactly identified with high probability from a relatively small number of samples for interesting classes of PBNs of bounded indegree. On the other hand, we also show that there exist classes of PBNs for which it is impossible to uniquely determine the structure of a PBN from samples.


Assuntos
Interpretação Estatística de Dados , Regulação da Expressão Gênica/fisiologia , Modelos Biológicos , Modelos Estatísticos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Simulação por Computador , Tamanho da Amostra
14.
BMC Syst Biol ; 9: 45, 2015 Aug 11.
Artigo em Inglês | MEDLINE | ID: mdl-26259567

RESUMO

BACKGROUND: Cellular decision-making is governed by molecular networks that are highly complex. An integrative understanding of these networks on a genome wide level is essential to understand cellular health and disease. In most cases however, such an understanding is beyond human comprehension and requires computational modeling. Mathematical modeling of biological networks at the level of biochemical details has hitherto relied on state transition models. These are typically based on enumeration of all relevant model states, and hence become very complex unless severely--and often arbitrarily--reduced. Furthermore, the parameters required for genome wide networks will remain underdetermined for the conceivable future. Alternatively, networks can be simulated by Boolean models, although these typically sacrifice molecular detail as well as distinction between different levels or modes of activity. However, the modeling community still lacks methods that can simulate genome scale networks on the level of biochemical reaction detail in a quantitative or semi quantitative manner. RESULTS: Here, we present a probabilistic bipartite Boolean modeling method that addresses these issues. The method is based on the reaction-contingency formalism, and enables fast simulation of large networks. We demonstrate its scalability by applying it to the yeast mitogen-activated protein kinase (MAPK) network consisting of 140 proteins and 608 nodes. CONCLUSION: The probabilistic Boolean model can be generated and parameterized automatically from a rxncon network description, using only two global parameters, and its qualitative behavior is robust against order of magnitude variation in these parameters. Our method can hence be used to simulate the outcome of large signal transduction network reconstruction, with little or no overhead in model creation or parameterization.


Assuntos
Modelos Biológicos , Transdução de Sinais , Retroalimentação Fisiológica , Sistema de Sinalização das MAP Quinases , Probabilidade , Saccharomyces cerevisiae/citologia , Processos Estocásticos , Biologia de Sistemas
15.
BMC Syst Biol ; 9: 14, 2015 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-25890175

RESUMO

BACKGROUND: As a result of recent advances in biotechnology, many findings related to intracellular systems have been published, e.g., transcription factor (TF) information. Although we can reproduce biological systems by incorporating such findings and describing their dynamics as mathematical equations, simulation results can be inconsistent with data from biological observations if there are inaccurate or unknown parts in the constructed system. For the completion of such systems, relationships among genes have been inferred through several computational approaches, which typically apply several abstractions, e.g., linearization, to handle the heavy computational cost in evaluating biological systems. However, since these approximations can generate false regulations, computational methods that can infer regulatory relationships based on less abstract models incorporating existing knowledge have been strongly required. RESULTS: We propose a new data assimilation algorithm that utilizes a simple nonlinear regulatory model and a state space representation to infer gene regulatory networks (GRNs) using time-course observation data. For the estimation of the hidden state variables and the parameter values, we developed a novel method termed a higher moment ensemble particle filter (HMEnPF) that can retain first four moments of the conditional distributions through filtering steps. Starting from the original model, e.g., derived from the literature, the proposed algorithm can sequentially evaluate candidate models, which are generated by partially changing the current best model, to find the model that can best predict the data. For the performance evaluation, we generated six synthetic data based on two real biological networks and evaluated effectiveness of the proposed algorithm by improving the networks inferred by previous methods. We then applied time-course observation data of rat skeletal muscle stimulated with corticosteroid. Since a corticosteroid pharmacogenomic pathway, its kinetic/dynamics and TF candidate genes have been partially elucidated, we incorporated these findings and inferred an extended pathway of rat pharmacogenomics. CONCLUSIONS: Through the simulation study, the proposed algorithm outperformed previous methods and successfully improved the regulatory structure inferred by the previous methods. Furthermore, the proposed algorithm could extend a corticosteroid related pathway, which has been partially elucidated, with incorporating several information sources.


Assuntos
Redes Reguladoras de Genes , Genômica/métodos , Corticosteroides/farmacologia , Algoritmos , Animais , Redes Reguladoras de Genes/efeitos dos fármacos , Modelos Genéticos , Músculo Esquelético/efeitos dos fármacos , Músculo Esquelético/metabolismo , Ratos , Transcrição Gênica/efeitos dos fármacos
16.
J Comput Biol ; 21(11): 785-98, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25244077

RESUMO

Gene regulatory networks (GRNs) play a central role in sustaining complex biological systems in cells. Although we can construct GRNs by integrating biological interactions that have been recorded in literature, they can include suspicious data and a lack of information. Therefore, there has been an urgent need for an approach by which the validity of constructed networks can be evaluated; simulation-based methods have been applied in which biological observational data are assimilated. However, these methods apply nonlinear models that require high computational power to evaluate even one network consisting of only several genes. Therefore, to explore candidate networks whose simulation models can better predict the data by modifying and extending literature-based GRNs, an efficient and versatile method is urgently required. We applied a combinatorial transcription model, which can represent combinatorial regulatory effects of genes, as a biological simulation model, to reproduce the dynamic behavior of gene expressions within a state space model. Under the model, we applied the unscented Kalman filter to obtain the approximate posterior probability distribution of the hidden state to efficiently estimate parameter values maximizing prediction ability for observational data by the EM-algorithm. Utilizing the method, we propose a novel algorithm to modify GRNs reported in the literature so that their simulation models become consistent with observed data. The effectiveness of our approach was validated through comparison analysis to the previous methods using synthetic networks. Finally, as an application example, a Kyoto Encyclopedia of Genes and Genomes (KEGG)-based yeast cell cycle network was extended with additional candidate genes to better predict the real mRNA expressions data using the proposed method.


Assuntos
Algoritmos , Proteínas de Ciclo Celular/genética , Biologia Computacional/métodos , Redes Reguladoras de Genes , Proteínas de Saccharomyces cerevisiae/genética , Simulação por Computador , Bases de Dados Factuais , Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Saccharomyces cerevisiae , Fatores de Tempo
17.
J Comput Biol ; 19(10): 1089-104, 2012 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23057820

RESUMO

Many kinds of tree-structured data, such as RNA secondary structures, have become available due to the progress of techniques in the field of molecular biology. To analyze the tree-structured data, various measures for computing the similarity between them have been developed and applied. Among them, tree edit distance is one of the most widely used measures. However, the tree edit distance problem for unordered trees is NP-hard. Therefore, it is required to develop efficient algorithms for the problem. Recently, a practical method called clique-based algorithm has been proposed, but it is not fast for large trees. This article presents an improved clique-based method for the tree edit distance problem for unordered trees. The improved method is obtained by introducing a dynamic programming scheme and heuristic techniques to the previous clique-based method. To evaluate the efficiency of the improved method, we applied the method to comparison of real tree structured data such as glycan structures. For large tree-structures, the improved method is much faster than the previous method. In particular, for hard instances, the improved method achieved more than 100 times speed-up.


Assuntos
Configuração de Carboidratos , Polissacarídeos/química , Polissacarídeos/genética , Software , Conformação de Ácido Nucleico , RNA/química , RNA/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA