Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Mol Inform ; 39(1-2): e1900130, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31908150

RESUMO

Prediction of compound-protein interactions with fingerprints has recently become challenging in recent pharmaceutical science for an efficient drug discovery. We review two scalable methods for predicting drug-protein interactions on fingerprints. Especially, we introduce two techniques of learning statistical models using lossless and lossy data compressions. The first one is a method using a trie representation of fingerprints which enables us to learn predictive models on the compressed format. The second one is a method using lossy data compression called feature maps (FMs). Recently, quite a few numbers of FMs for kernel approximations have been proposed and minwise hashing, one method of this kind. has been applied to predictions of compound-protein interactions and shows an effectiveness of the method. Overall, we show learning statistical models on the compressed format is effective for predicting compound-protein interactions on a large-scale.


Assuntos
Preparações Farmacêuticas/química , Proteínas/química , Algoritmos , Humanos , Modelos Estatísticos , Mapas de Interação de Proteínas
2.
Bioinformatics ; 35(14): i191-i199, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510663

RESUMO

MOTIVATION: Genome-wide identification of the transcriptomic responses of human cell lines to drug treatments is a challenging issue in medical and pharmaceutical research. However, drug-induced gene expression profiles are largely unknown and unobserved for all combinations of drugs and human cell lines, which is a serious obstacle in practical applications. RESULTS: Here, we developed a novel computational method to predict unknown parts of drug-induced gene expression profiles for various human cell lines and predict new drug therapeutic indications for a wide range of diseases. We proposed a tensor-train weighted optimization (TT-WOPT) algorithm to predict the potential values for unknown parts in tensor-structured gene expression data. Our results revealed that the proposed TT-WOPT algorithm can accurately reconstruct drug-induced gene expression data for a range of human cell lines in the Library of Integrated Network-based Cellular Signatures. The results also revealed that in comparison with the use of original gene expression profiles, the use of imputed gene expression profiles improved the accuracy of drug repositioning. We also performed a comprehensive prediction of drug indications for diseases with gene expression profiles, which suggested many potential drug indications that were not predicted by previous approaches. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Transcriptoma , Algoritmos , Linhagem Celular , Reposicionamento de Medicamentos , Humanos
3.
BMC Syst Biol ; 13(Suppl 2): 39, 2019 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-30953486

RESUMO

BACKGROUND: Characterization of drug-protein interaction networks with biological features has recently become challenging in recent pharmaceutical science toward a better understanding of polypharmacology. RESULTS: We present a novel method for systematic analyses of the underlying features characteristic of drug-protein interaction networks, which we call "drug-protein interaction signatures" from the integration of large-scale heterogeneous data of drugs and proteins. We develop a new efficient algorithm for extracting informative drug-protein interaction signatures from the integration of large-scale heterogeneous data of drugs and proteins, which is made possible by space-efficient representations for fingerprints of drug-protein pairs and sparsity-induced classifiers. CONCLUSIONS: Our method infers a set of drug-protein interaction signatures consisting of the associations between drug chemical substructures, adverse drug reactions, protein domains, biological pathways, and pathway modules. We argue the these signatures are biologically meaningful and useful for predicting unknown drug-protein interactions and are expected to contribute to rational drug design.


Assuntos
Biologia Computacional/métodos , Preparações Farmacêuticas/metabolismo , Proteínas/metabolismo , Modelos Logísticos , Ligação Proteica
4.
Sci Rep ; 8(1): 156, 2018 01 09.
Artigo em Inglês | MEDLINE | ID: mdl-29317676

RESUMO

Genome-wide identification of all target proteins of drug candidate compounds is a challenging issue in drug discovery. Moreover, emerging phenotypic effects, including therapeutic and adverse effects, are heavily dependent on the inhibition or activation of target proteins. Here we propose a novel computational method for predicting inhibitory and activatory targets of drug candidate compounds. Specifically, we integrated chemically-induced and genetically-perturbed gene expression profiles in human cell lines, which avoided dependence on chemical structures of compounds or proteins. Predictive models for individual target proteins were simultaneously constructed by the joint learning algorithm based on transcriptomic changes in global patterns of gene expression profiles following chemical treatments, and following knock-down and over-expression of proteins. This method discriminates between inhibitory and activatory targets and enables accurate identification of therapeutic effects. Herein, we comprehensively predicted drug-target-disease association networks for 1,124 drugs, 829 target proteins, and 365 human diseases, and validated some of these predictions in vitro. The proposed method is expected to facilitate identification of new drug indications and potential adverse effects.


Assuntos
Biologia Computacional/métodos , Desenho de Fármacos , Descoberta de Drogas , Perfilação da Expressão Gênica , Regulação da Expressão Gênica/efeitos dos fármacos , Transcriptoma , Relação Dose-Resposta a Droga , Descoberta de Drogas/métodos , Redes Reguladoras de Genes , Humanos , Reprodutibilidade dos Testes , Transdução de Sinais/efeitos dos fármacos
5.
Bioinformatics ; 32(12): i278-i287, 2016 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-27307627

RESUMO

MOTIVATION: Metabolic pathways are an important class of molecular networks consisting of compounds, enzymes and their interactions. The understanding of global metabolic pathways is extremely important for various applications in ecology and pharmacology. However, large parts of metabolic pathways remain unknown, and most organism-specific pathways contain many missing enzymes. RESULTS: In this study we propose a novel method to predict the enzyme orthologs that catalyze the putative reactions to facilitate the de novo reconstruction of metabolic pathways from metabolome-scale compound sets. The algorithm detects the chemical transformation patterns of substrate-product pairs using chemical graph alignments, and constructs a set of enzyme-specific classifiers to simultaneously predict all the enzyme orthologs that could catalyze the putative reactions of the substrate-product pairs in the joint learning framework. The originality of the method lies in its ability to make predictions for thousands of enzyme orthologs simultaneously, as well as its extraction of enzyme-specific chemical transformation patterns of substrate-product pairs. We demonstrate the usefulness of the proposed method by applying it to some ten thousands of metabolic compounds, and analyze the extracted chemical transformation patterns that provide insights into the characteristics and specificities of enzymes. The proposed method will open the door to both primary (central) and secondary metabolism in genomics research, increasing research productivity to tackle a wide variety of environmental and public health matters. CONTACT: : maskot@bio.titech.ac.jp.


Assuntos
Redes e Vias Metabólicas , Algoritmos , Catálise , Genômica , Metaboloma
6.
Bioinformatics ; 31(12): i161-70, 2015 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-26072478

RESUMO

MOTIVATION: Recent advances in mass spectrometry and related metabolomics technologies have enabled the rapid and comprehensive analysis of numerous metabolites. However, biosynthetic and biodegradation pathways are only known for a small portion of metabolites, with most metabolic pathways remaining uncharacterized. RESULTS: In this study, we developed a novel method for supervised de novo metabolic pathway reconstruction with an improved graph alignment-based approach in the reaction-filling framework. We proposed a novel chemical graph alignment algorithm, which we called PACHA (Pairwise Chemical Aligner), to detect the regioisomer-sensitive connectivities between the aligned substructures of two compounds. Unlike other existing graph alignment methods, PACHA can efficiently detect only one common subgraph between two compounds. Our results show that the proposed method outperforms previous descriptor-based methods or existing graph alignment-based methods in the enzymatic reaction-likeness prediction for isomer-enriched reactions. It is also useful for reaction annotation that assigns potential reaction characteristics such as EC (Enzyme Commission) numbers and PIERO (Enzymatic Reaction Ontology for Partial Information) terms to substrate-product pairs. Finally, we conducted a comprehensive enzymatic reaction-likeness prediction for all possible uncharacterized compound pairs, suggesting potential metabolic pathways for newly predicted substrate-product pairs.


Assuntos
Algoritmos , Redes e Vias Metabólicas , Metabolômica/métodos , Metaboloma
7.
Bioinformatics ; 30(12): i165-74, 2014 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-24931980

RESUMO

MOTIVATION: Metabolic pathway analysis is crucial not only in metabolic engineering but also in rational drug design. However, the biosynthetic/biodegradation pathways are known only for a small portion of metabolites, and a vast amount of pathways remain uncharacterized. Therefore, an important challenge in metabolomics is the de novo reconstruction of potential reaction networks on a metabolome-scale. RESULTS: In this article, we develop a novel method to predict the multistep reaction sequences for de novo reconstruction of metabolic pathways in the reaction-filling framework. We propose a supervised approach to learn what we refer to as 'multistep reaction sequence likeness', i.e. whether a compound-compound pair is possibly converted to each other by a sequence of enzymatic reactions. In the algorithm, we propose a recursive procedure of using step-specific classifiers to predict the intermediate compounds in the multistep reaction sequences, based on chemical substructure fingerprints/descriptors of compounds. We further demonstrate the usefulness of our proposed method on the prediction of enzymatic reaction networks from a metabolome-scale compound set and discuss characteristic features of the extracted chemical substructure transformation patterns in multistep reaction sequences. Our comprehensively predicted reaction networks help to fill the metabolic gap and to infer new reaction sequences in metabolic pathways. AVAILABILITY AND IMPLEMENTATION: Materials are available for free at http://web.kuicr.kyoto-u.ac.jp/supp/kot/ismb2014/


Assuntos
Redes e Vias Metabólicas , Metaboloma , Metabolômica/métodos , Algoritmos , Máquina de Vetores de Suporte
8.
Bioinformatics ; 29(13): i135-44, 2013 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-23812977

RESUMO

MOTIVATION: The metabolic pathway is an important biochemical reaction network involving enzymatic reactions among chemical compounds. However, it is assumed that a large number of metabolic pathways remain unknown, and many reactions are still missing even in known pathways. Therefore, the most important challenge in metabolomics is the automated de novo reconstruction of metabolic pathways, which includes the elucidation of previously unknown reactions to bridge the metabolic gaps. RESULTS: In this article, we develop a novel method to reconstruct metabolic pathways from a large compound set in the reaction-filling framework. We define feature vectors representing the chemical transformation patterns of compound-compound pairs in enzymatic reactions using chemical fingerprints. We apply a sparsity-induced classifier to learn what we refer to as 'enzymatic-reaction likeness', i.e. whether compound pairs are possibly converted to each other by enzymatic reactions. The originality of our method lies in the search for potential reactions among many compounds at a time, in the extraction of reaction-related chemical transformation patterns and in the large-scale applicability owing to the computational efficiency. In the results, we demonstrate the usefulness of our proposed method on the de novo reconstruction of 134 metabolic pathways in Kyoto Encyclopedia of Genes and Genomes (KEGG). Our comprehensively predicted reaction networks of 15 698 compounds enable us to suggest many potential pathways and to increase research productivity in metabolomics. AVAILABILITY: Softwares are available on request. Supplementary material are available at http://web.kuicr.kyoto-u.ac.jp/supp/kot/ismb2013/.


Assuntos
Redes e Vias Metabólicas , Metabolômica/métodos , Algoritmos , Enzimas/metabolismo , Modelos Lineares , Metaboloma , Máquina de Vetores de Suporte
9.
BMC Syst Biol ; 7 Suppl 6: S2, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24564846

RESUMO

BACKGROUND: In order to develop hypothesis on unknown metabolic pathways, biochemists frequently rely on literature that uses a free-text format to describe functional groups or substructures. In computational chemistry or cheminformatics, molecules are typically represented by chemical descriptors, i.e., vectors that summarize information on its various properties. However, it is difficult to interpret these chemical descriptors since they are not directly linked to the terminology of functional groups or substructures that the biochemists use. METHODS: In this study, we used KEGG Chemical Function (KCF) format to computationally describe biochemical substructures in seven attributes that resemble biochemists' way of dealing with substructures. RESULTS: We established KCF-S (KCF-and-Substructures) format as an additional structural information of KCF. Applying KCF-S revealed the specific appearance of substructures from various datasets of molecules that describes the characteristics of the respective datasets. Structure-based clustering of molecules using KCF-S resulted the clusters in which molecular weights and structures were less diverse than those obtained by conventional chemical fingerprints. We further applied KCF-S to find the pairs of molecules that are possibly converted to each other in enzymatic reactions, and KCF-S clearly improved predictive performance than that presented previously. CONCLUSIONS: KCF-S defines biochemical substructures with keeping interpretability, suggesting the potential to apply more studies on chemical bioinformatics. KCF and KCF-S can be automatically converted from Molfile format, enabling to deal with molecules from any data sources.


Assuntos
Biologia Computacional/métodos , Análise por Conglomerados , Bases de Dados de Compostos Químicos , Enzimas/metabolismo , Redes e Vias Metabólicas , Reprodutibilidade dos Testes , Relação Estrutura-Atividade
10.
BMC Syst Biol ; 7 Suppl 6: S3, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24564870

RESUMO

The identification of compound-protein interactions plays key roles in the drug development toward discovery of new drug leads and new therapeutic protein targets. There is therefore a strong incentive to develop new efficient methods for predicting compound-protein interactions on a genome-wide scale. In this paper we develop a novel chemogenomic method to make a scalable prediction of compound-protein interactions from heterogeneous biological data using minwise hashing. The proposed method mainly consists of two steps: 1) construction of new compact fingerprints for compound-protein pairs by an improved minwise hashing algorithm, and 2) application of a sparsity-induced classifier to the compact fingerprints. We test the proposed method on its ability to make a large-scale prediction of compound-protein interactions from compound substructure fingerprints and protein domain fingerprints, and show superior performance of the proposed method compared with the previous chemogenomic methods in terms of prediction accuracy, computational efficiency, and interpretability of the predictive model. All the previously developed methods are not computationally feasible for the full dataset consisting of about 200 millions of compound-protein pairs. The proposed method is expected to be useful for virtual screening of a huge number of compounds against many protein targets.


Assuntos
Algoritmos , Biologia Computacional/métodos , Descoberta de Drogas/métodos , Proteínas/metabolismo , Bibliotecas de Moléculas Pequenas/metabolismo , Ligação Proteica , Proteínas/química , Bibliotecas de Moléculas Pequenas/química , Máquina de Vetores de Suporte
11.
BMC Syst Biol ; 7 Suppl 6: S18, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24565527

RESUMO

BACKGROUND: Most phenotypic effects of drugs are involved in the interactions between drugs and their target proteins, however, our knowledge about the molecular mechanism of the drug-target interactions is very limited. One of challenging issues in recent pharmaceutical science is to identify the underlying molecular features which govern drug-target interactions. RESULTS: In this paper, we make a systematic analysis of the correlation between drug side effects and protein domains, which we call "pharmacogenomic features," based on the drug-target interaction network. We detect drug side effects and protein domains that appear jointly in known drug-target interactions, which is made possible by using classifiers with sparse models. It is shown that the inferred pharmacogenomic features can be used for predicting potential drug-target interactions. We also discuss advantages and limitations of the pharmacogenomic features, compared with the chemogenomic features that are the associations between drug chemical substructures and protein domains. CONCLUSION: The inferred side effect-domain association network is expected to be useful for estimating common drug side effects for different protein families and characteristic drug side effects for specific protein domains.


Assuntos
Biologia Computacional/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/metabolismo , Preparações Farmacêuticas/metabolismo , Proteínas/química , Proteínas/metabolismo , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/genética , Terapia de Alvo Molecular , Farmacogenética , Ligação Proteica , Estrutura Terciária de Proteína
12.
Bioinformatics ; 28(18): i487-i494, 2012 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-22962471

RESUMO

MOTIVATION: Drug effects are mainly caused by the interactions between drug molecules and their target proteins including primary targets and off-targets. Identification of the molecular mechanisms behind overall drug-target interactions is crucial in the drug design process. RESULTS: We develop a classifier-based approach to identify chemogenomic features (the underlying associations between drug chemical substructures and protein domains) that are involved in drug-target interaction networks. We propose a novel algorithm for extracting informative chemogenomic features by using L(1) regularized classifiers over the tensor product space of possible drug-target pairs. It is shown that the proposed method can extract a very limited number of chemogenomic features without loosing the performance of predicting drug-target interactions and the extracted features are biologically meaningful. The extracted substructure-domain association network enables us to suggest ligand chemical fragments specific for each protein domain and ligand core substructures important for a wide range of protein families. AVAILABILITY: Softwares are available at the supplemental website. CONTACT: yamanishi@bioreg.kyushu-u.ac.jp SUPPLEMENTARY INFORMATION: Datasets and all results are available at http://cbio.ensmp.fr/~yyamanishi/l1binary/ .


Assuntos
Algoritmos , Desenho de Fármacos , Preparações Farmacêuticas/química , Estrutura Terciária de Proteína , Sistemas de Liberação de Medicamentos , Humanos , Ligantes , Modelos Lineares , Proteínas/química , Proteínas/classificação , Proteínas/metabolismo
13.
Proteins ; 80(3): 747-63, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22113700

RESUMO

Computational investigation of protein functions is one of the most urgent and demanding tasks in the field of structural bioinformatics. Exhaustive pairwise comparison of known and putative ligand-binding sites, across protein families and folds, is essential in elucidating the biological functions and evolutionary relationships of proteins. Given the vast amounts of data available now, existing 3D structural comparison methods are not adequate due to their computation time complexity. In this article, we propose a new bit string representation of binding sites called structural sketches, which is obtained by random projections of triplet descriptors. It allows us to use ultra-fast all-pair similarity search methods for strings with strictly controlled error rates. Exhaustive comparison of 1.2 million known and putative binding sites finished in ∼30 h on a single core to yield 88 million similar binding site pairs. Careful investigation of 3.5 million pairs verified by TM-align revealed several notable analogous sites across distinct protein families or folds. In particular, we succeeded in finding highly plausible functions of several pockets via strong structural analogies. These results indicate that our method is a promising tool for functional annotation of binding sites derived from structural genomics projects.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Proteínas/química , Proteômica/métodos , Sítios de Ligação , Ligantes , Modelos Moleculares
14.
Nucleic Acids Res ; 40(Database issue): D541-8, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22135290

RESUMO

Numerous potential ligand-binding sites are available today, along with hundreds of thousands of known binding sites observed in the PDB. Exhaustive similarity search for such vastly numerous binding site pairs is useful to predict protein functions and to enable rapid screening of target proteins for drug design. Existing databases of ligand-binding sites offer databases of limited scale. For example, SitesBase covers only ~33,000 known binding sites. Inferring protein function and drug discovery purposes, however, demands a much more comprehensive database including known and putative-binding sites. Using a novel algorithm, we conducted a large-scale all-pairs similarity search for 1.8 million known and potential binding sites in the PDB, and discovered over 14 million similar pairs of binding sites. Here, we present the results as a relational database Pocket Similarity Search using Multiple-sketches (PoSSuM) including all the discovered pairs with annotations of various types. PoSSuM enables rapid exploration of similar binding sites among structures with different global folds as well as similar ones. Moreover, PoSSuM is useful for predicting the binding ligand for unbound structures, which provides important clues for characterizing protein structures with unclear functions. The PoSSuM database is freely available at http://possum.cbrc.jp/PoSSuM/.


Assuntos
Bases de Dados de Proteínas , Ligantes , Conformação Proteica , Sítios de Ligação , Desenho de Fármacos , Anotação de Sequência Molecular , Interface Usuário-Computador
15.
Mol Inform ; 30(9): 801-7, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-27467412

RESUMO

Similarity networks of ligands are often reported useful in predicting chemical activities and target proteins. However, the naive method of computing all pairwise similarities of chemical fingerprints takes quadratic time, which is prohibitive for large scale databases with millions of ligands. We propose a fast all pairs similarity search method, called SketchSort, that maps chemical fingerprints to symbol strings with random projections, and finds similar strings by multiple masked sorting. Due to random projection, SketchSort misses a certain fraction of neighbors (i.e., false negatives). Nevertheless, the expected fraction of false negatives is theoretically derived and can be kept under a very small value. Experiments show that SketchSort is much faster than other similarity search methods and enables us to obtain a PubChem-scale similarity network quickly.

16.
Bioinformatics ; 25(12): 1498-505, 2009 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-19376823

RESUMO

MOTIVATION: Non-coding RNAs (ncRNAs) show a unique evolutionary process in which the substitutions of distant bases are correlated in order to conserve the secondary structure of the ncRNA molecule. Therefore, the multiple alignment method for the detection of ncRNAs should take into account both the primary sequence and the secondary structure. Recently, there has been intense focus on multiple alignment investigations for the detection of ncRNAs; however, most of the proposed methods are designed for global multiple alignments. For this reason, these methods are not appropriate to identify locally conserved ncRNAs among genomic sequences. A more efficient local multiple alignment method for the detection of ncRNAs is required. RESULTS: We propose a new local multiple alignment method for the detection of ncRNAs. This method uses a local multiple alignment construction procedure inspired by ProDA, which is a local multiple aligner program for protein sequences with repeated and shuffled elements. To align sequences based on secondary structure information, we propose a new alignment model which incorporates secondary structure features. We define the conditional probability of an alignment via a conditional random field and use a gamma-centroid estimator to align sequences. The locally aligned subsequences are clustered into blocks of approximately globally alignable subsequences between pairwise alignments. Finally, these blocks are multiply aligned via MXSCARNA. In benchmark experiments, we demonstrate the high ability of the implemented software, SCARNA_LM, for local multiple alignment for the detection of ncRNAs. AVAILABILITY: The C++ source code for SCARNA_LM and its experimental datasets are available at http://www.ncrna.org/software/scarna_lm/download. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Biologia Computacional/métodos , RNA não Traduzido/química , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Sequência de Bases , Conformação de Ácido Nucleico
17.
Nucleic Acids Res ; 36(Web Server issue): W75-8, 2008 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-18440970

RESUMO

We present web servers for analysis of non-coding RNA sequences on the basis of their secondary structures. Software tools for structural multiple sequence alignments, structural pairwise sequence alignments and structural motif findings are available from the integrated web server and the individual stand-alone web servers. The servers are located at http://software.ncrna.org, along with the information for the evaluation and downloading. This website is freely available to all users and there is no login requirement.


Assuntos
RNA não Traduzido/química , Alinhamento de Sequência , Análise de Sequência de RNA , Software , Internet , Conformação de Ácido Nucleico
18.
BMC Bioinformatics ; 9: 33, 2008 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-18215258

RESUMO

BACKGROUND: Aligning multiple RNA sequences is essential for analyzing non-coding RNAs. Although many alignment methods for non-coding RNAs, including Sankoff's algorithm for strict structural alignments, have been proposed, they are either inaccurate or computationally too expensive. Faster methods with reasonable accuracies are required for genome-scale analyses. RESULTS: We propose a fast algorithm for multiple structural alignments of RNA sequences that is an extension of our pairwise structural alignment method (implemented in SCARNA). The accuracies of the implemented software, MXSCARNA, are at least as favorable as those of state-of-art algorithms that are computationally much more expensive in time and memory. CONCLUSION: The proposed method for structural alignment of multiple RNA sequences is fast enough for large-scale analyses with accuracies at least comparable to those of existing algorithms. The source code of MXSCARNA and its web server are available at http://mxscarna.ncrna.org.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Sequência de Bases , Dados de Sequência Molecular , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
19.
Bioinformatics ; 23(13): 1588-98, 2007 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-17459961

RESUMO

MOTIVATION: Structural RNA genes exhibit unique evolutionary patterns that are designed to conserve their secondary structures; these patterns should be taken into account while constructing accurate multiple alignments of RNA genes. The Sankoff algorithm is a natural alignment algorithm that includes the effect of base-pair covariation in the alignment model. However, the extremely high computational cost of the Sankoff algorithm precludes its application to most RNA sequences. RESULTS: We propose an efficient algorithm for the multiple alignment of structural RNA sequences. Our algorithm is a variant of the Sankoff algorithm, and it uses an efficient scoring system that reduces the time and space requirements considerably without compromising on the alignment quality. First, our algorithm computes the match probability matrix that measures the alignability of each position pair between sequences as well as the base pairing probability matrix for each sequence. These probabilities are then combined to score the alignment using the Sankoff algorithm. By itself, our algorithm does not predict the consensus secondary structure of the alignment but uses external programs for the prediction. We demonstrate that both the alignment quality and the accuracy of the consensus secondary structure prediction from our alignment are the highest among the other programs examined. We also demonstrate that our algorithm can align relatively long RNA sequences such as the eukaryotic-type signal recognition particle RNA that is approximately 300 nt in length; multiple alignment of such sequences has not been possible by using other Sankoff-based algorithms. The algorithm is implemented in the software named 'Murlet'. AVAILABILITY: The C++ source code of the Murlet software and the test dataset used in this study are available at http://www.ncrna.org/papers/Murlet/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , RNA/genética , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Software , Sequência de Bases , Dados de Sequência Molecular
20.
Bioinformatics ; 22(14): 1723-9, 2006 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-16690634

RESUMO

MOTIVATION: The functions of non-coding RNAs are strongly related to their secondary structures, but it is known that a secondary structure prediction of a single sequence is not reliable. Therefore, we have to collect similar RNA sequences with a common secondary structure for the analyses of a new non-coding RNA without knowing the exact secondary structure itself. Therefore, the sequence comparison in searching similar RNAs should consider not only their sequence similarities but also their potential secondary structures. Sankoff's algorithm predicts the common secondary structures of the sequences, but it is computationally too expensive to apply to large-scale analyses. Because we often want to compare a large number of cDNA sequences or to search similar RNAs in the whole genome sequences, much faster algorithms are required. RESULTS: We propose a new method of comparing RNA sequences based on the structural alignments of the fixed-length fragments of the stem candidates. The implemented software, SCARNA (Stem Candidate Aligner for RNAs), is fast enough to apply to the long sequences in the large-scale analyses. The accuracy of the alignments is better or comparable with the much slower existing algorithms. AVAILABILITY: The web server of SCARNA with graphical structural alignment viewer is available at http://www.scarna.org/.


Assuntos
Algoritmos , Reconhecimento Automatizado de Padrão/métodos , RNA/genética , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Software , Inteligência Artificial , Sequência de Bases , Dados de Sequência Molecular , Homologia de Sequência do Ácido Nucleico
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...