Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-32175871

RESUMO

Graph models often give us a deeper understanding of real-world networks. In the case of biological networks they help in predicting the evolution and history of biomolecule interactions, provided we map properly real networks into the corresponding graph models. In this paper, we show that for biological graph models many of the existing parameter estimation techniques overlook the critical property of graph symmetry (also known formally as graph automorphisms), thus the estimated parameters give statistically insignificant results concerning the observed network. To demonstrate it and to develop accurate estimation procedures, we focus on the biologically inspired duplication-divergence model, and the up-to-date data of protein-protein interactions of seven species including human and yeast. Using exact recurrence relations of some prominent graph statistics, we devise a parameter estimation technique that provides the right order of symmetries and uses phylogenetically old proteins as the choice of seed graph nodes. We also find that our results are consistent with the ones obtained from maximum likelihood estimation (MLE). However, the MLE approach is significantly slower than our methods in practice.


Assuntos
Biologia Computacional/métodos , Modelos Genéticos , Mapas de Interação de Proteínas/genética , Algoritmos , Animais , Evolução Molecular , Duplicação Gênica/genética , Humanos , Camundongos , Saccharomyces cerevisiae/genética
2.
J Pharm Sci ; 109(4): 1467-1472, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31978383

RESUMO

Methods that determine the relative purity of biopharmaceuticals represent the most widely used form of analysis for the pharmaceutical industry. The ability to rapidly assess method capability or the uncertainty of measurements under actual use conditions continues to present significant challenges. We have refined and applied the model of Uncertainty Based on Current Information to predict the precision of the purity measurements and compared the predicted precision to the measured variability for several different types of purity methods. The measured method variability was derived from the analysis of data sets ranging from hundreds to thousands of measurements for each different method type. The predicted precision was found to be in excellent agreement with the statistically obtained values with R2 = 0.94. This demonstration of concurrence between the predicted and actual precision provides an opportunity for streamlining laborious conventional (statically derived) assessment of method precision and leveraging the Uncertainty Based on Current Information model utilizing much smaller data sets or even a single experiment.


Assuntos
Incerteza , Cromatografia Líquida de Alta Pressão , Reprodutibilidade dos Testes
3.
IEEE Trans Inf Theory ; 66(8): 5003-5021, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33746243

RESUMO

The von Neumann entropy, named after John von Neumann, is an extension of the classical concept of entropy to the field of quantum mechanics. From a numerical perspective, von Neumann entropy can be computed simply by computing all eigenvalues of a density matrix, an operation that could be prohibitively expensive for large-scale density matrices. We present and analyze three randomized algorithms to approximate von Neumann entropy of real density matrices: our algorithms leverage recent developments in the Randomized Numerical Linear Algebra (RandNLA) literature, such as randomized trace estimators, provable bounds for the power method, and the use of random projections to approximate the eigenvalues of a matrix. All three algorithms come with provable accuracy guarantees and our experimental evaluations support our theoretical findings showing considerable speedup with small loss in accuracy.

4.
Sci Rep ; 9(1): 3057, 2019 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-30816140

RESUMO

The problem of reverse-engineering the evolution of a dynamic network, known broadly as network archaeology, is one of profound importance in diverse application domains. In analysis of infection spread, it reveals the spatial and temporal processes underlying infection. In analysis of biomolecular interaction networks (e.g., protein interaction networks), it reveals early molecules that are known to be differentially implicated in diseases. In economic networks, it reveals flow of capital and associated actors. Beyond these recognized applications, it provides analytical substrates for novel studies - for instance, on the structural and functional evolution of the human brain connectome. In this paper, we model, formulate, and rigorously analyze the problem of inferring the arrival order of nodes in a dynamic network from a single snapshot. We derive limits on solutions to the problem, present methods that approach this limit, and demonstrate the methods on a range of applications, from inferring the evolution of the human brain connectome to conventional citation and social networks, where ground truth is known.


Assuntos
Algoritmos , Disseminação de Informação , Análise de Sistemas , Conectoma , Humanos , Redes Sociais Online , Mapas de Interação de Proteínas
5.
IEEE Trans Inf Theory ; 64(9): 6070-6080, 2018 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-31537945

RESUMO

Compression schemes for advanced data structures have become a central modern challenge. Information theory has traditionally dealt with conventional data such as text, images, or video. In contrast, most data available today is multitype and context-dependent. To meet this challenge, we have recently initiated a systematic study of advanced data structures such as unlabeled graphs [8]. In this paper, we continue this program by considering trees with statistically correlated vertex names. Trees come in many forms, but here we deal with binary plane trees (where order of subtrees matters) and their non-plane version (where order of subtrees doesn't matter). Furthermore, we assume that each name is generated by a known memoryless source (horizontal independence), but a symbol of a vertex name depends in a Markovian sense on the corresponding symbol of the parent vertex name (vertical Markovian dependency). Such a model is closely connected to models of phylogenetic trees. While in general the problem of multimodal compression and associated analysis can be extremely complicated, we find that in this natural setting, both the entropy analysis and optimal compression are analytically tractable. We evaluate the entropy for both types of trees. For the plane case, with or without vertex names, we find that a simple two-stage compression scheme is both efficient and optimal. We then present efficient and optimal compression algorithms for the more complicated non-plane case.

6.
Proc IEEE Inst Electr Electron Eng ; 105(2): 286-305, 2017 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-28943650

RESUMO

We rigorously study a channel that maps sequences from a finite alphabet to self-avoiding walks in the two-dimensional grid, inspired by a model of protein folding from statistical physics and studied empirically by biophysicists. This channel, which we call the Boltzmann sequence-structure channel, is characterized by a Boltzmann/Gibbs distribution with a free parameter corresponding to temperature. In our previous work, we verified empirically that the channel capacity appears to have a phase transition for small temperature and decays to zero for high temperature. In this paper, we make some progress toward theoretically explaining these phenomena. We first estimate the conditional entropy between the input sequence and the output fold, giving an upper bound which exhibits a phase transition with respect to temperature. Next, we formulate a class of parameter settings under which the dependence between walk energies is governed by their number of shared contacts. In this setting, we derive a lower bound on the conditional entropy. This lower bound allows us to conclude that the mutual information tends to zero in a nontrivial regime of high temperature, giving some support to the empirical fact regarding capacity. Finally, we construct an example setting of the parameters of the model for which the conditional entropy is exactly calculable and which does not exhibit a phase transition.

7.
IEEE Trans Mol Biol Multiscale Commun ; 2(1): 92-106, 2016 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28868335

RESUMO

Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given indel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) using replicated extrusion (the process of passing a DNA strand through the nanopore), what is the number of replicas needed to accurately reconstruct the true sequence with high probability? Our results provide a number of important insights: (i) the probability of accurate reconstruction of a sequence from a single sample in the presence of indel errors tends quickly (i.e., exponentially) to zero as the length of the sequence increases; and (ii) replicated extrusion is an effective technique for accurate reconstruction. We show that for typical distributions of indel errors, the required number of replicas is a slow function (polylogarithmic) of sequence length - implying that through replicated extrusion, we can sequence large reads using nanopore sequencers. Moreover, we show that in certain cases, the required number of replicas can be related to information-theoretic parameters of the indel error distributions.

8.
Sci Rep ; 5: 8166, 2015 Feb 23.
Artigo em Inglês | MEDLINE | ID: mdl-25703447

RESUMO

Distributions of protein families and folds in genomes are highly skewed, having a small number of prevalent superfamiles/superfolds and a large number of families/folds of a small size. Why are the distributions of protein families and folds skewed? Why are there only a limited number of protein families? Here, we employ an information theoretic approach to investigate the protein sequence-structure relationship that leads to the skewed distributions. We consider that protein sequences and folds constitute an information theoretic channel and computed the most efficient distribution of sequences that code all protein folds. The identified distributions of sequences and folds are found to follow a power law, consistent with those observed for proteins in nature. Importantly, the skewed distributions of sequences and folds are suggested to have different origins: the skewed distribution of sequences is due to evolutionary pressure to achieve efficient coding of necessary folds, whereas that of folds is based on the thermodynamic stability of folds. The current study provides a new information theoretic framework for proteins that could be widely applied for understanding protein sequences, structures, functions, and interactions.


Assuntos
Modelos Teóricos , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Dobramento de Proteína , Estrutura Terciária de Proteína , Proteínas/metabolismo , Termodinâmica
9.
Pharm Res ; 29(12): 3404-19, 2012 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-22893253

RESUMO

PURPOSE: To predict precision and other performance characteristics of chromatographic purity methods, which represent the most widely used form of analysis in the biopharmaceutical industry. METHODS: We have conducted a comprehensive survey of purity methods, and show that all performance characteristics fall within narrow measurement ranges. This observation was used to develop a model called Uncertainty Based on Current Information (UBCI), which expresses these performance characteristics as a function of the signal and noise levels, hardware specifications, and software settings. RESULTS: We applied the UCBI model to assess the uncertainty of purity measurements, and compared the results to those from conventional qualification. We demonstrated that the UBCI model is suitable to dynamically assess method performance characteristics, based on information extracted from individual chromatograms. CONCLUSIONS: The model provides an opportunity for streamlining qualification and validation studies by implementing a "live validation" of test results utilizing UBCI as a concurrent assessment of measurement uncertainty. Therefore, UBCI can potentially mitigate the challenges associated with laborious conventional method validation and facilitates the introduction of more advanced analytical technologies during the method lifecycle.


Assuntos
Cromatografia/métodos , Incerteza , Modelos Químicos , Modelos Estatísticos , Software
10.
Genomics ; 93(4): 314-22, 2009 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-19084590

RESUMO

We developed a computational model to explore the hypothesis that regulatory instructions are context dependent and conveyed through specific 'codes' in human genomic DNA. We provide examples of correlation of computational predictions to reported mapped DNase I hypersensitive segments in the HOXA locus in human chromosome 7. The examples show that statistically significant 9-mers from promoter regions may occur in sequences near and upstream of transcription initiation sites, in intronic regions, and within intergenic regions. Additionally, a subset of 9-mers from coding sequences appears frequently, as clusters, in regulatory regions dispersed in noncoding regions in genomic DNA. The results suggest that the computational model has the potential of decoding regulatory instructions to discover candidate transcription factor binding sites and to discover candidate epigenetic signals that appear in both coding and regulatory regions of genes.


Assuntos
Biologia Computacional/métodos , DNA/química , Sequências Reguladoras de Ácido Nucleico/genética , Sítios de Ligação , Cromossomos Humanos Par 7/genética , DNA/metabolismo , Genoma Humano , Proteínas de Homeodomínio/genética , Humanos , Regiões Promotoras Genéticas/genética
11.
Pac Symp Biocomput ; : 153-65, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18229683

RESUMO

Integrating molecular interaction data with existing knowledge of molecular function reveals mechanisms that underly cellular organization. We present NARADA, a software tool that implements a comprehensive analysis suite for functional annotation of pathways. NARADA takes as input a species-specific molecular interaction network and annotation of biomolecules in the network and provides the user with a set of pathways composed of functional attributes, which may be thought of pathway templates in the functional annotation space that recur in various contexts (different groups of specific molecules with similar functional annotation patterns) in the molecular interaction network. NARADA has its underpinnings in formal statistical measures of significance, and algorithmic bases for performance. Comprehensive evaluation on the E. coli transcriptional regulation and proteinprotein interaction data demonstrate NARADA's ability to detect known, as well as novel pathways.


Assuntos
Biologia Molecular/estatística & dados numéricos , Software , Algoritmos , Biologia Computacional , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/metabolismo , Genes Bacterianos , Genômica/estatística & dados numéricos , Modelos Biológicos , Modelos Estatísticos , Mapeamento de Interação de Proteínas/estatística & dados numéricos , Transcrição Gênica , Interface Usuário-Computador
12.
Theor Comput Sci ; 395(2-3): 203-219, 2008 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-19169438

RESUMO

We study the entropy rate of a hidden Markov process (HMP) defined by observing the output of a binary symmetric channel whose input is a first-order binary Markov process. Despite the simplicity of the models involved, the characterization of this entropy is a long standing open problem. By presenting the probability of a sequence under the model as a product of random matrices, one can see that the entropy rate sought is equal to a top Lyapunov exponent of the product. This offers an explanation for the elusiveness of explicit expressions for the HMP entropy rate, as Lyapunov exponents are notoriously difficult to compute. Consequently, we focus on asymptotic estimates, and apply the same product of random matrices to derive an explicit expression for a Taylor approximation of the entropy rate with respect to the parameter of the binary symmetric channel. The accuracy of the approximation is validated against empirical simulation results. We also extend our results to higher-order Markov processes and to Rényi entropies of any order.

13.
J Comput Biol ; 14(6): 747-64, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17691892

RESUMO

Comparative analyses of cellular interaction networks enable understanding of the cell's modular organization through identification of functional modules and complexes. These techniques often rely on topological features such as connectedness and density, based on the premise that functionally related proteins are likely to interact densely and that these interactions follow similar evolutionary trajectories. Significant recent work has focused on efficient algorithms for identification of such functional modules and their conservation. In spite of algorithmic advances, development of a comprehensive infrastructure for interaction databases is in relative infancy compared to corresponding sequence analysis tools. One critical, and as yet unresolved aspect of this infrastructure is a measure of the statistical significance of a match, or a dense subcomponent. In the absence of analytical measures, conventional methods rely on computationally expensive simulations based on ad-hoc models for quantifying significance. In this paper, we present techniques for analytically quantifying statistical significance of dense components in reference model graphs. We consider two reference models--a G(n, p) model in which each pair of nodes in a graph has an identical likelihood, p, of sharing an edge, and a two-level G(n, p) model, which accounts for high-degree hub nodes generally observed in interaction networks. Experiments performed on a rich collection of protein interaction (PPI) networks show that the proposed model provides a reliable means of evaluating statistical significance of dense patterns in these networks. We also adapt existing state-of-the-art network clustering algorithms by using our statistical significance measure as an optimization criterion. Comparison of the resulting module identification algorithm, SIDES, with existing methods shows that SIDES outperforms existing algorithms in terms of sensitivity and specificity of identified clusters with respect to available GO annotations.


Assuntos
Biologia Computacional , Mapeamento de Interação de Proteínas , Proteínas/química , Proteômica , Algoritmos , Sequência Conservada , Bases de Dados de Proteínas , Proteínas/metabolismo
14.
Bioinformatics ; 23(13): i377-86, 2007 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-17646320

RESUMO

MOTIVATION: Standardized annotations of biomolecules in interaction networks (e.g. Gene Ontology) provide comprehensive understanding of the function of individual molecules. Extending such annotations to pathways is a critical component of functional characterization of cellular signaling at the systems level. RESULTS: We propose a framework for projecting gene regulatory networks onto the space of functional attributes using multigraph models, with the objective of deriving statistically significant pathway annotations. We first demonstrate that annotations of pairwise interactions do not generalize to indirect relationships between processes. Motivated by this result, we formalize the problem of identifying statistically overrepresented pathways of functional attributes. We establish the hardness of this problem by demonstrating the non-monotonicity of common statistical significance measures. We propose a statistical model that emphasizes the modularity of a pathway, evaluating its significance based on the coupling of its building blocks. We complement the statistical model by an efficient algorithm and software, Narada, for computing significant pathways in large regulatory networks. Comprehensive results from our methods applied to the Escherichia coli transcription network demonstrate that our approach is effective in identifying known, as well as novel biological pathway annotations. AVAILABILITY: Narada is implemented in Java and is available at http://www.cs.purdue.edu/homes/jpandey/narada/.


Assuntos
Algoritmos , Documentação/métodos , Regulação da Expressão Gênica/fisiologia , Modelos Biológicos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Simulação por Computador , Software
15.
Artigo em Inglês | MEDLINE | ID: mdl-18301721

RESUMO

Questions of understanding and quantifying the representation and amount of information in organisms have become a central part of biological research, as they potentially hold the key to fundamental advances. In this paper, we demonstrate the use of information-theoretic tools for the task of identifying segments of biomolecules (DNA or RNA) that are statistically correlated. We develop a precise and reliable methodology, based on the notion of mutual information, for finding and extracting statistical as well as structural dependencies. A simple threshold function is defined, and its use in quantifying the level of significance of dependencies between biological segments is explored. These tools are used in two specific applications. First, they are used for the identification of correlations between different parts of the maize zmSRp32 gene. There, we find significant dependencies between the 5' untranslated region in zmSRp32 and its alternatively spliced exons. This observation may indicate the presence of as-yet unknown alternative splicing mechanisms or structural scaffolds. Second, using data from the FBI's combined DNA index system (CODIS), we demonstrate that our approach is particularly well suited for the problem of discovering short tandem repeats-an application of importance in genetic profiling.

16.
J Comput Biol ; 13(7): 1299-322, 2006 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-17037960

RESUMO

Molecular interaction data plays an important role in understanding biological processes at a modular level by providing a framework for understanding cellular organization, functional hierarchy, and evolutionary conservation. As the quality and quantity of network and interaction data increases rapidly, the problem of effectively analyzing this data becomes significant. Graph theoretic formalisms, commonly used for these analysis tasks, often lead to computationally hard problems due to their relation to subgraph isomorphism. This paper presents an innovative new algorithm, MULE, for detecting frequently occurring patterns and modules in biological networks. Using an innovative graph simplification technique based on ortholog contraction, which is ideally suited to biological networks, our algorithm renders these problems computationally tractable and scalable to large numbers of networks. We show, experimentally, that our algorithm can extract frequently occurring patterns in metabolic pathways and protein interaction networks from the KEGG, DIP, and BIND databases within seconds. When compared to existing approaches, our graph simplification technique can be viewed either as a pruning heuristic, or a closely related, but computationally simpler task. When used as a pruning heuristic, we show that our technique reduces effective graph sizes significantly, accelerating existing techniques by several orders of magnitude! Indeed, for most of the test cases, existing techniques could not even be applied without our pruning step. When used as a stand-alone analysis technique, MULE is shown to convey significant biological insights at near-interactive rates. The software, sample input graphs, and detailed results for comprehensive analysis of nine eukaryotic PPI networks are available at www.cs.purdue.edu/homes/koyuturk/mule.


Assuntos
Algoritmos , Evolução Molecular , Mapeamento de Interação de Proteínas/métodos , Animais , Bases de Dados de Proteínas , Humanos , Ligação Proteica , Homologia de Sequência
17.
J Comput Biol ; 13(2): 182-99, 2006 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-16597234

RESUMO

With an ever-increasing amount of available data on protein-protein interaction (PPI) networks and research revealing that these networks evolve at a modular level, discovery of conserved patterns in these networks becomes an important problem. Although available data on protein-protein interactions is currently limited, recently developed algorithms have been shown to convey novel biological insights through employment of elegant mathematical models. The main challenge in aligning PPI networks is to define a graph theoretical measure of similarity between graph structures that captures underlying biological phenomena accurately. In this respect, modeling of conservation and divergence of interactions, as well as the interpretation of resulting alignments, are important design parameters. In this paper, we develop a framework for comprehensive alignment of PPI networks, which is inspired by duplication/divergence models that focus on understanding the evolution of protein interactions. We propose a mathematical model that extends the concepts of match, mismatch, and gap in sequence alignment to that of match, mismatch, and duplication in network alignment and evaluates similarity between graph structures through a scoring function that accounts for evolutionary events. By relying on evolutionary models, the proposed framework facilitates interpretation of resulting alignments in terms of not only conservation but also divergence of modularity in PPI networks. Furthermore, as in the case of sequence alignment, our model allows flexibility in adjusting parameters to quantify underlying evolutionary relationships. Based on the proposed model, we formulate PPI network alignment as an optimization problem and present fast algorithms to solve this problem. Detailed experimental results from an implementation of the proposed framework show that our algorithm is able to discover conserved interaction patterns very effectively, in terms of both accuracies and computational cost.


Assuntos
Algoritmos , Proteínas de Caenorhabditis elegans/metabolismo , Proteínas de Drosophila/metabolismo , Mapeamento de Interação de Proteínas , Proteoma/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Animais , Caenorhabditis elegans/metabolismo , Simulação por Computador , Sequência Conservada , Drosophila melanogaster/metabolismo , Evolução Molecular , Saccharomyces cerevisiae/metabolismo , Alinhamento de Sequência , Transdução de Sinais/fisiologia
18.
Int J Bioinform Res Appl ; 1(1): 3-17, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-18048118

RESUMO

The biological world is highly stochastic and inhomogeneous in its behaviour. There are regions in DNA with a high concentration of G or C bases; stretches of sequences with an abundance of CG dinucleotide (CpG islands); coding regions with strong periodicity-of-three pattern, and so forth. Transitions between these regions of DNA, known also as change points, carry important biological information. Computational methods used to identify these homogeneous regions are called segmentations. Viewing a DNA sequence as a non-stationary process, we apply recent novel techniques of universal source coding to discover stationary (possibly recurrent) segments. In particular, the Stein-Ziv lemma is adopted to find an asymptotically optimal discriminant function that determines whether two DNA segments are generated by the same source assuring exponentially small false positives. Next, we use the Minimum Description Length (MDL) principle to select parameters that lead to a linear-time segmentation algorithm. We apply our algorithm to human chromosome 9 and chromosome 20 to discover coding and noncoding regions, starting positions of genes, as well as the beginning of CpG islands.


Assuntos
Biologia Computacional/métodos , DNA/química , Análise de Sequência de DNA/métodos , Algoritmos , Sequência de Bases , Sítios de Ligação , Mapeamento Cromossômico , Cromossomos Humanos Par 20/genética , Cromossomos Humanos Par 9/genética , Ilhas de CpG , Feminino , Humanos , Masculino , Modelos Genéticos , Modelos Estatísticos , Dados de Sequência Molecular , Mutação , Ribossomos/química , Fatores Sexuais , Processos Estocásticos
19.
Genomics ; 84(6): 929-40, 2004 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-15533710

RESUMO

Central to reconstruction of cis-regulatory networks is identification and classification of naturally occurring transcription factor-binding sites according to the genes that they control. We have examined salient characteristics of 9-mers that occur in various orders and combinations in the proximal promoters of human genes. In evaluations of a dataset derived with respect to experimentally defined transcription initiation sites, in some cases we observed a clear correspondence of highly ranked 9-mers with protein-binding sites in genomic DNA. Evaluations of the larger dataset, derived with respect to the 5' end of human ESTs, revealed that a subset of the highly ranked 9-mers corresponded to sites for several known transcription factor families (including CREB, ETS, EGR-1, SP1, KLF, MAZ, HIF-1, and STATs) that play important roles in the regulation of vertebrate genes. We identified several highly ranked CpG-containing 9-mers, defining sites for interactions with the CREB and ETS families of proteins, and identified potential target genes for these proteins. The results of the studies imply that the CpG-containing transcription factor-binding sites regulate the expression of genes with important roles in pathways leading to cell-type-specific gene expression and pathways controlled by the complex networks of signaling systems.


Assuntos
Biologia Computacional , Regulação da Expressão Gênica , Regiões Promotoras Genéticas/genética , Elementos de Resposta/genética , Sítio de Iniciação de Transcrição , Transcrição Gênica/genética , Regiões 5' não Traduzidas/genética , Sequência de Bases , Sítios de Ligação , Bases de Dados Genéticas , Humanos , Dados de Sequência Molecular , Ligação Proteica , Fatores de Transcrição/metabolismo
20.
Bioinformatics ; 20 Suppl 1: i200-7, 2004 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-15262800

RESUMO

MOTIVATION: With rapidly increasing amount of network and interaction data in molecular biology, the problem of effectively analyzing this data is an important one. Graph theoretic formalisms, commonly used for these analysis tasks, often lead to computationally hard problems due to their relation with subgraph isomorphism. RESULTS: This paper presents an innovative new algorithm for detecting frequently occurring patterns and modules in biological networks. Using an innovative graph simplification technique, which is ideally suited to biological networks, our algorithm renders these problems computationally tractable. Indeed, we show experimentally that our algorithm can extract frequently occurring patterns in metabolic pathways extracted from the KEGG database within seconds. The proposed model and algorithm are applicable to a variety of biological networks either directly or with minor modifications. AVAILABILITY: Implementation of the proposed algorithms in the C programming language is available as open source at http://www.cs.purdue.edu/homes/koyuturk/pathway/


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Expressão Gênica/fisiologia , Modelos Biológicos , Mapeamento de Interação de Proteínas/métodos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Simulação por Computador , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA