Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 20(Suppl 12): 318, 2019 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-31216986

RESUMO

BACKGROUND: Identification of motifs-recurrent and statistically significant patterns-in biological networks is the key to understand the design principles, and to infer governing mechanisms of biological systems. This, however, is a computationally challenging task. This task is further complicated as biological interactions depend on limited resources, i.e., a reaction takes place if the reactant molecule concentrations are above a certain threshold level. This biochemical property implies that network edges can participate in a limited number of motifs simultaneously. Existing motif counting methods ignore this problem. This simplification often leads to inaccurate motif counts (over- or under-estimates), and thus, wrong biological interpretations. RESULTS: In this paper, we develop a novel motif counting algorithm, Partially Overlapping MOtif Counting (POMOC), that considers capacity levels for all interactions in counting motifs. CONCLUSIONS: Our experiments on real and synthetic networks demonstrate that motif count using the POMOC method significantly differs from the existing motif counting approaches, and our method extends to large-scale biological networks in practical time. Our results also show that our method makes it possible to characterize the impact of different stress factors on cell's organization of network. In this regard, analysis of a S. cerevisiae transcriptional regulatory network using our method shows that oxidative stress is more disruptive to organization and abundance of motifs in this network than mutations of individual genes. Our analysis also suggests that by focusing on the edges that lead to variation in motif counts, our method can be used to find important genes, and to reveal subtle topological and functional differences of the biological networks under different cell states.


Assuntos
Redes Reguladoras de Genes/genética , Saccharomyces cerevisiae/genética , Algoritmos , Bases de Dados Genéticas , Genes Fúngicos , Modelos Biológicos , Estresse Oxidativo/genética
2.
Bioinformatics ; 30(12): i96-104, 2014 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-24932011

RESUMO

MOTIVATION: Major disorders, such as leukemia, have been shown to alter the transcription of genes. Understanding how gene regulation is affected by such aberrations is of utmost importance. One promising strategy toward this objective is to compute whether signals can reach to the transcription factors through the transcription regulatory network (TRN). Due to the uncertainty of the regulatory interactions, this is a #P-complete problem and thus solving it for very large TRNs remains to be a challenge. RESULTS: We develop a novel and scalable method to compute the probability that a signal originating at any given set of source genes can arrive at any given set of target genes (i.e., transcription factors) when the topology of the underlying signaling network is uncertain. Our method tackles this problem for large networks while providing a provably accurate result. Our method follows a divide-and-conquer strategy. We break down the given network into a sequence of non-overlapping subnetworks such that reachability can be computed autonomously and sequentially on each subnetwork. We represent each interaction using a small polynomial. The product of these polynomials express different scenarios when a signal can or cannot reach to target genes from the source genes. We introduce polynomial collapsing operators for each subnetwork. These operators reduce the size of the resulting polynomial and thus the computational complexity dramatically. We show that our method scales to entire human regulatory networks in only seconds, while the existing methods fail beyond a few tens of genes and interactions. We demonstrate that our method can successfully characterize key reachability characteristics of the entire transcriptions regulatory networks of patients affected by eight different subtypes of leukemia, as well as those from healthy control samples. AVAILABILITY: All the datasets and code used in this article are available at bioinformatics.cise.ufl.edu/PReach/scalable.htm.


Assuntos
Redes Reguladoras de Genes , Algoritmos , Biologia Computacional/métodos , Regulação da Expressão Gênica , Humanos , Leucemia/genética , Leucemia/metabolismo , Transdução de Sinais , Fatores de Transcrição/genética
3.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 3093-3105, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37276117

RESUMO

The plight of navigating high-dimensional transcription datasets remains a persistent problem. This problem is further amplified for complex disorders, such as cancer as these disorders are often multigenic traits with multiple subsets of genes collectively affecting the type, stage, and severity of the trait. We are often faced with a trade off between reducing the dimensionality of our datasets and maintaining the integrity of our data. To accomplish both tasks simultaneously for very high dimensional transcriptome for complex multigenic traits, we propose a new supervised technique, Class Separation Transformation (CST). CST accomplishes both tasks simultaneously by significantly reducing the dimensionality of the input space into a one-dimensional transformed space that provides optimal separation between the differing classes. Furthermore, CST offers an means of explainable ML, as it computes the relative importance of each feature for its contribution to class distinction, which can thus lead to deeper insights and discovery. We compare our method with existing state-of-the-art methods using both real and synthetic datasets, demonstrating that CST is the more accurate, robust, scalable, and computationally advantageous technique relative to existing methods. Code used in this paper is available on https://github.com/richiebailey74/CST.


Assuntos
Transcriptoma , Fenótipo
4.
Artigo em Inglês | MEDLINE | ID: mdl-34398763

RESUMO

MOTIVATION: In bioinformatics, complex cellular modeling and behavior simulation to identify significant molecular interactions is considered a relevant problem. Traditional methods model such complex systems using single and binary network. However, this model is inadequate to represent biological networks as different sets of interactions can simultaneously take place for different interaction constraints (such as transcription regulation and protein interaction). Furthermore, biological systems may exhibit varying interaction topologies even for the same interaction type under different developmental stages or stress conditions. Therefore, models which consider biological systems as solitary interactions are inaccurate as they fail to capture the complex behavior of cellular interactions within organisms. Identification and counting of recurrent motifs within a network is one of the fundamental problems in biological network analysis. Existing methods for motif counting on single network topologies are inadequate to capture patterns of molecular interactions that have significant changes in biological expression when identified across different organisms that are similar, or even time-varying networks within the same organism. That is, they fail to identify recurrent interactions as they consider a single snapshot of a network among a set of multiple networks. Therefore, we need methods geared towards studying multiple network topologies and the pattern conservation among them. Contributions: In this paper, we consider the problem of counting the number of instances of a user supplied motif topology in a given multilayer network. We model interactions among a set of entities (e.g., genes)describing various conditions or temporal variation as multilayer networks. Thus a separate network as each layer shows the connectivity of the nodes under a unique network state. Existing motif counting and identification methods are limited to single network topologies, and thus cannot be directly applied on multilayer networks. We apply our model and algorithm to study frequent patterns in cellular networks that are common in varying cellular states under different stress conditions, where the cellular network topology under each stress condition describes a unique network layer. RESULTS: We develop a methodology and corresponding algorithm based on the proposed model for motif counting in multilayer networks. We performed experiments on both real and synthetic datasets. We modeled the synthetic datasets under a wide spectrum of parameters, such as network size, density, motif frequency. Results on synthetic datasets demonstrate that our algorithm finds motif embeddings with very high accuracy compared to existing state-of-the-art methods such as G-tries, ESU (FANMODE)and mfinder. Furthermore, we observe that our method runs from several times to several orders of magnitude faster than existing methods. For experiments on real dataset, we consider Escherichia coli (E. coli)transcription regulatory network under different experimental conditions. We observe that the genes selected by our method conserves functional characteristics under various stress conditions with very low false discovery rates. Moreover, the method is scalable to real networks in terms of both network size and number of layers.


Assuntos
Escherichia coli , Redes Reguladoras de Genes , Algoritmos , Biologia Computacional/métodos , Escherichia coli/genética , Redes Reguladoras de Genes/genética
5.
Artigo em Inglês | MEDLINE | ID: mdl-26357078

RESUMO

Extra-cellular molecules trigger a response inside the cell by initiating a signal at special membrane receptors (i.e., sources), which is then transmitted to reporters (i.e., targets) through various chains of interactions among proteins. Understanding whether such a signal can reach from membrane receptors to reporters is essential in studying the cell response to extra-cellular events. This problem is drastically complicated due to the unreliability of the interaction data. In this paper, we develop a novel method, called PReach (Probabilistic Reachability), that precisely computes the probability that a signal can reach from a given collection of receptors to a given collection of reporters when the underlying signaling network is uncertain. This is a very difficult computational problem with no known polynomial-time solution. PReach represents each uncertain interaction as a bi-variate polynomial. It transforms the reachability problem to a polynomial multiplication problem. We introduce novel polynomial collapsing operators that associate polynomial terms with possible paths between sources and targets as well as the cuts that separate sources from targets. These operators significantly shrink the number of polynomial terms and thus the running time. PReach has much better time complexity than the recent solutions for this problem. Our experimental results on real data sets demonstrate that this improvement leads to orders of magnitude of reduction in the running time over the most recent methods. Availability: All the data sets used, the software implemented and the alignments found in this paper are available at http://bioinformatics.cise.ufl.edu/PReach/.


Assuntos
Biologia Computacional/métodos , Modelos Biológicos , Modelos Estatísticos , Transdução de Sinais , Algoritmos , Software
6.
Pac Symp Biocomput ; : 111-22, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23424117

RESUMO

Discovering signaling pathways in protein interaction networks is a key ingredient in understanding how proteins carry out cellular functions. These interactions however can be uncertain events that may or may not take place depending on many factors including the internal factors, such as the size and abundance of the proteins, or the external factors, such as mutations, disorders and drug intake. In this paper, we consider the problem of finding causal orderings of nodes in such protein interaction networks to discover signaling pathways. We adopt color coding technique to address this problem. Color coding method may fail with some probability. By allowing it to run for sufficient time, however, its confidence in the optimality of the result can converge close to 100%. Our key contribution in this paper is elimination of the key conservative assumptions made by the traditional color coding methods while computing its success probability. We do this by carefully establishing the relationship between node colors, network topology and success probability. As a result our method converges to any confidence value much faster than the traditional methods. Thus, it is scalable to larger protein interaction networks and longer signaling pathways than existing methods. We demonstrate, both theoretically and experimentally that our method outperforms existing methods.


Assuntos
Mapeamento de Interação de Proteínas/estatística & dados numéricos , Mapas de Interação de Proteínas , Transdução de Sinais , Algoritmos , Animais , Cor , Biologia Computacional , Gráficos por Computador , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Modelos Biológicos , Modelos Estatísticos , Probabilidade , Ratos , Incerteza
7.
Artigo em Inglês | MEDLINE | ID: mdl-23702548

RESUMO

Interactions between molecules are probabilistic events. An interaction may or may not happen with some probability, depending on a variety of factors such as the size, abundance, or proximity of the interacting molecules. In this paper, we consider the problem of aligning two biological networks. Unlike existing methods, we allow one of the two networks to contain probabilistic interactions. Allowing interaction probabilities makes the alignment more biologically relevant at the expense of explosive growth in the number of alternative topologies that may arise from different subsets of interactions that take place. We develop a novel method that efficiently and precisely characterizes this massive search space. We represent the topological similarity between pairs of aligned molecules (i.e., proteins) with the help of random variables and compute their expected values. We validate our method showing that, without sacrificing the running time performance, it can produce novel alignments. Our results also demonstrate that our method identifies biologically meaningful mappings under a comprehensive set of criteria used in the literature as well as the statistical coherence measure that we developed to analyze the statistical significance of the similarity of the functions of the aligned protein pairs.


Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes , Redes e Vias Metabólicas , Modelos Biológicos , Modelos Estatísticos , Algoritmos , Animais , Humanos , Reprodutibilidade dos Testes
8.
Artigo em Inglês | MEDLINE | ID: mdl-24334390

RESUMO

UNLABELLED: Biological interactions are often uncertain events, that may or may not take place with some probability. This uncertainty leads to a massive number of alternative interaction topologies for each such network. The existing studies analyze the degree distribution of biological networks by assuming that all the given interactions take place under all circumstances. This strong and often incorrect assumption can lead to misleading results. In this paper, we address this problem and develop a sound mathematical basis to characterize networks in the presence of uncertain interactions. Using our mathematical representation, we develop a method that can accurately describe the degree distribution of such networks. We also take one more step and extend our method to accurately compute the joint-degree distributions of node pairs connected by edges. The number of possible network topologies grows exponentially with the number of uncertain interactions. However, the mathematical model we develop allows us to compute these degree distributions in polynomial time in the number of interactions. Our method works quickly even for entire protein-protein interaction (PPI) networks. It also helps us find an adequate mathematical model using MLE. We perform a comparative study of node-degree and joint-degree distributions in two types of biological networks: the classical deterministic networks and the more flexible probabilistic networks. Our results confirm that power-law and log-normal models best describe degree distributions for both probabilistic and deterministic networks. Moreover, the inverse correlation of degrees of neighboring nodes shows that, in probabilistic networks, nodes with large number of interactions prefer to interact with those with small number of interactions more frequently than expected. We also show that probabilistic networks are more robust for node-degree distribution computation than the deterministic ones. AVAILABILITY: all the data sets used, the software implemented and the alignments found in this paper are available at http://bioinformatics.cise.ufl.edu/projects/probNet/.


Assuntos
Biologia Computacional/métodos , Redes e Vias Metabólicas , Modelos Biológicos , Modelos Estatísticos , Animais , Redes Reguladoras de Genes , Mapas de Interação de Proteínas , Transdução de Sinais
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA