Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36738256

RESUMO

Many problems in life sciences can be brought back to a comparison of graphs. Even though a multitude of such techniques exist, often, these assume prior knowledge about the partitioning or the number of clusters and fail to provide statistical significance of observed between-network heterogeneity. Addressing these issues, we developed an unsupervised workflow to identify groups of graphs from reliable network-based statistics. In particular, we first compute the similarity between networks via appropriate distance measures between graphs and use them in an unsupervised hierarchical algorithm to identify classes of similar networks. Then, to determine the optimal number of clusters, we recursively test for distances between two groups of networks. The test itself finds its inspiration in distance-wise ANOVA algorithms. Finally, we assess significance via the permutation of between-object distance matrices. Notably, the approach, which we will call netANOVA, is flexible since users can choose multiple options to adapt to specific contexts and network types. We demonstrate the benefits and pitfalls of our approach via extensive simulations and an application to two real-life datasets. NetANOVA achieved high performance in many simulation scenarios while controlling type I error. On non-synthetic data, comparison against state-of-the-art methods showed that netANOVA is often among the top performers. There are many application fields, including precision medicine, for which identifying disease subtypes via individual-level biological networks improves prevention programs, diagnosis and disease monitoring.


Assuntos
Algoritmos , Análise por Conglomerados , Simulação por Computador , Fluxo de Trabalho , Análise de Variância
2.
Algorithms Mol Biol ; 19(1): 17, 2024 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-38679703

RESUMO

The graph traversal edit distance (GTED), introduced by Ebrahimpour Boroojeny et al. (2018), is an elegant distance measure defined as the minimum edit distance between strings reconstructed from Eulerian trails in two edge-labeled graphs. GTED can be used to infer evolutionary relationships between species by comparing de Bruijn graphs directly without the computationally costly and error-prone process of genome assembly. Ebrahimpour Boroojeny et al. (2018) propose two ILP formulations for GTED and claim that GTED is polynomially solvable because the linear programming relaxation of one of the ILPs always yields optimal integer solutions. The claim that GTED is polynomially solvable is contradictory to the complexity results of existing string-to-graph matching problems. We resolve this conflict in complexity results by proving that GTED is NP-complete and showing that the ILPs proposed by Ebrahimpour Boroojeny et al. do not solve GTED but instead solve for a lower bound of GTED and are not solvable in polynomial time. In addition, we provide the first two, correct ILP formulations of GTED and evaluate their empirical efficiency. These results provide solid algorithmic foundations for comparing genome graphs and point to the direction of heuristics. The source code to reproduce experimental results is available at https://github.com/Kingsford-Group/gtednewilp/ .

3.
Biomimetics (Basel) ; 7(1)2022 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-35225919

RESUMO

Metabolic pathways provide key information for achieving a better understanding of life and all its processes; this is useful information for the improvement of medicine, agronomy, pharmacy, and other similar areas. The main analysis tool used to study these pathways is based on pathway comparison, using graph data structures. Metabolic pathway comparison has been defined as a computationally complex task. In a previous work, two new algorithms were introduced to treat the problem of metabolic pathway pairwise comparison. Here we provide an extended analysis with more data and a deeper analysis of metabolic pathway comparison as listed in the discussion and results section.

4.
Interface Focus ; 11(4): 20200066, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34123355

RESUMO

Alignments of discrete objects can be constructed in a very general setting as super-objects from which the constituent objects are recovered by means of projections. Here, we focus on contact maps, i.e. undirected graphs with an ordered set of vertices. These serve as natural discretizations of RNA and protein structures. In the general case, the alignment problem for vertex-ordered graphs is NP-complete. In the special case of RNA secondary structures, i.e. crossing-free matchings, however, the alignments have a recursive structure. The alignment problem then can be solved by a variant of the Sankoff algorithm in polynomial time. Moreover, the tree or forest alignments of RNA secondary structure can be understood as the alignments of ordered edge sets.

5.
Netw Neurosci ; 4(3): 507-527, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32885113

RESUMO

Graph theoretical approach has proved an effective tool to understand, characterize, and quantify the complex brain network. However, much less attention has been paid to methods that quantitatively compare two graphs, a crucial issue in the context of brain networks. Comparing brain networks is indeed mandatory in several network neuroscience applications. Here, we discuss the current state of the art, challenges, and a collection of analysis tools that have been developed in recent years to compare brain networks. We first introduce the graph similarity problem in brain network application. We then describe the methodological background of the available metrics and algorithms of comparing graphs, their strengths, and limitations. We also report results obtained in concrete applications from normal brain networks. More precisely, we show the potential use of brain network similarity to build a "network of networks" that may give new insights into the object categorization in the human brain. Additionally, we discuss future directions in terms of network similarity methods and applications.

6.
J Comput Biol ; 27(3): 436-439, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-32160033

RESUMO

Graph Traversal Edit Distance (GTED) is a measure of distance (or dissimilarity) between two graphs introduced. This measure is based on the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs. GTED was motivated by and provides the first mathematical formalism for sequence coassembly and de novo variation detection in bioinformatics. Many problems in applied machine learning deal with graphs (also called networks), including social networks, security, web data mining, protein function prediction, and genome informatics. The kernel paradigm beautifully decouples the learning algorithm from the underlying geometric space, which renders graph kernels important for the aforementioned applications. In this article, we introduce a tool, PyGTED to compute GTED. It implements the algorithm based on the polynomial time algorithm devised for it by the authors. Informally, the GTED is the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs.


Assuntos
Biologia Computacional/métodos , Mineração de Dados , Aprendizado de Máquina , Programação Linear , Software
7.
J Comput Biol ; 27(3): 317-329, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-32058803

RESUMO

Many problems in applied machine learning deal with graphs (also called networks), including social networks, security, web data mining, protein function prediction, and genome informatics. The kernel paradigm beautifully decouples the learning algorithm from the underlying geometric space, which renders graph kernels important for the aforementioned applications. In this article, we give a new graph kernel, which we call graph traversal edit distance (GTED). We introduce the GTED problem and give the first polynomial time algorithm for it. Informally, the GTED is the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs. Also, GTED is motivated by and provides the first mathematical formalism for sequence co-assembly and de novo variation detection in bioinformatics. We demonstrate that GTED admits a polynomial time algorithm using a linear program in the graph product space that is guaranteed to yield an integer solution. To the best of our knowledge, this is the first approach to this problem. We also give a linear programming relaxation algorithm for a lower bound on GTED. We use GTED as a graph kernel and evaluate it by computing the accuracy of a support vector machine (SVM) classifier on a few data sets in the literature. Our results suggest that our kernel outperforms many of the common graph kernels in the tested data sets. As a second set of experiments, we successfully cluster viral genomes using GTED on their assembly graphs obtained from de novo assembly of next-generation sequencing reads.


Assuntos
Biologia Computacional/métodos , Programação Linear , Algoritmos , Animais , Mineração de Dados , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Máquina de Vetores de Suporte
8.
Appl Netw Sci ; 3(1): 2, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30839726

RESUMO

Most real networks are too large or they are not available for real time analysis. Therefore, in practice, decisions are made based on partial information about the ground truth network. It is of great interest to have metrics to determine if an inferred network (the partial information network) is similar to the ground truth. In this paper we develop a test for similarity between the inferred and the true network. Our research utilizes a network visualization tool, which systematically discovers a network, producing a sequence of snapshots of the network. We introduce and test our metric on the consecutive snapshots of a network, and against the ground truth. To test the scalability of our metric we use a random matrix theory approach while discovering Erdös-Rényi graphs. This scaling analysis allows us to make predictions about the performance of the discovery process.

9.
Mol Inform ; 34(8): 550-8, 2015 08.
Artigo em Inglês | MEDLINE | ID: mdl-27490500

RESUMO

The comparison of protein binding sites is a prominent task in computational chemistry and has been studied in many different ways. For the automatic detection and comparison of putative binding cavities the Cavbase system has been developed which uses a coarse-grained set of pseudocenters to represent the physicochemical properties of a binding site and employs a graph-based procedure to calculate similarities between two binding sites. However, the comparison of two graphs is computationally quite demanding which makes large-scale studies such as the rapid screening of entire databases hardly feasible. In a recent work, we proposed the method Local Cliques (LC) for the efficient comparison of Cavbase binding sites. It employs a clique heuristic to detect the maximum common subgraph of two binding sites and an extended graph model to additionally compare the shape of individual surface patches. In this study, we present an alternative to further accelerate the LC method by partitioning the binding-site graphs into disjoint components prior to their comparisons. The pseudocenter sets are split with regard to their assigned phyiscochemical type, which leads to seven much smaller graphs than the original one. Applying this approach on the same test scenarios as in the former comprehensive way results in a significant speed-up without sacrificing accuracy.


Assuntos
Bases de Dados de Proteínas , Modelos Moleculares , Sítios de Ligação
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa