RESUMEN
BACKGROUND: Graphlets are useful for bioinformatics network analysis. Based on the structure of Hocevar and Demsar's ORCA algorithm, we have created an orbit counting algorithm, named Jesse. This algorithm, like ORCA, uses equations to count the orbits, but unlike ORCA it can count graphlets of any order. To do so, it generates the required internal structures and equations automatically. Many more redundant equations are generated, however, and Jesse's running time is highly dependent on which of these equations are used. Therefore, this paper aims to investigate which equations are most efficient, and which factors have an effect on this efficiency. RESULTS: With appropriate equation selection, Jesse's running time may be reduced by a factor of up to 2 in the best case, compared to using randomly selected equations. Which equations are most efficient depends on the density of the graph, but barely on the graph type. At low graph density, equations with terms in their right-hand side with few arguments are more efficient, whereas at high density, equations with terms with many arguments in the right-hand side are most efficient. At a density between 0.6 and 0.7, both types of equations are about equally efficient. CONCLUSIONS: Our Jesse algorithm became up to a factor 2 more efficient, by automatically selecting the best equations based on graph density. It was adapted into a Cytoscape App that is freely available from the Cytoscape App Store to ease application by bioinformaticians.
Asunto(s)
Algoritmos , Biología Computacional/métodos , Gráficos por Computador , Interpretación Estadística de Datos , Diabetes Mellitus/metabolismo , Mapeo de Interacción de Proteínas/métodos , Proteínas/metabolismo , Simulación por Computador , Humanos , Modelos BiológicosRESUMEN
Motivation: Graphlets are a useful tool to determine a graph's small-scale structure. Finding them is exponentially hard with respect to the number of nodes in each graphlet. Therefore, equations can be used to reduce the size of graphlets that need to be enumerated to calculate the number of each graphlet touching each node. Hocevar and Demsar first introduced such equations, which were derived manually, and an algorithm that uses them, but only graphlets with four or five nodes can be counted this way. Results: We present a new algorithm for orbit counting, which is applicable to graphlets of any order. This algorithm uses a tree structure to simplify finding orbits, and stabilizers and symmetry-breaking constraints to ensure correctness. This method gives a significant speedup compared to a brute force counting method and can count orbits beyond the capacity of other available tools. Availability and implementation: An implementation of the algorithm can be found at https://github.com/biointec/jesse. Contact: pieter.audenaert@ugent.be.
Asunto(s)
Algoritmos , Biología Computacional/métodos , Gráficos por ComputadorRESUMEN
Summary: We present a Cytoscape app for the ISMAGS algorithm, which can enumerate all instances of a motif in a graph, making optimal use of the motif's symmetries to make the search more efficient. The Cytoscape app provides a handy interface for this algorithm, which allows more efficient network analysis. Availability and Implementation: The Cytoscape app for ISMAGS can be freely downloaded from the Cytoscape App store http://apps.cytoscape.org/apps/ismags. Source code and documentation for ISMAGS are available at https://github.com/biointec/ismags. Source code and documentation for the Cytoscape app are available at https://gitlab.psb.ugent.be/thpar/ISMAGS_Cytoscape. Contacts: Pieter.Audenaert@intec.ugent.be or Yves.VanDePeer@psb.vib-ugent.be Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Biología Computacional/métodos , Programas Informáticos , Algoritmos , Gráficos por ComputadorRESUMEN
Graphlets are small subgraphs, usually containing up to five vertices, that can be found in a larger graph. Identification of the graphlets that a vertex in an explored graph touches can provide useful information about the local structure of the graph around that vertex. Actually finding all graphlets in a large graph can be time-consuming, however. As the graphlets grow in size, more different graphlets emerge and the time needed to find each graphlet also scales up. If it is not needed to find each instance of each graphlet, but knowing the number of graphlets touching each node of the graph suffices, the problem is less hard. Previous research shows a way to simplify counting the graphlets: instead of looking for the graphlets needed, smaller graphlets are searched, as well as the number of common neighbors of vertices. Solving a system of equations then gives the number of times a vertex is part of each graphlet of the desired size. However, until now, equations only exist to count graphlets with 4 or 5 nodes. In this paper, two new techniques are presented. The first allows to generate the equations needed in an automatic way. This eliminates the tedious work needed to do so manually each time an extra node is added to the graphlets. The technique is independent on the number of nodes in the graphlets and can thus be used to count larger graphlets than previously possible. The second technique gives all graphlets a unique ordering which is easily extended to name graphlets of any size. Both techniques were used to generate equations to count graphlets with 4, 5 and 6 vertices, which extends all previous results. Code can be found at https://github.com/IneMelckenbeeck/equation-generator and https://github.com/IneMelckenbeeck/graphlet-naming.