Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
PLoS Comput Biol ; 12(11): e1005184, 2016 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-27835644

RESUMO

The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems.


Assuntos
Composição de Bases/genética , Engenharia de Proteínas/métodos , Proteínas/química , Proteínas/genética , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Software , Algoritmos
2.
Phys Rev E Stat Nonlin Soft Matter Phys ; 80(1 Pt 2): 016118, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19658785

RESUMO

Many complex networks display a mesoscopic structure with groups of nodes sharing many links with the other nodes in their group and comparatively few with nodes of different groups. This feature is known as community structure and encodes precious information about the organization and the function of the nodes. Many algorithms have been proposed but it is not yet clear how they should be tested. Recently we have proposed a general class of undirected and unweighted benchmark graphs, with heterogeneous distributions of node degree and community size. An increasing attention has been recently devoted to develop algorithms able to consider the direction and the weight of the links, which require suitable benchmark graphs for testing. In this paper we extend the basic ideas behind our previous benchmark to generate directed and weighted networks with built-in community structure. We also consider the possibility that nodes belong to more communities, a feature occurring in real systems, such as social networks. As a practical application, we show how modularity optimization performs on our benchmark.

3.
Phys Rev E Stat Nonlin Soft Matter Phys ; 78(4 Pt 2): 046110, 2008 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-18999496

RESUMO

Community structure is one of the most important features of real networks and reveals the internal organization of the nodes. Many algorithms have been proposed but the crucial issue of testing, i.e., the question of how good an algorithm is, with respect to others, is still open. Standard tests include the analysis of simple artificial graphs with a built-in community structure, that the algorithm has to recover. However, the special graphs adopted in actual tests have a structure that does not reflect the real properties of nodes and communities found in real networks. Here we introduce a class of benchmark graphs, that account for the heterogeneity in the distributions of node degrees and of community sizes. We use this benchmark to test two popular methods of community detection, modularity optimization, and Potts model clustering. The results show that the benchmark poses a much more severe test to algorithms than standard benchmarks, revealing limits that may not be apparent at a first analysis.


Assuntos
Redes Neurais de Computação , Algoritmos , Análise por Conglomerados , Redes Comunitárias , Modelos Biológicos , Modelos Estatísticos , Modelos Teóricos
4.
Phys Rev E ; 93(3): 032309, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-27078368

RESUMO

Community detection of network flows conventionally assumes one-step dynamics on the links. For sparse networks and interest in large-scale structures, longer timescales may be more appropriate. Oppositely, for large networks and interest in small-scale structures, shorter timescales may be better. However, current methods for analyzing networks at different timescales require expensive and often infeasible network reconstructions. To overcome this problem, we introduce a method that takes advantage of the inner workings of the map equation and evades the reconstruction step. This makes it possible to efficiently analyze large networks at different Markov times with no extra overhead cost. The method also evades the costly unipartite projection for identifying flow modules in bipartite networks.

5.
Nat Commun ; 5: 4630, 2014 Aug 11.
Artigo em Inglês | MEDLINE | ID: mdl-25109694

RESUMO

Random walks on networks is the standard tool for modelling spreading processes in social and biological systems. This first-order Markov approach is used in conventional community detection, ranking and spreading analysis, although it ignores a potentially important feature of the dynamics: where flow moves to may depend on where it comes from. Here we analyse pathways from different systems, and although we only observe marginal consequences for disease spreading, we show that ignoring the effects of second-order Markov dynamics has important consequences for community detection, ranking and information spreading. For example, capturing dynamics with a second-order Markov model allows us to reveal actual travel patterns in air traffic and to uncover multidisciplinary journals in scientific communication. These findings were achieved only by using more available data and making no additional assumptions, and therefore suggest that accounting for higher-order memory in network flows can help us better understand how real systems are organized and function.


Assuntos
Surtos de Doenças , Métodos Epidemiológicos , Teoria da Informação , Cadeias de Markov , Meios de Transporte , Algoritmos , Humanos , Disseminação de Informação , Modelos Estatísticos , Probabilidade , Estados Unidos
6.
Sci Rep ; 2: 336, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22468223

RESUMO

The community structure of complex networks reveals both their organization and hidden relationships among their constituents. Most community detection methods currently available are not deterministic, and their results typically depend on the specific random seeds, initial conditions and tie-break rules adopted for their execution. Consensus clustering is used in data analysis to generate stable results out of a set of partitions delivered by stochastic methods. Here we show that consensus clustering can be combined with any existing method in a self-consistent way, enhancing considerably both the stability and the accuracy of the resulting partitions. This framework is also particularly suitable to monitor the evolution of community structure in temporal networks. An application of consensus clustering to a large citation network of physics papers demonstrates its capability to keep track of the birth, death and diversification of topics.

7.
Phys Rev E Stat Nonlin Soft Matter Phys ; 84(6 Pt 2): 066122, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-22304170

RESUMO

Modularity maximization is the most popular technique for the detection of community structure in graphs. The resolution limit of the method is supposedly solvable with the introduction of modified versions of the measure, with tunable resolution parameters. We show that multiresolution modularity suffers from two opposite coexisting problems: the tendency to merge small subgraphs, which dominates when the resolution is low; the tendency to split large subgraphs, which dominates when the resolution is high. In benchmark networks with heterogeneous distributions of cluster sizes, the simultaneous elimination of both biases is not possible and multiresolution modularity is not capable to recover the planted community structure, not even when it is pronounced and easily detectable by other methods, for any value of the resolution parameter. This holds for other multiresolution techniques and it is likely to be a general problem of methods based on global optimization.

8.
PLoS One ; 6(4): e18961, 2011 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-21559480

RESUMO

Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software (http://www.oslom.org), and we believe it will be a valuable tool in the analysis of networks.


Assuntos
Estatística como Assunto/métodos , Algoritmos , Benchmarking , Análise por Conglomerados , Humanos , Internet , Modelos Estatísticos , Modelos Teóricos , Redes Neurais de Computação , Software , Meios de Transporte
9.
PLoS One ; 5(8): e11976, 2010 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-20711338

RESUMO

BACKGROUND: Community structure is one of the key properties of complex networks and plays a crucial role in their topology and function. While an impressive amount of work has been done on the issue of community detection, very little attention has been so far devoted to the investigation of communities in real networks. METHODOLOGY/PRINCIPAL FINDINGS: We present a systematic empirical analysis of the statistical properties of communities in large information, communication, technological, biological, and social networks. We find that the mesoscopic organization of networks of the same category is remarkably similar. This is reflected in several characteristics of community structure, which can be used as "fingerprints" of specific network categories. While community size distributions are always broad, certain categories of networks consist mainly of tree-like communities, while others have denser modules. Average path lengths within communities initially grow logarithmically with community size, but the growth saturates or slows down for communities larger than a characteristic size. This behaviour is related to the presence of hubs within communities, whose roles differ across categories. Also the community embeddedness of nodes, measured in terms of the fraction of links within their communities, has a characteristic distribution for each category. CONCLUSIONS/SIGNIFICANCE: Our findings, verified by the use of two fundamentally different community detection methods, allow for a classification of real networks and pave the way to a realistic modelling of networks' evolution.


Assuntos
Informática/métodos , Animais , Redes de Comunicação de Computadores , Drosophila melanogaster , Humanos , Serviços de Informação , Saccharomyces cerevisiae , Apoio Social
10.
Phys Rev E Stat Nonlin Soft Matter Phys ; 81(4 Pt 2): 046110, 2010 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-20481789

RESUMO

Nodes in real-world networks are usually organized in local modules. These groups, called communities, are intuitively defined as subgraphs with a larger density of internal connections than of external links. In this work, we define a measure aimed at quantifying the statistical significance of single communities. Extreme and order statistics are used to predict the statistics associated with individual clusters in random graphs. These distributions allows us to define one community significance as the probability that a generic clustering algorithm finds such a group in a random graph. The method is successfully applied in the case of real-world networks for the evaluation of the significance of their communities.


Assuntos
Modelos Teóricos , Animais , Benchmarking , Probabilidade
11.
Phys Rev E Stat Nonlin Soft Matter Phys ; 82(2 Pt 2): 026102, 2010 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-20866871

RESUMO

Communities are clusters of nodes with a higher than average density of internal connections. Their detection is of great relevance to better understand the structure and hierarchies present in a network. Modularity has become a standard tool in the area of community detection, providing at the same time a way to evaluate partitions and, by maximizing it, a method to find communities. In this work, we study the modularity from a combinatorial point of view. Our analysis (as the modularity definition) relies on the use of the configurational model, a technique that given a graph produces a series of randomized copies keeping the degree sequence invariant. We develop an approach that enumerates the null model partitions and can be used to calculate the probability distribution function of the modularity. Our theory allows for a deep inquiry of several interesting features characterizing modularity such as its resolution limit and the statistics of the partitions that maximize it. Additionally, the study of the probability of extremes of the modularity in the random graph partitions opens the way for a definition of the statistical significance of network partitions.

12.
Phys Rev E Stat Nonlin Soft Matter Phys ; 80(5 Pt 2): 056117, 2009 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-20365053

RESUMO

Uncovering the community structure exhibited by real networks is a crucial step toward an understanding of complex systems that goes beyond the local organization of their constituents. Many algorithms have been proposed so far, but none of them has been subjected to strict tests to evaluate their performance. Most of the sporadic tests performed so far involved small networks with known community structure and/or artificial graphs with a simplified structure, which is very uncommon in real systems. Here we test several methods against a recently introduced class of benchmark graphs, with heterogeneous distributions of degree and community size. The methods are also tested against the benchmark by Girvan and Newman [Proc. Natl. Acad. Sci. U.S.A. 99, 7821 (2002)] and on random graphs. As a result of our analysis, three recent algorithms introduced by Rosvall and Bergstrom [Proc. Natl. Acad. Sci. U.S.A. 104, 7327 (2007); Proc. Natl. Acad. Sci. U.S.A. 105, 1118 (2008)], Blondel [J. Stat. Mech.: Theory Exp. (2008), P10008], and Ronhovde and Nussinov [Phys. Rev. E 80, 016109 (2009)] have an excellent performance, with the additional advantage of low computational complexity, which enables one to analyze large systems.


Assuntos
Redes Comunitárias , Modelos Biológicos , Redes Neurais de Computação , Algoritmos , Biofísica/métodos , Análise por Conglomerados , Modelos Estatísticos , Modelos Teóricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA