*Phys Rev Lett ; 123(12): 128301, 2019 Sep 20.*

**| MEDLINE**| ID: mdl-31633974

##### RESUMO

We present a scalable nonparametric Bayesian method to perform network reconstruction from observed functional behavior that at the same time infers the communities present in the network. We show that the joint reconstruction with community detection has a synergistic effect, where the edge correlations used to inform the existence of communities are also inherently used to improve the accuracy of the reconstruction which, in turn, can better inform the uncovering of communities. We illustrate the use of our method with observations arising from epidemic models and the Ising model, both on synthetic and empirical networks, as well as on data containing only functional information.

*Sci Adv ; 4(7): eaaq1360, 2018 07.*

**| MEDLINE**| ID: mdl-30035215

##### RESUMO

One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach that infers the latent topical structure of a collection of documents. Despite their success-particularly of the most widely used variant called latent Dirichlet allocation (LDA)-and numerous applications in sociology, history, and linguistics, topic models are known to suffer from severe conceptual and practical problems, for example, a lack of justification for the Bayesian priors, discrepancies with statistical properties of real texts, and the inability to properly choose the number of topics. We obtain a fresh view of the problem of identifying topical structures by relating it to the problem of finding communities in complex networks. We achieve this by representing text corpora as bipartite networks of documents and words. By adapting existing community-detection methods (using a stochastic block model (SBM) with nonparametric priors), we obtain a more versatile and principled framework for topic modeling (for example, it automatically detects the number of topics and hierarchically clusters both the words and documents). The analysis of artificial and real corpora demonstrates that our SBM approach leads to better topic models than LDA in terms of statistical model selection. Our work shows how to formally relate methods from community detection and topic modeling, opening the possibility of cross-fertilization between these two fields.

*Phys Rev E ; 97(6-1): 062316, 2018 Jun.*

**| MEDLINE**| ID: mdl-30011606

##### RESUMO

A principled approach to understand network structures is to formulate generative models. Given a collection of models, however, an outstanding key task is to determine which one provides a more accurate description of the network at hand, discounting statistical fluctuations. This problem can be approached using two principled criteria that at first may seem equivalent: selecting the most plausible model in terms of its posterior probability; or selecting the model with the highest predictive performance in terms of identifying missing links. Here we show that while these two approaches yield consistent results in most cases, there are also notable instances where they do not, that is, where the most plausible model is not the most predictive. We show that in the latter case the improvement of predictive performance can in fact lead to overfitting both in artificial and empirical settings. Furthermore, we show that, in general, the predictive performance is higher when we average over collections of models that are individually less plausible than when we consider only the single most plausible model.

*Phys Rev E ; 97(1-1): 012306, 2018 Jan.*

**| MEDLINE**| ID: mdl-29448436

##### RESUMO

We present a Bayesian formulation of weighted stochastic block models that can be used to infer the large-scale modular structure of weighted networks, including their hierarchical organization. Our method is nonparametric, and thus does not require the prior knowledge of the number of groups or other dimensions of the model, which are instead inferred from data. We give a comprehensive treatment of different kinds of edge weights (i.e., continuous or discrete, signed or unsigned, bounded or unbounded), as well as arbitrary weight transformations, and describe an unsupervised model selection approach to choose the best network description. We illustrate the application of our method to a variety of empirical weighted networks, such as global migrations, voting patterns in congress, and neural connections in the human brain.

*Nat Commun ; 8(1): 582, 2017 09 19.*

**| MEDLINE**| ID: mdl-28928409

##### RESUMO

In evolving complex systems such as air traffic and social organisations, collective effects emerge from their many components' dynamic interactions. While the dynamic interactions can be represented by temporal networks with nodes and links that change over time, they remain highly complex. It is therefore often necessary to use methods that extract the temporal networks' large-scale dynamic community structure. However, such methods are subject to overfitting or suffer from effects of arbitrary, a priori-imposed timescales, which should instead be extracted from data. Here we simultaneously address both problems and develop a principled data-driven method that determines relevant timescales and identifies patterns of dynamics that take place on networks, as well as shape the networks themselves. We base our method on an arbitrary-order Markov chain model with community structure, and develop a nonparametric Bayesian inference framework that identifies the simplest such model that can explain temporal interaction data.The description of temporal networks is usually simplified in terms of their dynamic community structures, whose identification however relies on a priori assumptions. Here the authors present a data-driven method that determines relevant timescales for the dynamics and uses it to identify communities.

##### Assuntos

Modelos Estatísticos , Algoritmos , Teorema de Bayes , Cadeias de Markov , Características de Residência*Phys Rev E ; 95(1-1): 012317, 2017 Jan.*

**| MEDLINE**| ID: mdl-28208453

##### RESUMO

A principled approach to characterize the hidden structure of networks is to formulate generative models and then infer their parameters from data. When the desired structure is composed of modules or "communities," a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints, i.e., the generated networks are not allowed to violate the patterns imposed by the model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: (1) deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, which not only remove limitations that seriously degrade the inference on large networks but also reveal structures at multiple scales; (2) a very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges but also with an unlimited number of modules. We show also how this approach can be used to sample modular hierarchies from the posterior distribution, as well as to perform model selection. We discuss and analyze the differences between sampling from the posterior and simply finding the single parameter estimate that maximizes it. Furthermore, we expose a direct equivalence between our microcanonical approach and alternative derivations based on the canonical SBM.

*Phys Rev E ; 95(1-2): 019904, 2017 Jan.*

**| MEDLINE**| ID: mdl-28212045

##### RESUMO

This corrects the article DOI: 10.1103/PhysRevE.95.012317.

*Phys Rev E Stat Nonlin Soft Matter Phys ; 92(4): 042807, 2015 Oct.*

**| MEDLINE**| ID: mdl-26565289

##### RESUMO

Many network systems are composed of interdependent but distinct types of interactions, which cannot be fully understood in isolation. These different types of interactions are often represented as layers, attributes on the edges, or as a time dependence of the network structure. Although they are crucial for a more comprehensive scientific understanding, these representations offer substantial challenges. Namely, it is an open problem how to precisely characterize the large or mesoscale structure of network systems in relation to these additional aspects. Furthermore, the direct incorporation of these features invariably increases the effective dimension of the network description, and hence aggravates the problem of overfitting, i.e., the use of overly complex characterizations that mistake purely random fluctuations for actual structure. In this work, we propose a robust and principled method to tackle these problems, by constructing generative models of modular network structure, incorporating layered, attributed and time-varying properties, as well as a nonparametric Bayesian methodology to infer the parameters from data and select the most appropriate model according to statistical evidence. We show that the method is capable of revealing hidden structure in layered, edge-valued, and time-varying networks, and that the most appropriate level of granularity with respect to the additional dimensions can be reliably identified. We illustrate our approach on a variety of empirical systems, including a social network of physicians, the voting correlations of deputies in the Brazilian national congress, the global airport network, and a proximity network of high-school students.

##### Assuntos

Modelos Teóricos , Aeroportos , Brasil , Humanos , Médicos/psicologia , Política , Apoio Social , Estudantes , Tempo*Phys Rev Lett ; 115(18): 188701, 2015 Oct 30.*

**| MEDLINE**| ID: mdl-26565509

##### RESUMO

The statistical significance of network properties is conditioned on null models which satisfy specified properties but that are otherwise random. Exponential random graph models are a principled theoretical framework to generate such constrained ensembles, but which often fail in practice, either due to model inconsistency or due to the impossibility to sample networks from them. These problems affect the important case of networks with prescribed clustering coefficient or number of small connected subgraphs (motifs). In this Letter we use the Wang-Landau method to obtain a multicanonical sampling that overcomes both these problems. We sample, in polynomial time, networks with arbitrary degree sequences from ensembles with imposed motifs counts. Applying this method to social networks, we investigate the relation between transitivity and homophily, and we quantify the correlation between different types of motifs, finding that single motifs can explain up to 60% of the variation of motif profiles.

*Phys Rev Lett ; 115(8): 088701, 2015 Aug 21.*

**| MEDLINE**| ID: mdl-26340218

##### RESUMO

A substantial volume of research is devoted to studies of community structure in networks, but communities are not the only possible form of large-scale network structure. Here, we describe a broad extension of community structure that encompasses traditional communities but includes a wide range of generalized structural patterns as well. We describe a principled method for detecting this generalized structure in empirical network data and demonstrate with real-world examples how it can be used to learn new things about the shape and meaning of networks.

*PLoS One ; 9(9): e108215, 2014.*

**| MEDLINE**| ID: mdl-25250565

##### RESUMO

We investigate the trade-off between the robustness against random and targeted removal of nodes from a network. To this end we utilize the stochastic block model to study ensembles of infinitely large networks with arbitrary large-scale structures. We present results from numerical two-objective optimization simulations for networks with various fixed mean degree and number of blocks. The results provide strong evidence that three different blocks are sufficient to realize the best trade-off between the two measures of robustness, i.e. to obtain the complete front of Pareto-optimal networks. For all values of the mean degree, a characteristic three block structure emerges over large parts of the Pareto-optimal front. This structure can be often characterized as a core-periphery structure, composed of a group of core nodes with high degree connected among themselves and to a periphery of low-degree nodes, in addition to a third group of nodes which is disconnected from the periphery, and weakly connected to the core. Only at both extremes of the Pareto-optimal front, corresponding to maximal robustness against random and targeted node removal, a two-block core-periphery structure or a one-block fully random network are found, respectively.

##### Assuntos

Algoritmos , Simulação por Computador , Cibernética , Processos Estocásticos*Phys Rev E Stat Nonlin Soft Matter Phys ; 89(1): 012804, 2014 Jan.*

**| MEDLINE**| ID: mdl-24580278

##### RESUMO

We present an efficient algorithm for the inference of stochastic block models in large networks. The algorithm can be used as an optimized Markov chain Monte Carlo (MCMC) method, with a fast mixing time and a much reduced susceptibility to getting trapped in metastable states, or as a greedy agglomerative heuristic, with an almost linear O(Nln2N) complexity, where N is the number of nodes in the network, independent of the number of blocks being inferred. We show that the heuristic is capable of delivering results which are indistinguishable from the more exact and numerically expensive MCMC method in many artificial and empirical networks, despite being much faster. The method is entirely unbiased towards any specific mixing pattern, and in particular it does not favor assortative community structures.

*PLoS One ; 8(12): e80303, 2013.*

**| MEDLINE**| ID: mdl-24324594

##### RESUMO

We introduce a model for the adaptive evolution of a network of company ownerships. In a recent work it has been shown that the empirical global network of corporate control is marked by a central, tightly connected "core" made of a small number of large companies which control a significant part of the global economy. Here we show how a simple, adaptive "rich get richer" dynamics can account for this characteristic, which incorporates the increased buying power of more influential companies, and in turn results in even higher control. We conclude that this kind of centralized structure can emerge without it being an explicit goal of these companies, or as a result of a well-organized strategy.

##### Assuntos

Modelos Estatísticos , Propriedade , Corporações Profissionais/estatística & dados numéricos , Humanos , Cultura Organizacional , Corporações Profissionais/organização & administração*Phys Rev Lett ; 111(9): 098701, 2013 Aug 30.*

**| MEDLINE**| ID: mdl-24033075

##### RESUMO

A large variety of dynamical processes that take place on networks can be expressed in terms of the spectral properties of some linear operator which reflects how the dynamical rules depend on the network topology. Often, such spectral features are theoretically obtained by considering only local node properties, such as degree distributions. Many networks, however, possess large-scale modular structures that can drastically influence their spectral characteristics and which are neglected in such simplified descriptions. Here, we obtain in a unified fashion the spectrum of a large family of operators, including the adjacency, Laplacian, and normalized Laplacian matrices, for networks with generic modular structure, in the limit of large degrees. We focus on the conditions necessary for the merging of the isolated eigenvalues with the continuous band of the spectrum, after which the planted modular structure can no longer be easily detected by spectral methods. This is a crucial transition point which determines when a modular structure is strong enough to affect a given dynamical process. We show that this transition happens in general at different points for the different matrices, and hence the detectability threshold can vary significantly, depending on the operator chosen. Equivalently, the sensitivity to the modular structure of the different dynamical processes associated with each matrix will be different, given the same large-scale structure present in the network. Furthermore, we show that, with the exception of the Laplacian matrix, the different transitions coalesce into the same point for the special case where the modules are homogeneous but separate otherwise.

*Phys Rev Lett ; 110(14): 148701, 2013 Apr 05.*

**| MEDLINE**| ID: mdl-25167049

##### RESUMO

We investigate the detectability of modules in large networks when the number of modules is not known in advance. We employ the minimum description length principle which seeks to minimize the total amount of information required to describe the network, and avoid overfitting. According to this criterion, we obtain general bounds on the detectability of any prescribed block structure, given the number of nodes and edges in the sampled network. We also obtain that the maximum number of detectable blocks scales as sqrt[N], where N is the number of nodes in the network, for a fixed average degree ⟨k⟩. We also show that the simplicity of the minimum description length approach yields an efficient multilevel Monte Carlo inference algorithm with a complexity of O(τNlogN), if the number of blocks is unknown, and O(τN) if it is known, where τ is the mixing time of the Markov chain. We illustrate the application of the method on a large network of actors and films with over 10(6) edges, and a dissortative, bipartite block structure.

*Phys Rev Lett ; 108(21): 218702, 2012 May 25.*

**| MEDLINE**| ID: mdl-23003311

##### RESUMO

We investigate the dynamics of a trust game on a mixed population, where individuals with the role of buyers are forced to play against a predetermined number of sellers whom they choose dynamically. Agents with the role of sellers are also allowed to adapt the level of value for money of their products, based on payoff. The dynamics undergoes a transition at a specific value of the strategy update rate, above which an emergent cartel organization is observed, where sellers have similar values of below-optimal value for money. This cartel organization is not due to an explicit collusion among agents; instead, it arises spontaneously from the maximization of the individual payoffs. This dynamics is marked by large fluctuations and a high degree of unpredictability for most of the parameter space and serves as a plausible qualitative explanation for observed elevated levels and fluctuations of certain commodity prices.

*Phys Rev E Stat Nonlin Soft Matter Phys ; 85(5 Pt 2): 056122, 2012 May.*

**| MEDLINE**| ID: mdl-23004836

##### RESUMO

Stochastic blockmodels are generative network models where the vertices are separated into discrete groups, and the probability of an edge existing between two vertices is determined solely by their group membership. In this paper, we derive expressions for the entropy of stochastic blockmodel ensembles. We consider several ensemble variants, including the traditional model as well as the newly introduced degree-corrected version [Karrer et al., Phys. Rev. E 83, 016107 (2011)], which imposes a degree sequence on the vertices, in addition to the block structure. The imposed degree sequence is implemented both as "soft" constraints, where only the expected degrees are imposed, and as "hard" constraints, where they are required to be the same on all samples of the ensemble. We also consider generalizations to multigraphs and directed graphs. We illustrate one of many applications of this measure by directly deriving a log-likelihood function from the entropy expression, and using it to infer latent block structure in observed data. Due to the general nature of the ensembles considered, the method works well for ensembles with intrinsic degree correlations (i.e., with entropic origin) as well as extrinsic degree correlations, which go beyond the block structure.

*Phys Rev Lett ; 109(11): 118703, 2012 Sep 14.*

**| MEDLINE**| ID: mdl-23005691

##### RESUMO

We model the robustness against random failure or an intentional attack of networks with an arbitrary large-scale structure. We construct a block-based model which incorporates--in a general fashion--both connectivity and interdependence links, as well as arbitrary degree distributions and block correlations. By optimizing the percolation properties of this general class of networks, we identify a simple core-periphery structure as the topology most robust against random failure. In such networks, a distinct and small "core" of nodes with higher degree is responsible for most of the connectivity, functioning as a central "backbone" of the system. This centralized topology remains the optimal structure when other constraints are imposed, such as a given fraction of interdependence links and fixed degree distributions. This distinguishes simple centralized topologies as the most likely to emerge, when robustness against failure is the dominant evolutionary force.

*Phys Rev E Stat Nonlin Soft Matter Phys ; 85(4 Pt 1): 041908, 2012 Apr.*

**| MEDLINE**| ID: mdl-22680499

##### RESUMO

We investigate the evolution of Boolean networks subject to a selective pressure which favors robustness against noise, as a model of evolved genetic regulatory systems. By mapping the evolutionary process into a statistical ensemble and minimizing its associated free energy, we find the structural properties which emerge as the selective pressure is increased and identify a phase transition from a random topology to a "segregated-core" structure, where a smaller and more densely connected subset of the nodes is responsible for most of the regulation in the network. This segregated structure is very similar qualitatively to what is found in gene regulatory networks, where only a much smaller subset of genes--those responsible for transcription factors-is responsible for global regulation. We obtain the full phase diagram of the evolutionary process as a function of selective pressure and the average number of inputs per node. We compare the theoretical predictions with Monte Carlo simulations of evolved networks and with empirical data for Saccharomyces cerevisiae and Escherichia coli.

##### Assuntos

Regulação da Expressão Gênica/genética , Modelos Genéticos , Modelos Estatísticos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Animais , Simulação por Computador , Humanos , Transição de Fase*PLoS One ; 6(4): e18384, 2011 Apr 05.*

**| MEDLINE**| ID: mdl-21483683

##### RESUMO

Non-centralized recommendation-based decision making is a central feature of several social and technological processes, such as market dynamics, peer-to-peer file-sharing and the web of trust of digital certification. We investigate the properties of trust propagation on networks, based on a simple metric of trust transitivity. We investigate analytically the percolation properties of trust transitivity in random networks with arbitrary in/out-degree distributions, and compare with numerical realizations. We find that the existence of a non-zero fraction of absolute trust (i.e. entirely confident trust) is a requirement for the viability of global trust propagation in large systems: The average pair-wise trust is marked by a discontinuous transition at a specific fraction of absolute trust, below which it vanishes. Furthermore, we perform an extensive analysis of the Pretty Good Privacy (PGP) web of trust, in view of the concepts introduced. We compare different scenarios of trust distribution: community- and authority-centered. We find that these scenarios lead to sharply different patterns of trust propagation, due to the segregation of authority hubs and densely-connected communities. While the authority-centered scenario is more efficient, and leads to higher average trust values, it favours weakly-connected "fringe" nodes, which are directly trusted by authorities. The community-centered scheme, on the other hand, favours nodes with intermediate in/out-degrees, in detriment of the authorities and its "fringe" peers.