*Nat Ecol Evol ; 4(1): 40-45, 2020 01.*

**| MEDLINE**| ID: mdl-31844189

##### RESUMO

According to the competitive exclusion principle, species with low competitive abilities should be excluded by more efficient competitors; yet, they generally remain as rare species. Here, we describe the positive and negative spatial association networks of 326 disparate assemblages, showing a general organization pattern that simultaneously supports the primacy of competition and the persistence of rare species. Abundant species monopolize negative associations in about 90% of the assemblages. On the other hand, rare species are mostly involved in positive associations, forming small network modules. Simulations suggest that positive interactions among rare species and microhabitat preferences are the most probable mechanisms underpinning this pattern and rare species persistence. The consistent results across taxa and geography suggest a general explanation for the maintenance of biodiversity in competitive environments.

##### Assuntos

Biodiversidade , Ecologia , Geografia*Phys Rev E ; 100(5-1): 052308, 2019 Nov.*

**| MEDLINE**| ID: mdl-31869919

##### RESUMO

To understand how a complex system is organized and functions, researchers often identify communities in the system's network of interactions. Because it is practically impossible to explore all solutions to guarantee the best one, many community-detection algorithms rely on multiple stochastic searches. But for a given combination of network and stochastic algorithms, how many searches are sufficient to find a solution that is good enough? The standard approach is to pick a reasonably large number of searches and select the network partition with the highest quality or derive a consensus solution based on all network partitions. However, if different partitions have similar qualities such that the solution landscape is degenerate, the single best partition may miss relevant information, and a consensus solution may blur complementary communities. Here we address this degeneracy problem with coarse-grained descriptions of the solution landscape. We cluster network partitions based on their similarity and suggest an approach to determine the minimum number of searches required to describe the solution landscape adequately. To make good use of all partitions, we also propose different ways to explore the solution landscape, including a significance clustering procedure. We test these approaches on synthetic networks and a real-world network using two contrasting community-detection algorithms: The algorithm that can identify more general structures requires more searches, and networks with clearer community structures require fewer searches. We also find that exploring the coarse-grained solution landscape can reveal complementary solutions and enable more reliable community detection.

*Ecol Lett ; 22(8): 1297-1305, 2019 Aug.*

**| MEDLINE**| ID: mdl-31190431

##### RESUMO

Zoogeographical regions, or zooregions, are areas of the Earth defined by species pools that reflect ecological, historical and evolutionary processes acting over millions of years. Consequently, researchers have assumed that zooregions are robust and unlikely to change on a human timescale. However, the increasing number of human-mediated introductions and extinctions can challenge this assumption. By delineating zooregions with a network-based algorithm, here we show that introductions and extinctions are altering the zooregions we know today. Introductions are homogenising the Eurasian and African mammal zooregions and also triggering less intuitive effects in birds and amphibians, such as dividing and redefining zooregions representing the Old and New World. Furthermore, these Old and New World amphibian zooregions are no longer detected when considering introductions plus extinctions of the most threatened species. Our findings highlight the profound and far-reaching impact of human activity and call for identifying and protecting the uniqueness of biotic assemblages.

##### Assuntos

Anfíbios , Aves , Espécies em Perigo de Extinção , Atividades Humanas , Animais , Biodiversidade , Conservação dos Recursos Naturais , Extinção Biológica , Humanos , Mamíferos*Nat Phys ; 15(4): 313-320, 2019 Apr.*

**| MEDLINE**| ID: mdl-30956684

##### RESUMO

Rich data is revealing that complex dependencies between the nodes of a network may escape models based on pairwise interactions. Higher-order network models go beyond these limitations, offering new perspectives for understanding complex systems.

*Phys Rev E ; 97(6-1): 062312, 2018 Jun.*

**| MEDLINE**| ID: mdl-30011557

##### RESUMO

Many real-world networks represent dynamic systems with interactions that change over time, often in uncoordinated ways and at irregular intervals. For example, university students connect in intermittent groups that repeatedly form and dissolve based on multiple factors, including their lectures, interests, and friends. Such dynamic systems can be represented as multilayer networks where each layer represents a snapshot of the temporal network. In this representation, it is crucial that the links between layers accurately capture real dependencies between those layers. Often, however, these dependencies are unknown. Therefore, current methods connect layers based on simplistic assumptions that do not capture node-level layer dependencies. For example, connecting every node to itself in other layers with the same weight can wipe out dependencies between intermittent groups, making it difficult or even impossible to identify them. In this paper, we present a principled approach to estimating node-level layer dependencies based on the network structure within each layer. We implement our node-level coupling method in the community detection framework Infomap and demonstrate its performance compared to current methods on synthetic and real temporal networks. We show that our approach more effectively constrains information inside multilayer communities so that Infomap can better recover planted groups in multilayer benchmark networks that represent multiple modes with different groups and better identify intermittent communities in real temporal contact networks. These results suggest that node-level layer coupling can improve the modeling of information spreading in temporal networks and better capture intermittent community structure.

*Nat Commun ; 8(1): 582, 2017 09 19.*

**| MEDLINE**| ID: mdl-28928409

##### RESUMO

In evolving complex systems such as air traffic and social organisations, collective effects emerge from their many components' dynamic interactions. While the dynamic interactions can be represented by temporal networks with nodes and links that change over time, they remain highly complex. It is therefore often necessary to use methods that extract the temporal networks' large-scale dynamic community structure. However, such methods are subject to overfitting or suffer from effects of arbitrary, a priori-imposed timescales, which should instead be extracted from data. Here we simultaneously address both problems and develop a principled data-driven method that determines relevant timescales and identifies patterns of dynamics that take place on networks, as well as shape the networks themselves. We base our method on an arbitrary-order Markov chain model with community structure, and develop a nonparametric Bayesian inference framework that identifies the simplest such model that can explain temporal interaction data.The description of temporal networks is usually simplified in terms of their dynamic community structures, whose identification however relies on a priori assumptions. Here the authors present a data-driven method that determines relevant timescales for the dynamics and uses it to identify communities.

##### Assuntos

Modelos Estatísticos , Algoritmos , Teorema de Bayes , Cadeias de Markov , Características de Residência*Syst Biol ; 66(2): 197-204, 2017 Mar 01.*

**| MEDLINE**| ID: mdl-27694311

##### RESUMO

Biogeographical regions (bioregions) reveal how different sets of species are spatially grouped and therefore are important units for conservation, historical biogeography, ecology, and evolution. Several methods have been developed to identify bioregions based on species distribution data rather than expert opinion. One approach successfully applies network theory to simplify and highlight the underlying structure in species distributions. However, this method lacks tools for simple and efficient analysis. Here, we present Infomap Bioregions, an interactive web application that inputs species distribution data and generates bioregion maps. Species distributions may be provided as georeferenced point occurrences or range maps, and can be of local, regional, or global scale. The application uses a novel adaptive resolution method to make best use of often incomplete species distribution data. The results can be downloaded as vector graphics, shapefiles, or in table format. We validate the tool by processing large data sets of publicly available species distribution data of the world's amphibians using species ranges, and mammals using point occurrences. We then calculate the fit between the inferred bioregions and WWF ecoregions. As examples of applications, researchers can reconstruct ancestral ranges in historical biogeography or identify indicator species for targeted conservation. [Biogeography; bioregionalization; conservation; mapping].

##### Assuntos

Distribuição Animal , Conservação dos Recursos Naturais/métodos , Ecologia/métodos , Filogeografia/métodos , Anfíbios/fisiologia , Animais , Internet , Mamíferos/fisiologia , Mapas como Assunto , Filogenia , Software*Appl Netw Sci ; 2(1): 4, 2017.*

**| MEDLINE**| ID: mdl-30533512

##### RESUMO

Community detection, the decomposition of a graph into essential building blocks, has been a core research topic in network science over the past years. Since a precise notion of what constitutes a community has remained evasive, community detection algorithms have often been compared on benchmark graphs with a particular form of assortative community structure and classified based on the mathematical techniques they employ. However, this comparison can be misleading because apparent similarities in their mathematical machinery can disguise different goals and reasons for why we want to employ community detection in the first place. Here we provide a focused review of these different motivations that underpin community detection. This problem-driven classification is useful in applied network science, where it is important to select an appropriate algorithm for the given purpose. Moreover, highlighting the different facets of community detection also delineates the many lines of research and points out open directions and avenues for future research.

*Phys Rev E ; 93(3): 032309, 2016 Mar.*

**| MEDLINE**| ID: mdl-27078368

##### RESUMO

Community detection of network flows conventionally assumes one-step dynamics on the links. For sparse networks and interest in large-scale structures, longer timescales may be more appropriate. Oppositely, for large networks and interest in small-scale structures, shorter timescales may be better. However, current methods for analyzing networks at different timescales require expensive and often infeasible network reconstructions. To overcome this problem, we introduce a method that takes advantage of the inner workings of the map equation and evades the reconstruction step. This makes it possible to efficiently analyze large networks at different Markov times with no extra overhead cost. The method also evades the costly unipartite projection for identifying flow modules in bipartite networks.

*Phys Rev E Stat Nonlin Soft Matter Phys ; 91(1): 012809, 2015 Jan.*

**| MEDLINE**| ID: mdl-25679659

##### RESUMO

A community detection algorithm is considered to have a resolution limit if the scale of the smallest modules that can be resolved depends on the size of the analyzed subnetwork. The resolution limit is known to prevent some community detection algorithms from accurately identifying the modular structure of a network. In fact, any global objective function for measuring the quality of a two-level assignment of nodes into modules must have some sort of resolution limit or an external resolution parameter. However, it is yet unknown how the resolution limit affects the so-called map equation, which is known to be an efficient objective function for community detection. We derive an analytical estimate and conclude that the resolution limit of the map equation is set by the total number of links between modules instead of the total number of links in the full network as for modularity. This mechanism makes the resolution limit much less restrictive for the map equation than for modularity; in practice, it is orders of magnitudes smaller. Furthermore, we argue that the effect of the resolution limit often results from shoehorning multilevel modular structures into two-level descriptions. As we show, the hierarchical map equation effectively eliminates the resolution limit for networks with nested multilevel modular structures.

*Nat Commun ; 5: 4630, 2014 Aug 11.*

**| MEDLINE**| ID: mdl-25109694

##### RESUMO

Random walks on networks is the standard tool for modelling spreading processes in social and biological systems. This first-order Markov approach is used in conventional community detection, ranking and spreading analysis, although it ignores a potentially important feature of the dynamics: where flow moves to may depend on where it comes from. Here we analyse pathways from different systems, and although we only observe marginal consequences for disease spreading, we show that ignoring the effects of second-order Markov dynamics has important consequences for community detection, ranking and information spreading. For example, capturing dynamics with a second-order Markov model allows us to reveal actual travel patterns in air traffic and to uncover multidisciplinary journals in scientific communication. These findings were achieved only by using more available data and making no additional assumptions, and therefore suggest that accounting for higher-order memory in network flows can help us better understand how real systems are organized and function.

##### Assuntos

Surtos de Doenças , Métodos Epidemiológicos , Teoria da Informação , Cadeias de Markov , Transportes , Algoritmos , Humanos , Disseminação de Informação , Modelos Estatísticos , Probabilidade , Estados Unidos*PLoS One ; 9(7): e103006, 2014.*

**| MEDLINE**| ID: mdl-25068302

##### RESUMO

Although the understanding of and motivation behind individual trading behavior is an important puzzle in finance, little is known about the connection between an investor's portfolio structure and her trading behavior in practice. In this paper, we investigate the relation between what stocks investors hold, and what stocks they buy, and show that investors with similar portfolio structures to a great extent trade in a similar way. With data from the central register of shareholdings in Sweden, we model the market in a similarity network, by considering investors as nodes, connected with links representing portfolio similarity. From the network, we find investor groups that not only identify different investment strategies, but also represent individual investors trading in a similar way. These findings suggest that the stock portfolios of investors hold meaningful information, which could be used to earn a better understanding of stock market dynamics.

##### Assuntos

Investimentos em Saúde , Modelos Teóricos*PLoS One ; 8(1): e53943, 2013.*

**| MEDLINE**| ID: mdl-23372677

##### RESUMO

Community detection helps us simplify the complex configuration of networks, but communities are reliable only if they are statistically significant. To detect statistically significant communities, a common approach is to resample the original network and analyze the communities. But resampling assumes independence between samples, while the components of a network are inherently dependent. Therefore, we must understand how breaking dependencies between resampled components affects the results of the significance analysis. Here we use scientific communication as a model system to analyze this effect. Our dataset includes citations among articles published in journals in the years 1984-2010. We compare parametric resampling of citations with non-parametric article resampling. While citation resampling breaks link dependencies, article resampling maintains such dependencies. We find that citation resampling underestimates the variance of link weights. Moreover, this underestimation explains most of the differences in the significance analysis of ranking and clustering. Therefore, when only link weights are available and article resampling is not an option, we suggest a simple parametric resampling scheme that generates link-weight variances close to the link-weight variances of article resampling. Nevertheless, when we highlight and summarize important structural changes in science, the more dependencies we can maintain in the resampling scheme, the earlier we can predict structural change.

##### Assuntos

Modelos Estatísticos , Editoração/estatística & dados numéricos , Bibliometria , Análise por Conglomerados , Humanos*PLoS One ; 7(6): e39461, 2012.*

**| MEDLINE**| ID: mdl-22724019

##### RESUMO

BACKGROUND: The additional clinical value of clustering cardiovascular risk factors to define the metabolic syndrome (MetS) is still under debate. However, it is unclear which cardiovascular risk factors tend to cluster predominately and how individual risk factor states change over time. METHODS AND RESULTS: We used data from 3,187 individuals aged 20-79 years from the population-based Study of Health in Pomerania for a network-based approach to visualize clustered MetS risk factor states and their change over a five-year follow-up period. MetS was defined by harmonized Adult Treatment Panel III criteria, and each individual's risk factor burden was classified according to the five MetS components at baseline and follow-up. We used the map generator to depict 32 (2(5)) different states and highlight the most important transitions between the 1,024 (32(2)) possible states in the weighted directed network. At baseline, we found the largest fraction (19.3%) of all individuals free of any MetS risk factors and identified hypertension (15.4%) and central obesity (6.3%), as well as their combination (19.0%), as the most common MetS risk factors. Analyzing risk factor flow over the five-year follow-up, we found that most individuals remained in their risk factor state and that low high-density lipoprotein cholesterol (HDL) (6.3%) was the most prominent additional risk factor beyond hypertension and central obesity. Also among individuals without any MetS risk factor at baseline, low HDL (3.5%), hypertension (2.1%), and central obesity (1.6%) were the first risk factors to manifest during follow-up. CONCLUSIONS: We identified hypertension and central obesity as the predominant MetS risk factor cluster and low HDL concentrations as the most prominent new onset risk factor.

##### Assuntos

Síndrome Metabólica/complicações , Síndrome Metabólica/epidemiologia , Adulto , Idoso , Doenças Cardiovasculares/complicações , Estudos de Coortes , Progressão da Doença , Feminino , Seguimentos , Humanos , Masculino , Pessoa de Meia-Idade , Prevalência , Fatores de Risco*PLoS One ; 7(3): e33721, 2012.*

**| MEDLINE**| ID: mdl-22479433

##### RESUMO

Researchers use community-detection algorithms to reveal large-scale organization in biological and social networks, but community detection is useful only if the communities are significant and not a result of noisy data. To assess the statistical significance of the network communities, or the robustness of the detected structure, one approach is to perturb the network structure by removing links and measure how much the communities change. However, perturbing sparse networks is challenging because they are inherently sensitive; they shatter easily if links are removed. Here we propose a simple method to perturb sparse networks and assess the significance of their communities. We generate resampled networks by adding extra links based on local information, then we aggregate the information from multiple resampled networks to find a coarse-grained description of significant clusters. In addition to testing our method on benchmark networks, we use our method on the sparse network of the European Court of Justice (ECJ) case law, to detect significant and insignificant areas of law. We use our significance analysis to draw a map of the ECJ case law network that reveals the relations between the areas of law.

##### Assuntos

Algoritmos , Modelos Teóricos , Análise por Conglomerados , União Europeia , Legislação como Assunto*PLoS One ; 6(4): e18209, 2011 Apr 08.*

**| MEDLINE**| ID: mdl-21494658

##### RESUMO

To comprehend the hierarchical organization of large integrated systems, we introduce the hierarchical map equation, which reveals multilevel structures in networks. In this information-theoretic approach, we exploit the duality between compression and pattern detection; by compressing a description of a random walker as a proxy for real flow on a network, we find regularities in the network that induce this system-wide flow. Finding the shortest multilevel description of the random walker therefore gives us the best hierarchical clustering of the network--the optimal number of levels and modular partition at each level--with respect to the dynamics on the network. With a novel search algorithm, we extract and illustrate the rich multilevel organization of several large social and biological networks. For example, from the global air traffic network we uncover countries and continents, and from the pattern of scientific communication we reveal more than 100 scientific fields organized in four major disciplines: life sciences, physical sciences, ecology and earth sciences, and social sciences. In general, we find shallow hierarchical structures in globally interconnected systems, such as neural networks, and rich multilevel organizations in systems with highly separated regions, such as road networks.

##### Assuntos

Análise por Conglomerados , Compressão de Dados , Análise Multinível , Ciência*PLoS One ; 5(1): e8694, 2010 Jan 27.*

**| MEDLINE**| ID: mdl-20111700

##### RESUMO

Change is a fundamental ingredient of interaction patterns in biology, technology, the economy, and science itself: Interactions within and between organisms change; transportation patterns by air, land, and sea all change; the global financial flow changes; and the frontiers of scientific research change. Networks and clustering methods have become important tools to comprehend instances of these large-scale structures, but without methods to distinguish between real trends and noisy data, these approaches are not useful for studying how networks change. Only if we can assign significance to the partitioning of single networks can we distinguish meaningful structural changes from random fluctuations. Here we show that bootstrap resampling accompanied by significance clustering provides a solution to this problem. To connect changing structures with the changing function of networks, we highlight and summarize the significant structural changes with alluvial diagrams and realize de Solla Price's vision of mapping change in science: studying the citation pattern between about 7000 scientific journals over the past decade, we find that neuroscience has transformed from an interdisciplinary specialty to a mature and stand-alone discipline.

##### Assuntos

Modelos Teóricos , Análise por Conglomerados*Proc Natl Acad Sci U S A ; 105(4): 1118-23, 2008 Jan 29.*

**| MEDLINE**| ID: mdl-18216267

##### RESUMO

To comprehend the multipartite organization of large-scale biological and social systems, we introduce an information theoretic approach that reveals community structure in weighted and directed networks. We use the probability flow of random walks on a network as a proxy for information flows in the real system and decompose the network into modules by compressing a description of the probability flow. The result is a map that both simplifies and highlights the regularities in the structure and their relationships. We illustrate the method by making a map of scientific communication as captured in the citation patterns of >6,000 journals. We discover a multicentric organization with fields that vary dramatically in size and degree of integration into the network of science. Along the backbone of the network-including physics, chemistry, molecular biology, and medicine-information flows bidirectionally, but the map reveals a directional pattern of citation from the applied fields to the basic sciences.

##### Assuntos

Bibliometria , Pesquisa Biomédica/métodos , Teoria da Informação , Publicações Periódicas como Assunto , Ciência/métodos , Pesquisa Biomédica/instrumentação , Pesquisa Biomédica/estatística & dados numéricos , Cadeias de Markov , Ciência/instrumentação , Ciência/estatística & dados numéricos*Proc Natl Acad Sci U S A ; 104(18): 7327-31, 2007 May 01.*

**| MEDLINE**| ID: mdl-17452639

##### RESUMO

To understand the structure of a large-scale biological, social, or technological network, it can be helpful to decompose the network into smaller subunits or modules. In this article, we develop an information-theoretic foundation for the concept of modularity in networks. We identify the modules of which the network is composed by finding an optimal compression of its topology, capitalizing on regularities in its structure. We explain the advantages of this approach and illustrate them by partitioning a number of real-world and model networks.

##### Assuntos

Modelos Biológicos , Biologia Computacional*Phys Rev E Stat Nonlin Soft Matter Phys ; 74(3 Pt 2): 036119, 2006 Sep.*

**| MEDLINE**| ID: mdl-17025720

##### RESUMO

We generalize the degree-organizational view of real-world networks with broad degree distributions in a landscape analog with mountains (high-degree nodes) and valleys (low-degree nodes). For example, correlated degrees between adjacent nodes correspond to smooth landscapes (social networks), hierarchical networks to one-mountain landscapes (the Internet), and degree-disassortative networks without hierarchical features to rough landscapes with several mountains. To quantify the topology, we here measure the widths of the mountains and the separation between different mountains. We also generate ridge landscapes to model networks organized under constraints imposed by the space the networks are embedded in, associated to spatial or in molecular networks to functional localization.