Búsqueda | Portal de Búsqueda de la BVS Colombia

Analysing Time-Stamped Co-Editing Networks in Software Development Teams using git2net.

Gote, Christoph; Scholtes, Ingo; Schweitzer, Frank.

Empir Softw Eng ; 26(4): 75, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-34720670

RESUMEN

Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication from the commit history of projects. Many works in this area studied networks of co-authorship of software artefacts, neglecting detailed information on code changes and code ownership available in software repositories. To address this issue, we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. We apply our tool in two case studies using GitHub repositories of multiple Open Source as well as a proprietary software project. Specifically, we use data on more than 1.2 million commits and more than 25,000 developers to test a hypothesis on the relation between developer productivity and co-editing patterns in software teams. We argue that git2net opens up an important new source of high-resolution data on human collaboration patterns that can be used to advance theory in empirical software engineering, computational social science, and organisational studies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at (10.1007/s10664-020-09928-2).

Betweenness preference: quantifying correlations in the topological dynamics of temporal networks.

Pfitzner, René; Scholtes, Ingo; Garas, Antonios; Tessone, Claudio J; Schweitzer, Frank.

Phys Rev Lett ; 110(19): 198701, 2013 May 10.

Artículo en Inglés | MEDLINE | ID: mdl-23705746

RESUMEN

We study correlations in temporal networks and introduce the notion of betweenness preference. It allows us to quantify to what extent paths, existing in time-aggregated representations of temporal networks, are actually realizable based on the sequence of interactions. We show that betweenness preference is present in empirical temporal network data and that it influences the length of the shortest time-respecting paths. Using four different data sets, we further argue that neglecting betweenness preference leads to wrong conclusions about dynamical processes on temporal networks.

Inference of time-ordered multibody interactions.

Alvarez-Rodriguez, Unai; Petrovic, Luka V; Scholtes, Ingo.

Phys Rev E ; 108(3-1): 034312, 2023 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-37849178

RESUMEN

We introduce time-ordered multibody interactions to describe complex systems manifesting temporal as well as multibody dependencies. First, we show how the dynamics of multivariate Markov chains can be decomposed in ensembles of time-ordered multibody interactions. Then, we present an algorithm to extract those interactions from data capturing the system-level dynamics of node states and a measure to characterize the complexity of interaction ensembles. Finally, we experimentally validate the robustness of our algorithm against statistical errors and its efficiency at inferring parsimonious interaction ensembles.

Bayesian inference of transition matrices from incomplete graph data with a topological prior.

Perri, Vincenzo; Petrovic, Luka V; Scholtes, Ingo.

EPJ Data Sci ; 12(1): 48, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37840552

RESUMEN

Many network analysis and graph learning techniques are based on discrete- or continuous-time models of random walks. To apply these methods, it is necessary to infer transition matrices that formalize the underlying stochastic process in an observed graph. For weighted graphs, where weighted edges capture observations of repeated interactions between nodes, it is common to estimate the entries of such transition matrices based on the (relative) weights of edges. However in real-world settings we are often confronted with incomplete data, which turns the construction of the transition matrix based on a weighted graph into an inference problem. Moreover, we often have access to additional information, which capture topological constraints of the system, i.e. which edges in a weighted graph are (theoretically) possible and which are not. Examples include transportation networks, where we may have access to a small sample of passenger trajectories as well as the physical topology of connections, or a limited set of observed social interactions with additional information on the underlying social structure. Combining these two different sources of information to reliably infer transition matrices from incomplete data on repeated interactions is an important open challenge, with severe implications for the reliability of downstream network analysis tasks. Addressing this issue, we show that including knowledge on such topological constraints can considerably improve the inference of transition matrices, especially in situations where we only have a small number of observed interactions. To this end, we derive an analytically tractable Bayesian method that uses repeated interactions and a topological prior to perform data-efficient inference of transition matrices. We compare our approach against commonly used frequentist and Bayesian approaches both in synthetic data and in five real-world datasets, and we find that our method recovers the transition probabilities with higher accuracy. Furthermore, we demonstrate that the method is robust even in cases when the knowledge of the topological constraint is partial. Lastly, we show that this higher accuracy improves the results for downstream network analysis tasks like cluster detection and node ranking, which highlights the practical relevance of our method for interdisciplinary data-driven analyses of networked systems.

Predicting variable-length paths in networked systems using multi-order generative models.

Gote, Christoph; Casiraghi, Giona; Schweitzer, Frank; Scholtes, Ingo.

Appl Netw Sci ; 8(1): 68, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37745796

RESUMEN

Apart from nodes and links, for many networked systems, we have access to data on paths, i.e., collections of temporally ordered variable-length node sequences that are constrained by the system's topology. Understanding the patterns in such data is key to advancing our understanding of the structure and dynamics of complex systems. Moreover, the ability to accurately model and predict paths is important for engineered systems, e.g., to optimise supply chains or provide smart mobility services. Here, we introduce MOGen, a generative modelling framework that enables both next-element and out-of-sample prediction in paths with high accuracy and consistency. It features a model selection approach that automatically determines the optimal model directly from data, effectively making MOGen parameter-free. Using empirical data, we show that our method outperforms state-of-the-art sequence modelling techniques. We further introduce a mathematical formalism that links higher-order models of paths to transition matrices of random walks in multi-layer networks.

Locating community smells in software development processes using higher-order network centralities.

Gote, Christoph; Perri, Vincenzo; Zingg, Christian; Casiraghi, Giona; Arzig, Carsten; von Gernler, Alexander; Schweitzer, Frank; Scholtes, Ingo.

Soc Netw Anal Min ; 13(1): 129, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37829148

RESUMEN

Community smells are negative patterns in software development teams' interactions that impede their ability to successfully create software. Examples are team members working in isolation, lack of communication and collaboration across departments or sub-teams, or areas of the codebase where only a few team members can work on. Current approaches aim to detect community smells by analysing static network representations of software teams' interaction structures. In doing so, they are insufficient to locate community smells within development processes. Extending beyond the capabilities of traditional social network analysis, we show that higher-order network models provide a robust means of revealing such hidden patterns and complex relationships. To this end, we develop a set of centrality measures based on the MOGen higher-order network model and show their effectiveness in predicting influential nodes using five empirical datasets. We then employ these measures for a comprehensive analysis of a product team at the German IT security company genua GmbH, showcasing our method's success in identifying and locating community smells. Specifically, we uncover critical community smells in two areas of the team's development process. Semi-structured interviews with five team members validate our findings: while the team was aware of one community smell and employed measures to address it, it was not aware of the second. This highlights the potential of our approach as a robust tool for identifying and addressing community smells in software development teams. More generally, our work contributes to the social network analysis field with a powerful set of higher-order network centralities that effectively capture community dynamics and indirect relationships.

From networks to optimal higher-order models of complex systems.

Lambiotte, Renaud; Rosvall, Martin; Scholtes, Ingo.

Nat Phys ; 15(4): 313-320, 2019 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-30956684

RESUMEN

Rich data is revealing that complex dependencies between the nodes of a network may escape models based on pairwise interactions. Higher-order network models go beyond these limitations, offering new perspectives for understanding complex systems.

Quantifying the effect of editor-author relations on manuscript handling times.

Sarigöl, Emre; Garcia, David; Scholtes, Ingo; Schweitzer, Frank.

Scientometrics ; 113(1): 609-631, 2017.

Artículo en Inglés | MEDLINE | ID: mdl-29056793

RESUMEN

In this article we study to what extent the academic peer review process is influenced by social relations between the authors of a manuscript and the editor handling the manuscript. Taking the open access journal PlosOne as a case study, our analysis is based on a data set of more than 100,000 articles published between 2007 and 2015. Using available data on handling editor, submission and acceptance time of manuscripts, we study the question whether co-authorship relations between authors and the handling editor affect the manuscript handling time, i.e. the time taken between the submission and acceptance of a manuscript. Our analysis reveals (1) that editors handle papers co-authored by previous collaborators significantly more often than expected at random, and (2) that such prior co-author relations are significantly related to faster manuscript handling. Addressing the question whether these shorter manuscript handling times can be explained by the quality of publications, we study the number of citations and downloads which accepted papers eventually accumulate. Moreover, we consider the influence of additional (social) factors, such as the editor's experience, the topical similarity between authors and editors, as well as reciprocal citation relations between authors and editors. Our findings show that, even when correcting for other factors like time, experience, and performance, prior co-authorship relations have a large and significant influence on manuscript handling times, speeding up the editorial decision on average by 19 days.

Causality-driven slow-down and speed-up of diffusion in non-Markovian temporal networks.

Scholtes, Ingo; Wider, Nicolas; Pfitzner, René; Garas, Antonios; Tessone, Claudio J; Schweitzer, Frank.

Nat Commun ; 5: 5024, 2014 Sep 24.

Artículo en Inglés | MEDLINE | ID: mdl-25248462

RESUMEN

Recent research has highlighted limitations of studying complex systems with time-varying topologies from the perspective of static, time-aggregated networks. Non-Markovian characteristics resulting from the ordering of interactions in temporal networks were identified as one important mechanism that alters causality and affects dynamical processes. So far, an analytical explanation for this phenomenon and for the significant variations observed across different systems is missing. Here we introduce a methodology that allows to analytically predict causality-driven changes of diffusion speed in non-Markovian temporal networks. Validating our predictions in six data sets we show that compared with the time-aggregated network, non-Markovian characteristics can lead to both a slow-down or speed-up of diffusion, which can even outweigh the decelerating effect of community structures in the static topology. Thus, non-Markovian properties of temporal networks constitute an important additional dimension of complexity in time-varying complex systems.

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA