RESUMO
DNA methylation provides one of the most widely studied biomarkers of ageing. Since the methylation of CpG dinucleotides function as switches in cellular mechanisms, it is plausible to assume that by proper adjustment of these switches age may be tuned. Though, adjusting hundreds of CpG methylation levels coherently may never be feasible and changing just a few positions may lead to biologically unstable state. A prominent example of methylation-based age estimators is provided by Horvath's clock, based on 353 CpG dinucleotides, showing a high correlation (not necessarily causation) with chronological age across multiple tissue types. On this small subset of CpG dinucleotides we demonstrate how the adjustment of one methylation level leads to a cascade of changes at other sites. Among the studied subset, we locate the most important CpGs (and related genes) that may have a large influence on the rest of the sub-system. According to our analysis, the structure of this network is way more hierarchical compared to what one would expect based on ensembles of uncorrelated connections. Therefore, only a handful of CpGs is enough to modify the system towards a desired state. When propagation of the change over the network is taken into account, the resulting modification in the predicted age can be significantly larger compared to the effect of isolated CpG perturbations. By adjusting the most influential single CpG site and following the propagation of methylation level changes we can reach up to 5.74 years in virtual age reduction, significantly larger than without taking into account of the network control. Extending our approach to the whole methylation network may identify key nodes that have controller role in the ageing process.
Assuntos
Envelhecimento/genética , Metilação de DNA , Ilhas de CpG , HumanosRESUMO
BACKGROUND: The willingness to get COVID-19 or seasonal influenza vaccines has not yet been thoroughly investigated together, thus, this study aims to explore this notion within the general adult population. METHODS: The responses of 840 Hungarian participants were analysed who took part in a nationwide computer-assisted telephone interviewing. During the survey questions concerning various demographic characteristics, perceived financial status, and willingness to get the two types of vaccines were asked. Descriptive statistics, comparative statistics and word co-occurrence network analysis were conducted. RESULTS: 48.2% of participants were willing to get a COVID-19 vaccine, while this ratio for the seasonal influenza was only 25.7%. The difference was significant. Regardless of how the participants were grouped, based on demographic data or perceived financial status, the significant difference always persisted. Being older than 59 years significantly increased the willingness to get both vaccines when compared to the middle-aged groups, but not when compared to the younger ones. Having higher education significantly elevated the acceptance of COVID-19 vaccination in comparison to secondary education. The willingness of getting any type of COVID-19 vaccine correlated with the willingness to get both influenza and COVID-19. Finally, those who were willing to get either vaccine coupled similar words together to describe their thoughts about a COVID-19 vaccination. CONCLUSION: The overall results show a clear preference for a COVID-19 vaccine and there are several similarities between the nature of willingness to get either type of vaccine.
Assuntos
COVID-19 , Vacinas contra Influenza , Influenza Humana , Adulto , Vacinas contra COVID-19 , Estudos Transversais , Humanos , Hungria , Influenza Humana/prevenção & controle , Pessoa de Meia-Idade , SARS-CoV-2 , Estações do Ano , VacinaçãoRESUMO
Intorduction: The ClinicalTrials.gov website, which is operated by the US government, collects data about clinical trials. AIM: We have processed data related to Hungary by downloading from the website as XML files. METHOD: Most of the data describe trials performed after 2000, so we got an overview about the clinical research of the last 10 to 15 years. As the majority of the data fields are collected as free text, significant data cleaning was needed. RESULTS: The database contained 2863 trials related to Hungary from 189 settlements. Only 20 per cent of the actual research organizations could have been identified as many times only an "id" number or a general name was given, thus this information was anonymised in many cases. CONCLUSION: Besides the analysis of the information obtained from this database, our study points out the relevant issues that may influence the international view of the Hungarian clinical research. Orv. Hetil., 2017, 158(9), 345-351.
Assuntos
Pesquisa Biomédica/estatística & dados numéricos , Ensaios Clínicos como Assunto/estatística & dados numéricos , Bases de Dados Factuais , Feminino , Humanos , Hungria , MasculinoRESUMO
How do words change their meaning? Although semantic evolution is driven by a variety of distinct factors, including linguistic, societal, and technological ones, we find that there is one law that holds universally across five major Indo-European languages: that semantic evolution is subdiffusive. Using an automated pipeline of diachronic distributional semantic embedding that controls for underlying symmetries, we show that words follow stochastic trajectories in meaning space with an anomalous diffusion exponent α = 0.45 ± 0.05 across languages, in contrast with diffusing particles that follow α = 1. Randomization methods indicate that preserving temporal correlations in semantic change directions is necessary to recover strongly subdiffusive behavior; however, correlations in change sizes play an important role too. We furthermore show that strong subdiffusion is a robust phenomenon under a wide variety of choices in data analysis and interpretation, such as the choice of fitting an ensemble average of displacements or averaging best-fit exponents of individual word trajectories.
Assuntos
Idioma , Semântica , Linguística , Difusão , TecnologiaRESUMO
Graph embeddings learn the structure of networks and represent it in low-dimensional vector spaces. Community structure is one of the features that are recognized and reproduced by embeddings. We show that an iterative procedure, in which a graph is repeatedly embedded and its links are reweighted based on the geometric proximity between the nodes, reinforces intra-community links and weakens inter-community links, making the clusters of the initial network more visible and more easily detectable. The geometric separation between the communities can become so strong that even a very simple parsing of the links may recover the communities as isolated components with surprisingly high precision. Furthermore, when used as a pre-processing step, our embedding and reweighting procedure can improve the performance of traditional community detection algorithms.
RESUMO
The international scientific community puts an ever-growing emphasis on research excellence and performance evaluation. So does the European Union with its flagship research excellence grant scheme organised by the European Research Council. This paper aims to provide an in-depth analysis of one of the ERC's thematic panels within the social sciences, namely the SH2 "Political Science" panel. The analysis is based on empirical, statistical methods, and network analysis tools to gain insights about the grant winners' publication patterns and their coauthor networks. The results draw up an academic career track of the grantees based on quantitative publication patterns and performance. Besides, a change in authorship can be observed, which is proven by the formation of new groups and intensifying intra-group collaboration patterns in the case of all three grant types. However, the ERC grant serves different functions for the winners of three different categories: for the Starting Grant winners, it offers the possibility to kick off and establish their research group, for the Consolidator Grant winners, it opens up new opportunities to extend their co-authorship network, and for the Advanced Grant winners, it offers the chance to start a new collaboration.
RESUMO
Ageing is often characterised by progressive accumulation of damage, and it is one of the most important risk factors for chronic disease development. Epigenetic mechanisms including DNA methylation could functionally contribute to organismal aging, however the key functions and biological processes may govern ageing are still not understood. Although age predictors called epigenetic clocks can accurately estimate the biological age of an individual based on cellular DNA methylation, their models have limited ability to explain the prediction algorithm behind and underlying key biological processes controlling ageing. Here we present XAI-AGE, a biologically informed, explainable deep neural network model for accurate biological age prediction across multiple tissue types. We show that XAI-AGE outperforms the first-generation age predictors and achieves similar results to deep learning-based models, while opening up the possibility to infer biologically meaningful insights of the activity of pathways and other abstract biological processes directly from the model.
Assuntos
Aprendizado Profundo , Algoritmos , Metilação de DNA , Epigênese GenéticaRESUMO
The rich set of interactions between individuals in society results in complex community structure, capturing highly connected circles of friends, families or professional cliques in a social network. Thanks to frequent changes in the activity and communication patterns of individuals, the associated social and communication network is subject to constant evolution. Our knowledge of the mechanisms governing the underlying community dynamics is limited, but is essential for a deeper understanding of the development and self-optimization of society as a whole. We have developed an algorithm based on clique percolation that allows us to investigate the time dependence of overlapping communities on a large scale, and thus uncover basic relationships characterizing community evolution. Our focus is on networks capturing the collaboration between scientists and the calls between mobile phone users. We find that large groups persist for longer if they are capable of dynamically altering their membership, suggesting that an ability to change the group composition results in better adaptability. The behaviour of small groups displays the opposite tendency-the condition for stability is that their composition remains unchanged. We also show that knowledge of the time commitment of members to a given community can be used for estimating the community's lifetime. These findings offer insight into the fundamental differences between the dynamics of small groups and large institutions.
Assuntos
Evolução Biológica , Processos Grupais , Pesquisadores , Comportamento Social , Autoria , Telefone Celular , Comunicação , Pesquisadores/psicologia , Fatores de TempoRESUMO
We introduce a new approach to constructing networks with realistic features. Our method, in spite of its conceptual simplicity (it has only two parameters) is capable of generating a wide variety of network types with prescribed statistical properties, e.g., with degree or clustering coefficient distributions of various, very different forms. In turn, these graphs can be used to test hypotheses or as models of actual data. The method is based on a mapping between suitably chosen singular measures defined on the unit square and sparse infinite networks. Such a mapping has the great potential of allowing for graph theoretical results for a variety of network topologies. The main idea of our approach is to go to the infinite limit of the singular measure and the size of the corresponding graph simultaneously. A very unique feature of this construction is that with the increasing system size the generated graphs become topologically more structured. We present analytic expressions derived from the parameters of the--to be iterated--initial generating measure for such major characteristics of graphs as their degree, clustering coefficient, and assortativity coefficient distributions. The optimal parameters of the generating measure are determined from a simple simulated annealing process. Thus, the present work provides a tool for researchers from a variety of fields (such as biology, computer science, biology, or complex systems) enabling them to create a versatile model of their network data.
Assuntos
Algoritmos , Fractais , Modelos Teóricos , Simulação por ComputadorRESUMO
Finding the optimal embedding of networks into low-dimensional hyperbolic spaces is a challenge that received considerable interest in recent years, with several different approaches proposed in the literature. In general, these methods take advantage of the exponentially growing volume of the hyperbolic space as a function of the radius from the origin, allowing a (roughly) uniform spatial distribution of the nodes even for scale-free small-world networks, where the connection probability between pairs decays with hyperbolic distance. One of the motivations behind hyperbolic embedding is that optimal placement of the nodes in a hyperbolic space is widely thought to enable efficient navigation on top of the network. According to that, one of the measures that can be used to quantify the quality of different embeddings is given by the fraction of successful greedy paths following a simple navigation protocol based on the hyperbolic coordinates. In the present work, we develop an optimisation scheme for this score in the native disk representation of the hyperbolic space. This optimisation algorithm can be either used as an embedding method alone, or it can be applied to improve this score for embeddings obtained from other methods. According to our tests on synthetic and real networks, the proposed optimisation can considerably enhance the success rate of greedy paths in several cases, improving the given embedding from the point of view of navigability.
RESUMO
Hyperbolic network models have gained considerable attention in recent years, mainly due to their capability of explaining many peculiar features of real-world networks. One of the most widely known models of this type is the popularity-similarity optimisation (PSO) model, working in the native disk representation of the two-dimensional hyperbolic space and generating networks with small-world property, scale-free degree distribution, high clustering and strong community structure at the same time. With the motivation of better understanding hyperbolic random graphs, we hereby introduce the dPSO model, a generalisation of the PSO model to any arbitrary integer dimension [Formula: see text]. The analysis of the obtained networks shows that their major structural properties can be affected by the dimension of the underlying hyperbolic space in a non-trivial way. Our extended framework is not only interesting from a theoretical point of view but can also serve as a starting point for the generalisation of already existing two-dimensional hyperbolic embedding techniques.
RESUMO
Many complex systems in nature and society can be described in terms of networks capturing the intricate web of connections among the units they are made of. A key question is how to interpret the global organization of such networks as the coexistence of their structural subunits (communities) associated with more highly interconnected parts. Identifying these a priori unknown building blocks (such as functionally related proteins, industrial sectors and groups of people) is crucial to the understanding of the structural and functional properties of networks. The existing deterministic methods used for large networks find separated communities, whereas most of the actual networks are made of highly overlapping cohesive groups of nodes. Here we introduce an approach to analysing the main statistical features of the interwoven sets of overlapping communities that makes a step towards uncovering the modular structure of complex systems. After defining a set of new characteristic quantities for the statistics of communities, we apply an efficient technique for exploring overlapping communities on a large scale. We find that overlaps are significant, and the distributions we introduce reveal universal features of networks. Our studies of collaboration, word-association and protein interaction graphs show that the web of communities has non-trivial correlations and specific scaling properties.
Assuntos
Redes Comunitárias , Modelos Biológicos , Natureza , Humanos , Internet , Ligação Proteica , Saccharomyces cerevisiae/metabolismoRESUMO
Several observations indicate the existence of a latent hyperbolic space behind real networks that makes their structure very intuitive in the sense that the probability for a connection is decreasing with the hyperbolic distance between the nodes. A remarkable network model generating random graphs along this line is the popularity-similarity optimisation (PSO) model, offering a scale-free degree distribution, high clustering and the small-world property at the same time. These results provide a strong motivation for the development of hyperbolic embedding algorithms, that tackle the problem of finding the optimal hyperbolic coordinates of the nodes based on the network structure. A very promising recent approach for hyperbolic embedding is provided by the noncentered minimum curvilinear embedding (ncMCE) method, belonging to the family of coalescent embedding algorithms. This approach offers a high-quality embedding at a low running time. In the present work we propose a further optimisation of the angular coordinates in this framework that seems to reduce the logarithmic loss and increase the greedy routing score of the embedding compared to the original version, thereby adding an extra improvement to the quality of the inferred hyperbolic coordinates.
RESUMO
A remarkable approach for grasping the relevant statistical features of real networks with the help of random graphs is offered by hyperbolic models, centred around the idea of placing nodes in a low-dimensional hyperbolic space, and connecting node pairs with a probability depending on the hyperbolic distance. It is widely appreciated that these models can generate random graphs that are small-world, highly clustered and scale-free at the same time; thus, reproducing the most fundamental common features of real networks. In the present work, we focus on a less well-known property of the popularity-similarity optimisation model and the [Formula: see text] model from this model family, namely that the networks generated by these approaches also contain communities for a wide range of the parameters, which was certainly not an intention at the design of the models. We extracted the communities from the studied networks using well-established community finding methods such as Louvain, Infomap and label propagation. The observed high modularity values indicate that the community structure can become very pronounced under certain conditions. In addition, the modules found by the different algorithms show good consistency, implying that these are indeed relevant and apparent structural units. Since the appearance of communities is rather common in networks representing real systems as well, this feature of hyperbolic models makes them even more suitable for describing real networks than thought before.
RESUMO
The concept of entropy connects the number of possible configurations with the number of variables in large stochastic systems. Independent or weakly interacting variables render the number of configurations scale exponentially with the number of variables, making the Boltzmann-Gibbs-Shannon entropy extensive. In systems with strongly interacting variables, or with variables driven by history-dependent dynamics, this is no longer true. Here we show that contrary to the generally held belief, not only strong correlations or history-dependence, but skewed-enough distribution of visiting probabilities, that is, first-order statistics, also play a role in determining the relation between configuration space size and system size, or, equivalently, the extensive form of generalized entropy. We present a macroscopic formalism describing this interplay between first-order statistics, higher-order statistics, and configuration space growth. We demonstrate that knowing any two strongly restricts the possibilities of the third. We believe that this unified macroscopic picture of emergent degrees of freedom constraining mechanisms provides a step towards finding order in the zoo of strongly interacting complex systems.
RESUMO
The hidden variable formalism (based on the assumption of some intrinsic node parameters) turned out to be a remarkably efficient and powerful approach in describing and analyzing the topology of complex networks. Owing to one of its most advantageous property - namely proven to be able to reproduce a wide range of different degree distribution forms - it has become a standard tool for generating networks having the scale-free property. One of the most intensively studied version of this model is based on a thresholding mechanism of the exponentially distributed hidden variables associated to the nodes (intrinsic vertex weights), which give rise to the emergence of a scale-free network where the degree distribution p(k) ~ k-γ is decaying with an exponent of γ = 2. Here we propose a generalization and modification of this model by extending the set of connection probabilities and hidden variable distributions that lead to the aforementioned degree distribution, and analyze the conditions leading to the above behavior analytically. In addition, we propose a relaxation of the hard threshold in the connection probabilities, which opens up the possibility for obtaining sparse scale free networks with arbitrary scaling exponent.
RESUMO
Hierarchical organisation is a prevalent feature of many complex networks appearing in nature and society. A relating interesting, yet less studied question is how does a hierarchical network evolve over time? Here we take a data driven approach and examine the time evolution of the network between the Medical Subject Headings (MeSH) provided by the National Center for Biotechnology Information (NCBI, part of the U. S. National Library of Medicine). The network between the MeSH terms is organised into 16 different, yearly updated hierarchies such as "Anatomy", "Diseases", "Chemicals and Drugs", etc. The natural representation of these hierarchies is given by directed acyclic graphs, composed of links pointing from nodes higher in the hierarchy towards nodes in lower levels. Due to the yearly updates, the structure of these networks is subject to constant evolution: new MeSH terms can appear, terms becoming obsolete can be deleted or be merged with other terms, and also already existing parts of the network may be rewired. We examine various statistical properties of the time evolution, with a special focus on the attachment and detachment mechanisms of the links, and find a few general features that are characteristic for all MeSH hierarchies. According to the results, the hierarchies investigated display an interesting interplay between non-uniform preference with respect to multiple different topological and hierarchical properties.
Assuntos
Medical Subject Headings , PubMed , Modelos Estatísticos , Fatores de TempoRESUMO
Many physical, biological or social systems are governed by history-dependent dynamics or are composed of strongly interacting units, showing an extreme diversity of microscopic behaviour. Macroscopically, however, they can be efficiently modeled by generalizing concepts of the theory of Markovian, ergodic and weakly interacting stochastic processes. In this paper, we model stochastic processes by a family of generalized Fokker-Planck equations whose stationary solutions are equivalent to the maximum entropy distributions according to generalized entropies. We show that at asymptotically large times and volumes, the scaling exponent of the anomalous diffusion process described by the generalized Fokker-Planck equation and the phase space volume scaling exponent of the generalized entropy bijectively determine each other via a simple algebraic relation. This implies that these basic measures characterizing the transient and the stationary behaviour of the processes provide the same information regarding the asymptotic regime, and consequently, the classification of the processes given by these two exponents coincide.
RESUMO
Hierarchical organization is prevalent in networks representing a wide range of systems in nature and society. An important example is given by the tag hierarchies extracted from large on-line data repositories such as scientific publication archives, file sharing portals, blogs, on-line news portals, etc. The tagging of the stored objects with informative keywords in such repositories has become very common, and in most cases the tags on a given item are free words chosen by the authors independently. Therefore, the relations among keywords appearing in an on-line data repository are unknown in general. However, in most cases the topics and concepts described by these keywords are forming a latent hierarchy, with the more general topics and categories at the top, and more specialized ones at the bottom. There are several algorithms available for deducing this hierarchy from the statistical features of the keywords. In the present work we apply a recent, co-occurrence-based tag hierarchy extraction method to sets of keywords obtained from four different on-line news portals. The resulting hierarchies show substantial differences not just in the topics rendered as important (being at the top of the hierarchy) or of less interest (categorized low in the hierarchy), but also in the underlying network structure. This reveals discrepancies between the plausible keyword association frameworks in the studied news portals.
Assuntos
Mineração de Dados , Internet , Idioma , Jornais como AssuntoRESUMO
Signs of hierarchy are prevalent in a wide range of systems in nature and society. One of the key problems is quantifying the importance of hierarchical organisation in the structure of the network representing the interactions or connections between the fundamental units of the studied system. Although a number of notable methods are already available, their vast majority is treating all directed acyclic graphs as already maximally hierarchical. Here we propose a hierarchy measure based on random walks on the network. The novelty of our approach is that directed trees corresponding to multi level pyramidal structures obtain higher hierarchy scores compared to directed chains and directed stars. Furthermore, in the thermodynamic limit the hierarchy measure of regular trees is converging to a well defined limit depending only on the branching number. When applied to real networks, our method is computationally very effective, as the result can be evaluated with arbitrary precision by subsequent multiplications of the transition matrix describing the random walk process. In addition, the tests on real world networks provided very intuitive results, e.g., the trophic levels obtained from our approach on a food web were highly consistent with former results from ecology.