Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Entropy (Basel) ; 23(10)2021 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-34681995

RESUMO

Functional modules can be predicted using genome-wide protein-protein interactions (PPIs) from a systematic perspective. Various graph clustering algorithms have been applied to PPI networks for this task. In particular, the detection of overlapping clusters is necessary because a protein is involved in multiple functions under different conditions. graph entropy (GE) is a novel metric to assess the quality of clusters in a large, complex network. In this study, the unweighted and weighted GE algorithm is evaluated to prove the validity of predicting function modules. To measure clustering accuracy, the clustering results are compared to protein complexes and Gene Ontology (GO) annotations as references. We demonstrate that the GE algorithm is more accurate in overlapping clusters than the other competitive methods. Moreover, we confirm the biological feasibility of the proteins that occur most frequently in the set of identified clusters. Finally, novel proteins for the additional annotation of GO terms are revealed.

2.
Entropy (Basel) ; 23(1)2020 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-33374305

RESUMO

A community in a complex network refers to a group of nodes that are densely connected internally but with only sparse connections to the outside. Overlapping community structures are ubiquitous in real-world networks, where each node belongs to at least one community. Therefore, overlapping community detection is an important topic in complex network research. This paper proposes an overlapping community detection algorithm based on membership degree propagation that is driven by both global and local information of the node community. In the method, we introduce a concept of membership degree, which not only stores the label information, but also the degrees of the node belonging to the labels. Then the conventional label propagation process could be extended to membership degree propagation, with the results mapped directly to the overlapping community division. Therefore, it obtains the partition result and overlapping node identification simultaneously and greatly reduces the computational time. The proposed algorithm was applied to a synthetic Lancichinetti-Fortunato-Radicchi (LFR) dataset and nine real-world datasets and compared with other up-to-date algorithms. The experimental results show that our proposed algorithm is effective and outperforms the comparison methods on most datasets. Our proposed method significantly improved the accuracy and speed of the overlapping node prediction. It can also substantially alleviate the computational complexity of community structure detection in general.

3.
Sensors (Basel) ; 19(2)2019 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-30634718

RESUMO

With the enrichment of the entity information in the real world, many networks with attributed nodes are proposed and studied widely. Community detection in these attributed networks is an essential task that aims to find groups where the intra-nodes are much more densely connected than the inter-nodes. However, many existing community detection methods in attributed networks do not distinguish overlapping communities from non-overlapping communities when designing algorithms. In this paper, we propose a novel and accurate algorithm called Node-similarity-based Multi-Label Propagation Algorithm (NMLPA) for detecting overlapping communities in attributed networks. NMLPA first calculates the similarity between nodes and then propagates multiple labels based on the network structure and the node similarity. Moreover, NMLPA uses a pruning strategy to keep the number of labels per node within a suitable range. Extensive experiments conducted on both synthetic and real-world networks show that our new method significantly outperforms state-of-the-art methods.

4.
Molecules ; 23(10)2018 Oct 13.
Artigo em Inglês | MEDLINE | ID: mdl-30322177

RESUMO

Overlapping structures of protein⁻protein interaction networks are very prevalent in different biological processes, which reflect the sharing mechanism to common functional components. The overlapping community detection (OCD) algorithm based on central node selection (CNS) is a traditional and acceptable algorithm for OCD in networks. The main content of CNS is the central node selection and the clustering procedure. However, the original CNS does not consider the influence among the nodes and the importance of the division of the edges in networks. In this paper, an OCD algorithm based on a central edge selection (CES) algorithm for detection of overlapping communities of protein⁻protein interaction (PPI) networks is proposed. Different from the traditional CNS algorithms for OCD, the proposed algorithm uses community magnetic interference (CMI) to obtain more reasonable central edges in the process of CES, and employs a new distance between the non-central edge and the set of the central edges to divide the non-central edge into the correct cluster during the clustering procedure. In addition, the proposed CES improves the strategy of overlapping nodes pruning (ONP) to make the division more precisely. The experimental results on three benchmark networks and three biological PPI networks of Mus. musculus, Escherichia coli, and Cerevisiae show that the CES algorithm performs well.


Assuntos
Biologia Computacional/métodos , Escherichia coli/metabolismo , Mapeamento de Interação de Proteínas/métodos , Saccharomyces cerevisiae/metabolismo , Algoritmos , Animais , Proteínas de Escherichia coli/metabolismo , Camundongos , Mapas de Interação de Proteínas , Proteínas de Saccharomyces cerevisiae/metabolismo
5.
PeerJ Comput Sci ; 9: e1291, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37346513

RESUMO

The detection of communities in graph datasets provides insight about a graph's underlying structure and is an important tool for various domains such as social sciences, marketing, traffic forecast, and drug discovery. While most existing algorithms provide fast approaches for community detection, their results usually contain strictly separated communities. However, most datasets would semantically allow for or even require overlapping communities that can only be determined at much higher computational cost. We build on an efficient algorithm, Fox, that detects such overlapping communities. Fox measures the closeness of a node to a community by approximating the count of triangles which that node forms with that community. We propose LazyFox, a multi-threaded adaptation of the Fox algorithm, which provides even faster detection without an impact on community quality. This allows for the analyses of significantly larger and more complex datasets. LazyFox enables overlapping community detection on complex graph datasets with millions of nodes and billions of edges in days instead of weeks. As part of this work, LazyFox's implementation was published and is available as a tool under an MIT licence at https://github.com/TimGarrels/LazyFox.

6.
Comput Biol Chem ; 98: 107670, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35398777

RESUMO

Metagenomics is a discipline that studies the genetic material of all tiny organisms in the biological environment. In recent years, the interaction between metagenomic microbial communities, the transfer of horizontal genes, and the dynamic changes of microbial ecosystems have attracted more and more attention. It is of great significance to use the community detection algorithm to divide the metagenomic microbes into modules, and it has a positive guiding role for the follow-up research on human, drug, microbial interaction study and drug prediction and development. At present, there are challenges in mining the effective information hidden in large-scale microbial sequence data. The non-linear characteristics and non-scalability of microbial sequence data still bother people. This paper proposes an end-to-end unsupervised GCN learning model OTUCD (Operational Classification Unit Community Detection), which divides large-scale metagenomic sequence data into potential gene modules. We construct an OTU network, and then performs subsequent nonoverlapping community detection task with graph convolutional networks. Experimental scores show that the community detection effect of this method is better than other latest metagenomic algorithms.


Assuntos
Metagenômica , Microbiota , Algoritmos , Humanos , Metagenoma , Metagenômica/métodos , Microbiota/genética
7.
bioRxiv ; 2021 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-34189530

RESUMO

Characterization of protein complexes, i.e. sets of proteins assembling into a single larger physical entity, is important, as such assemblies play many essential roles in cells such as gene regulation. From networks of protein-protein interactions, potential protein complexes can be identified computationally through the application of community detection methods, which flag groups of entities interacting with each other in certain patterns. Most community detection algorithms tend to be unsupervised and assume that communities are dense network subgraphs, which is not always true, as protein complexes can exhibit diverse network topologies. The few existing supervised machine learning methods are serial and can potentially be improved in terms of accuracy and scalability by using better-suited machine learning models and parallel algorithms. Here, we present Super.Complex, a distributed, supervised AutoML-based pipeline for overlapping community detection in weighted networks. We also propose three new evaluation measures for the outstanding issue of comparing sets of learned and known communities satisfactorily. Super.Complex learns a community fitness function from known communities using an AutoML method and applies this fitness function to detect new communities. A heuristic local search algorithm finds maximally scoring communities, and a parallel implementation can be run on a computer cluster for scaling to large networks. On a yeast protein-interaction network, Super.Complex outperforms 6 other supervised and 4 unsupervised methods. Application of Super.Complex to a human protein-interaction network with ~8k nodes and ~60k edges yields 1,028 protein complexes, with 234 complexes linked to SARS-CoV-2, the COVID-19 virus, with 111 uncharacterized proteins present in 103 learned complexes. Super.Complex is generalizable with the ability to improve results by incorporating domain-specific features. Learned community characteristics can also be transferred from existing applications to detect communities in a new application with no known communities. Code and interactive visualizations of learned human protein complexes are freely available at: https://sites.google.com/view/supercomplex/super-complex-v3-0.

8.
Front Genet ; 10: 164, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30918511

RESUMO

Biological networks catalog the complex web of interactions happening between different molecules, typically proteins, within a cell. These networks are known to be highly modular, with groups of proteins associated with specific biological functions. Human diseases often arise from the dysfunction of one or more such proteins of the biological functional group. The ability, to identify and automatically extract these modules has implications for understanding the etiology of different diseases as well as the functional roles of different protein modules in disease. The recent DREAM challenge posed the problem of identifying disease modules from six heterogeneous networks of proteins/genes. There exist many community detection algorithms, but all of them are not adaptable to the biological context, as these networks are densely connected and the size of biologically relevant modules is quite small. The contribution of this study is 3-fold: first, we present a comprehensive assessment of many classic community detection algorithms for biological networks to identify non-overlapping communities, and propose heuristics to identify small and structurally well-defined communities-core modules. We evaluated our performance over 180 GWAS datasets. In comparison to traditional approaches, with our proposed approach we could identify 50% more number of disease-relevant modules. Thus, we show that it is important to identify more compact modules for better performance. Next, we sought to understand the peculiar characteristics of disease-enriched modules and what causes standard community detection algorithms to detect so few of them. We performed a comprehensive analysis of the interaction patterns of known disease genes to understand the structure of disease modules and show that merely considering the known disease genes set as a module does not give good quality clusters, as measured by typical metrics such as modularity and conductance. We go on to present a methodology leveraging these known disease genes, to also include the neighboring nodes of these genes into a module, to form good quality clusters and subsequently extract a "gold-standard set" of disease modules. Lastly, we demonstrate, with justification, that "overlapping" community detection algorithms should be the preferred choice for disease module identification since several genes participate in multiple biological functions.

9.
J Neurosci Methods ; 318: 47-55, 2019 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-30831137

RESUMO

It has been found that specific regions in the brain are dedicated to specific functions. Detection and analysis of the constituent functional networks of the brain is of great importance for understanding the brain functionality and diagnosing some neuropsychiatric illnesses. In this paper, we introduce Non-negative Tensor Factorization (NTF) methods to identify the overlapping communities in brain networks using resting-state functional Magnetic Resonance Imaging (rs-fMRI) data. Instead of taking average over a group of subjects, we use individual subject connectivity matrices to build the tensor data. Decomposed factors indicate the community membership probabilities and inter-subject variability indices modeling the community strengths over subjects. In contrast to the methods based on Non-negative Matrix Factorization (NMF) which are generally applied to the average connectivity matrices, using tensor factorization modeling preserves the information conveyed by the individual subjects. The experiments are carried out on simulated data as well as real Human Connectome Project (HCP) rs-fMRI datasets. To evaluate the effectiveness of the proposed framework, we have computed reproducibility over time and groups of subjects. Test-retest reliability is also examined through computing the intra-class correlation coefficient (ICC) index. The results show that the proposed NTF-based frameworks lead to stable and accurate results.


Assuntos
Algoritmos , Conectoma/métodos , Imageamento por Ressonância Magnética/métodos , Adulto , Teorema de Bayes , Feminino , Humanos , Masculino , Modelos Teóricos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA