Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
Chaos ; 34(8)2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-39146454

RESUMO

Structures of complex networks are fundamental to system dynamics, where node state and connectivity patterns determine the cost of a control system, a key aspect in unraveling complexity. However, minimizing the energy required to control a system with the fewest input nodes remains an open problem. This study investigates the relationship between the structure of closed-connected function modules and control energy. We discovered that small structural adjustments, such as adding a few extended driver nodes, can significantly reduce control energy. Thus, we propose MInimal extended driver nodes in Energetic costs Reduction (MIER). Next, we transform the detection of MIER into a multi-objective optimization problem and choose an NSGA-II algorithm to solve it. Compared with the baseline methods, NSGA-II can approximate the optimal solution to the greatest extent. Through experiments using synthetic and real data, we validate that MIER can exponentially decrease control energy. Furthermore, random perturbation tests confirm the stability of MIER. Subsequently, we applied MIER to three representative scenarios: regulation of differential expression genes affected by cancer mutations in the human protein-protein interaction network, trade relations among developed countries in the world trade network, and regulation of body-wall muscle cells by motor neurons in Caenorhabditis elegans nervous network. The results reveal that the involvement of MIER significantly reduces control energy required for these original modules from a topological perspective. Additionally, MIER nodes enhance functionality, supplement key nodes, and uncover potential mechanisms. Overall, our work provides practical computational tools for understanding and presenting control strategies in biological, social, and neural systems.


Assuntos
Algoritmos , Caenorhabditis elegans , Caenorhabditis elegans/metabolismo , Humanos , Animais , Metabolismo Energético , Mapas de Interação de Proteínas
2.
Front Genet ; 15: 1354208, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38463168

RESUMO

CTCF-mediated chromatin loops create insulated neighborhoods that constrain promoter-enhancer interactions, serving as a unit of gene regulation. Disruption of the CTCF binding sites (CBS) will lead to the destruction of insulated neighborhoods, which in turn can cause dysregulation of the contained genes. In a recent study, it is found that CTCF/cohesin binding sites are a major mutational hotspot in the cancer genome. Mutations can affect CTCF binding, causing the disruption of insulated neighborhoods. And our analysis reveals a significant enrichment of well-known proto-oncogenes in insulated neighborhoods with mutations specifically occurring in anchor regions. It can be assumed that some mutations disrupt CTCF binding, leading to the disruption of insulated neighborhoods and subsequent activation of proto-oncogenes within these insulated neighborhoods. To explore the consequences of such mutations, we develop DeepCBS, a computational tool capable of analyzing mutations at CTCF binding sites, predicting their influence on insulated neighborhoods, and investigating the potential activation of proto-oncogenes. Futhermore, DeepCBS is applied to somatic mutation data of liver cancer. As a result, 87 mutations that disrupt CTCF binding sites are identified, which leads to the identification of 237 disrupted insulated neighborhoods containing a total of 135 genes. Integrative analysis of gene expression differences in liver cancer further highlights three genes: ARHGEF39, UBE2C and DQX1. Among them, ARHGEF39 and UBE2C have been reported in the literature as potential oncogenes involved in the development of liver cancer. The results indicate that DQX1 may be a potential oncogene in liver cancer and may contribute to tumor immune escape. In conclusion, DeepCBS is a promising method to analyze impacts of mutations occurring at CTCF binding sites on the insulator function of CTCF, with potential extensions to shed light on the effects of mutations on other functions of CTCF.

3.
NPJ Syst Biol Appl ; 8(1): 47, 2022 11 29.
Artigo em Inglês | MEDLINE | ID: mdl-36446819

RESUMO

Thousands of genes are perturbed by cancer, and these disturbances can be seen in transcriptome, methylation, somatic mutation, and copy number variation omics studies. Understanding their connectivity patterns as an omnigenic neighbourhood in a molecular interaction network (interactome) is a key step towards advancing knowledge of the molecular mechanisms underlying cancers. Here, we introduce a unified connectivity line (CLine) to pinpoint omics-specific omnigenic patterns across 15 curated cancers. Taking advantage of the universality of CLine, we distinguish the peripheral and core genes for each omics aspect. We propose a network-based framework, multi-omics periphery and core (MOPC), to combine peripheral and core genes from different omics into a button-like structure. On the basis of network proximity, we provide evidence that core genes tend to be specifically perturbed in one omics, but the peripheral genes are diversely perturbed in multiple omics. And the core of one omics is regulated by multiple omics peripheries. Finally, we take the MOPC as an omnigenic neighbourhood, describe its characteristics, and explore its relative contribution to network-based mechanisms of cancer. We were able to present how multi-omics perturbations percolate through the human interactome and contribute to an integrated periphery and core.


Assuntos
Variações do Número de Cópias de DNA , Neoplasias , Humanos , Variações do Número de Cópias de DNA/genética , Neoplasias/genética , Transcriptoma/genética
4.
Entropy (Basel) ; 24(7)2022 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-35885201

RESUMO

The determination of directed control paths in complex networks is important because control paths indicate the structure of the propagation of control signals through edges. A challenging problem is to identify them in complex networked systems characterized by different types of interactions that form multilayer networks. In this study, we describe a graph pattern called the conserved control path, which allows us to model a common control structure among different types of relations. We present a practical conserved control path detection method (CoPath), which is based on a maximum-weighted matching, to determine the paths that play the most consistent roles in controlling signal transmission in multilayer networks. As a pragmatic application, we demonstrate that the control paths detected in a multilayered pan-cancer network are statistically more consistent. Additionally, they lead to the effective identification of drug targets, thereby demonstrating their power in predicting key pathways that influence multiple cancers.

5.
Front Genet ; 12: 665416, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33968140

RESUMO

Multi-omics molecules regulate complex biological processes (CBPs), which reflect the activities of various molecules in living organisms. Meanwhile, the applications to represent disease subtypes and cell types have created an urgent need for sample grouping and associated CBP-inferring tools. In this paper, we present CBP-JMF, a practical tool primarily for discovering CBPs, which underlie sample groups as disease subtypes in applications. Differently from existing methods, CBP-JMF is based on a joint non-negative matrix tri-factorization framework and is implemented in Python. As a pragmatic application, we apply CBP-JMF to identify CBPs for four subtypes of breast cancer. The result shows significant overlapping between genes extracted from CBPs and known subtype pathways. We verify the effectiveness of our tool in detecting CBPs that interpret subtypes of disease.

6.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-33956950

RESUMO

Two thousand nineteen novel coronavirus SARS-CoV-2, the pathogen of COVID-19, has caused a catastrophic pandemic, which has a profound and widespread impact on human lives and social economy globally. However, the molecular perturbations induced by the SARS-CoV-2 infection remain unknown. In this paper, from the perspective of omnigenic, we analyze the properties of the neighborhood perturbed by SARS-CoV-2 in the human interactome and disclose the peripheral and core regions of virus-host network (VHN). We find that the virus-host proteins (VHPs) form a significantly connected VHN, among which highly perturbed proteins aggregate into an observable core region. The non-core region of VHN forms a large scale but relatively low perturbed periphery. We further validate that the periphery is non-negligible and conducive to identifying comorbidities and detecting drug repurposing candidates for COVID-19. We particularly put forward a flower model for COVID-19, SARS and H1N1 based on their peripheral regions, and the flower model shows more correlations between COVID-19 and other two similar diseases in common functional pathways and candidate drugs. Overall, our periphery-core pattern can not only offer insights into interconnectivity of SARS-CoV-2 VHPs but also facilitate the research on therapeutic drugs.


Assuntos
COVID-19/genética , Reposicionamento de Medicamentos , SARS-CoV-2/genética , COVID-19/patologia , COVID-19/virologia , Interações Hospedeiro-Patógeno/genética , Humanos , Vírus da Influenza A Subtipo H1N1/efeitos dos fármacos , Vírus da Influenza A Subtipo H1N1/patogenicidade , SARS-CoV-2/patogenicidade , Tratamento Farmacológico da COVID-19
7.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-33940590

RESUMO

Single-cell clustering is an important part of analyzing single-cell RNA-sequencing data. However, the accuracy and robustness of existing methods are disturbed by noise. One promising approach for addressing this challenge is integrating pathway information, which can alleviate noise and improve performance. In this work, we studied the impact on accuracy and robustness of existing single-cell clustering methods by integrating pathways. We collected 10 state-of-the-art single-cell clustering methods, 26 scRNA-seq datasets and four pathway databases, combined the AUCell method and the similarity network fusion to integrate pathway data and scRNA-seq data, and introduced three accuracy indicators, three noise generation strategies and robustness indicators. Experiments on this framework showed that integrating pathways can significantly improve the accuracy and robustness of most single-cell clustering methods.


Assuntos
Algoritmos , Bases de Dados de Ácidos Nucleicos , Sequenciamento do Exoma , RNA-Seq , Análise de Célula Única , Análise por Conglomerados
8.
BMC Bioinformatics ; 21(1): 433, 2020 Oct 02.
Artigo em Inglês | MEDLINE | ID: mdl-33008305

RESUMO

BACKGROUND: Precise disease module is conducive to understanding the molecular mechanism of disease causation and identifying drug targets. However, due to the fragmentization of disease module in incomplete human interactome, how to determine connectivity pattern and detect a complete neighbourhood of disease based on this is still an open question. RESULTS: In this paper, we perform exploratory analysis leading to an important observation that through a few intermediate nodes, most separate connected components formed by disease-associated proteins can be effectively connected and eventually form a complete disease module. And based on the topological properties of these intermediate nodes, we propose a connect separate connected components (C3) method to detect a succinct disease module by introducing a relatively small number of intermediate nodes, which allows us to obtain more pure disease module than other methods. Then we apply C3 across a large corpus of diseases to validate this connectivity pattern of disease module. Furthermore, the connectivity of the perturbed genes in multi-omics data such as The Cancer Genome Atlas also fits this pattern. CONCLUSIONS: C3 tool is not only useful in detecting a clearly-defined connected disease neighbourhood of 299 diseases and cancer with multi-omics data, but also helpful in better understanding the interconnection of phenotypically related genes in different omics data and studying complex pathological processes.


Assuntos
Algoritmos , Doença , Asma/genética , Neoplasias da Mama/genética , Feminino , Ontologia Genética , Humanos , Anotação de Sequência Molecular , Mapas de Interação de Proteínas , Proteínas/metabolismo
9.
Artigo em Inglês | MEDLINE | ID: mdl-29994261

RESUMO

Identifying driver modules or pathways is a key challenge to interpret the molecular mechanisms and pathogenesis underlying cancer. An increasing number of studies suggest that rarely mutated genes are important for the development of cancer. However, the driver modules consisting of mutated genes with low-frequency driver mutations are not well characterized. To identify driver modules with rarely mutated genes, we propose a functional similarity index to quantify the functional relationship between rarely mutated genes and other ones in the same module. Then, we develop a method to detect Driver Modules with Rarely mutated Genes (DMRG) by incorporating the functional similarity, coverage and mutual exclusivity. By applying DMRG on TCGA cancer dataset on three networks: HINT+HI2012, iRefIndex and MultiNet, we detect driver modules intersecting with the well-known signalling pathways and protein complexes, such as the cell cycle pathway and the mediator complex. DMRG can also detect driver modules effectively with 20, 40, 60 and 80 percent of samples by random selection. When compared with HotNet2, DMRG detects more rarely mutated cancer genes and has higher pathway enrichment. Overall, DMRG provides an effective method for the identification of driver modules with rarely mutated genes.


Assuntos
Biologia Computacional/métodos , Mutação/genética , Neoplasias/genética , Algoritmos , Ciclo Celular/genética , Bases de Dados Genéticas , Humanos , Transdução de Sinais/genética
10.
Nat Commun ; 10(1): 2180, 2019 05 16.
Artigo em Inglês | MEDLINE | ID: mdl-31097707

RESUMO

Most combination therapies are developed based on targets of existing drugs, which only represent a small portion of the human proteome. We introduce a network controllability-based method, OptiCon, for de novo identification of synergistic regulators as candidates for combination therapy. These regulators jointly exert maximal control over deregulated genes but minimal control over unperturbed genes in a disease. Using data from three cancer types, we show that 68% of predicted regulators are either known drug targets or have a critical role in cancer development. Predicted regulators are depleted for known proteins associated with side effects. Predicted synergy is supported by disease-specific and clinically relevant synthetic lethal interactions and experimental validation. A significant portion of genes regulated by synergistic regulators participate in dense interactions between co-regulated subnetworks and contribute to therapy resistance. OptiCon represents a general framework for systemic and de novo identification of synergistic regulators underlying a cellular state transition.


Assuntos
Protocolos de Quimioterapia Combinada Antineoplásica/farmacologia , Biologia Computacional/métodos , Redes Reguladoras de Genes/efeitos dos fármacos , Neoplasias/genética , Mapas de Interação de Proteínas/efeitos dos fármacos , Células A549 , Algoritmos , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Conjuntos de Dados como Assunto , Sinergismo Farmacológico , Quimioterapia Combinada/métodos , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica/efeitos dos fármacos , Redes Reguladoras de Genes/genética , Células HEK293 , Humanos , Células MCF-7 , Modelos Genéticos , Terapia de Alvo Molecular/métodos , Mutação , Neoplasias/tratamento farmacológico , Neoplasias/patologia , Mapas de Interação de Proteínas/genética , Transdução de Sinais/efeitos dos fármacos , Transdução de Sinais/genética
11.
Artigo em Inglês | MEDLINE | ID: mdl-27076463

RESUMO

Computational approaches for predicting drug-disease associations by integrating gene expression and biological network provide great insights to the complex relationships among drugs, targets, disease genes, and diseases at a system level. Hepatocellular carcinoma (HCC) is one of the most common malignant tumors with a high rate of morbidity and mortality. We provide an integrative framework to predict novel d rugs for HCC based on multi-source random walk (PD-MRW). Firstly, based on gene expression and protein interaction network, we construct a gene-gene weighted i nteraction network (GWIN). Then, based on multi-source random walk in GWIN, we build a drug-drug similarity network. Finally, based on the known drugs for HCC, we score all drugs in the drug-drug similarity network. The robustness of our predictions, their overlap with those reported in Comparative Toxicogenomics Database (CTD) and literatures, and their enriched KEGG pathway demonstrate our approach can effectively identify new drug indications. Specifically, regorafenib (Rank = 9 in top-20 list) is proven to be effective in Phase I and II clinical trials of HCC, and the Phase III trial is ongoing. And, it has 11 overlapping pathways with HCC with lower p-values. Focusing on a particular disease, we believe our approach is more accurate and possesses better scalability.


Assuntos
Antineoplásicos , Carcinoma Hepatocelular , Biologia Computacional/métodos , Reposicionamento de Medicamentos/métodos , Neoplasias Hepáticas , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Carcinoma Hepatocelular/tratamento farmacológico , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/metabolismo , Perfilação da Expressão Gênica , Redes Reguladoras de Genes/efeitos dos fármacos , Redes Reguladoras de Genes/genética , Humanos , Neoplasias Hepáticas/tratamento farmacológico , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/metabolismo , Mapas de Interação de Proteínas/efeitos dos fármacos , Mapas de Interação de Proteínas/genética , Transcriptoma/efeitos dos fármacos , Transcriptoma/genética
12.
Mol Biosyst ; 12(9): 2921-31, 2016 08 16.
Artigo em Inglês | MEDLINE | ID: mdl-27426053

RESUMO

Although a lot of methods have been proposed to identify driver genes, how to separate the driver mutations from the passenger mutations is still a challenging problem in cancer genomics. The detection of driver genes with rare mutation and low accuracy is unsolved better. In this study, we present an integrated network-based approach to locate potential driver genes in a cohort of patients. The approach is composed of two steps including a network diffusion step and an aggregated ranking step, which fuses the correlation between the gene mutations and gene expression, the relationship between the mutated genes and the heterogeneous characteristic of the patient mutation. We analyze three cancer datasets including Glioblastoma multiforme, Ovarian cancer and Breast cancer. Our method has not only identified the known driver genes with high-frequency mutations, but also discovered the potential driver genes with a rare mutation. At the same time, validation by literature search and functional enrichment analysis reveal that the predicted genes are obviously related to these three kinds of cancers.


Assuntos
Transformação Celular Neoplásica/genética , Biologia Computacional/métodos , Redes Reguladoras de Genes , Neoplasias/genética , Oncogenes , Algoritmos , Regulação Neoplásica da Expressão Gênica , Frequência do Gene , Genômica/métodos , Humanos , Mutação , Reprodutibilidade dos Testes
13.
BMC Syst Biol ; 10(Suppl 4): 111, 2016 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-28155709

RESUMO

BACKGROUND: Extracting drug-disease correlations is crucial in unveiling disease mechanisms, as well as discovering new indications of available drugs, or drug repositioning. Both the interactome and the knowledge of disease-associated and drug-associated genes remain incomplete. RESULTS: We present a new method to predict the associations between drugs and diseases. Our method is based on a module distance, which is originally proposed to calculate distances between modules in incomplete human interactome. We first map all the disease genes and drug genes to a combined protein interaction network. Then based on the module distance, we calculate the distances between drug gene sets and disease gene sets, and take the distances as the relationships of drug-disease pairs. We also filter possible false positive drug-disease correlations by p-value. Finally, we validate the top-100 drug-disease associations related to six drugs in the predicted results. CONCLUSION: The overlapping between our predicted correlations with those reported in Comparative Toxicogenomics Database (CTD) and literatures, and their enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways demonstrate our approach can not only effectively identify new drug indications, but also provide new insight into drug-disease discovery.


Assuntos
Biologia Computacional/métodos , Doença/genética , Preparações Farmacêuticas/metabolismo , Mapas de Interação de Proteínas , Descoberta de Drogas , Humanos , Terapia de Alvo Molecular
15.
PLoS One ; 10(8): e0135491, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26284649

RESUMO

BACKGROUND: The complexity of biological systems motivates us to use the underlying networks to provide deep understanding of disease etiology and the human diseases are viewed as perturbations of dynamic properties of networks. Control theory that deals with dynamic systems has been successfully used to capture systems-level knowledge in large amount of quantitative biological interactions. But from the perspective of system control, the ways by which multiple genetic factors jointly perturb a disease phenotype still remain. RESULTS: In this work, we combine tools from control theory and network science to address the diversified control paths in complex networks. Then the ways by which the disease genes perturb biological systems are identified and quantified by the control paths in a human regulatory network. Furthermore, as an application, prioritization of candidate genes is presented by use of control path analysis and gene ontology annotation for definition of similarities. We use leave-one-out cross-validation to evaluate the ability of finding the gene-disease relationship. Results have shown compatible performance with previous sophisticated works, especially in directed systems. CONCLUSIONS: Our results inspire a deeper understanding of molecular mechanisms that drive pathological processes. Diversified control paths offer a basis for integrated intervention techniques which will ultimately lead to the development of novel therapeutic strategies.


Assuntos
Biomarcadores/metabolismo , Biologia Computacional/métodos , Regulação da Expressão Gênica , Redes Reguladoras de Genes/genética , Redes e Vias Metabólicas/genética , Transdução de Sinais , Biologia de Sistemas , Doença de Alzheimer/genética , Neoplasias Colorretais/genética , Diabetes Mellitus Tipo 2/genética , Feminino , Humanos , Leucemia/genética , Anotação de Sequência Molecular , Neoplasias Ovarianas/genética , Fenótipo
16.
PLoS One ; 10(2): e0115692, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25664462

RESUMO

BACKGROUND: Phenotypic features associated with genes and diseases play an important role in disease-related studies and most of the available methods focus solely on the Online Mendelian Inheritance in Man (OMIM) database without considering the controlled vocabulary. The Human Phenotype Ontology (HPO) provides a standardized and controlled vocabulary covering phenotypic abnormalities in human diseases, and becomes a comprehensive resource for computational analysis of human disease phenotypes. Most of the existing HPO-based software tools cannot be used offline and provide only few similarity measures. Therefore, there is a critical need for developing a comprehensive and offline software for phenotypic features similarity based on HPO. RESULTS: HPOSim is an R package for analyzing phenotypic similarity for genes and diseases based on HPO data. Seven commonly used semantic similarity measures are implemented in HPOSim. Enrichment analysis of gene sets and disease sets are also implemented, including hypergeometric enrichment analysis and network ontology analysis (NOA). CONCLUSIONS: HPOSim can be used to predict disease genes and explore disease-related function of gene modules. HPOSim is open source and freely available at SourceForge (https://sourceforge.net/p/hposim/).


Assuntos
Doença/genética , Ontologia Genética , Fenótipo , Software , Biologia Computacional , Bases de Dados Genéticas , Genes , Predisposição Genética para Doença , Humanos
17.
Sci Rep ; 4: 5399, 2014 Jun 23.
Artigo em Inglês | MEDLINE | ID: mdl-24954137

RESUMO

Topological centrality is a significant measure for characterising the relative importance of a node in a complex network. For directed networks that model dynamic processes, however, it is of more practical importance to quantify a vertex's ability to dominate (control or observe) the state of other vertices. In this paper, based on the determination of controllable and observable subspaces under the global minimum-cost condition, we introduce a novel direction-specific index, domination centrality, to assess the intervention capabilities of vertices in a directed network. Statistical studies demonstrate that the domination centrality is, to a great extent, encoded by the underlying network's degree distribution and that most network positions through which one can intervene in a system are vertices with high domination centrality rather than network hubs. To analyse the interaction and functional dependence between vertices when they are used to dominate a network, we define the domination similarity and detect significant functional modules in glossary and metabolic networks through clustering analysis. The experimental results provide strong evidence that our indices are effective and practical in accurately depicting the structure of directed networks.

18.
PLoS One ; 9(1): e87797, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24498199

RESUMO

Increasing evidence has indicated that long non-coding RNAs (lncRNAs) are implicated in and associated with many complex human diseases. Despite of the accumulation of lncRNA-disease associations, only a few studies had studied the roles of these associations in pathogenesis. In this paper, we investigated lncRNA-disease associations from a network view to understand the contribution of these lncRNAs to complex diseases. Specifically, we studied both the properties of the diseases in which the lncRNAs were implicated, and that of the lncRNAs associated with complex diseases. Regarding the fact that protein coding genes and lncRNAs are involved in human diseases, we constructed a coding-non-coding gene-disease bipartite network based on known associations between diseases and disease-causing genes. We then applied a propagation algorithm to uncover the hidden lncRNA-disease associations in this network. The algorithm was evaluated by leave-one-out cross validation on 103 diseases in which at least two genes were known to be involved, and achieved an AUC of 0.7881. Our algorithm successfully predicted 768 potential lncRNA-disease associations between 66 lncRNAs and 193 diseases. Furthermore, our results for Alzheimer's disease, pancreatic cancer, and gastric cancer were verified by other independent studies.


Assuntos
Algoritmos , Doença de Alzheimer/genética , Redes Reguladoras de Genes , Neoplasias Pancreáticas/genética , RNA Longo não Codificante/genética , RNA Neoplásico/genética , Neoplasias Gástricas/genética , Doença de Alzheimer/metabolismo , Humanos , Neoplasias Pancreáticas/metabolismo , RNA Longo não Codificante/metabolismo , RNA Neoplásico/metabolismo , Análise de Sequência de RNA , Neoplasias Gástricas/metabolismo
19.
BMC Syst Biol ; 7 Suppl 2: S8, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24565177

RESUMO

BACKGROUND: Protein-protein interactions (PPIs) are crucial in cellular processes. Since the current biological experimental techniques are time-consuming and expensive, and the results suffer from the problems of incompleteness and noise, developing computational methods and software tools to predict PPIs is necessary. Although several approaches have been proposed, the species supported are often limited and additional data like homologous interactions in other species, protein sequence and protein expression are often required. And predictive abilities of different features for different kinds of PPI data have not been studied. RESULTS: In this paper, we propose ppiPre, an open-source framework for PPI analysis and prediction using a combination of heterogeneous features including three GO-based semantic similarities, one KEGG-based co-pathway similarity and three topology-based similarities. It supports up to twenty species. Only the original PPI data and gold-standard PPI data are required from users. The experiments on binary and co-complex gold-standard yeast PPI data sets show that there exist big differences among the predictive abilities of different features on different kinds of PPI data sets. And the prediction performance on the two data sets shows that ppiPre is capable of handling PPI data in different kinds and sizes. ppiPre is implemented in the R language and is freely available on the CRAN (http://cran.r-project.org/web/packages/ppiPre/). CONCLUSIONS: We applied our framework to both binary and co-complex gold-standard PPI data sets. The detailed analysis on three GO aspects suggests that different GO aspects should be used on different kinds of data sets, and that combining all the three aspects of GO often gets the best result. The analysis also shows that using only features based solely on the topology of the PPI network can get a very good result when predicting the co-complex PPI data. ppiPre provides useful functions for analysing PPI data and can be used to predict PPIs for multiple species.


Assuntos
Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Ontologia Genética , Internet , Curva ROC , Software
20.
Proteome Sci ; 10 Suppl 1: S16, 2012 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-22759574

RESUMO

BACKGROUND: Network alignment is one of the most common biological network comparison methods. Aligning protein-protein interaction (PPI) networks of different species is of great important to detect evolutionary conserved pathways or protein complexes across species through the identification of conserved interactions, and to improve our insight into biological systems. Global network alignment (GNA) problem is NP-complete, for which only heuristic methods have been proposed so far. Generally, the current GNA methods fall into global heuristic seed-and-extend approaches. These methods can not get the best overall consistent alignment between networks for the opinionated local seed. Furthermore These methods are lost in maximizing the number of aligned edges between two networks without considering the original structures of functional modules. METHODS: We present a novel seed selection strategy for global network alignment by constructing the pairs of hub nodes of networks to be aligned into multiple seeds. Beginning from every hub seed and using the membership similarity of nodes to quantify to what extent the nodes can participate in functional modules associated with current seed topologically we align the networks by modules. By this way we can maintain the functional modules are not damaged during the heuristic alignment process. And our method is efficient in resolving the fatal problem of most conventional algorithms that the initialization selected seeds have a direct influence on the alignment result. The similarity measures between network nodes (e.g., proteins) include sequence similarity, centrality similarity, and dynamic membership similarity and our algorithm can be called Multiple Hubs-based Alignment (MHA). RESULTS: When applying our seed selection strategy to several pairs of real PPI networks, it is observed that our method is working to strike a balance, extending the conserved interactions while maintaining the functional modules unchanged. In the case study, we assess the effectiveness of MHA on the alignment of the yeast and fly PPI networks. Our method outperforms state-of-the-art algorithms at detecting conserved functional modules and retrieves in particular 86% more conserved interactions than IsoRank. CONCLUSIONS: We believe that our seed selection strategy will lead us to obtain more topologically and biologically similar alignment result. And it can be used as the reference and complement of other heuristic methods to seek more meaningful alignment results.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA