RESUMO
Protein complexes play a dominant role in cellular organization and function. Prediction of protein complexes from the network of physical interactions between proteins (PPI networks) has thus become one of the important research areas. Recently, many computational approaches have been developed to identify these complexes. Various performance assessment measures have been proposed for evaluating the efficiency of these methods. However, there are many inconsistencies in the definitions and usage of the measures across the literature. To address this issue, we have gathered and presented the most important performance evaluation measures and developed a tool, named CompEvaluator, to critically assess the protein complex prediction methods. The tool and documentation are publicly available at https://sourceforge.net/projects/compevaluator/files/.
Assuntos
Algoritmos , Biologia Computacional/métodos , Modelos Teóricos , Mapas de Interação de Proteínas , Proteínas/metabolismo , Animais , Estudos de Avaliação como Assunto , Humanos , Ligação Proteica , Proteínas/químicaRESUMO
BACKGROUND: Accurate identification of perturbed signaling pathways based on differentially expressed genes between sample groups is one of the key factors in the understanding of diseases and druggable targets. Most pathway analysis methods prioritize impacted signaling pathways by incorporating pathway topology using simple graph-based models. Despite their relative success, these models are limited in describing all types of dependencies and interactions that exist in biological pathways. RESULTS: In this work, we propose a new approach based on the formal modeling of signaling pathways. Signaling pathways are formally modeled, and then model checking tools are applied to find the likelihood of perturbation for each pathway in a given condition. By adopting formal methods, various complex interactions among biological parts are modeled, which can contribute to reducing the false-positive rate of the proposed approach. We have developed a tool named Formal model checking based pathway analysis (FoPA) based on this approach. FoPA is compared with three well-known pathway analysis methods: PADOG, CePa, and SPIA on the benchmark of 36 GEO datasets from various diseases by applying the target pathway technique. This validation technique eliminates the need for possibly biased human assessments of results. In the cases that, there is no apriori knowledge of all relevant pathways, simulated false inputs (permuted class labels and decoy pathways) are chosen as a set of negative controls to test the false positive rate of the methods. Finally, to further evaluate the efficiency of FoPA, it is applied to a list of autism-related genes. CONCLUSIONS: The results obtained by the target pathway technique demonstrate that FoPA is able to prioritize target pathways as well as PADOG but better than CePa and SPIA. Also, the false-positive rate of finding significant pathways using FoPA is lower than other compared methods. Also, FoPA can detect more consistent relevant pathways than other methods. The results of FoPA on autism-related genes highlight the role of "Renin-angiotensin system" pathway. This pathway has been supposed to have a pivotal role in some neurodegenerative diseases, while little attention has been paid to its impact on autism development so far.
Assuntos
Transdução de Sinais , Software , Transtorno Autístico/genética , Viés , Neoplasias Colorretais/metabolismo , Bases de Dados como Assunto , Reações Falso-Positivas , Humanos , Modelos Teóricos , Transdução de Sinais/genéticaRESUMO
Finding the causal relation between a gene and a disease using experimental approaches is a time-consuming and expensive task. However, computational approaches are cost-efficient methods for identifying candidate genes. This article proposes a new heterogeneous biological network embedding approach, named NetEM, to identify disease-associated genes. To evaluate NetEM, we examine six complex diseases, including peroxisomal disorders, sarcoma, grave's disease, lysosomal storage diseases, blood coagulation disorders, and cardiomyopathy hypertrophic. Our experiments indicate that NetEM outperforms three well-known state-of-the-art algorithms: Cardigan, DIAMOnD and GeneWanderer, in identifying disease genes. We examine TCGA data of Invasive Lobular Breast Cancer and CPTAC data of human glioblastoma as other case studies to evaluate NetEM using real data. This evaluation also indicates the validity of the method. The source codes of NetEM and data are available in the supplementary of this article.
Assuntos
Glioblastoma , Sarcoma , Humanos , Algoritmos , Biologia ComputacionalRESUMO
Perturbation in the normal function of the cell signaling pathways often leads to diseases. One of the factors that help understand the mechanism of diseases is the precise identification and investigation of perturbed signaling pathways. Pathway analysis methods have been developed as their purpose is to identify perturbed signaling pathways in given conditions. Among these methods, some consider the pathways topologies in their analysis, which are referred to as topology-based methods. Most of the topology-based methods used simple graph-based models to incorporate topology in their analysis, which have some limitations. We describe a new Pathway Analysis method using Petri net (PAPet) that uses the Petri net to model the signaling pathways and then propose an algorithm to measure the perturbation on a given pathway under a given condition. Modeling with Petri net has some advantages and could overcome the shortcomings of the simple graph-based models. We illustrate the capabilities of the proposed method using sensitivity, prioritization, mean reciprocal rank, and false-positive rate metrics on 36 real datasets from various diseases. The results of comparing PAPet with five pathway analysis methods FoPA, PADOG, GSEA, CePa and SPIA show that PAPet is the best one that provides a good compromise between all metrics. In addition, the results of applying methods to gene expression profiles in normal and Pancreatic Ductal Adenocarcinoma cancer (PDAC) samples show that the PAPet method achieves the best rank among others in finding the pathways that have been previously reported for PDAC. The PAPet method is available at https://github.com/fmansoori/PAPET.
Assuntos
Neoplasias Pancreáticas , Algoritmos , Humanos , Neoplasias Pancreáticas/genética , Transdução de SinaisRESUMO
The entities of real-world networks are connected via different types of connections (i.e., layers). The task of link prediction in multiplex networks is about finding missing connections based on both intra-layer and inter-layer correlations. Our observations confirm that in a wide range of real-world multiplex networks, from social to biological and technological, a positive correlation exists between connection probability in one layer and similarity in other layers. Accordingly, a similarity-based automatic general-purpose multiplex link prediction method-SimBins-is devised that quantifies the amount of connection uncertainty based on observed inter-layer correlations in a multiplex network. Moreover, SimBins enhances the prediction quality in the target layer by incorporating the effect of link overlap across layers. Applying SimBins to various datasets from diverse domains, our findings indicate that SimBins outperforms the compared methods (both baseline and state-of-the-art methods) in most instances when predicting links. Furthermore, it is discussed that SimBins imposes minor computational overhead to the base similarity measures making it a potentially fast method, suitable for large-scale multiplex networks.
RESUMO
Networks are invaluable tools to study real biological, social and technological complex systems in which connected elements form a purposeful phenomenon. A higher resolution image of these systems shows that the connection types do not confine to one but to a variety of types. Multiplex networks encode this complexity with a set of nodes which are connected in different layers via different types of links. A large body of research on link prediction problem is devoted to finding missing links in single-layer (simplex) networks. In recent years, the problem of link prediction in multiplex networks has gained the attention of researchers from different scientific communities. Although most of these studies suggest that prediction performance can be enhanced by using the information contained in different layers of the network, the exact source of this enhancement remains obscure. Here, it is shown that similarity w.r.t. structural features (eigenvectors) is a major source of enhancements for link prediction task in multiplex networks using the proposed layer reconstruction method and experiments on real-world multiplex networks from different disciplines. Moreover, we characterize how low values of similarity w.r.t. structural features result in cases where improving prediction performance is substantially hard.
RESUMO
Protein-protein interactions (PPIs) are important for understanding the cellular mechanisms of biological functions, but the reliability of PPIs extracted by high-throughput assays is known to be low. To address this, many current methods use multiple evidence from different sources of information to compute reliability scores for such PPIs. However, they often combine the evidence without taking into account the uncertainty of the evidence values, potential dependencies between the information sources used and missing values from some information sources. We propose to formulate the task of scoring PPIs using multiple information sources as a multi-criteria decision making problem that can be solved using data fusion to model potential interactions between the multiple information sources. Using data fusion, the amount of contribution from each information source can be proportioned accordingly to systematically score the reliability of PPIs. Our experimental results showed that the reliability scores assigned by our data fusion method can effectively classify highly reliable PPIs from multiple information sources, with substantial improvement in scoring over conventional approach such as the Adjust CD-Distance approach. In addition, the underlying interactions between the information sources used, as well as their relative importance, can also be determined with our data fusion approach. We also showed that such knowledge can be used to effectively handle missing values from information sources.
Assuntos
Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Tomada de Decisões Assistida por Computador , Expressão Gênica , Ensaios de Triagem em Larga Escala , Reprodutibilidade dos TestesRESUMO
Graph clustering algorithms are widely used in the analysis of biological networks. Extracting functional modules in protein-protein interaction (PPI) networks is one such use. Most clustering algorithms whose focuses are on finding functional modules try either to find a clique like sub networks or to grow clusters starting from vertices with high degrees as seeds. These algorithms do not make any difference between a biological network and any other networks. In the current research, we present a new procedure to find functional modules in PPI networks. Our main idea is to model a biological concept and to use this concept for finding good functional modules in PPI networks. In order to evaluate the quality of the obtained clusters, we compared the results of our algorithm with those of some other widely used clustering algorithms on three high throughput PPI networks from Sacchromyces Cerevisiae, Homo sapiens and Caenorhabditis elegans as well as on some tissue specific networks. Gene Ontology (GO) analyses were used to compare the results of different algorithms. Each algorithm's result was then compared with GO-term derived functional modules. We also analyzed the effect of using tissue specific networks on the quality of the obtained clusters. The experimental results indicate that the new algorithm outperforms most of the others, and this improvement is more significant when tissue specific networks are used.