Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
Entropy (Basel) ; 25(6)2023 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-37372185

RESUMO

Identifying the driver genes of cancer progression is of great significance in improving our understanding of the causes of cancer and promoting the development of personalized treatment. In this paper, we identify the driver genes at the pathway level via an existing intelligent optimization algorithm, named the Mouth Brooding Fish (MBF) algorithm. Many methods based on the maximum weight submatrix model to identify driver pathways attach equal importance to coverage and exclusivity and assign them equal weight, but those methods ignore the impact of mutational heterogeneity. Here, we use principal component analysis (PCA) to incorporate covariate data to reduce the complexity of the algorithm and construct a maximum weight submatrix model considering different weights of coverage and exclusivity. Using this strategy, the unfavorable effect of mutational heterogeneity is overcome to some extent. Data involving lung adenocarcinoma and glioblastoma multiforme were tested with this method and the results compared with the MDPFinder, Dendrix, and Mutex methods. When the driver pathway size was 10, the recognition accuracy of the MBF method reached 80% in both datasets, and the weight values of the submatrix were 1.7 and 1.89, respectively, which are better than those of the compared methods. At the same time, in the signal pathway enrichment analysis, the important role of the driver genes identified by our MBF method in the cancer signaling pathway is revealed, and the validity of these driver genes is demonstrated from the perspective of their biological effects.

2.
BMC Bioinformatics ; 23(1): 199, 2022 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-35637427

RESUMO

BACKGROUND: The accurate characterization of protein functions is critical to understanding life at the molecular level and has a huge impact on biomedicine and pharmaceuticals. Computationally predicting protein function has been studied in the past decades. Plagued by noise and errors in protein-protein interaction (PPI) networks, researchers have undertaken to focus on the fusion of multi-omics data in recent years. A data model that appropriately integrates network topologies with biological data and preserves their intrinsic characteristics is still a bottleneck and an aspirational goal for protein function prediction. RESULTS: In this paper, we propose the RWRT (Random Walks with Restart on Tensor) method to accomplish protein function prediction by applying bi-random walks on the tensor. RWRT firstly constructs a functional similarity tensor by combining protein interaction networks with multi-omics data derived from domain annotation and protein complex information. After this, RWRT extends the bi-random walks algorithm from a two-dimensional matrix to the tensor for scoring functional similarity between proteins. Finally, RWRT filters out possible pretenders based on the concept of cohesiveness coefficient and annotates target proteins with functions of the remaining functional partners. Experimental results indicate that RWRT performs significantly better than the state-of-the-art methods and improves the area under the receiver-operating curve (AUROC) by no less than 18%. CONCLUSIONS: The functional similarity tensor offers us an alternative, in that it is a collection of networks sharing the same nodes; however, the edges belong to different categories or represent interactions of different nature. We demonstrate that the tensor-based random walk model can not only discover more partners with similar functions but also free from the constraints of errors in protein interaction networks effectively. We believe that the performance of function prediction depends greatly on whether we can extract and exploit proper functional similarity information on protein correlations.


Assuntos
Algoritmos , Mapas de Interação de Proteínas
3.
BMC Bioinformatics ; 23(1): 493, 2022 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-36401161

RESUMO

BACKGROUND: Accurate annotation of protein function is the key to understanding life at the molecular level and has great implications for biomedicine and pharmaceuticals. The rapid developments of high-throughput technologies have generated huge amounts of protein-protein interaction (PPI) data, which prompts the emergence of computational methods to determine protein function. Plagued by errors and noises hidden in PPI data, these computational methods have undertaken to focus on the prediction of functions by integrating the topology of protein interaction networks and multi-source biological data. Despite effective improvement of these computational methods, it is still challenging to build a suitable network model for integrating multiplex biological data. RESULTS: In this paper, we constructed a heterogeneous biological network by initially integrating original protein interaction networks, protein-domain association data and protein complexes. To prove the effectiveness of the heterogeneous biological network, we applied the propagation algorithm on this network, and proposed a novel iterative model, named Propagate on Heterogeneous Biological Networks (PHN) to score and rank functions in descending order from all functional partners, Finally, we picked out top L of these predicted functions as candidates to annotate the target protein. Our comprehensive experimental results demonstrated that PHN outperformed seven other competing approaches using cross-validation. Experimental results indicated that PHN performs significantly better than competing methods and improves the Area Under the Receiver-Operating Curve (AUROC) in Biological Process (BP), Molecular Function (MF) and Cellular Components (CC) by no less than 33%, 15% and 28%, respectively. CONCLUSIONS: We demonstrated that integrating multi-source data into a heterogeneous biological network can preserve the complex relationship among multiplex biological data and improve the prediction accuracy of protein function by getting rid of the constraints of errors in PPI networks effectively. PHN, our proposed method, is effective for protein function prediction.


Assuntos
Algoritmos , Mapeamento de Interação de Proteínas , Mapeamento de Interação de Proteínas/métodos , Anotação de Sequência Molecular , Mapas de Interação de Proteínas , Proteínas/metabolismo
4.
Hum Genomics ; 14(1): 14, 2020 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-32252824

RESUMO

BACKGROUND: Essential proteins are an important part of the cell and closely related to the life activities of the cell. Hitherto, Protein-Protein Interaction (PPI) networks have been adopted by many computational methods to predict essential proteins. Most of the current approaches focus mainly on the topological structure of PPI networks. However, those methods relying solely on the PPI network have low detection accuracy for essential proteins. Therefore, it is necessary to integrate the PPI network with other biological information to identify essential proteins. RESULTS: In this paper, we proposed a novel random walk method for identifying essential proteins, called HEPT. A three-dimensional tensor is constructed first by combining the PPI network of Saccharomyces cerevisiae with multiple biological data such as gene ontology annotations and protein domains. Then, based on the newly constructed tensor, we extended the Hyperlink-Induced Topic Search (HITS) algorithm from a two-dimensional to a three-dimensional tensor model that can be utilized to infer essential proteins. Different from existing state-of-the-art methods, the importance of proteins and the types of interactions will both contribute to the essential protein prediction. To evaluate the performance of our newly proposed HEPT method, proteins are ranked in the descending order based on their ranking scores computed by our method and other competitive methods. After that, a certain number of the ranked proteins are selected as candidates for essential proteins. According to the list of known essential proteins, the number of true essential proteins is used to judge the performance of each method. Experimental results show that our method can achieve better prediction performance in comparison with other nine state-of-the-art methods in identifying essential proteins. CONCLUSIONS: Through analysis and experimental results, it is obvious that HEPT can be used to effectively improve the prediction accuracy of essential proteins by the use of HITS algorithm and the combination of network topology with gene ontology annotations and protein domains, which provides a new insight into multi-data source fusion.


Assuntos
Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Mapas de Interação de Proteínas , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo
5.
BMC Bioinformatics ; 21(1): 355, 2020 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-32787776

RESUMO

BACKGROUND: The accurate annotation of protein functions is of great significance in elucidating the phenomena of life, treating disease and developing new medicines. Various methods have been developed to facilitate the prediction of these functions by combining protein interaction networks (PINs) with multi-omics data. However, it is still challenging to make full use of multiple biological to improve the performance of functions annotation. RESULTS: We presented NPF (Network Propagation for Functions prediction), an integrative protein function predicting framework assisted by network propagation and functional module detection, for discovering interacting partners with similar functions to target proteins. NPF leverages knowledge of the protein interaction network architecture and multi-omics data, such as domain annotation and protein complex information, to augment protein-protein functional similarity in a propagation manner. We have verified the great potential of NPF for accurately inferring protein functions. According to the comprehensive evaluation of NPF, it delivered a better performance than other competing methods in terms of leave-one-out cross-validation and ten-fold cross validation. CONCLUSIONS: We demonstrated that network propagation, together with multi-omics data, can both discover more partners with similar function, and is unconstricted by the "small-world" feature of protein interaction networks. We conclude that the performance of function prediction depends greatly on whether we can extract and exploit proper functional information of similarity from protein correlations.


Assuntos
Algoritmos , Biologia Computacional/métodos , Mapas de Interação de Proteínas , Análise por Conglomerados , Ontologia Genética , Ligação Proteica , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo
6.
BMC Bioinformatics ; 20(1): 626, 2019 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-31795943

RESUMO

BACKGROUND: In recent years, lncRNAs (long-non-coding RNAs) have been proved to be closely related to the occurrence and development of many serious diseases that are seriously harmful to human health. However, most of the lncRNA-disease associations have not been found yet due to high costs and time complexity of traditional bio-experiments. Hence, it is quite urgent and necessary to establish efficient and reasonable computational models to predict potential associations between lncRNAs and diseases. RESULTS: In this manuscript, a novel prediction model called TCSRWRLD is proposed to predict potential lncRNA-disease associations based on improved random walk with restart. In TCSRWRLD, a heterogeneous lncRNA-disease network is constructed first by combining the integrated similarity of lncRNAs and the integrated similarity of diseases. And then, for each lncRNA/disease node in the newly constructed heterogeneous lncRNA-disease network, it will establish a node set called TCS (Target Convergence Set) consisting of top 100 disease/lncRNA nodes with minimum average network distances to these disease/lncRNA nodes having known associations with itself. Finally, an improved random walk with restart is implemented on the heterogeneous lncRNA-disease network to infer potential lncRNA-disease associations. The major contribution of this manuscript lies in the introduction of the concept of TCS, based on which, the velocity of convergence of TCSRWRLD can be quicken effectively, since the walker can stop its random walk while the walking probability vectors obtained by it at the nodes in TCS instead of all nodes in the whole network have reached stable state. And Simulation results show that TCSRWRLD can achieve a reliable AUC of 0.8712 in the Leave-One-Out Cross Validation (LOOCV), which outperforms previous state-of-the-art results apparently. Moreover, case studies of lung cancer and leukemia demonstrate the satisfactory prediction performance of TCSRWRLD as well. CONCLUSIONS: Both comparative results and case studies have demonstrated that TCSRWRLD can achieve excellent performances in prediction of potential lncRNA-disease associations, which imply as well that TCSRWRLD may be a good addition to the research of bioinformatics in the future.


Assuntos
Algoritmos , Biologia Computacional/métodos , Estudos de Associação Genética , Predisposição Genética para Doença , RNA Longo não Codificante/genética , Área Sob a Curva , Humanos , Neoplasias/genética , Probabilidade , RNA Longo não Codificante/metabolismo , Reprodutibilidade dos Testes
7.
BMC Bioinformatics ; 20(1): 355, 2019 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-31234779

RESUMO

BACKGROUND: Essential proteins are distinctly important for an organism's survival and development and crucial to disease analysis and drug design as well. Large-scale protein-protein interaction (PPI) data sets exist in Saccharomyces cerevisiae, which provides us with a valuable opportunity to predict identify essential proteins from PPI networks. Many network topology-based computational methods have been designed to detect essential proteins. However, these methods are limited by the completeness of available PPI data. To break out of these restraints, some computational methods have been proposed by integrating PPI networks and multi-source biological data. Despite the progress in the research of multiple data fusion, it is still challenging to improve the prediction accuracy of the computational methods. RESULTS: In this paper, we design a novel iterative model for essential proteins prediction, named Randomly Walking in the Heterogeneous Network (RWHN). In RWHN, a weighted protein-protein interaction network and a domain-domain association network are constructed according to the original PPI network and the known protein-domain association network, firstly. And then, we establish a new heterogeneous matrix by combining the two constructed networks with the protein-domain association network. Based on the heterogeneous matrix, a transition probability matrix is established by normalized operation. Finally, an improved PageRank algorithm is adopted on the heterogeneous network for essential proteins prediction. In order to eliminate the influence of the false negative, information on orthologous proteins and the subcellular localization information of proteins are integrated to initialize the score vector of proteins. In RWHN, the topology, conservative and functional features of essential proteins are all taken into account in the prediction process. The experimental results show that RWHN obviously exceeds in predicting essential proteins ten other competing methods. CONCLUSIONS: We demonstrated that integrating multi-source data into a heterogeneous network can preserve the complex relationship among multiple biological data and improve the prediction accuracy of essential proteins. RWHN, our proposed method, is effective for the prediction of essential proteins.


Assuntos
Mapeamento de Interação de Proteínas/métodos , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Algoritmos , Domínios Proteicos , Mapas de Interação de Proteínas , Proteínas de Saccharomyces cerevisiae/química
8.
Hum Genomics ; 10 Suppl 2: 17, 2016 07 25.
Artigo em Inglês | MEDLINE | ID: mdl-27461193

RESUMO

BACKGROUND: Protein complexes play an important role in biological processes. Recent developments in experiments have resulted in the publication of many high-quality, large-scale protein-protein interaction (PPI) datasets, which provide abundant data for computational approaches to the prediction of protein complexes. However, the precision of protein complex prediction still needs to be improved due to the incompletion and noise in PPI networks. RESULTS: There exist complex and diverse relationships among proteins after integrating multiple sources of biological information. Considering that the influences of different types of interactions are not the same weight for protein complex prediction, we construct a multi-relationship protein interaction network (MPIN) by integrating PPI network topology with gene ontology annotation information. Then, we design a novel algorithm named MINE (identifying protein complexes based on Multi-relationship protein Interaction NEtwork) to predict protein complexes with high cohesion and low coupling from MPIN. CONCLUSIONS: The experiments on yeast data show that MINE outperforms the current methods in terms of both accuracy and statistical significance.


Assuntos
Algoritmos , Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas , Ontologia Genética , Proteínas/genética , Proteínas/metabolismo , Reprodutibilidade dos Testes
9.
Hum Genomics ; 10(1): 33, 2016 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-27678214

RESUMO

BACKGROUND: Accurate annotation of protein functions is still a big challenge for understanding life in the post-genomic era. Many computational methods based on protein-protein interaction (PPI) networks have been proposed to predict the function of proteins. However, the precision of these predictions still needs to be improved, due to the incompletion and noise in PPI networks. Integrating network topology and biological information could improve the accuracy of protein function prediction and may also lead to the discovery of multiple interaction types between proteins. Current algorithms generate a single network, which is archived using a weighted sum of all types of protein interactions. METHOD: The influences of different types of interactions on the prediction of protein functions are not the same. To address this, we construct multilayer protein networks (MPN) by integrating PPI networks, the domain of proteins, and information on protein complexes. In the MPN, there is more than one type of connections between pairwise proteins. Different types of connections reflect different roles and importance in protein function prediction. Based on the MPN, we propose a new protein function prediction method, named function prediction based on multilayer protein networks (FP-MPN). Given an un-annotated protein, the FP-MPN method visits each layer of the MPN in turn and generates a set of candidate neighbors with known functions. A set of predicted functions for the testing protein is then formed and all of these functions are scored and sorted. Each layer plays different importance on the prediction of protein functions. A number of top-ranking functions are selected to annotate the unknown protein. CONCLUSIONS: The method proposed in this paper was a better predictor when used on Saccharomyces cerevisiae protein data than other function prediction methods previously used. The proposed FP-MPN method takes different roles of connections in protein function prediction into account to reduce the artificial noise by introducing biological information.

10.
Methods ; 110: 54-63, 2016 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-27402354

RESUMO

Essential proteins are indispensable for the survival of a living organism and play important roles in the emerging field of synthetic biology. Many computational methods have been proposed to identify essential proteins by using the topological features of interactome networks. However, most of these methods ignored intrinsic biological meaning of proteins. Researches show that essentiality is tied not only to the protein or gene itself, but also to the molecular modules to which that protein belongs. The results of this study reveal the modularity of essential proteins. On the other hand, essential proteins are more evolutionarily conserved than nonessential proteins and frequently bind each other. That is to say, conservatism is another important feature of essential proteins. Multiple networks are constructed by integrating protein-protein interaction (PPI) networks, time course gene expression data and protein domain information. Based on these networks, a new essential protein identification method is proposed based on a combination of modularity and conservatism of proteins. Experimental results show that the proposed method outperforms other essential protein identification methods in terms of a number essential protein out of top ranked candidates.


Assuntos
Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas/genética , Algoritmos , Regulação da Expressão Gênica/genética
11.
Sensors (Basel) ; 16(12)2016 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-27916948

RESUMO

Human activity recognition, tracking and classification is an essential trend in assisted living systems that can help support elderly people with their daily activities. Traditional activity recognition approaches depend on vision-based or sensor-based techniques. Nowadays, a novel promising technique has obtained more attention, namely device-free human activity recognition that neither requires the target object to wear or carry a device nor install cameras in a perceived area. The device-free technique for activity recognition uses only the signals of common wireless local area network (WLAN) devices available everywhere. In this paper, we present a novel elderly activities recognition system by leveraging the fluctuation of the wireless signals caused by human motion. We present an efficient method to select the correct data from the Channel State Information (CSI) streams that were neglected in previous approaches. We apply a Principle Component Analysis method that exposes the useful information from raw CSI. Thereafter, Forest Decision (FD) is adopted to classify the proposed activities and has gained a high accuracy rate. Extensive experiments have been conducted in an indoor environment to test the feasibility of the proposed system with a total of five volunteer users. The evaluation shows that the proposed system is applicable and robust to electromagnetic noise.


Assuntos
Técnicas Biossensoriais/métodos , Tecnologia sem Fio , Atividades Cotidianas , Idoso , Algoritmos , Humanos , Internet , Reconhecimento Automatizado de Padrão , Análise de Componente Principal
12.
Math Biosci Eng ; 19(6): 6331-6343, 2022 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-35603404

RESUMO

High throughput biological experiments are expensive and time consuming. For the past few years, many computational methods based on biological information have been proposed and widely used to understand the biological background. However, the processing of biological information data inevitably produces false positive and false negative data, such as the noise in the Protein-Protein Interaction (PPI) networks and the noise generated by the integration of a variety of biological information. How to solve these noise problems is the key role in essential protein predictions. An Identifying Essential Proteins model based on non-negative Matrix Symmetric tri-Factorization and multiple biological information (IEPMSF) is proposed in this paper, which utilizes only the PPI network proteins common neighbor characters to develop a weighted network, and uses the non-negative matrix symmetric tri-factorization method to find more potential interactions between proteins in the network so as to optimize the weighted network. Then, using the subcellular location and lineal homology information, the starting score of proteins is determined, and the random walk algorithm with restart mode is applied to the optimized network to mark and rank each protein. We tested the suggested forecasting model against current representative approaches using a public database. Experiment shows high efficiency of new method in essential proteins identification. The effectiveness of this method shows that it can dramatically solve the noise problems that existing in the multi-source biological information itself and cased by integrating them.


Assuntos
Biologia Computacional , Mapeamento de Interação de Proteínas , Algoritmos , Mapas de Interação de Proteínas , Proteínas
13.
PeerJ ; 8: e8316, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31915586

RESUMO

BACKGROUND: There is no criterion to distinguish synchronous and non-synchronous multiple primary cutaneous melanomas (MPMs). This study aimed to distinguish synchronous and non-synchronous MPMs and compare the survivals of them using the Surveillance, Epidemiology, and End Results database. METHODS: Synchronous and non-synchronous MPMs were distinguished by fitting the double log transformed distribution of the time interval between the first and second primary cutaneous melanomas (TIFtS) through a piecewise linear regression. The overall and melanoma-specific survivals were compared by the Kaplan-Meier method and Cox proportional hazard model through modeling the occurrence of synchronous MPMs as a time-dependent variable. RESULTS: The distribution of TIFtS was composed by three power-law distributions. According to its first inflection point, synchronous MPMs were defined as tumors that occurred within 2 months. The Kaplain-Meier plot revealed a significant inferior survival for synchronous MPMs than non-synchronous MPMs (P < 0.0001), and the occurrence of synchronous MPM was a risk factor for overall survival of cutaneous melanoma (CM) (hazard ratio: 2.213; (95% CI [2.087-2.346]); P < 0.0001). CONCLUSIONS: This study provided data analysis evidences for using 2 months to distinguish synchronous MPMs and non-synchronous MPMs. Furthermore, the occurrence of synchronous MPM was a risk factor for prognosis of patients with CM.

14.
Front Genet ; 11: 343, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32373163

RESUMO

The identification of essential proteins can help in understanding the minimum requirements for cell survival and development. Ever-increasing amounts of high-throughput data provide us with opportunities to detect essential proteins from protein interaction networks (PINs). Existing network-based approaches are limited by the poor quality of the underlying PIN data, which exhibits high rates of false positive and false negative results. To overcome this problem, researchers have focused on the prediction of essential proteins by combining PINs with other biological data, which has led to the emergence of various interactions between proteins. It remains challenging, however, to use aggregated multiplex interactions within a single analysis framework to identify essential proteins. In this study, we created a multiplex biological network (MON) by initially integrating PINs, protein domains, and gene expression profiles. Next, we proposed a new approach to discover essential proteins by extending the random walk with restart algorithm to the tensor, which provides a data model representation of the MON. In contrast to existing approaches, the proposed MON approach considers for the importance of nodes and the different types of interactions between proteins during the iteration. MON was implemented to identify essential proteins within two yeast PINs. Our comprehensive experimental results demonstrated that MON outperformed 11 other state-of-the-art approaches in terms of precision-recall curve, jackknife curve, and other criteria.

15.
Front Microbiol ; 10: 676, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31024478

RESUMO

The survival of human beings is inseparable from microbes. More and more studies have proved that microbes can affect human physiological processes in various aspects and are closely related to some human diseases. In this paper, based on known microbe-disease associations, a bidirectional weighted network was constructed by integrating the schemes of normalized Gaussian interactions and bidirectional recommendations firstly. And then, based on the newly constructed bidirectional network, a computational model called BWNMHMDA was developed to predict potential relationships between microbes and diseases. Finally, in order to evaluate the superiority of the new prediction model BWNMHMDA, the framework of LOOCV and 5-fold cross validation were implemented, and simulation results indicated that BWNMHMDA could achieve reliable AUCs of 0.9127 and 0.8967 ± 0.0027 in these two different frameworks respectively, which is outperformed some state-of-the-art methods. Moreover, case studies of asthma, colorectal carcinoma, and chronic obstructive pulmonary disease were implemented to further estimate the performance of BWNMHMDA. Experimental results showed that there are 10, 9, and 8 out of the top 10 predicted microbes having been confirmed by related literature in these three kinds of case studies separately, which also demonstrated that our new model BWNMHMDA could achieve satisfying prediction performance.

16.
Genes (Basel) ; 10(2)2019 02 08.
Artigo em Inglês | MEDLINE | ID: mdl-30744078

RESUMO

Recently, an increasing number of studies have indicated that long-non-coding RNAs (lncRNAs) can participate in various crucial biological processes and can also be used as the most promising biomarkers for the treatment of certain diseases such as coronary artery disease and various cancers. Due to costs and time complexity, the number of possible disease-related lncRNAs that can be verified by traditional biological experiments is very limited. Therefore, in recent years, it has been very popular to use computational models to predict potential disease-lncRNA associations. In this study, we constructed three kinds of association networks, namely the lncRNA-miRNA association network, the miRNA-disease association network, and the lncRNA-disease correlation network firstly. Then, through integrating these three newly constructed association networks, we constructed an lncRNA-disease weighted association network, which would be further updated by adopting the KNN algorithm based on the semantic similarity of diseases and the similarity of lncRNA functions. Thereafter, according to the updated lncRNA-disease weighted association network, a novel computational model called PMFILDA was proposed to infer potential lncRNA-disease associations based on the probability matrix decomposition. Finally, to evaluate the superiority of the new prediction model PMFILDA, we performed Leave One Out Cross-Validation (LOOCV) based on strongly validated data filtered from MNDR and the simulation results indicated that the performance of PMFILDA was better than some state-of-the-art methods. Moreover, case studies of breast cancer, lung cancer, and colorectal cancer were implemented to further estimate the performance of PMFILDA, and simulation results illustrated that PMFILDA could achieve satisfying prediction performance as well.


Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , RNA Longo não Codificante/genética , Software , Humanos
17.
Curr Protein Pept Sci ; 18(11): 1120-1131, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28474566

RESUMO

Predicting functions of proteins is a key issue in the post-genomic era. Some experimental methods have been designed to predict protein functions. However, these methods cannot accommodate the vast amount of sequence data due to their inherent difficulty and expense. To address these problems, a lot of computational methods have been proposed to predict the function of proteins. In this paper, we provide a comprehensive survey of the current techniques for computational prediction of protein functions. We begin with introducing the formal description of protein function prediction and evaluation of prediction methods. We then focus on the various approaches available in categories of supervised and unsupervised methods for predicting protein functions. Finally, we discuss challenges and future works in this field.


Assuntos
Biologia Computacional/métodos , Redes Neurais de Computação , Proteínas/fisiologia , Máquina de Vetores de Suporte , Árvores de Decisões , Humanos , Aprendizado de Máquina , Mapeamento de Interação de Proteínas , Proteínas/química
18.
IEEE Trans Nanobioscience ; 15(2): 131-9, 2016 03.
Artigo em Inglês | MEDLINE | ID: mdl-26955047

RESUMO

Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of proteins can only be annotated computationally. Under new conditions or stimuli, not only the number and location of proteins would be changed, but also their interactions. This dynamic feature of protein interactions, however, was not considered in the existing function prediction algorithms. Taking the dynamic nature of protein interactions into consideration, we construct a dynamic weighted interactome network (DWIN) by integrating protein-protein interaction (PPI) network and time course gene expression data, as well as proteins' domain information and protein complex information. Then, we propose a new prediction approach that predicts protein functions from the constructed dynamic weighted interactome network. For an unknown protein, the proposed method visits dynamic networks at different time points and scores functions derived from all neighbors. Finally, the method selects top N functions from these ranked candidate functions to annotate the testing protein. Experiments on PPI datasets were conducted to evaluate the effectiveness of the proposed approach in predicting unknown protein functions. The evaluation results demonstrated that the proposed method outperforms other competing methods.


Assuntos
Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas/fisiologia , Algoritmos , Bases de Dados de Proteínas , Proteínas de Saccharomyces cerevisiae/fisiologia
19.
Artigo em Inglês | MEDLINE | ID: mdl-26357088

RESUMO

Protein complexes play a significant role in understanding the underlying mechanism of most cellular functions. Recently, many researchers have explored computational methods to identify protein complexes from protein-protein interaction (PPI) networks. One group of researchers focus on detecting local dense subgraphs which correspond to protein complexes by considering local neighbors. The drawback of this kind of approach is that the global information of the networks is ignored. Some methods such as Markov Clustering algorithm (MCL), PageRank-Nibble are proposed to find protein complexes based on random walk technique which can exploit the global structure of networks. However, these methods ignore the inherent core-attachment structure of protein complexes and treat adjacent node equally. In this paper, we design a weighted PageRank-Nibble algorithm which assigns each adjacent node with different probability, and propose a novel method named WPNCA to detect protein complex from PPI networks by using weighted PageRank-Nibble algorithm and core-attachment structure. Firstly, WPNCA partitions the PPI networks into multiple dense clusters by using weighted PageRank-Nibble algorithm. Then the cores of these clusters are detected and the rest of proteins in the clusters will be selected as attachments to form the final predicted protein complexes. The experiments on yeast data show that WPNCA outperforms the existing methods in terms of both accuracy and p-value. The software for WPNCA is available at "http://netlab.csu.edu.cn/bioinfomatics/weipeng/WPNCA/download.html".


Assuntos
Algoritmos , Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Bases de Dados de Proteínas , Proteínas/química , Software
20.
IEEE Trans Nanobioscience ; 13(4): 415-24, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25122840

RESUMO

Many computational methods have been proposed to identify essential proteins by using the topological features of interactome networks. However, the precision of essential protein discovery still needs to be improved. Researches show that majority of hubs (essential proteins) in the yeast interactome network are essential due to their involvement in essential complex biological modules and hubs can be classified into two categories: date hubs and party hubs. In this study, combining with gene expression profiles, we propose a new method to predict essential proteins based on overlapping essential modules, named POEM. In POEM, the original protein interactome network is partitioned into many overlapping essential modules. The frequencies and weighted degrees of proteins in these modules are employed to decide which categories does a protein belong to? The comparative results show that POEM outperforms the classical centrality measures: Degree Centrality (DC), Information Centrality (IC), Eigenvector Centrality (EC), Subgraph Centrality (SC), Betweenness Centrality (BC), Closeness Centrality (CC), Edge Clustering Coefficient Centrality (NC), and two newly proposed essential proteins prediction methods: PeC and CoEWC. Experimental results indicate that the precision of predicting essential proteins can be improved by considering the modularity of proteins and integrating gene expression profiles with network topological features.


Assuntos
Algoritmos , Regulação da Expressão Gênica/fisiologia , Metaboloma/fisiologia , Modelos Biológicos , Mapeamento de Interação de Proteínas/métodos , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Animais , Simulação por Computador , Humanos , Software
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa