Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36242566

RESUMO

MOTIVATION: Discovering the drug-target interactions (DTIs) is a crucial step in drug development such as the identification of drug side effects and drug repositioning. Since identifying DTIs by web-biological experiments is time-consuming and costly, many computational-based approaches have been proposed and have become an efficient manner to infer the potential interactions. Although extensive effort is invested to solve this task, the prediction accuracy still needs to be improved. More especially, heterogeneous network-based approaches do not fully consider the complex structure and rich semantic information in these heterogeneous networks. Therefore, it is still a challenge to predict DTIs efficiently. RESULTS: In this study, we develop a novel method via Multiview heterogeneous information network embedding with Hierarchical Attention mechanisms to discover potential Drug-Target Interactions (MHADTI). Firstly, MHADTI constructs different similarity networks for drugs and targets by utilizing their multisource information. Combined with the known DTI network, three drug-target heterogeneous information networks (HINs) with different views are established. Secondly, MHADTI learns embeddings of drugs and targets from multiview HINs with hierarchical attention mechanisms, which include the node-level, semantic-level and graph-level attentions. Lastly, MHADTI employs the multilayer perceptron to predict DTIs with the learned deep feature representations. The hierarchical attention mechanisms could fully consider the importance of nodes, meta-paths and graphs in learning the feature representations of drugs and targets, which makes their embeddings more comprehensively. Extensive experimental results demonstrate that MHADTI performs better than other SOTA prediction models. Moreover, analysis of prediction results for some interested drugs and targets further indicates that MHADTI has advantages in discovering DTIs. AVAILABILITY AND IMPLEMENTATION: https://github.com/pxystudy/MHADTI.


Assuntos
Reposicionamento de Medicamentos , Redes Neurais de Computação , Interações Medicamentosas , Desenvolvimento de Medicamentos , Serviços de Informação
2.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34929742

RESUMO

MOTIVATION: Accumulating evidences have indicated that microRNA (miRNA) plays a crucial role in the pathogenesis and progression of various complex diseases. Inferring disease-associated miRNAs is significant to explore the etiology, diagnosis and treatment of human diseases. As the biological experiments are time-consuming and labor-intensive, developing effective computational methods has become indispensable to identify associations between miRNAs and diseases. RESULTS: We present an Ensemble learning framework with Resampling method for MiRNA-Disease Association (ERMDA) prediction to discover potential disease-related miRNAs. Firstly, the resampling strategy is proposed for building multiple different balanced training subsets to address the challenge of sample imbalance within the database. Then, ERMDA extracts miRNA and disease feature representations by integrating miRNA-miRNA similarities, disease-disease similarities and experimentally verified miRNA-disease association information. Next, the feature selection approach is applied to reduce the redundant information and increase the diversity among these subsets. Lastly, ERMDA constructs an individual learner on each subset to yield primitive outcomes, and the soft voting method is introduced for making the final decision based on the prediction results of individual learners. A series of experimental results demonstrates that ERMDA outperforms other state-of-the-art methods on both balanced and unbalanced testing sets. Besides, case studies conducted on the three human diseases further confirm the ERMDA's prediction capability for identifying potential disease-related miRNAs. In conclusion, these experimental results demonstrate that our method can serve as an effective and reliable tool for researchers to explore the regulatory role of miRNAs in complex diseases.


Assuntos
Doença/genética , Estudos de Associação Genética , Aprendizado de Máquina , MicroRNAs/genética , Algoritmos , Biologia Computacional , Predisposição Genética para Doença/genética , Humanos
3.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-36070619

RESUMO

MOTIVATION: CircularRNA (circRNA) is a class of noncoding RNA with high conservation and stability, which is considered as an important disease biomarker and drug target. Accumulating pieces of evidence have indicated that circRNA plays a crucial role in the pathogenesis and progression of many complex diseases. As the biological experiments are time-consuming and labor-intensive, developing an accurate computational prediction method has become indispensable to identify disease-related circRNAs. RESULTS: We presented a hybrid graph representation learning framework, named GraphCDA, for predicting the potential circRNA-disease associations. Firstly, the circRNA-circRNA similarity network and disease-disease similarity network were constructed to characterize the relationships of circRNAs and diseases, respectively. Secondly, a hybrid graph embedding model combining Graph Convolutional Networks and Graph Attention Networks was introduced to learn the feature representations of circRNAs and diseases simultaneously. Finally, the learned representations were concatenated and employed to build the prediction model for identifying the circRNA-disease associations. A series of experimental results demonstrated that GraphCDA outperformed other state-of-the-art methods on several public databases. Moreover, GraphCDA could achieve good performance when only using a small number of known circRNA-disease associations as the training set. Besides, case studies conducted on several human diseases further confirmed the prediction capability of GraphCDA for predicting potential disease-related circRNAs. In conclusion, extensive experimental results indicated that GraphCDA could serve as a reliable tool for exploring the regulatory role of circRNAs in complex diseases.


Assuntos
Biologia Computacional , RNA Circular , Biomarcadores , Biologia Computacional/métodos , Humanos , Polímeros
4.
Bioinformatics ; 29(11): 1424-32, 2013 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-23572412

RESUMO

MOTIVATION: Compared with sequence and structure similarity, functional similarity is more informative for understanding the biological roles and functions of genes. Many important applications in computational molecular biology require functional similarity, such as gene clustering, protein function prediction, protein interaction evaluation and disease gene prioritization. Gene Ontology (GO) is now widely used as the basis for measuring gene functional similarity. Some existing methods combined semantic similarity scores of single term pairs to estimate gene functional similarity, whereas others compared terms in groups to measure it. However, these methods may make error-prone judgments about gene functional similarity. It remains a challenge that measuring gene functional similarity reliably. RESULT: We propose a novel method called SORA to measure gene functional similarity in GO context. First of all, SORA computes the information content (IC) of a term making use of semantic specificity and coverage. Second, SORA measures the IC of a term set by means of combining inherited and extended IC of the terms based on the structure of GO. Finally, SORA estimates gene functional similarity using the IC overlap ratio of term sets. SORA is evaluated against five state-of-the-art methods in the file on the public platform for collaborative evaluation of GO-based semantic similarity measure. The carefully comparisons show SORA is superior to other methods in general. Further analysis suggests that it primarily benefits from the structure of GO, which implies expressive information about gene function. SORA offers an effective and reliable way to compare gene function. AVAILABILITY: The web service of SORA is freely available at http://nclab.hit.edu.cn/SORA/


Assuntos
Genes , Anotação de Sequência Molecular , Vocabulário Controlado , Algoritmos , Biologia Computacional/métodos , Proteínas/genética , Proteínas/fisiologia , Semântica
5.
IEEE J Biomed Health Inform ; 28(5): 3146-3157, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38294927

RESUMO

Predicting potential drug-disease associations (RDAs) plays a pivotal role in elucidating therapeutic strategies for diseases and facilitating drug repositioning, making it of paramount importance. However, existing methods are constrained and rely heavily on limited domain-specific knowledge, impeding their ability to effectively predict candidate associations between drugs and diseases. Moreover, the simplistic definition of unknown information pertaining to drug-disease relationships as negative samples presents inherent limitations. To overcome these challenges, we introduce a novel hierarchical negative sampling-based graph contrastive model, termed HSGCLRDA, which aims to forecast latent associations between drugs and diseases. In this study, HSGCLRDA integrates the association information as well as similarity between drugs, diseases and proteins. Meanwhile, the model constructs a drug-disease-protein heterogeneous network. Subsequently, employing a hierarchical structural sampling technique, we establish reliable negative drug-disease samples utilizing PageRank algorithms. Utilizing meta-path aggregation within the heterogeneous network, we derive low-dimensional representations for drugs and diseases, thereby constructing global and local feature graphs that capture their interactions comprehensively. To obtain representation information, we adopt a self-supervised graph contrastive approach that leverages graph convolutional networks (GCNs) and second-order GCNs to extract feature graph information. Furthermore, we integrate a contrastive cost function derived from the cross-entropy cost function, facilitating holistic model optimization. Experimental results obtained from benchmark datasets not only showcase the superior performance of HSGCLRDA compared to various baseline methods in predicting RDAs but also emphasize its practical utility in identifying novel potential diseases associated with existing drugs through meticulous case studies.


Assuntos
Algoritmos , Biologia Computacional , Humanos , Biologia Computacional/métodos , Aprendizado de Máquina , Reposicionamento de Medicamentos/métodos , Doença/classificação , Preparações Farmacêuticas
6.
Comput Biol Med ; 157: 106711, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36924738

RESUMO

Long non-coding RNAs (lncRNAs) play important roles by regulating proteins in many biological processes and life activities. To uncover molecular mechanisms of lncRNA, it is very necessary to identify interactions of lncRNA with proteins. Recently, some machine learning methods were proposed to detect lncRNA-protein interactions according to the distribution of known interactions. The performances of these methods were largely dependent upon: (1) how exactly the distribution of known interactions was characterized by feature space; (2) how discriminative the feature space was for distinguishing lncRNA-protein interactions. Because the known interactions may be multiple and complex model, it remains a challenge to construct discriminative feature space for lncRNA-protein interactions. To resolve this problem, a novel method named DFRPI was developed based on deep autoencoder and marginal fisher analysis in this paper. Firstly, some initial features of lncRNA-protein interactions were extracted from the primary sequences and secondary structures of lncRNA and protein. Secondly, a deep autoencoder was exploited to learn encode parameters of the initial features to describe the known interactions precisely. Next, the marginal fisher analysis was employed to optimize the encode parameters of features to characterize a discriminative feature space of the lncRNA-protein interactions. Finally, a random forest-based predictor was trained on the discriminative feature space to detect lncRNA-protein interactions. Verified by a series of experiments, the results showed that our predictor achieved the precision of 0.920, recall of 0.916, accuracy of 0.918, MCC of 0.836, specificity of 0.920, sensitivity of 0.916 and AUC of 0.906 respectively, which outperforms the concerned methods for predicting lncRNA-protein interaction. It may be suggested that the proposed method can generate a reasonable and effective feature space for distinguishing lncRNA-protein interactions accurately. The code and data are available on https://github.com/D0ub1e-D/DFRPI.


Assuntos
RNA Longo não Codificante , RNA Longo não Codificante/genética , Algoritmos , Aprendizado de Máquina , Biologia Computacional/métodos
7.
Artigo em Inglês | MEDLINE | ID: mdl-37498762

RESUMO

Circular RNA (circRNA) is a class of noncoding RNA that is highly conserved and exhibit exceptional stability. Due to its function as a microRNA sponge, circRNA has gained significant attention as an essential biomarker and potential drug target in the pathogenesis of several cancers. Although many circRNAs have been identified to play a role in cancer resistance, traditional methods are time-consuming and expensive. In this context, computational methods offer a promising way to facilitate the discovery process. However, most existing prediction models focus on the association between circRNAs and drug resistance, without considering the corresponding disease-related information in the circRNA-drug resistance association. Incorporating disease-related information into the prediction of circRNA-drug resistance associations could potentially improve the efficiency and speed of discovering and developing circRNA-targeting drugs. We propose a computational framework, named GraphCDD, for predicting the association between circRNA and drug resistance. Our model utilizes data from three sources, namely circRNA, disease, and drug, to construct three similarity networks that represent the features of circRNA, disease, and drug, respectively. We utilize a multimodal graph neural network to acquire efficient representations of circRNAs, diseases, and drugs by integrating various types of information, and establish a predictive model. The experimental results have validated the effectiveness of our model and provided a promising method in predicting potential associations between circRNA and drug resistance. The source code and dataset of GraphCDD can be found at https://github.com/Ziqiang-Liu/GraphCDD.

8.
IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 1737-1745, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36251906

RESUMO

Studies have shown that IncRNA-miRNA interactions can affect cellular expression at the level of gene molecules through a variety of regulatory mechanisms and have important effects on the biological activities of living organisms. Several biomolecular network-based approaches have been proposed to accelerate the identification of lncRNA-miRNA interactions. However, most of the methods cannot fully utilize the structural and topological information of the lncRNA-miRNA interaction network. In this article, we proposed a new method, ISLMI, a prediction model based on information injection and second order graph convolution network(SOGCN). The model calculated the sequence similarity and Gaussian interaction profile kernel similarity between lncRNA and miRNA, fused them to enhance the intrinsic interaction between the nodes, using SOGCN to learn second-order representations of similarity matrix information. At the same time, multiple feature representations obtain using different graph embedding methods were also injected into the second-order graph representation. Finally, matrix complementation was used to increase the model accuracy. The model combined the advantages of different methods and achieved reliable performance in 5-fold cross-validation, significantly improved the performance of predicting lncRNA-miRNA interactions. In addition, our model successfully confirmed the superiority of ISLMI by comparing it with several other model algorithm.


Assuntos
MicroRNAs , RNA Longo não Codificante , MicroRNAs/genética , MicroRNAs/metabolismo , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Biologia Computacional/métodos , Algoritmos
9.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3395-3403, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34543201

RESUMO

Recent studies have found that lncRNA (long non-coding RNA) in ncRNA (non-coding RNA) is not only involved in many biological processes, but also abnormally expressed in many complex diseases. Identification of lncRNA-disease associations accurately is of great significance for understanding the function of lncRNA and disease mechanism. In this paper, a deep learning framework consisting of stacked autoencoder(SAE), multi-scale ResNet and stacked ensemble module, named DHNLDA, was constructed to predict lncRNA-disease associations, which integrates multiple biological data sources and constructing feature matrices. Among them, the biological data including the similarity and the interaction of lncRNAs, diseases and miRNAs are integrated. The feature matrices are obtained by node2vec embedding and feature extraction respectively. Then, the SAE and the multi-scale ResNet are used to learn the complementary information between nodes, and the high-level features of node attributes are obtained. Finally, the fusion of high-level feature is input into the stacked ensemble module to obtain the prediction results of lncRNA-disease associations. The experimental results of five-fold cross-validation show that the AUC of DHNLDA reaches 0.975 better than the existing methods. Case studies of stomach cancer, breast cancer and lung cancer have shown the great ability of DHNLDA to discover the potential lncRNA-disease associations.


Assuntos
Neoplasias da Mama , MicroRNAs , RNA Longo não Codificante , Neoplasias Gástricas , Humanos , Feminino , RNA Longo não Codificante/genética , Algoritmos , MicroRNAs/genética , Neoplasias da Mama/genética , Neoplasias Gástricas/genética , Biologia Computacional/métodos
10.
Math Biosci Eng ; 19(5): 4749-4764, 2022 03 11.
Artigo em Inglês | MEDLINE | ID: mdl-35430839

RESUMO

Long non-coding RNAs (lncRNAs) play a regulatory role in many biological cells, and the recognition of lncRNA-protein interactions is helpful to reveal the functional mechanism of lncRNAs. Identification of lncRNA-protein interaction by biological techniques is costly and time-consuming. Here, an ensemble learning framework, RLF-LPI is proposed, to predict lncRNA-protein interactions. The RLF-LPI of the residual LSTM autoencoder module with fusion attention mechanism can extract the potential representation of features and capture the dependencies between sequences and structures by k-mer method. Finally, the relationship between lncRNA and protein is learned through the method of fuzzy decision. The experimental results show that the ACC of RLF-LPI is 0.912 on ATH948 dataset and 0.921 on ZEA22133 dataset. Thus, it is demonstrated that our proposed method performed better in predicting lncRNA-protein interaction than other methods.


Assuntos
RNA Longo não Codificante , Biologia Computacional/métodos , Aprendizado de Máquina , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo
11.
IEEE/ACM Trans Comput Biol Bioinform ; 19(3): 1724-1733, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-33125334

RESUMO

Long non-coding RNA(lncRNA) can interact with microRNA(miRNA) and play an important role in inhibiting or activating the expression of target genes and the occurrence and development of tumors. Accumulating studies focus on the prediction of miRNA-lncRNA interaction, and mostly are concerned with biological experiments and machine learning methods. These methods are found with long cycles, high costs, and requiring over much human intervention. In this paper, a data-driven hierarchical deep learning framework was proposed, which was composed of a capsule network, an independent recurrent neural network with attention mechanism and bi-directional long short-term memory network. This framework combines the advantages of different networks, uses multiple sequence-derived features of the original sequence and features of secondary structure to mine the dependency between features, and devotes to obtain better results. In the experiment, five-fold cross-validation was used to evaluate the performance of the model, and the zea mays data set was compared with the different model to obtain better classification effect. In addition, sorghum, brachypodium distachyon and bryophyte data sets were used to test the model, and the accuracy reached 0.9850, 0.9859 and 0.9777, respectively, which verified the model's good generalization ability.


Assuntos
Aprendizado Profundo , MicroRNAs , RNA Longo não Codificante , Biologia Computacional/métodos , Humanos , Aprendizado de Máquina , MicroRNAs/genética , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo
12.
IEEE/ACM Trans Comput Biol Bioinform ; 19(4): 2409-2419, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34038367

RESUMO

RNA binding protein (RBP) is extensively involved in various cellular regulatory processes through the interaction with RNAs. Capturing the RBP binding preferences is fundamental for revealing the pathogenesis of complex diseases. Many experimental detection techniques are still time-consuming and labor-intensive, therefore, it is indispensable to develop a computational method with convincing accuracy. In this study, we proposed a CNN-BLSTM hybrid deep learning framework, named DeepDW, for predicting the RBP binding sites on RNAs with high-order encoding features of RNA sequence and secondary structure. The high-order encoding strategy was used to characterize the dependencies among adjacency nucleotides. For CNN-BLSTM hybrid model, DeepDW first employed two 1-D convolutional neural networks (CNNs) for learning the local features from high-order encoded matrices of RNA sequence and structure separately, and then applied two bidirectional long short-term memory networks (BLSTMs) to capture the global information in a higher level. Moreover, a series of experiments were carried out on 31 public datasets to evaluate our proposed framework, and DeepDW achieved superior performance than the state-of-the-art methods. The results indicated that the combination of high-order encoding method and CNN-BLSTM hybrid model had advantages in identifying RBP-RNA binding sites.


Assuntos
Redes Neurais de Computação , RNA , Sítios de Ligação/genética , Ligação Proteica , RNA/genética , RNA/metabolismo , Proteínas de Ligação a RNA/química
13.
Interdiscip Sci ; 12(4): 414-423, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-32572768

RESUMO

Long non-coding RNA(lncRNA) is one of the non-coding RNAs longer than 200 nucleotides and it has no protein encoding function. LncRNA plays a key role in many biological processes. Studying the RNA-binding protein (RBP) binding sites on the lncRNA chain helps to reveal epigenetic and post-transcriptional mechanisms, to explore the physiological and pathological processes of cancer, and to discover new therapeutic breakthroughs. To improve the recognition rate of RBP binding sites and reduce the experimental time and cost, many calculation methods based on domain knowledge to predict RBP binding sites have emerged. However, these prediction methods are independent of nucleotides and do not take into account nucleotide statistics. In this paper, we use a high-order statistical-based encoding scheme, then the encoded lncRNA sequences are fed into a hybrid deep learning architecture named AC-Caps. It consists of a joint processing layer(composed of attention mechanism and convolutional neural network) and a capsule network. The AC-Caps model was evaluated using 31 independent experimental data sets from 12 lncRNA-binding proteins. In experiments, our method achieves excellent performance, with an average area under the curve (AUC) of 0.967 and an average accuracy (ACC) of 92.5%, which are 0.014, 2.3%, 0.261, 28.9%, 0.189, and 21.8% higher than HOCCNNLB, iDeepS, and DeepBind, respectively. The results show that the AC-Caps method can reliably process the large-scale RBP binding site data on the lncRNA chain, and the prediction performance is better than existing deep-learning models. The source code of AC-Caps and the datasets used in this paper are available at https://github.com/JinmiaoS/AC-Caps .


Assuntos
RNA Longo não Codificante/química , Sítios de Ligação , Redes Neurais de Computação , Ligação Proteica , Proteínas de Ligação a RNA/metabolismo
14.
Front Genet ; 10: 18, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30774646

RESUMO

Non-coding RNA (ncRNA) plays important roles in many critical regulation processes. Many ncRNAs perform their regulatory functions by the form of RNA-protein complexes. Therefore, identifying the interaction between ncRNA and protein is fundamental to understand functions of ncRNA. Under pressures from expensive cost of experimental techniques, developing an accuracy computational predictive model has become an indispensable way to identify ncRNA-protein interaction. A powerful predicting model of ncRNA-protein interaction needs a good feature set of characterizing the interaction. In this paper, a novel method is put forward to generate complex features for characterizing ncRNA-protein interaction (named CFRP). To obtain a comprehensive description of ncRNA-protein interaction, complex features are generated by non-linear transformations from the traditional k-mer features of ncRNA and protein sequences. To further reduce the dimensions of complex features, a group of discriminative features are selected by random forest. To validate the performances of the proposed method, a series of experiments are carried on several widely-used public datasets. Compared with the traditional k-mer features, the CFRP complex features can boost the performances of ncRNA-protein interaction prediction model. Meanwhile, the CFRP-based prediction model is compared with several state-of-the-art methods, and the results show that the proposed method achieves better performances than the others in term of the evaluation metrics. In conclusion, the complex features generated by CFRP are beneficial for building a powerful predicting model of ncRNA-protein interaction.

15.
FEBS Open Bio ; 5: 251-6, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25870785

RESUMO

In humans, despite the rapid increase in disease-associated gene discovery, a large proportion of disease-associated genes are still unknown. Many network-based approaches have been used to prioritize disease genes. Many networks, such as the protein-protein interaction (PPI), KEGG, and gene co-expression networks, have been used. Expression quantitative trait loci (eQTLs) have been successfully applied for the determination of genes associated with several diseases. In this study, we constructed an eQTL-based gene-gene co-regulation network (GGCRN) and used it to mine for disease genes. We adopted the random walk with restart (RWR) algorithm to mine for genes associated with Alzheimer disease. Compared to the Human Protein Reference Database (HPRD) PPI network alone, the integrated HPRD PPI and GGCRN networks provided faster convergence and revealed new disease-related genes. Therefore, using the RWR algorithm for integrated PPI and GGCRN is an effective method for disease-associated gene mining.

16.
Biomed Res Int ; 2014: 720960, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25405206

RESUMO

Protein complex formed by a group of physical interacting proteins plays a crucial role in cell activities. Great effort has been made to computationally identify protein complexes from protein-protein interaction (PPI) network. However, the accuracy of the prediction is still far from being satisfactory, because the topological structures of protein complexes in the PPI network are too complicated. This paper proposes a novel optimization framework to detect complexes from PPI network, named PLSMC. The method is on the basis of the fact that if two proteins are in a common complex, they are likely to be interacting. PLSMC employs this relation to determine complexes by a penalized least squares method. PLSMC is applied to several public yeast PPI networks, and compared with several state-of-the-art methods. The results indicate that PLSMC outperforms other methods. In particular, complexes predicted by PLSMC can match known complexes with a higher accuracy than other methods. Furthermore, the predicted complexes have high functional homogeneity.


Assuntos
Modelos Teóricos , Complexos Multiproteicos/química , Mapas de Interação de Proteínas , Proteínas/química , Algoritmos , Biologia Computacional , Citoplasma/química , Citoplasma/genética , Análise dos Mínimos Quadrados , Proteínas/genética
17.
Biomed Res Int ; 2014: 641469, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24868539

RESUMO

In this paper, we propose a novel method, SeekFun, to predict protein function based on weighted mapping of domains and GO terms. Firstly, a weighted mapping of domains and GO terms is constructed according to GO annotations and domain composition of the proteins. The association strength between domain and GO term is weighted by symmetrical conditional probability. Secondly, the mapping is extended along the true paths of the terms based on GO hierarchy. Finally, the terms associated with resident domains are transferred to host protein and real annotations of the host protein are determined by association strengths. Our careful comparisons demonstrate that SeekFun outperforms the concerned methods on most occasions. SeekFun provides a flexible and effective way for protein function prediction. It benefits from the well-constructed mapping of domains and GO terms, as well as the reasonable strategy for inferring annotations of protein from those of its domains.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Proteínas/fisiologia , Algoritmos , Bases de Dados de Proteínas , Modelos Estatísticos , Probabilidade , Estrutura Terciária de Proteína , Proteômica , Reprodutibilidade dos Testes , Software
18.
PLoS One ; 8(8): e70204, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23950912

RESUMO

BACKGROUND: The identification of human disease-related microRNAs (disease miRNAs) is important for further investigating their involvement in the pathogenesis of diseases. More experimentally validated miRNA-disease associations have been accumulated recently. On the basis of these associations, it is essential to predict disease miRNAs for various human diseases. It is useful in providing reliable disease miRNA candidates for subsequent experimental studies. METHODOLOGY/PRINCIPAL FINDINGS: It is known that miRNAs with similar functions are often associated with similar diseases and vice versa. Therefore, the functional similarity of two miRNAs has been successfully estimated by measuring the semantic similarity of their associated diseases. To effectively predict disease miRNAs, we calculated the functional similarity by incorporating the information content of disease terms and phenotype similarity between diseases. Furthermore, the members of miRNA family or cluster are assigned higher weight since they are more probably associated with similar diseases. A new prediction method, HDMP, based on weighted k most similar neighbors is presented for predicting disease miRNAs. Experiments validated that HDMP achieved significantly higher prediction performance than existing methods. In addition, the case studies examining prostatic neoplasms, breast neoplasms, and lung neoplasms, showed that HDMP can uncover potential disease miRNA candidates. CONCLUSIONS: The superior performance of HDMP can be attributed to the accurate measurement of miRNA functional similarity, the weight assignment based on miRNA family or cluster, and the effective prediction based on weighted k most similar neighbors. The online prediction and analysis tool is freely available at http://nclab.hit.edu.cn/hdmpred.


Assuntos
MicroRNAs/genética , Neoplasias/genética , Algoritmos , Humanos , Modelos Biológicos , Modelos Estatísticos , Fenótipo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA