Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 62
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35511108

RESUMEN

MOTIVATION: Interaction between transcription factor (TF) and its target genes establishes the knowledge foundation for biological researches in transcriptional regulation, the number of which is, however, still limited by biological techniques. Existing computational methods relevant to the prediction of TF-target interactions are mostly proposed for predicting binding sites, rather than directly predicting the interactions. To this end, we propose here a graph attention-based autoencoder model to predict TF-target gene interactions using the information of the known TF-target gene interaction network combined with two sequential and chemical gene characters, considering that the unobserved interactions between transcription factors and target genes can be predicted by learning the pattern of the known ones. To the best of our knowledge, the proposed model is the first attempt to solve this problem by learning patterns from the known TF-target gene interaction network. RESULTS: In this paper, we formulate the prediction task of TF-target gene interactions as a link prediction problem on a complex knowledge graph and propose a deep learning model called GraphTGI, which is composed of a graph attention-based encoder and a bilinear decoder. We evaluated the prediction performance of the proposed method on a real dataset, and the experimental results show that the proposed model yields outstanding performance with an average AUC value of 0.8864 +/- 0.0057 in the 5-fold cross-validation. It is anticipated that the GraphTGI model can effectively and efficiently predict TF-target gene interactions on a large scale. AVAILABILITY: Python code and the datasets used in our studies are made available at https://github.com/YanghanWu/GraphTGI.


Asunto(s)
Redes Neurales de la Computación
2.
Bioinformatics ; 39(8)2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37505483

RESUMEN

MOTIVATION: The task of predicting drug-target interactions (DTIs) plays a significant role in facilitating the development of novel drug discovery. Compared with laboratory-based approaches, computational methods proposed for DTI prediction are preferred due to their high-efficiency and low-cost advantages. Recently, much attention has been attracted to apply different graph neural network (GNN) models to discover underlying DTIs from heterogeneous biological information network (HBIN). Although GNN-based prediction methods achieve better performance, they are prone to encounter the over-smoothing simulation when learning the latent representations of drugs and targets with their rich neighborhood information in HBIN, and thereby reduce the discriminative ability in DTI prediction. RESULTS: In this work, an improved graph representation learning method, namely iGRLDTI, is proposed to address the above issue by better capturing more discriminative representations of drugs and targets in a latent feature space. Specifically, iGRLDTI first constructs an HBIN by integrating the biological knowledge of drugs and targets with their interactions. After that, it adopts a node-dependent local smoothing strategy to adaptively decide the propagation depth of each biomolecule in HBIN, thus significantly alleviating over-smoothing by enhancing the discriminative ability of feature representations of drugs and targets. Finally, a Gradient Boosting Decision Tree classifier is used by iGRLDTI to predict novel DTIs. Experimental results demonstrate that iGRLDTI yields better performance that several state-of-the-art computational methods on the benchmark dataset. Besides, our case study indicates that iGRLDTI can successfully identify novel DTIs with more distinguishable features of drugs and targets. AVAILABILITY AND IMPLEMENTATION: Python codes and dataset are available at https://github.com/stevejobws/iGRLDTI/.


Asunto(s)
Descubrimiento de Drogas , Redes Neurales de la Computación , Simulación por Computador , Descubrimiento de Drogas/métodos , Interacciones Farmacológicas
3.
PLoS Comput Biol ; 19(6): e1011207, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37339154

RESUMEN

Interactions between transcription factor and target gene form the main part of gene regulation network in human, which are still complicating factors in biological research. Specifically, for nearly half of those interactions recorded in established database, their interaction types are yet to be confirmed. Although several computational methods exist to predict gene interactions and their type, there is still no method available to predict them solely based on topology information. To this end, we proposed here a graph-based prediction model called KGE-TGI and trained in a multi-task learning manner on a knowledge graph that we specially constructed for this problem. The KGE-TGI model relies on topology information rather than being driven by gene expression data. In this paper, we formulate the task of predicting interaction types of transcript factor and target genes as a multi-label classification problem for link types on a heterogeneous graph, coupled with solving another link prediction problem that is inherently related. We constructed a ground truth dataset as benchmark and evaluated the proposed method on it. As a result of the 5-fold cross experiments, the proposed method achieved average AUC values of 0.9654 and 0.9339 in the tasks of link prediction and link type classification, respectively. In addition, the results of a series of comparison experiments also prove that the introduction of knowledge information significantly benefits to the prediction and that our methodology achieve state-of-the-art performance in this problem.


Asunto(s)
Reconocimiento de Normas Patrones Automatizadas , Factores de Transcripción , Humanos , Bases de Datos Factuales , Factores de Transcripción/genética , Redes Reguladoras de Genes , Proteoma , Algoritmos , Biología de Sistemas , Ontología de Genes
4.
BMC Bioinformatics ; 24(1): 188, 2023 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-37158823

RESUMEN

BACKGROUND: The limited knowledge of miRNA-lncRNA interactions is considered as an obstruction of revealing the regulatory mechanism. Accumulating evidence on Human diseases indicates that the modulation of gene expression has a great relationship with the interactions between miRNAs and lncRNAs. However, such interaction validation via crosslinking-immunoprecipitation and high-throughput sequencing (CLIP-seq) experiments that inevitably costs too much money and time but with unsatisfactory results. Therefore, more and more computational prediction tools have been developed to offer many reliable candidates for a better design of further bio-experiments. METHODS: In this work, we proposed a novel link prediction model based on Gaussian kernel-based method and linear optimization algorithm for inferring miRNA-lncRNA interactions (GKLOMLI). Given an observed miRNA-lncRNA interaction network, the Gaussian kernel-based method was employed to output two similarity matrixes of miRNAs and lncRNAs. Based on the integrated matrix combined with similarity matrixes and the observed interaction network, a linear optimization-based link prediction model was trained for inferring miRNA-lncRNA interactions. RESULTS: To evaluate the performance of our proposed method, k-fold cross-validation (CV) and leave-one-out CV were implemented, in which each CV experiment was carried out 100 times on a training set generated randomly. The high area under the curves (AUCs) at 0.8623 ± 0.0027 (2-fold CV), 0.9053 ± 0.0017 (5-fold CV), 0.9151 ± 0.0013 (10-fold CV), and 0.9236 (LOO-CV), illustrated the precision and reliability of our proposed method. CONCLUSION: GKLOMLI with high performance is anticipated to be used to reveal underlying interactions between miRNA and their target lncRNAs, and deciphers the potential mechanisms of the complex diseases.


Asunto(s)
MicroARNs , ARN Largo no Codificante , Humanos , ARN Largo no Codificante/genética , Reproducibilidad de los Resultados , Proyectos de Investigación , Algoritmos , MicroARNs/genética
5.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33693513

RESUMEN

Proteins interact with each other to play critical roles in many biological processes in cells. Although promising, laboratory experiments usually suffer from the disadvantages of being time-consuming and labor-intensive. The results obtained are often not robust and considerably uncertain. Due recently to advances in high-throughput technologies, a large amount of proteomics data has been collected and this presents a significant opportunity and also a challenge to develop computational models to predict protein-protein interactions (PPIs) based on these data. In this paper, we present a comprehensive survey of the recent efforts that have been made towards the development of effective computational models for PPI prediction. The survey introduces the algorithms that can be used to learn computational models for predicting PPIs, and it classifies these models into different categories. To understand their relative merits, the paper discusses different validation schemes and metrics to evaluate the prediction performance. Biological databases that are commonly used in different experiments for performance comparison are also described and their use in a series of extensive experiments to compare different prediction models are discussed. Finally, we present some open issues in PPI prediction for future work. We explain how the performance of PPI prediction can be improved if these issues are effectively tackled.


Asunto(s)
Biología Computacional/métodos , Mapeo de Interacción de Proteínas/métodos , Proteínas/metabolismo , Programas Informáticos , Máquina de Vectores de Soporte , Bases de Datos Genéticas , Bases de Datos de Proteínas , Ontología de Genes , Humanos , Modelos Moleculares , Conformación Proteica , Dominios y Motivos de Interacción de Proteínas , Mapeo de Interacción de Proteínas/estadística & datos numéricos , Proteínas/química , Proteínas/genética , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
6.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33734296

RESUMEN

Emerging research shows that circular RNA (circRNA) plays a crucial role in the diagnosis, occurrence and prognosis of complex human diseases. Compared with traditional biological experiments, the computational method of fusing multi-source biological data to identify the association between circRNA and disease can effectively reduce cost and save time. Considering the limitations of existing computational models, we propose a semi-supervised generative adversarial network (GAN) model SGANRDA for predicting circRNA-disease association. This model first fused the natural language features of the circRNA sequence and the features of disease semantics, circRNA and disease Gaussian interaction profile kernel, and then used all circRNA-disease pairs to pre-train the GAN network, and fine-tune the network parameters through labeled samples. Finally, the extreme learning machine classifier is employed to obtain the prediction result. Compared with the previous supervision model, SGANRDA innovatively introduced circRNA sequences and utilized all the information of circRNA-disease pairs during the pre-training process. This step can increase the information content of the feature to some extent and reduce the impact of too few known associations on the model performance. SGANRDA obtained AUC scores of 0.9411 and 0.9223 in leave-one-out cross-validation and 5-fold cross-validation, respectively. Prediction results on the benchmark dataset show that SGANRDA outperforms other existing models. In addition, 25 of the top 30 circRNA-disease pairs with the highest scores of SGANRDA in case studies were verified by recent literature. These experimental results demonstrate that SGANRDA is a useful model to predict the circRNA-disease association and can provide reliable candidates for biological experiments.


Asunto(s)
Aprendizaje Profundo , Redes Reguladoras de Genes , Esclerosis Múltiple/genética , Infarto del Miocardio/genética , Neoplasias/genética , Osteoartritis/genética , ARN Circular/genética , Área Bajo la Curva , Biología Computacional/métodos , Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Regulación de la Expresión Génica , Humanos , Esclerosis Múltiple/metabolismo , Esclerosis Múltiple/patología , Infarto del Miocardio/metabolismo , Infarto del Miocardio/patología , Neoplasias/clasificación , Neoplasias/metabolismo , Neoplasias/patología , Osteoartritis/metabolismo , Osteoartritis/patología , ARN Circular/clasificación , ARN Circular/metabolismo , Factores de Riesgo
7.
Brief Bioinform ; 22(3)2021 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-32633319

RESUMEN

MOTIVATION: Identifying microRNAs that are associated with different diseases as biomarkers is a problem of great medical significance. Existing computational methods for uncovering such microRNA-diseases associations (MDAs) are mostly developed under the assumption that similar microRNAs tend to associate with similar diseases. Since such an assumption is not always valid, these methods may not always be applicable to all kinds of MDAs. Considering that the relationship between long noncoding RNA (lncRNA) and different diseases and the co-regulation relationships between the biological functions of lncRNA and microRNA have been established, we propose here a multiview multitask method to make use of the known lncRNA-microRNA interaction to predict MDAs on a large scale. The investigation is performed in the absence of complete information of microRNAs and any similarity measurement for it and to the best knowledge, the work represents the first ever attempt to discover MDAs based on lncRNA-microRNA interactions. RESULTS: In this paper, we propose to develop a deep learning model called MVMTMDA that can create a multiview representation of microRNAs. The model is trained based on an end-to-end multitasking approach to machine learning so that, based on it, missing data in the side information can be determined automatically. Experimental results show that the proposed model yields an average area under ROC curve of 0.8410+/-0.018, 0.8512+/-0.012 and 0.8521+/-0.008 when k is set to 2, 5 and 10, respectively. In addition, we also propose here a statistical approach to predicting lncRNA-disease associations based on these associations and the MDA discovered using MVMTMDA. AVAILABILITY: Python code and the datasets used in our studies are made available at https://github.com/yahuang1991polyu/MVMTMDA/.


Asunto(s)
Enfermedad/genética , Aprendizaje Automático , MicroARNs , Modelos Genéticos , ARN Largo no Codificante , Humanos , MicroARNs/genética , MicroARNs/metabolismo , Valor Predictivo de las Pruebas , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo
8.
Bioinformatics ; 38(9): 2554-2560, 2022 04 28.
Artículo en Inglés | MEDLINE | ID: mdl-35266510

RESUMEN

MOTIVATION: Identifying the target genes of transcription factors (TFs) is of great significance for biomedical researches. However, using biological experiments to identify TF-target gene interactions is still time consuming, expensive and limited to small scale. Existing computational methods for predicting underlying genes for TF to target is mainly proposed for their binding sites rather than the direct interaction. To bridge this gap, we in this work proposed a deep learning prediction model, named HGETGI, to identify the new TF-target gene interaction. Specifically, the proposed HGETGI model learns the patterns of the known interaction between TF and target gene complemented with their involvement in different human disease mechanisms. It performs prediction based on random walk for meta-path sampling and node embedding in a skip-gram manner. RESULTS: We evaluated the prediction performance of the proposed method on a real dataset and the experimental results show that it can achieve the average area under the curve of 0.8519 ± 0.0731 in fivefold cross validation. Besides, we conducted case studies on the prediction of two important kinds of TF, NFKB1 and TP53. As a result, 33 and 32 in the top-40 ranking lists of NFKB1 and TP53 were successfully confirmed by looking up another public database (hTftarget). It is envisioned that the proposed HGETGI method is feasible and effective for predicting TF-target gene interactions on a large scale. AVAILABILITY AND IMPLEMENTATION: The source code and dataset are available at https://github.com/PGTSING/HGETGI. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Factores de Transcripción , Humanos , Sitios de Unión , Factores de Transcripción/metabolismo
9.
BMC Bioinformatics ; 23(Suppl 7): 518, 2022 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-36457083

RESUMEN

BACKGROUND: Self-interacting proteins (SIPs), two or more copies of the protein that can interact with each other expressed by one gene, play a central role in the regulation of most living cells and cellular functions. Although numerous SIPs data can be provided by using high-throughput experimental techniques, there are still several shortcomings such as in time-consuming, costly, inefficient, and inherently high in false-positive rates, for the experimental identification of SIPs even nowadays. Therefore, it is more and more significant how to develop efficient and accurate automatic approaches as a supplement of experimental methods for assisting and accelerating the study of predicting SIPs from protein sequence information. RESULTS: In this paper, we present a novel framework, termed GLCM-WSRC (gray level co-occurrence matrix-weighted sparse representation based classification), for predicting SIPs automatically based on protein evolutionary information from protein primary sequences. More specifically, we firstly convert the protein sequence into Position Specific Scoring Matrix (PSSM) containing protein sequence evolutionary information, exploiting the Position Specific Iterated BLAST (PSI-BLAST) tool. Secondly, using an efficient feature extraction approach, i.e., GLCM, we extract abstract salient and invariant feature vectors from the PSSM, and then perform a pre-processing operation, the adaptive synthetic (ADASYN) technique, to balance the SIPs dataset to generate new feature vectors for classification. Finally, we employ an efficient and reliable WSRC model to identify SIPs according to the known information of self-interacting and non-interacting proteins. CONCLUSIONS: Extensive experimental results show that the proposed approach exhibits high prediction performance with 98.10% accuracy on the yeast dataset, and 91.51% accuracy on the human dataset, which further reveals that the proposed model could be a useful tool for large-scale self-interacting protein prediction and other bioinformatics tasks detection in the future.


Asunto(s)
Evolución Biológica , Biología Computacional , Humanos , Secuencia de Aminoácidos , Posición Específica de Matrices de Puntuación , Leucocitos , Saccharomyces cerevisiae/genética
10.
BMC Genomics ; 22(Suppl 1): 916, 2022 Mar 16.
Artículo en Inglés | MEDLINE | ID: mdl-35296232

RESUMEN

BACKGROUND: Recent evidences have suggested that human microorganisms participate in important biological activities in the human body. The dysfunction of host-microbiota interactions could lead to complex human disorders. The knowledge on host-microbiota interactions can provide valuable insights into understanding the pathological mechanism of diseases. However, it is time-consuming and costly to identify the disorder-specific microbes from the biological "haystack" merely by routine wet-lab experiments. With the developments in next-generation sequencing and omics-based trials, it is imperative to develop computational prediction models for predicting microbe-disease associations on a large scale. RESULTS: Based on the known microbe-disease associations derived from the Human Microbe-Disease Association Database (HMDAD), the proposed model shows reliable performance with high values of the area under ROC curve (AUC) of 0.9456 and 0.8866 in leave-one-out cross validations and five-fold cross validations, respectively. In case studies of colorectal carcinoma, 80% out of the top-20 predicted microbes have been experimentally confirmed via published literatures. CONCLUSION: Based on the assumption that functionally similar microbes tend to share the similar interaction patterns with human diseases, we here propose a group based computational model of Bayesian disease-oriented ranking to prioritize the most potential microbes associating with various human diseases. Based on the sequence information of genes, two computational approaches (BLAST+ and MEGA 7) are leveraged to measure the microbe-microbe similarity from different perspectives. The disease-disease similarity is calculated by capturing the hierarchy information from the Medical Subject Headings (MeSH) data. The experimental results illustrate the accuracy and effectiveness of the proposed model. This work is expected to facilitate the characterization and identification of promising microbial biomarkers.


Asunto(s)
Algoritmos , Bacterias/clasificación , Biología Computacional , ARN Ribosómico 16S , Teorema de Bayes , Biología Computacional/métodos , Genes de ARNr , Humanos , ARN Ribosómico 16S/genética
11.
Bioinformatics ; 36(3): 851-858, 2020 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-31397851

RESUMEN

MOTIVATION: MicroRNA (miRNA) therapeutics is becoming increasingly important. However, aberrant expression of miRNAs is known to cause drug resistance and can become an obstacle for miRNA-based therapeutics. At present, little is known about associations between miRNA and drug resistance and there is no computational tool available for predicting such association relationship. Since it is known that miRNAs can regulate genes that encode specific proteins that are keys for drug efficacy, we propose here a computational approach, called GCMDR, for finding a three-layer latent factor model that can be used to predict miRNA-drug resistance associations. RESULTS: In this paper, we discuss how the problem of predicting such associations can be formulated as a link prediction problem involving a bipartite attributed graph. GCMDR makes use of the technique of graph convolution to build a latent factor model, which can effectively utilize information of high-dimensional attributes of miRNA/drug in an end-to-end learning scheme. In addition, GCMDR also learns graph embedding features for miRNAs and drugs. We leveraged the data from multiple databases storing miRNA expression profile, drug substructure fingerprints, gene ontology and disease ontology. The test for performance shows that the GCMDR prediction model can achieve AUCs of 0.9301 ± 0.0005, 0.9359 ± 0.0006 and 0.9369 ± 0.0003 based on 2-fold, 5-fold and 10-fold cross validation, respectively. Using this model, we show that the associations between miRNA and drug resistance can be reliably predicted by properly introducing useful side information like miRNA expression profile and drug structure fingerprints. AVAILABILITY AND IMPLEMENTATION: Python codes and dataset are available at https://github.com/yahuang1991polyu/GCMDR/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
MicroARNs , Algoritmos , Área Bajo la Curva , Biología Computacional , Resistencia a Medicamentos
12.
Bioinformatics ; 36(13): 4038-4046, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-31793982

RESUMEN

MOTIVATION: Emerging evidence indicates that circular RNA (circRNA) plays a crucial role in human disease. Using circRNA as biomarker gives rise to a new perspective regarding our diagnosing of diseases and understanding of disease pathogenesis. However, detection of circRNA-disease associations by biological experiments alone is often blind, limited to small scale, high cost and time consuming. Therefore, there is an urgent need for reliable computational methods to rapidly infer the potential circRNA-disease associations on a large scale and to provide the most promising candidates for biological experiments. RESULTS: In this article, we propose an efficient computational method based on multi-source information combined with deep convolutional neural network (CNN) to predict circRNA-disease associations. The method first fuses multi-source information including disease semantic similarity, disease Gaussian interaction profile kernel similarity and circRNA Gaussian interaction profile kernel similarity, and then extracts its hidden deep feature through the CNN and finally sends them to the extreme learning machine classifier for prediction. The 5-fold cross-validation results show that the proposed method achieves 87.21% prediction accuracy with 88.50% sensitivity at the area under the curve of 86.67% on the CIRCR2Disease dataset. In comparison with the state-of-the-art SVM classifier and other feature extraction methods on the same dataset, the proposed model achieves the best results. In addition, we also obtained experimental support for prediction results by searching published literature. As a result, 7 of the top 15 circRNA-disease pairs with the highest scores were confirmed by literature. These results demonstrate that the proposed model is a suitable method for predicting circRNA-disease associations and can provide reliable candidates for biological experiments. AVAILABILITY AND IMPLEMENTATION: The source code and datasets explored in this work are available at https://github.com/look0012/circRNA-Disease-association. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Redes Neurales de la Computación , ARN Circular , Algoritmos , Humanos
13.
PLoS Comput Biol ; 16(5): e1007568, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32433655

RESUMEN

Numerous evidences indicate that Circular RNAs (circRNAs) are widely involved in the occurrence and development of diseases. Identifying the association between circRNAs and diseases plays a crucial role in exploring the pathogenesis of complex diseases and improving the diagnosis and treatment of diseases. However, due to the complex mechanisms between circRNAs and diseases, it is expensive and time-consuming to discover the new circRNA-disease associations by biological experiment. Therefore, there is increasingly urgent need for utilizing the computational methods to predict novel circRNA-disease associations. In this study, we propose a computational method called GCNCDA based on the deep learning Fast learning with Graph Convolutional Networks (FastGCN) algorithm to predict the potential disease-associated circRNAs. Specifically, the method first forms the unified descriptor by fusing disease semantic similarity information, disease and circRNA Gaussian Interaction Profile (GIP) kernel similarity information based on known circRNA-disease associations. The FastGCN algorithm is then used to objectively extract the high-level features contained in the fusion descriptor. Finally, the new circRNA-disease associations are accurately predicted by the Forest by Penalizing Attributes (Forest PA) classifier. The 5-fold cross-validation experiment of GCNCDA achieved 91.2% accuracy with 92.78% sensitivity at the AUC of 90.90% on circR2Disease benchmark dataset. In comparison with different classifier models, feature extraction models and other state-of-the-art methods, GCNCDA shows strong competitiveness. Furthermore, we conducted case study experiments on diseases including breast cancer, glioma and colorectal cancer. The results showed that 16, 15 and 17 of the top 20 candidate circRNAs with the highest prediction scores were respectively confirmed by relevant literature and databases. These results suggest that GCNCDA can effectively predict potential circRNA-disease associations and provide highly credible candidates for biological experiments.


Asunto(s)
Biología Computacional/métodos , Predicción/métodos , ARN Circular/análisis , Algoritmos , Neoplasias de la Mama/genética , Neoplasias Colorrectales/genética , Exactitud de los Datos , Aprendizaje Profundo/tendencias , Glioma/genética , Humanos , MicroARNs/genética , Distribución Normal , Factores de Riesgo , Sensibilidad y Especificidad
14.
PLoS Comput Biol ; 16(5): e1007872, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-32421715

RESUMEN

Found in recent research, tumor cell invasion, proliferation, or other biological processes are controlled by circular RNA. Understanding the association between circRNAs and diseases is an important way to explore the pathogenesis of complex diseases and promote disease-targeted therapy. Most methods, such as k-mer and PSSM, based on the analysis of high-throughput expression data have the tendency to think functionally similar nucleic acid lack direct linear homology regardless of positional information and only quantify nonlinear sequence relationships. However, in many complex diseases, the sequence nonlinear relationship between the pathogenic nucleic acid and ordinary nucleic acid is not much different. Therefore, the analysis of positional information expression can help to predict the complex associations between circRNA and disease. To fill up this gap, we propose a new method, named iCDA-CGR, to predict the circRNA-disease associations. In particular, we introduce circRNA sequence information and quantifies the sequence nonlinear relationship of circRNA by Chaos Game Representation (CGR) technology based on the biological sequence position information for the first time in the circRNA-disease prediction model. In the cross-validation experiment, our method achieved 0.8533 AUC, which was significantly higher than other existing methods. In the validation of independent data sets including circ2Disease, circRNADisease and CRDD, the prediction accuracy of iCDA-CGR reached 95.18%, 90.64% and 95.89%. Moreover, in the case studies, 19 of the top 30 circRNA-disease associations predicted by iCDA-CGR on circRDisease dataset were confirmed by newly published literature. These results demonstrated that iCDA-CGR has outstanding robustness and stability, and can provide highly credible candidates for biological experiments.


Asunto(s)
Predisposición Genética a la Enfermedad , ARN Circular/genética , Biología Computacional/métodos , Bases de Datos Genéticas , Humanos , Dinámicas no Lineales
15.
BMC Med Inform Decis Mak ; 21(Suppl 1): 308, 2021 11 04.
Artículo en Inglés | MEDLINE | ID: mdl-34736437

RESUMEN

BACKGROUND: Disease-drug associations provide essential information for drug discovery and disease treatment. Many disease-drug associations remain unobserved or unknown, and trials to confirm these associations are time-consuming and expensive. To better understand and explore these valuable associations, it would be useful to develop computational methods for predicting unobserved disease-drug associations. With the advent of various datasets describing diseases and drugs, it has become more feasible to build a model describing the potential correlation between disease and drugs. RESULTS: In this work, we propose a new prediction method, called LMFDA, which works in several stages. First, it studies the drug chemical structure, disease MeSH descriptors, disease-related phenotypic terms, and drug-drug interactions. On this basis, similarity networks of different sources are constructed to enrich the representation of drugs and diseases. Based on the fused disease similarity network and drug similarity network, LMFDA calculated the association score of each pair of diseases and drugs in the database. This method achieves good performance on Fdataset and Cdataset, AUROCs were 91.6% and 92.1% respectively, higher than many of the existing computational models. CONCLUSIONS: The novelty of LMFDA lies in the introduction of multimodal fusion using low-rank tensors to fuse multiple similar networks and combine matrix complement technology to predict potential association. We have demonstrated that LMFDA can display excellent network integration ability for accurate disease-drug association inferring and achieve substantial improvement over the advanced approach. Overall, experimental results on two real-world networks dataset demonstrate that LMFDA able to delivers an excellent detecting performance. Results also suggest that perfecting similar networks with as much domain knowledge as possible is a promising direction for drug repositioning.


Asunto(s)
Biología Computacional , Preparaciones Farmacéuticas , Algoritmos , Bases de Datos Factuales , Descubrimiento de Drogas , Reposicionamiento de Medicamentos
16.
J Cell Mol Med ; 24(1): 79-87, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31568653

RESUMEN

LncRNA and miRNA are key molecules in mechanism of competing endogenous RNAs(ceRNA), and their interactions have been discovered with important roles in gene regulation. As supplementary to the identification of lncRNA-miRNA interactions from CLIP-seq experiments, in silico prediction can select the most potential candidates for experimental validation. Although developing computational tool for predicting lncRNA-miRNA interaction is of great importance for deciphering the ceRNA mechanism, little effort has been made towards this direction. In this paper, we propose an approach based on linear neighbour representation to predict lncRNA-miRNA interactions (LNRLMI). Specifically, we first constructed a bipartite network by combining the known interaction network and similarities based on expression profiles of lncRNAs and miRNAs. Based on such a data integration, linear neighbour representation method was introduced to construct a prediction model. To evaluate the prediction performance of the proposed model, k-fold cross validations were implemented. As a result, LNRLMI yielded the average AUCs of 0.8475 ± 0.0032, 0.8960 ± 0.0015 and 0.9069 ± 0.0014 on 2-fold, 5-fold and 10-fold cross validation, respectively. A series of comparison experiments with other methods were also conducted, and the results showed that our method was feasible and effective to predict lncRNA-miRNA interactions via a combination of different types of useful side information. It is anticipated that LNRLMI could be a useful tool for predicting non-coding RNA regulation network that lncRNA and miRNA are involved in.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Regulación de la Expresión Génica , Redes Reguladoras de Genes , MicroARNs/metabolismo , ARN Largo no Codificante/metabolismo , ARN Mensajero/metabolismo , Área Bajo la Curva , Perfilación de la Expresión Génica , Humanos , MicroARNs/genética , ARN Largo no Codificante/genética , ARN Mensajero/genética
17.
Bioinformatics ; 34(5): 812-819, 2018 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-29069317

RESUMEN

Motivation: The interaction of miRNA and lncRNA is known to be important for gene regulations. However, not many computational approaches have been developed to analyze known interactions and predict the unknown ones. Given that there are now more evidences that suggest that lncRNA-miRNA interactions are closely related to their relative expression levels in the form of a titration mechanism, we analyzed the patterns in large-scale expression profiles of known lncRNA-miRNA interactions. From these uncovered patterns, we noticed that lncRNAs tend to interact collaboratively with miRNAs of similar expression profiles, and vice versa. Results: By representing known interaction between lncRNA and miRNA as a bipartite graph, we propose here a technique, called EPLMI, to construct a prediction model from such a graph. EPLMI performs its tasks based on the assumption that lncRNAs that are highly similar to each other tend to have similar interaction or non-interaction patterns with miRNAs and vice versa. The effectiveness of the prediction model so constructed has been evaluated using the latest dataset of lncRNA-miRNA interactions. The results show that the prediction model can achieve AUCs of 0.8522 and 0.8447 ± 0.0017 based on leave-one-out cross validation and 5-fold cross validation. Using this model, we show that lncRNA-miRNA interactions can be reliably predicted. We also show that we can use it to select the most likely lncRNA targets that specific miRNAs would interact with. We believe that the prediction models discovered by EPLMI can yield great insights for further research on ceRNA regulation network. To the best of our knowledge, EPLMI is the first technique that is developed for large-scale lncRNA-miRNA interaction profiling. Availability and implementation: Matlab codes and dataset are available at https://github.com/yahuang1991polyu/EPLMI/. Contact: yu-an.huang@connect.polyu.hk or zhuhongyou@ms.xjb.ac.cn. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica , MicroARNs/metabolismo , ARN Largo no Codificante/metabolismo , Análisis de Secuencia de ARN/métodos , Algoritmos , Área Bajo la Curva , Humanos , Sensibilidad y Especificidad
18.
J Transl Med ; 17(1): 382, 2019 11 20.
Artículo en Inglés | MEDLINE | ID: mdl-31747915

RESUMEN

BACKGROUND: In the process of drug development, computational drug repositioning is effective and resource-saving with regards to its important functions on identifying new drug-disease associations. Recent years have witnessed a great progression in the field of data mining with the advent of deep learning. An increasing number of deep learning-based techniques have been proposed to develop computational tools in bioinformatics. METHODS: Along this promising direction, we here propose a drug repositioning computational method combining the techniques of Sigmoid Kernel and Convolutional Neural Network (SKCNN) which is able to learn new features effectively representing drug-disease associations via its hidden layers. Specifically, we first construct similarity metric of drugs using drug sigmoid similarity and drug structural similarity, and that of disease using disease sigmoid similarity and disease semantic similarity. Based on the combined similarities of drugs and diseases, we then use SKCNN to learn hidden representations for each drug-disease pair whose labels are finally predicted by a classifier based on random forest. RESULTS: A series of experiments were implemented for performance evaluation and their results show that the proposed SKCNN improves the prediction accuracy compared with other state-of-the-art approaches. Case studies of two selected disease are also conducted through which we prove the superior performance of our method in terms of the actual discovery of potential drug indications. CONCLUSION: The aim of this study was to establish an effective predictive model for finding new drug-disease associations. These experimental results show that SKCNN can effectively predict the association between drugs and diseases.


Asunto(s)
Algoritmos , Enfermedad/genética , Reposicionamiento de Medicamentos , Estudios de Asociación Genética , Área Bajo la Curva , Asma/genética , Bases de Datos como Asunto , Humanos , Redes Neurales de la Computación , Obesidad/genética , Curva ROC , Reproducibilidad de los Resultados , Máquina de Vectores de Soporte
19.
Molecules ; 24(16)2019 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-31430892

RESUMEN

The identification of drug-target interactions (DTIs) is a critical step in drug development. Experimental methods that are based on clinical trials to discover DTIs are time-consuming, expensive, and challenging. Therefore, as complementary to it, developing new computational methods for predicting novel DTI is of great significance with regards to saving cost and shortening the development period. In this paper, we present a novel computational model for predicting DTIs, which uses the sequence information of proteins and a rotation forest classifier. Specifically, all of the target protein sequences are first converted to a position-specific scoring matrix (PSSM) to retain evolutionary information. We then use local phase quantization (LPQ) descriptors to extract evolutionary information in the PSSM. On the other hand, substructure fingerprint information is utilized to extract the features of the drug. We finally combine the features of drugs and protein together to represent features of each drug-target pair and use a rotation forest classifier to calculate the scores of interaction possibility, for a global DTI prediction. The experimental results indicate that the proposed model is effective, achieving average accuracies of 89.15%, 86.01%, 82.20%, and 71.67% on four datasets (i.e., enzyme, ion channel, G protein-coupled receptors (GPCR), and nuclear receptor), respectively. In addition, we compared the prediction performance of the rotation forest classifier with another popular classifier, support vector machine, on the same dataset. Several types of methods previously proposed are also implemented on the same datasets for performance comparison. The comparison results demonstrate the superiority of the proposed method to the others. We anticipate that the proposed method can be used as an effective tool for predicting drug-target interactions on a large scale, given the information of protein sequences and drug fingerprints.


Asunto(s)
Interacciones Farmacológicas/fisiología , Proteínas/metabolismo , Secuencia de Aminoácidos , Biología Computacional/métodos , Simulación por Computador , Bases de Datos de Proteínas , Desarrollo de Medicamentos/métodos , Posición Específica de Matrices de Puntuación , Máquina de Vectores de Soporte
20.
Bioinformatics ; 33(5): 733-739, 2017 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-28025197

RESUMEN

Motivation: Accumulating clinical observations have indicated that microbes living in the human body are closely associated with a wide range of human noninfectious diseases, which provides promising insights into the complex disease mechanism understanding. Predicting microbe-disease associations could not only boost human disease diagnostic and prognostic, but also improve the new drug development. However, little efforts have been attempted to understand and predict human microbe-disease associations on a large scale until now. Results: In this work, we constructed a microbe-human disease association network and further developed a novel computational model of KATZ measure for Human Microbe-Disease Association prediction (KATZHMDA) based on the assumption that functionally similar microbes tend to have similar interaction and non-interaction patterns with noninfectious diseases, and vice versa. To our knowledge, KATZHMDA is the first tool for microbe-disease association prediction. The reliable prediction performance could be attributed to the use of KATZ measurement, and the introduction of Gaussian interaction profile kernel similarity for microbes and diseases. LOOCV and k-fold cross validation were implemented to evaluate the effectiveness of this novel computational model based on known microbe-disease associations obtained from HMDAD database. As a result, KATZHMDA achieved reliable performance with average AUCs of 0.8130 ± 0.0054, 0.8301 ± 0.0033 and 0.8382 in 2-fold and 5-fold cross validation and LOOCV framework, respectively. It is anticipated that KATZHMDA could be used to obtain more novel microbes associated with important noninfectious human diseases and therefore benefit drug discovery and human medical improvement. Availability and Implementation: Matlab codes and dataset explored in this work are available at http://dwz.cn/4oX5mS . Contacts: xingchen@amss.ac.cn or zhuhongyou@gmail.com or wangxuesongcumt@163.com. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Enfermedad , Microbiota/fisiología , Modelos Biológicos , Bacterias , Fenómenos Fisiológicos Bacterianos , Biología Computacional/métodos , Interacciones Huésped-Patógeno , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA