Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
BMC Bioinformatics ; 24(1): 481, 2023 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-38104057

RESUMO

BACKGROUND: The rapid emergence of single-cell RNA-seq (scRNA-seq) data presents remarkable opportunities for broad investigations through integration analyses. However, most integration models are black boxes that lack interpretability or are hard to train. RESULTS: To address the above issues, we propose scInterpreter, a deep learning-based interpretable model. scInterpreter substantially outperforms other state-of-the-art (SOTA) models in multiple benchmark datasets. In addition, scInterpreter is extensible and can integrate and annotate atlas scRNA-seq data. We evaluated the robustness of scInterpreter in a variety of situations. Through comparison experiments, we found that with a knowledge prior, the training process can be significantly accelerated. Finally, we conducted interpretability analysis for each dimension (pathway) of cell representation in the embedding space. CONCLUSIONS: The results showed that the cell representations obtained by scInterpreter are full of biological significance. Through weight sorting, we found several new genes related to pathways in PBMC dataset. In general, scInterpreter is an effective and interpretable integration tool. It is expected that scInterpreter will bring great convenience to the study of single-cell transcriptomics.


Assuntos
Leucócitos Mononucleares , Análise da Expressão Gênica de Célula Única , Análise de Sequência de RNA/métodos , Leucócitos Mononucleares/metabolismo , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Análise por Conglomerados
2.
Brief Bioinform ; 22(2): 2085-2095, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32232320

RESUMO

Effectively representing Medical Subject Headings (MeSH) headings (terms) such as disease and drug as discriminative vectors could greatly improve the performance of downstream computational prediction models. However, these terms are often abstract and difficult to quantify. In this paper, we converted the MeSH tree structure into a relationship network and applied several graph embedding algorithms on it to represent these terms. Specifically, the relationship network consisting of nodes (MeSH headings) and edges (relationships), which can be constructed by the tree num. Then, five graph embedding algorithms including DeepWalk, LINE, SDNE, LAP and HOPE were implemented on the relationship network to represent MeSH headings as vectors. In order to evaluate the performance of the proposed methods, we carried out the node classification and relationship prediction tasks. The results show that the MeSH headings characterized by graph embedding algorithms can not only be treated as an independent carrier for representation, but also can be utilized as additional information to enhance the representation ability of vectors. Thus, it can serve as an input and continue to play a significant role in any computational models related to disease, drug, microbe, etc. Besides, our method holds great hope to inspire relevant researchers to study the representation of terms in this network perspective.


Assuntos
Algoritmos , Medical Subject Headings , Simulação por Computador , Sistemas de Liberação de Medicamentos , Predisposição Genética para Doença , Humanos , MicroRNAs/genética , Semântica
3.
BMC Bioinformatics ; 22(Suppl 5): 622, 2022 Mar 22.
Artigo em Inglês | MEDLINE | ID: mdl-35317723

RESUMO

BACKGROUND: lncRNAs play a critical role in numerous biological processes and life activities, especially diseases. Considering that traditional wet experiments for identifying uncovered lncRNA-disease associations is limited in terms of time consumption and labor cost. It is imperative to construct reliable and efficient computational models as addition for practice. Deep learning technologies have been proved to make impressive contributions in many areas, but the feasibility of it in bioinformatics has not been adequately verified. RESULTS: In this paper, a machine learning-based model called LDACE was proposed to predict potential lncRNA-disease associations by combining Extreme Learning Machine (ELM) and Convolutional Neural Network (CNN). Specifically, the representation vectors are constructed by integrating multiple types of biology information including functional similarity and semantic similarity. Then, CNN is applied to mine both local and global features. Finally, ELM is chosen to carry out the prediction task to detect the potential lncRNA-disease associations. The proposed method achieved remarkable Area Under Receiver Operating Characteristic Curve of 0.9086 in Leave-one-out cross-validation and 0.8994 in fivefold cross-validation, respectively. In addition, 2 kinds of case studies based on lung cancer and endometrial cancer indicate the robustness and efficiency of LDACE even in a real environment. CONCLUSIONS: Substantial results demonstrated that the proposed model is expected to be an auxiliary tool to guide and assist biomedical research, and the close integration of deep learning and biology big data will provide life sciences with novel insights.


Assuntos
RNA Longo não Codificante , Biologia Computacional/métodos , Aprendizado de Máquina , Redes Neurais de Computação , RNA Longo não Codificante/genética , Curva ROC
4.
BMC Med Inform Decis Mak ; 21(Suppl 1): 308, 2021 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-34736437

RESUMO

BACKGROUND: Disease-drug associations provide essential information for drug discovery and disease treatment. Many disease-drug associations remain unobserved or unknown, and trials to confirm these associations are time-consuming and expensive. To better understand and explore these valuable associations, it would be useful to develop computational methods for predicting unobserved disease-drug associations. With the advent of various datasets describing diseases and drugs, it has become more feasible to build a model describing the potential correlation between disease and drugs. RESULTS: In this work, we propose a new prediction method, called LMFDA, which works in several stages. First, it studies the drug chemical structure, disease MeSH descriptors, disease-related phenotypic terms, and drug-drug interactions. On this basis, similarity networks of different sources are constructed to enrich the representation of drugs and diseases. Based on the fused disease similarity network and drug similarity network, LMFDA calculated the association score of each pair of diseases and drugs in the database. This method achieves good performance on Fdataset and Cdataset, AUROCs were 91.6% and 92.1% respectively, higher than many of the existing computational models. CONCLUSIONS: The novelty of LMFDA lies in the introduction of multimodal fusion using low-rank tensors to fuse multiple similar networks and combine matrix complement technology to predict potential association. We have demonstrated that LMFDA can display excellent network integration ability for accurate disease-drug association inferring and achieve substantial improvement over the advanced approach. Overall, experimental results on two real-world networks dataset demonstrate that LMFDA able to delivers an excellent detecting performance. Results also suggest that perfecting similar networks with as much domain knowledge as possible is a promising direction for drug repositioning.


Assuntos
Biologia Computacional , Preparações Farmacêuticas , Algoritmos , Bases de Dados Factuais , Descoberta de Drogas , Reposicionamento de Medicamentos
5.
BMC Bioinformatics ; 21(1): 401, 2020 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-32912137

RESUMO

BACKGROUND: As an important non-coding RNA, microRNA (miRNA) plays a significant role in a series of life processes and is closely associated with a variety of Human diseases. Hence, identification of potential miRNA-disease associations can make great contributions to the research and treatment of Human diseases. However, to our knowledge, many existing computational methods only utilize the single type of known association information between miRNAs and diseases to predict their potential associations, without focusing on their interactions or associations with other types of molecules. RESULTS: In this paper, we propose a network embedding-based method for predicting miRNA-disease associations by preserving behavior and attribute information. Firstly, a heterogeneous network is constructed by integrating known associations among miRNA, protein and disease, and the network representation method Learning Graph Representations with Global Structural Information (GraRep) is implemented to learn the behavior information of miRNAs and diseases in the network. Then, the behavior information of miRNAs and diseases is combined with the attribute information of them to represent miRNA-disease association pairs. Finally, the prediction model is established based on the Random Forest algorithm. Under the five-fold cross validation, the proposed NEMPD model obtained average 85.41% prediction accuracy with 80.96% sensitivity at the AUC of 91.58%. Furthermore, the performance of NEMPD is also validated by the case studies. Among the top 50 predicted disease-related miRNAs, 48 (breast neoplasms), 47 (colon neoplasms), 47 (lung neoplasms) were confirmed by two other databases. CONCLUSIONS: The proposed NEMPD model has a good performance in predicting the potential associations between miRNAs and diseases, and has great potency in the field of miRNA-disease association prediction in the future.


Assuntos
Neoplasias da Mama/diagnóstico , Neoplasias do Colo/diagnóstico , Biologia Computacional/métodos , Neoplasias Pulmonares/diagnóstico , MicroRNAs/metabolismo , Algoritmos , Área Sob a Curva , Neoplasias da Mama/genética , Neoplasias do Colo/genética , Feminino , Humanos , Neoplasias Pulmonares/genética , MicroRNAs/genética , Curva ROC
6.
J Cell Mol Med ; 24(1): 79-87, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31568653

RESUMO

LncRNA and miRNA are key molecules in mechanism of competing endogenous RNAs(ceRNA), and their interactions have been discovered with important roles in gene regulation. As supplementary to the identification of lncRNA-miRNA interactions from CLIP-seq experiments, in silico prediction can select the most potential candidates for experimental validation. Although developing computational tool for predicting lncRNA-miRNA interaction is of great importance for deciphering the ceRNA mechanism, little effort has been made towards this direction. In this paper, we propose an approach based on linear neighbour representation to predict lncRNA-miRNA interactions (LNRLMI). Specifically, we first constructed a bipartite network by combining the known interaction network and similarities based on expression profiles of lncRNAs and miRNAs. Based on such a data integration, linear neighbour representation method was introduced to construct a prediction model. To evaluate the prediction performance of the proposed model, k-fold cross validations were implemented. As a result, LNRLMI yielded the average AUCs of 0.8475 ± 0.0032, 0.8960 ± 0.0015 and 0.9069 ± 0.0014 on 2-fold, 5-fold and 10-fold cross validation, respectively. A series of comparison experiments with other methods were also conducted, and the results showed that our method was feasible and effective to predict lncRNA-miRNA interactions via a combination of different types of useful side information. It is anticipated that LNRLMI could be a useful tool for predicting non-coding RNA regulation network that lncRNA and miRNA are involved in.


Assuntos
Algoritmos , Biologia Computacional/métodos , Regulação da Expressão Gênica , Redes Reguladoras de Genes , MicroRNAs/metabolismo , RNA Longo não Codificante/metabolismo , RNA Mensageiro/metabolismo , Área Sob a Curva , Perfilação da Expressão Gênica , Humanos , MicroRNAs/genética , RNA Longo não Codificante/genética , RNA Mensageiro/genética
7.
BMC Med Inform Decis Mak ; 20(Suppl 2): 49, 2020 03 18.
Artigo em Inglês | MEDLINE | ID: mdl-32183788

RESUMO

BACKGROUND: The key to modern drug discovery is to find, identify and prepare drug molecular targets. However, due to the influence of throughput, precision and cost, traditional experimental methods are difficult to be widely used to infer these potential Drug-Target Interactions (DTIs). Therefore, it is urgent to develop effective computational methods to validate the interaction between drugs and target. METHODS: We developed a deep learning-based model for DTIs prediction. The proteins evolutionary features are extracted via Position Specific Scoring Matrix (PSSM) and Legendre Moment (LM) and associated with drugs molecular substructure fingerprints to form feature vectors of drug-target pairs. Then we utilized the Sparse Principal Component Analysis (SPCA) to compress the features of drugs and proteins into a uniform vector space. Lastly, the deep long short-term memory (DeepLSTM) was constructed for carrying out prediction. RESULTS: A significant improvement in DTIs prediction performance can be observed on experimental results, with AUC of 0.9951, 0.9705, 0.9951, 0.9206, respectively, on four classes important drug-target datasets. Further experiments preliminary proves that the proposed characterization scheme has great advantage on feature expression and recognition. We also have shown that the proposed method can work well with small dataset. CONCLUSION: The results demonstration that the proposed approach has a great advantage over state-of-the-art drug-target predictor. To the best of our knowledge, this study first tests the potential of deep learning method with memory and Turing completeness in DTIs prediction.


Assuntos
Aprendizado Profundo , Memória de Curto Prazo/efeitos dos fármacos , Redes Neurais de Computação , Preparações Farmacêuticas , Desenvolvimento de Medicamentos , Humanos , Análise de Componente Principal , Proteínas
8.
BMC Genomics ; 20(Suppl 13): 928, 2019 Dec 27.
Artigo em Inglês | MEDLINE | ID: mdl-31881833

RESUMO

BACKGROUND: Identification of protein-protein interactions (PPIs) is crucial for understanding biological processes and investigating the cellular functions of genes. Self-interacting proteins (SIPs) are those in which more than two identical proteins can interact with each other and they are the specific type of PPIs. More and more researchers draw attention to the SIPs detection, and several prediction model have been proposed, but there are still some problems. Hence, there is an urgent need to explore a efficient computational model for SIPs prediction. RESULTS: In this study, we developed an effective model to predict SIPs, called RP-FIRF, which merges the Random Projection (RP) classifier and Finite Impulse Response Filter (FIRF) together. More specifically, each protein sequence was firstly transformed into the Position Specific Scoring Matrix (PSSM) by exploiting Position Specific Iterated BLAST (PSI-BLAST). Then, to effectively extract the discriminary SIPs feature to improve the performance of SIPs prediction, a FIRF method was used on PSSM. The R'classifier was proposed to execute the classification and predict novel SIPs. We evaluated the performance of the proposed RP-FIRF model and compared it with the state-of-the-art support vector machine (SVM) on human and yeast datasets, respectively. The proposed model can achieve high average accuracies of 97.89 and 97.35% using five-fold cross-validation. To further evaluate the high performance of the proposed method, we also compared it with other six exiting methods, the experimental results demonstrated that the capacity of our model surpass that of the other previous approaches. CONCLUSION: Experimental results show that self-interacting proteins are accurately well-predicted by the proposed model on human and yeast datasets, respectively. It fully show that the proposed model can predict the SIPs effectively and sufficiently. Thus, RP-FIRF model is an automatic decision support method which should provide useful insights into the recognition of SIPs.


Assuntos
Proteínas/metabolismo , Máquina de Vetores de Suporte , Área Sob a Curva , Bases de Dados de Proteínas , Humanos , Análise de Componente Principal , Mapas de Interação de Proteínas , Proteínas/química , Curva ROC , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo
9.
Int J Mol Sci ; 20(4)2019 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-30795499

RESUMO

It is significant for biological cells to predict self-interacting proteins (SIPs) in the field of bioinformatics. SIPs mean that two or more identical proteins can interact with each other by one gene expression. This plays a major role in the evolution of protein‒protein interactions (PPIs) and cellular functions. Owing to the limitation of the experimental identification of self-interacting proteins, it is more and more significant to develop a useful biological tool for the prediction of SIPs from protein sequence information. Therefore, we propose a novel prediction model called RP-FFT that merges the Random Projection (RP) model and Fast Fourier Transform (FFT) for detecting SIPs. First, each protein sequence was transformed into a Position Specific Scoring Matrix (PSSM) using the Position Specific Iterated BLAST (PSI-BLAST). Second, the features of protein sequences were extracted by the FFT method on PSSM. Lastly, we evaluated the performance of RP-FFT and compared the RP classifier with the state-of-the-art support vector machine (SVM) classifier and other existing methods on the human and yeast datasets; after the five-fold cross-validation, the RP-FFT model can obtain high average accuracies of 96.28% and 91.87% on the human and yeast datasets, respectively. The experimental results demonstrated that our RP-FFT prediction model is reasonable and robust.


Assuntos
Análise de Fourier , Análise de Sequência de Proteína/métodos , Máquina de Vetores de Suporte , Animais , Sítios de Ligação , Humanos , Ligação Proteica , Proteínas de Saccharomyces cerevisiae/química
10.
IEEE J Biomed Health Inform ; 27(9): 4611-4622, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37368803

RESUMO

The abuse of traditional antibiotics has led to increased resistance of bacteria and viruses. Efficient therapeutic peptide prediction is critical for peptide drug discovery. However, most of the existing methods only make effective predictions for one class of therapeutic peptides. It is worth noting that currently no predictive method considers sequence length information as a distinct feature of therapeutic peptides. In this article, a novel deep learning approach with matrix factorization for predicting therapeutic peptides (DeepTPpred) by integrating length information are proposed. The matrix factorization layer can learn the potential features of the encoded sequence through the mechanism of first compression and then restoration. And the length features of the sequence of therapeutic peptides are embedded with encoded amino acid sequences. To automatically learn therapeutic peptide predictions, these latent features are input into the neural networks with self-attention mechanism. On eight therapeutic peptide datasets, DeepTPpred achieved excellent prediction results. Based on these datasets, we first integrated eight datasets to obtain a full therapeutic peptide integration dataset. Then, we obtained two functional integration datasets based on the functional similarity of the peptides. Finally, we also conduct experiments on the latest versions of the ACP and CPP datasets. Overall, the experimental results show that our work is effective for the identification of therapeutic peptides.


Assuntos
Aprendizado Profundo , Humanos , Peptídeos/química , Redes Neurais de Computação , Descoberta de Drogas
11.
Mol Ther Nucleic Acids ; 32: 721-728, 2023 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-37251691

RESUMO

Identifying proteins that interact with drug compounds has been recognized as an important part in the process of drug discovery. Despite extensive efforts that have been invested in predicting compound-protein interactions (CPIs), existing traditional methods still face several challenges. The computer-aided methods can identify high-quality CPI candidates instantaneously. In this research, a novel model is named GraphCPIs, proposed to improve the CPI prediction accuracy. First, we establish the adjacent matrix of entities connected to both drugs and proteins from the collected dataset. Then, the feature representation of nodes could be obtained by using the graph convolutional network and Grarep embedding model. Finally, an extreme gradient boosting (XGBoost) classifier is exploited to identify potential CPIs based on the stacked two kinds of features. The results demonstrate that GraphCPIs achieves the best performance, whose average predictive accuracy rate reaches 90.09%, average area under the receiver operating characteristic curve is 0.9572, and the average area under the precision and recall curve is 0.9621. Moreover, comparative experiments reveal that our method surpasses the state-of-the-art approaches in the field of accuracy and other indicators with the same experimental environment. We believe that the GraphCPIs model will provide valuable insight to discover novel candidate drug-related proteins.

12.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 2610-2618, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35675235

RESUMO

Accumulating evidences show that circular RNAs (circRNAs) play an important role in regulating gene expression, and involve in many complex human diseases. Identifying associations of circRNA with disease helps to understand the pathogenesis, treatment and diagnosis of complex diseases. Since inferring circRNA-disease associations by biological experiments is costly and time-consuming, there is an urgently need to develop a computational model to identify the association between them. In this paper, we proposed a novel method named KNN-NMF, which combines K nearest neighbors with nonnegative matrix factorization to infer associations between circRNA and disease (KNN-NMF). Frist, we compute the Gaussian Interaction Profile (GIP) kernel similarity of circRNA and disease, the semantic similarity of disease, respectively. Then, the circRNA-disease new interaction profiles are established using weight K nearest neighbors to reduce the false negative association impact on prediction performance. Finally, Nonnegative Matrix Factorization is implemented to predict associations of circRNA with disease. The experiment results indicate that the prediction performance of KNN-NMF outperforms the competing methods under five-fold cross-validation. Moreover, case studies of two common diseases further show that KNN-NMF can identify potential circRNA-disease associations effectively.

13.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3663-3672, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34699364

RESUMO

The abuse of traditional antibiotics has led to an increase in the resistance of bacteria and viruses. Similar to the function of antibacterial peptides, bacteriocins are more common as a kind of peptides produced by bacteria that have bactericidal or bacterial effects. More importantly, the marine environment is one of the most abundant resources for extracting marine microbial bacteriocins (MMBs). Identifying bacteriocins from marine microorganisms is a common goal for the development of new drugs. Effective use of MMBs will greatly alleviate the current antibiotic abuse problem. In this work, deep learning is used to identify meaningful MMBs. We propose a random multi-scale convolutional neural network method. In the scale setting, we set a random model to update the scale value randomly. The scale selection method can reduce the contingency caused by artificial setting under certain conditions, thereby making the method more extensive. The results show that the classification performance of the proposed method is better than the state-of-the-art classification methods. In addition, some potential MMBs are predicted, and some different sequence analyses are performed on these candidates. It is worth mentioning that after sequence analysis, the HNH endonucleases of different marine bacteria are considered as potential bacteriocins.


Assuntos
Bactérias , Bacteriocinas , Descoberta de Drogas , Redes Neurais de Computação , Antibacterianos/química , Bactérias/química , Bacteriocinas/química , Bacteriocinas/classificação , Peptídeos , Descoberta de Drogas/métodos , Organismos Aquáticos/química , Análise de Sequência de DNA
14.
Biology (Basel) ; 11(5)2022 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-35625468

RESUMO

The key to new drug discovery and development is first and foremost the search for molecular targets of drugs, thus advancing drug discovery and drug repositioning. However, traditional drug-target interactions (DTIs) is a costly, lengthy, high-risk, and low-success-rate system project. Therefore, more and more pharmaceutical companies are trying to use computational technologies to screen existing drug molecules and mine new drugs, leading to accelerating new drug development. In the current study, we designed a deep learning computational model MSPEDTI based on Molecular Structure and Protein Evolutionary to predict the potential DTIs. The model first fuses protein evolutionary information and drug structure information, then a deep learning convolutional neural network (CNN) to mine its hidden features, and finally accurately predicts the associated DTIs by extreme learning machine (ELM). In cross-validation experiments, MSPEDTI achieved 94.19%, 90.95%, 87.95%, and 86.11% prediction accuracy in the gold-standard datasets enzymes, ion channels, G-protein-coupled receptors (GPCRs), and nuclear receptors, respectively. MSPEDTI showed its competitive ability in ablation experiments and comparison with previous excellent methods. Additionally, 7 of 10 potential DTIs predicted by MSPEDTI were substantiated by the classical database. These excellent outcomes demonstrate the ability of MSPEDTI to provide reliable drug candidate targets and strongly facilitate the development of drug repositioning and drug development.

15.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3144-3153, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34882561

RESUMO

Discovery of transcription factor binding sites (TFBSs) is of primary importance for understanding the underlying binding mechanic and gene regulation process. Growing evidence indicates that apart from the primary DNA sequences, DNA shape landscape has a significant influence on transcription factor binding preference. To effectively model the co-influence of sequence and shape features, we emphasize the importance of position information of sequence motif and shape pattern. In this paper, we propose a novel deep learning-based architecture, named hybridShape eDeepCNN, for TFBS prediction which integrates DNA sequence and shape information in a spatially aligned manner. Our model utilizes the power of the multi-layer convolutional neural network and constructs an independent subnetwork to adapt for the distinct data distribution of heterogeneous features. Besides, we explore the usage of continuous embedding vectors as the representation of DNA sequences. Based on the experiments on 20 in-vitro datasets derived from universal protein binding microarrays (uPBMs), we demonstrate the superiority of our proposed method and validate the underlying design logic.


Assuntos
Proteínas de Ligação a DNA , Fatores de Transcrição , Ligação Proteica , Fatores de Transcrição/metabolismo , Sítios de Ligação/genética , Proteínas de Ligação a DNA/metabolismo , DNA/química
16.
Cancers (Basel) ; 13(9)2021 Apr 27.
Artigo em Inglês | MEDLINE | ID: mdl-33925568

RESUMO

Identification of drug-target interactions (DTIs) is a significant step in the drug discovery or repositioning process. Compared with the time-consuming and labor-intensive in vivo experimental methods, the computational models can provide high-quality DTI candidates in an instant. In this study, we propose a novel method called LGDTI to predict DTIs based on large-scale graph representation learning. LGDTI can capture the local and global structural information of the graph. Specifically, the first-order neighbor information of nodes can be aggregated by the graph convolutional network (GCN); on the other hand, the high-order neighbor information of nodes can be learned by the graph embedding method called DeepWalk. Finally, the two kinds of feature are fed into the random forest classifier to train and predict potential DTIs. The results show that our method obtained area under the receiver operating characteristic curve (AUROC) of 0.9455 and area under the precision-recall curve (AUPR) of 0.9491 under 5-fold cross-validation. Moreover, we compare the presented method with some existing state-of-the-art methods. These results imply that LGDTI can efficiently and robustly capture undiscovered DTIs. Moreover, the proposed model is expected to bring new inspiration and provide novel perspectives to relevant researchers.

17.
ACS Omega ; 5(28): 17022-17032, 2020 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-32715187

RESUMO

Analysis of miRNA-target mRNA interaction (MTI) is of crucial significance in discovering new target candidates for miRNAs. However, the biological experiments for identifying MTIs have a high false positive rate and are high-priced, time-consuming, and arduous. It is an urgent task to develop effective computational approaches to enhance the investigation of miRNA-target mRNA relationships. In this study, a novel method called MIPDH is developed for miRNA-mRNA interaction prediction by using DeepWalk on a heterogeneous network. More specifically, MIPDH extracts two kinds of features, in which a biological behavior feature is learned using a network embedding algorithm on a constructed heterogeneous network derived from 17 kinds of associations among drug, disease, and 6 kinds of biomolecules, and the attribute feature is learned using the k-mer method on sequences of miRNAs and target mRNAs. Then, a random forest classifier is trained on the features combined with the biological behavior feature and attribute feature. When implementing a 5-fold cross-validation experiment, MIPDH achieved an average accuracy, sensitivity, specificity and AUC of 75.85, 74.37, 77.33%, and 0.8044, respectively. To further evaluate the performance of MIPDH, other classifiers and feature descriptors are conducted for comparisons. MIPDH can achieve a better performance. Additionally, case studies on hsa-miR-106b-5p, hsa-let-7d-5p, and hsa-let-7e-5p are also implemented. As a result, 14, 9, and 9 out of the top 15 targets that interacted with these miRNAs were verified using the experimental literature or other databases. All these prediction results indicate that MIPDH is an effective method for predicting miRNA-target mRNA interactions.

18.
Commun Biol ; 3(1): 118, 2020 03 13.
Artigo em Inglês | MEDLINE | ID: mdl-32170157

RESUMO

Abundant life activities are maintained by various biomolecule relationships in human cells. However, many previous computational models only focus on isolated objects, without considering that cell is a complete entity with ample functions. Inspired by holism, we constructed a Molecular Associations Network (MAN) including 9 kinds of relationships among 5 types of biomolecules, and a prediction model called MAN-GF. More specifically, biomolecules can be represented as vectors by the algorithm called biomarker2vec which combines 2 kinds of information involved the attribute learned by k-mer, etc and the behavior learned by Graph Factorization (GF). Then, Random Forest classifier is applied for training, validation and test. MAN-GF obtained a substantial performance with AUC of 0.9647 and AUPR of 0.9521 under 5-fold Cross-validation. The results imply that MAN-GF with an overall perspective can act as ancillary for practice. Besides, it holds great hope to provide a new insight to elucidate the regulatory mechanisms.


Assuntos
Neoplasias do Colo/metabolismo , Biologia Computacional/métodos , MicroRNAs/metabolismo , Modelos Biológicos , Mapas de Interação de Proteínas , Proteínas/metabolismo , RNA Longo não Codificante/metabolismo , Algoritmos , Área Sob a Curva , Confiabilidade dos Dados , Mineração de Dados/métodos , Humanos , Curva ROC , Sensibilidade e Especificidade
19.
Gigascience ; 9(6)2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32533701

RESUMO

BACKGROUND: The explosive growth of genomic, chemical, and pathological data provides new opportunities and challenges for humans to thoroughly understand life activities in cells. However, there exist few computational models that aggregate various bioentities to comprehensively reveal the physical and functional landscape of biological systems. RESULTS: We constructed a molecular association network, which contains 18 edges (relationships) between 8 nodes (bioentities). Based on this, we propose Bioentity2vec, a new method for representing bioentities, which integrates information about the attributes and behaviors of a bioentity. Applying the random forest classifier, we achieved promising performance on 18 relationships, with an area under the curve of 0.9608 and an area under the precision-recall curve of 0.9572. CONCLUSIONS: Our study shows that constructing a network with rich topological and biological information is important for systematic understanding of the biological landscape at the molecular level. Our results show that Bioentity2vec can effectively represent biological entities and provides easily distinguishable information about classification tasks. Our method is also able to simultaneously predict relationships between single types and multiple types, which will accelerate progress in biological experimental research and industrial product development.


Assuntos
Algoritmos , Biologia Computacional/métodos , Software , Biologia de Sistemas/métodos , Perfilação da Expressão Gênica/métodos , Curva ROC
20.
Artigo em Inglês | MEDLINE | ID: mdl-32582646

RESUMO

Predicting drug-target interactions (DTIs) is crucial in innovative drug discovery, drug repositioning and other fields. However, there are many shortcomings for predicting DTIs using traditional biological experimental methods, such as the high-cost, time-consumption, low efficiency, and so on, which make these methods difficult to widely apply. As a supplement, the in silico method can provide helpful information for predictions of DTIs in a timely manner. In this work, a deep walk embedding method is developed for predicting DTIs from a multi-molecular network. More specifically, a multi-molecular network, also called molecular associations network, is constructed by integrating the associations among drug, protein, disease, lncRNA, and miRNA. Then, each node can be represented as a behavior feature vector by using a deep walk embedding method. Finally, we compared behavior features with traditional attribute features on an integrated dataset by using various classifiers. The experimental results revealed that the behavior feature could be performed better on different classifiers, especially on the random forest classifier. It is also demonstrated that the use of behavior information is very helpful for addressing the problem of sequences containing both self-interacting and non-interacting pairs of proteins. This work is not only extremely suitable for predicting DTIs, but also provides a new perspective for the prediction of other biomolecules' associations.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa