Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38271483

RESUMO

The advent of single-cell sequencing technologies has revolutionized cell biology studies. However, integrative analyses of diverse single-cell data face serious challenges, including technological noise, sample heterogeneity, and different modalities and species. To address these problems, we propose scCorrector, a variational autoencoder-based model that can integrate single-cell data from different studies and map them into a common space. Specifically, we designed a Study Specific Adaptive Normalization for each study in decoder to implement these features. scCorrector substantially achieves competitive and robust performance compared with state-of-the-art methods and brings novel insights under various circumstances (e.g. various batches, multi-omics, cross-species, and development stages). In addition, the integration of single-cell data and spatial data makes it possible to transfer information between different studies, which greatly expand the narrow range of genes covered by MERFISH technology. In summary, scCorrector can efficiently integrate multi-study single-cell datasets, thereby providing broad opportunities to tackle challenges emerging from noisy resources.

2.
BMC Bioinformatics ; 24(1): 481, 2023 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-38104057

RESUMO

BACKGROUND: The rapid emergence of single-cell RNA-seq (scRNA-seq) data presents remarkable opportunities for broad investigations through integration analyses. However, most integration models are black boxes that lack interpretability or are hard to train. RESULTS: To address the above issues, we propose scInterpreter, a deep learning-based interpretable model. scInterpreter substantially outperforms other state-of-the-art (SOTA) models in multiple benchmark datasets. In addition, scInterpreter is extensible and can integrate and annotate atlas scRNA-seq data. We evaluated the robustness of scInterpreter in a variety of situations. Through comparison experiments, we found that with a knowledge prior, the training process can be significantly accelerated. Finally, we conducted interpretability analysis for each dimension (pathway) of cell representation in the embedding space. CONCLUSIONS: The results showed that the cell representations obtained by scInterpreter are full of biological significance. Through weight sorting, we found several new genes related to pathways in PBMC dataset. In general, scInterpreter is an effective and interpretable integration tool. It is expected that scInterpreter will bring great convenience to the study of single-cell transcriptomics.


Assuntos
Leucócitos Mononucleares , Análise da Expressão Gênica de Célula Única , Análise de Sequência de RNA/métodos , Leucócitos Mononucleares/metabolismo , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Análise por Conglomerados
3.
Brief Bioinform ; 22(2): 2085-2095, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32232320

RESUMO

Effectively representing Medical Subject Headings (MeSH) headings (terms) such as disease and drug as discriminative vectors could greatly improve the performance of downstream computational prediction models. However, these terms are often abstract and difficult to quantify. In this paper, we converted the MeSH tree structure into a relationship network and applied several graph embedding algorithms on it to represent these terms. Specifically, the relationship network consisting of nodes (MeSH headings) and edges (relationships), which can be constructed by the tree num. Then, five graph embedding algorithms including DeepWalk, LINE, SDNE, LAP and HOPE were implemented on the relationship network to represent MeSH headings as vectors. In order to evaluate the performance of the proposed methods, we carried out the node classification and relationship prediction tasks. The results show that the MeSH headings characterized by graph embedding algorithms can not only be treated as an independent carrier for representation, but also can be utilized as additional information to enhance the representation ability of vectors. Thus, it can serve as an input and continue to play a significant role in any computational models related to disease, drug, microbe, etc. Besides, our method holds great hope to inspire relevant researchers to study the representation of terms in this network perspective.


Assuntos
Algoritmos , Medical Subject Headings , Simulação por Computador , Sistemas de Liberação de Medicamentos , Predisposição Genética para Doença , Humanos , MicroRNAs/genética , Semântica
4.
BMC Bioinformatics ; 22(Suppl 5): 622, 2022 Mar 22.
Artigo em Inglês | MEDLINE | ID: mdl-35317723

RESUMO

BACKGROUND: lncRNAs play a critical role in numerous biological processes and life activities, especially diseases. Considering that traditional wet experiments for identifying uncovered lncRNA-disease associations is limited in terms of time consumption and labor cost. It is imperative to construct reliable and efficient computational models as addition for practice. Deep learning technologies have been proved to make impressive contributions in many areas, but the feasibility of it in bioinformatics has not been adequately verified. RESULTS: In this paper, a machine learning-based model called LDACE was proposed to predict potential lncRNA-disease associations by combining Extreme Learning Machine (ELM) and Convolutional Neural Network (CNN). Specifically, the representation vectors are constructed by integrating multiple types of biology information including functional similarity and semantic similarity. Then, CNN is applied to mine both local and global features. Finally, ELM is chosen to carry out the prediction task to detect the potential lncRNA-disease associations. The proposed method achieved remarkable Area Under Receiver Operating Characteristic Curve of 0.9086 in Leave-one-out cross-validation and 0.8994 in fivefold cross-validation, respectively. In addition, 2 kinds of case studies based on lung cancer and endometrial cancer indicate the robustness and efficiency of LDACE even in a real environment. CONCLUSIONS: Substantial results demonstrated that the proposed model is expected to be an auxiliary tool to guide and assist biomedical research, and the close integration of deep learning and biology big data will provide life sciences with novel insights.


Assuntos
RNA Longo não Codificante , Biologia Computacional/métodos , Aprendizado de Máquina , Redes Neurais de Computação , RNA Longo não Codificante/genética , Curva ROC
5.
PLoS Comput Biol ; 16(5): e1007872, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32421715

RESUMO

Found in recent research, tumor cell invasion, proliferation, or other biological processes are controlled by circular RNA. Understanding the association between circRNAs and diseases is an important way to explore the pathogenesis of complex diseases and promote disease-targeted therapy. Most methods, such as k-mer and PSSM, based on the analysis of high-throughput expression data have the tendency to think functionally similar nucleic acid lack direct linear homology regardless of positional information and only quantify nonlinear sequence relationships. However, in many complex diseases, the sequence nonlinear relationship between the pathogenic nucleic acid and ordinary nucleic acid is not much different. Therefore, the analysis of positional information expression can help to predict the complex associations between circRNA and disease. To fill up this gap, we propose a new method, named iCDA-CGR, to predict the circRNA-disease associations. In particular, we introduce circRNA sequence information and quantifies the sequence nonlinear relationship of circRNA by Chaos Game Representation (CGR) technology based on the biological sequence position information for the first time in the circRNA-disease prediction model. In the cross-validation experiment, our method achieved 0.8533 AUC, which was significantly higher than other existing methods. In the validation of independent data sets including circ2Disease, circRNADisease and CRDD, the prediction accuracy of iCDA-CGR reached 95.18%, 90.64% and 95.89%. Moreover, in the case studies, 19 of the top 30 circRNA-disease associations predicted by iCDA-CGR on circRDisease dataset were confirmed by newly published literature. These results demonstrated that iCDA-CGR has outstanding robustness and stability, and can provide highly credible candidates for biological experiments.


Assuntos
Predisposição Genética para Doença , RNA Circular/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Humanos , Dinâmica não Linear
6.
BMC Bioinformatics ; 21(1): 60, 2020 Feb 18.
Artigo em Inglês | MEDLINE | ID: mdl-32070279

RESUMO

BACKGROUND: The interactions between non-coding RNAs (ncRNA) and proteins play an essential role in many biological processes. Several high-throughput experimental methods have been applied to detect ncRNA-protein interactions. However, these methods are time-consuming and expensive. Accurate and efficient computational methods can assist and accelerate the study of ncRNA-protein interactions. RESULTS: In this work, we develop a stacking ensemble computational framework, RPI-SE, for effectively predicting ncRNA-protein interactions. More specifically, to fully exploit protein and RNA sequence feature, Position Weight Matrix combined with Legendre Moments is applied to obtain protein evolutionary information. Meanwhile, k-mer sparse matrix is employed to extract efficient feature of ncRNA sequences. Finally, an ensemble learning framework integrated different types of base classifier is developed to predict ncRNA-protein interactions using these discriminative features. The accuracy and robustness of RPI-SE was evaluated on three benchmark data sets under five-fold cross-validation and compared with other state-of-the-art methods. CONCLUSIONS: The results demonstrate that RPI-SE is competent for ncRNA-protein interactions prediction task with high accuracy and robustness. It's anticipated that this work can provide a computational prediction tool to advance ncRNA-protein interactions related biomedical research.


Assuntos
RNA não Traduzido/metabolismo , Proteínas de Ligação a RNA/metabolismo , Análise de Sequência de Proteína/métodos , Análise de Sequência de RNA/métodos , Matrizes de Pontuação de Posição Específica , RNA não Traduzido/química , Proteínas de Ligação a RNA/química
7.
J Transl Med ; 18(1): 347, 2020 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-32894154

RESUMO

BACKGROUND: The prediction of potential drug-target interactions (DTIs) not only provides a better comprehension of biological processes but also is critical for identifying new drugs. However, due to the disadvantages of expensive and high time-consuming traditional experiments, only a small section of interactions between drugs and targets in the database were verified experimentally. Therefore, it is meaningful and important to develop new computational methods with good performance for DTIs prediction. At present, many existing computational methods only utilize the single type of interactions between drugs and proteins without paying attention to the associations and influences with other types of molecules. METHODS: In this work, we developed a novel network embedding-based heterogeneous information integration model to predict potential drug-target interactions. Firstly, a heterogeneous multi-molecuar information network is built by combining the known associations among protein, drug, lncRNA, disease, and miRNA. Secondly, the Large-scale Information Network Embedding (LINE) model is used to learn behavior information (associations with other nodes) of drugs and proteins in the network. Hence, the known drug-protein interaction pairs can be represented as a combination of attribute information (e.g. protein sequences information and drug molecular fingerprints) and behavior information of themselves. Thirdly, the Random Forest classifier is used for training and prediction. RESULTS: In the results, under the five-fold cross validation, our method obtained 85.83% prediction accuracy with 80.47% sensitivity at the AUC of 92.33%. Moreover, in the case studies of three common drugs, the top 10 candidate targets have 8 (Caffeine), 7 (Clozapine) and 6 (Pioglitazone) are respectively verified to be associated with corresponding drugs. CONCLUSIONS: In short, these results indicate that our method can be a powerful tool for predicting potential drug-target interactions and finding unknown targets for certain drugs or unknown drugs for certain targets.


Assuntos
MicroRNAs , Preparações Farmacêuticas , RNA Longo não Codificante , Algoritmos , Sequência de Aminoácidos , Proteínas
8.
Guang Pu Xue Yu Guang Pu Fen Xi ; 35(8): 2154-8, 2015 Aug.
Artigo em Chinês | MEDLINE | ID: mdl-26672284

RESUMO

In order to explore the feasibility of prediction soluble solid contents (SSC) in sugarcane stalks by using near infrared hyperspectral imaging techniques, two hundred and forty sugarcane stalks which come from three different varieties were studied. After obtaining the raw hyperspectral images of sugarcane stalks, the spectral information and textural features were discussed respectively. The prediction models were established by using partial least squares regression (PLSR), principal components regression (PCR) and least squares support vector machines (LS-SVM) algorithms. Besides, three different selected wavelengths algorithms such as successive projection (SPA) algorithms, intervals partial least squares (iPLS) algorithms and uninformation variables elimination (UVE) algorithm were analyzed after building partial least squares regression model. The results indicate that partial least squares regression model based on spectral features can be an steady model to predict SSC and the correlation coefficient (R2) of calibration sets and prediction sets are 0.879, 0.843. The root mean square errors of calibration sets and prediction sets are 0.644, 0.742 respectively. The obtained 105 wavelengths which were selected by UVE algorithm are effective spectral features. The R2 results of calibration sets and prediction sets of its PLSR model are 0.860, 0.813. The root mean square errors of calibration sets and prediction sets are 0.693, 0.810 respectively


Assuntos
Saccharum/química , Algoritmos , Análise dos Mínimos Quadrados , Análise de Componente Principal , Espectroscopia de Luz Próxima ao Infravermelho , Máquina de Vetores de Suporte
9.
Mol Ther Nucleic Acids ; 32: 721-728, 2023 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-37251691

RESUMO

Identifying proteins that interact with drug compounds has been recognized as an important part in the process of drug discovery. Despite extensive efforts that have been invested in predicting compound-protein interactions (CPIs), existing traditional methods still face several challenges. The computer-aided methods can identify high-quality CPI candidates instantaneously. In this research, a novel model is named GraphCPIs, proposed to improve the CPI prediction accuracy. First, we establish the adjacent matrix of entities connected to both drugs and proteins from the collected dataset. Then, the feature representation of nodes could be obtained by using the graph convolutional network and Grarep embedding model. Finally, an extreme gradient boosting (XGBoost) classifier is exploited to identify potential CPIs based on the stacked two kinds of features. The results demonstrate that GraphCPIs achieves the best performance, whose average predictive accuracy rate reaches 90.09%, average area under the receiver operating characteristic curve is 0.9572, and the average area under the precision and recall curve is 0.9621. Moreover, comparative experiments reveal that our method surpasses the state-of-the-art approaches in the field of accuracy and other indicators with the same experimental environment. We believe that the GraphCPIs model will provide valuable insight to discover novel candidate drug-related proteins.

10.
Artigo em Inglês | MEDLINE | ID: mdl-35389869

RESUMO

DNA-binding proteins (DBPs) play vital roles in the regulation of biological systems. Although there are already many deep learning methods for predicting the sequence specificities of DBPs, they face two challenges as follows. Classic deep learning methods for DBPs prediction usually fail to capture the dependencies between genomic sequences since their commonly used one-hot codes are mutually orthogonal. Besides, these methods usually perform poorly when samples are inadequate. To address these two challenges, we developed a novel language model for mining DBPs using human genomic data and ChIP-seq datasets with decaying learning rates, named DNA Fine-tuned Language Model (DFLM). It can capture the dependencies between genome sequences based on the context of human genomic data and then fine-tune the features of DBPs tasks using different ChIP-seq datasets. First, we compared DFLM with the existing widely used methods on 69 datasets and we achieved excellent performance. Moreover, we conducted comparative experiments on complex DBPs and small datasets. The results show that DFLM still achieved a significant improvement. Finally, through visualization analysis of one-hot encoding and DFLM, we found that one-hot encoding completely cut off the dependencies of DNA sequences themselves, while DFLM using language models can well represent the dependency of DNA sequences. Source code are available at: https://github.com/Deep-Bioinfo/DFLM.


Assuntos
Algoritmos , Proteínas de Ligação a DNA , Humanos , Genômica , DNA/genética , Genoma
11.
Cancers (Basel) ; 13(9)2021 Apr 27.
Artigo em Inglês | MEDLINE | ID: mdl-33925568

RESUMO

Identification of drug-target interactions (DTIs) is a significant step in the drug discovery or repositioning process. Compared with the time-consuming and labor-intensive in vivo experimental methods, the computational models can provide high-quality DTI candidates in an instant. In this study, we propose a novel method called LGDTI to predict DTIs based on large-scale graph representation learning. LGDTI can capture the local and global structural information of the graph. Specifically, the first-order neighbor information of nodes can be aggregated by the graph convolutional network (GCN); on the other hand, the high-order neighbor information of nodes can be learned by the graph embedding method called DeepWalk. Finally, the two kinds of feature are fed into the random forest classifier to train and predict potential DTIs. The results show that our method obtained area under the receiver operating characteristic curve (AUROC) of 0.9455 and area under the precision-recall curve (AUPR) of 0.9491 under 5-fold cross-validation. Moreover, we compare the presented method with some existing state-of-the-art methods. These results imply that LGDTI can efficiently and robustly capture undiscovered DTIs. Moreover, the proposed model is expected to bring new inspiration and provide novel perspectives to relevant researchers.

12.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2546-2554, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-32070992

RESUMO

A key aim of post-genomic biomedical research is to systematically understand molecules and their interactions in human cells. Multiple biomolecules coordinate to sustain life activities, and interactions between various biomolecules are interconnected. However, existing studies usually only focusing on associations between two or very limited types of molecules. In this study, we propose a network representation learning based computational framework MAN-SDNE to predict any intermolecular associations. More specifically, we constructed a large-scale molecular association network of multiple biomolecules in human by integrating associations among long non-coding RNA, microRNA, protein, drug, and disease, containing 6,528 molecular nodes, 9 kind of,105,546 associations. And then, the feature of each node is represented by its network proximity and attribute features. Furthermore, these features are used to train Random Forest classifier to predict intermolecular associations. MAN-SDNE achieves a remarkable performance with an AUC of 0.9552 and an AUPR of 0.9338 under five-fold cross-validation. To indicate the ability to predict specific types of interactions, a case study for predicting lncRNA-protein interactions using MAN-SDNE is also executed. Experimental results demonstrate this work offers a systematic insight for understanding the synergistic associations between molecules and complex diseases and provides a network-based computational tool to systematically explore intermolecular interactions.


Assuntos
Modelos Biológicos , Biologia de Sistemas/métodos , Simulação por Computador , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Preparações Farmacêuticas/metabolismo , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo
13.
Mol Ther Nucleic Acids ; 19: 498-506, 2020 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-31923739

RESUMO

Detecting whether a pair of biomolecules associate is of great significance in the study of molecular biology. Hence, computational methods are urgently needed as guidance for practice. However, most of the previous prediction models influenced by reductionism focused on isolated research objects, which have their own inherent defects. Inspired by holism, a machine-learning-based framework called MAN-node2vec is proposed to predict multi-type relationships in the molecular associations network (MAN). Specifically, we constructed a large-scale MAN composed of 1,023 miRNAs, 1,649 proteins, 769 long non-coding RNAs (lncRNAs), 1,025 drugs, and 2,062 diseases. Then, each biomolecule in MAN can be represented as a vector by its attribute learned by k-mer, etc. and its behavior learned by node2vec. Finally, the random forest classifier is applied to carry out the relationship prediction task. The proposed model achieved a reliable performance with 0.9677 areas under the curve (AUCs) and 0.9562 areas under the precision curve (AUPRs) under 5-fold cross-validation. Also, additional experiments proved that the proposed global model shows more competitive performance than the traditional local method. All of these provided a systematic insight for understanding the synergistic interactions between various molecules and diseases. It is anticipated that this work can bring beneficial inspiration and advance to related systems biology and biomedical research.

14.
Comput Struct Biotechnol J ; 18: 2391-2400, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33005302

RESUMO

Benefiting from advances in high-throughput experimental techniques, important regulatory roles of miRNAs, lncRNAs, and proteins, as well as biological property information, are gradually being complemented. As the key data support to promote biomedical research, domain knowledge such as intermolecular relationships that are increasingly revealed by molecular genome-wide analysis is often used to guide the discovery of potential associations. However, the method of performing network representation learning from the perspective of the global biological network is scarce. These methods cover a very limited type of molecular associations and are therefore not suitable for more comprehensive analysis of molecular network representation information. In this study, we propose a computational model based on the Biological network for predicting potential associations between miRNAs and diseases called iMDA-BN. The iMDA-BN has three significant advantages: I) It uses a new method to describe disease and miRNA characteristics which analyzes node representation information for disease and miRNA from the perspective of biological networks. II) It can predict unproven associations even if miRNAs and diseases do not appear in the biological network. III) Accurate description of miRNA characteristics from biological properties based on high-throughput sequence information. The iMDA-BN predictor achieves an AUC of 0.9145 and an accuracy of 84.49% on the miRNA-disease association baseline dataset, and it can also achieve an AUC of 0.8765 and an accuracy of 80.96% when predicting unknown diseases and miRNAs in the biological network. Compared to existing miRNA-disease association prediction methods, iMDA-BN has higher accuracy and the advantage of predicting unknown associations. In addition, 45, 49, and 49 of the top 50 miRNA-disease associations with the highest predicted scores were confirmed in the case studies, respectively.

15.
Commun Biol ; 3(1): 118, 2020 03 13.
Artigo em Inglês | MEDLINE | ID: mdl-32170157

RESUMO

Abundant life activities are maintained by various biomolecule relationships in human cells. However, many previous computational models only focus on isolated objects, without considering that cell is a complete entity with ample functions. Inspired by holism, we constructed a Molecular Associations Network (MAN) including 9 kinds of relationships among 5 types of biomolecules, and a prediction model called MAN-GF. More specifically, biomolecules can be represented as vectors by the algorithm called biomarker2vec which combines 2 kinds of information involved the attribute learned by k-mer, etc and the behavior learned by Graph Factorization (GF). Then, Random Forest classifier is applied for training, validation and test. MAN-GF obtained a substantial performance with AUC of 0.9647 and AUPR of 0.9521 under 5-fold Cross-validation. The results imply that MAN-GF with an overall perspective can act as ancillary for practice. Besides, it holds great hope to provide a new insight to elucidate the regulatory mechanisms.


Assuntos
Neoplasias do Colo/metabolismo , Biologia Computacional/métodos , MicroRNAs/metabolismo , Modelos Biológicos , Mapas de Interação de Proteínas , Proteínas/metabolismo , RNA Longo não Codificante/metabolismo , Algoritmos , Área Sob a Curva , Confiabilidade dos Dados , Mineração de Dados/métodos , Humanos , Curva ROC , Sensibilidade e Especificidade
16.
Artigo em Inglês | MEDLINE | ID: mdl-32582646

RESUMO

Predicting drug-target interactions (DTIs) is crucial in innovative drug discovery, drug repositioning and other fields. However, there are many shortcomings for predicting DTIs using traditional biological experimental methods, such as the high-cost, time-consumption, low efficiency, and so on, which make these methods difficult to widely apply. As a supplement, the in silico method can provide helpful information for predictions of DTIs in a timely manner. In this work, a deep walk embedding method is developed for predicting DTIs from a multi-molecular network. More specifically, a multi-molecular network, also called molecular associations network, is constructed by integrating the associations among drug, protein, disease, lncRNA, and miRNA. Then, each node can be represented as a behavior feature vector by using a deep walk embedding method. Finally, we compared behavior features with traditional attribute features on an integrated dataset by using various classifiers. The experimental results revealed that the behavior feature could be performed better on different classifiers, especially on the random forest classifier. It is also demonstrated that the use of behavior information is very helpful for addressing the problem of sequences containing both self-interacting and non-interacting pairs of proteins. This work is not only extremely suitable for predicting DTIs, but also provides a new perspective for the prediction of other biomolecules' associations.

17.
iScience ; 23(7): 101261, 2020 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-32580123

RESUMO

Molecular components that are functionally interdependent in human cells constitute molecular association networks. Disease can be caused by disturbance of multiple molecular interactions. New biomolecular regulatory mechanisms can be revealed by discovering new biomolecular interactions. To this end, a heterogeneous molecular association network is formed by systematically integrating comprehensive associations between miRNAs, lncRNAs, circRNAs, mRNAs, proteins, drugs, microbes, and complex diseases. We propose a machine learning method for predicting intermolecular interactions, named MMI-Pred. More specifically, a network embedding model is developed to fully exploit the network behavior of biomolecules, and attribute features are also calculated. Then, these discriminative features are combined to train a random forest classifier to predict intermolecular interactions. MMI-Pred achieves an outstanding performance of 93.50% accuracy in hybrid associations prediction under 5-fold cross-validation. This work provides systematic landscape and machine learning method to model and infer complex associations between various biological components.

18.
ACS Omega ; 5(28): 17022-17032, 2020 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-32715187

RESUMO

Analysis of miRNA-target mRNA interaction (MTI) is of crucial significance in discovering new target candidates for miRNAs. However, the biological experiments for identifying MTIs have a high false positive rate and are high-priced, time-consuming, and arduous. It is an urgent task to develop effective computational approaches to enhance the investigation of miRNA-target mRNA relationships. In this study, a novel method called MIPDH is developed for miRNA-mRNA interaction prediction by using DeepWalk on a heterogeneous network. More specifically, MIPDH extracts two kinds of features, in which a biological behavior feature is learned using a network embedding algorithm on a constructed heterogeneous network derived from 17 kinds of associations among drug, disease, and 6 kinds of biomolecules, and the attribute feature is learned using the k-mer method on sequences of miRNAs and target mRNAs. Then, a random forest classifier is trained on the features combined with the biological behavior feature and attribute feature. When implementing a 5-fold cross-validation experiment, MIPDH achieved an average accuracy, sensitivity, specificity and AUC of 75.85, 74.37, 77.33%, and 0.8044, respectively. To further evaluate the performance of MIPDH, other classifiers and feature descriptors are conducted for comparisons. MIPDH can achieve a better performance. Additionally, case studies on hsa-miR-106b-5p, hsa-let-7d-5p, and hsa-let-7e-5p are also implemented. As a result, 14, 9, and 9 out of the top 15 targets that interacted with these miRNAs were verified using the experimental literature or other databases. All these prediction results indicate that MIPDH is an effective method for predicting miRNA-target mRNA interactions.

19.
Gigascience ; 9(6)2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32533701

RESUMO

BACKGROUND: The explosive growth of genomic, chemical, and pathological data provides new opportunities and challenges for humans to thoroughly understand life activities in cells. However, there exist few computational models that aggregate various bioentities to comprehensively reveal the physical and functional landscape of biological systems. RESULTS: We constructed a molecular association network, which contains 18 edges (relationships) between 8 nodes (bioentities). Based on this, we propose Bioentity2vec, a new method for representing bioentities, which integrates information about the attributes and behaviors of a bioentity. Applying the random forest classifier, we achieved promising performance on 18 relationships, with an area under the curve of 0.9608 and an area under the precision-recall curve of 0.9572. CONCLUSIONS: Our study shows that constructing a network with rich topological and biological information is important for systematic understanding of the biological landscape at the molecular level. Our results show that Bioentity2vec can effectively represent biological entities and provides easily distinguishable information about classification tasks. Our method is also able to simultaneously predict relationships between single types and multiple types, which will accelerate progress in biological experimental research and industrial product development.


Assuntos
Algoritmos , Biologia Computacional/métodos , Software , Biologia de Sistemas/métodos , Perfilação da Expressão Gênica/métodos , Curva ROC
20.
Cells ; 8(8)2019 08 09.
Artigo em Inglês | MEDLINE | ID: mdl-31405040

RESUMO

One key issue in the post-genomic era is how to systematically describe the associations between small molecule transcripts or translations inside cells. With the rapid development of high-throughput "omics" technologies, the achieved ability to detect and characterize molecules with other molecule targets opens the possibility of investigating the relationships between different molecules from a global perspective. In this article, a molecular association network (MAN) is constructed and comprehensively analyzed by integrating the associations among miRNA, lncRNA, protein, drug, and disease, in which any kind of potential associations can be predicted. More specifically, each node in MAN can be represented as a vector by combining two kinds of information including the attribute of the node itself (e.g., sequences of ncRNAs and proteins, semantics of diseases and molecular fingerprints of drugs) and the behavior of the node in the complex network (associations with other nodes). A random forest classifier is trained to classify and predict new interactions or associations between biomolecules. In the experiment, the proposed method achieved a superb performance with an area under curve (AUC) of 0.9735 under a five-fold cross-validation, which showed that the proposed method could provide new insight for exploration of the molecular mechanisms of disease and valuable clues for disease treatment.


Assuntos
Biologia Computacional , Doença/genética , Redes Reguladoras de Genes , MicroRNAs/genética , Proteínas/genética , RNA Longo não Codificante/genética , Humanos , Preparações Farmacêuticas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA