Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38271483

RESUMO

The advent of single-cell sequencing technologies has revolutionized cell biology studies. However, integrative analyses of diverse single-cell data face serious challenges, including technological noise, sample heterogeneity, and different modalities and species. To address these problems, we propose scCorrector, a variational autoencoder-based model that can integrate single-cell data from different studies and map them into a common space. Specifically, we designed a Study Specific Adaptive Normalization for each study in decoder to implement these features. scCorrector substantially achieves competitive and robust performance compared with state-of-the-art methods and brings novel insights under various circumstances (e.g. various batches, multi-omics, cross-species, and development stages). In addition, the integration of single-cell data and spatial data makes it possible to transfer information between different studies, which greatly expand the narrow range of genes covered by MERFISH technology. In summary, scCorrector can efficiently integrate multi-study single-cell datasets, thereby providing broad opportunities to tackle challenges emerging from noisy resources.

3.
BMC Bioinformatics ; 24(1): 481, 2023 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-38104057

RESUMO

BACKGROUND: The rapid emergence of single-cell RNA-seq (scRNA-seq) data presents remarkable opportunities for broad investigations through integration analyses. However, most integration models are black boxes that lack interpretability or are hard to train. RESULTS: To address the above issues, we propose scInterpreter, a deep learning-based interpretable model. scInterpreter substantially outperforms other state-of-the-art (SOTA) models in multiple benchmark datasets. In addition, scInterpreter is extensible and can integrate and annotate atlas scRNA-seq data. We evaluated the robustness of scInterpreter in a variety of situations. Through comparison experiments, we found that with a knowledge prior, the training process can be significantly accelerated. Finally, we conducted interpretability analysis for each dimension (pathway) of cell representation in the embedding space. CONCLUSIONS: The results showed that the cell representations obtained by scInterpreter are full of biological significance. Through weight sorting, we found several new genes related to pathways in PBMC dataset. In general, scInterpreter is an effective and interpretable integration tool. It is expected that scInterpreter will bring great convenience to the study of single-cell transcriptomics.


Assuntos
Leucócitos Mononucleares , Análise da Expressão Gênica de Célula Única , Análise de Sequência de RNA/métodos , Leucócitos Mononucleares/metabolismo , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Análise por Conglomerados
4.
Biomass Convers Biorefin ; : 1-14, 2023 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-37363205

RESUMO

Effective in-site treatment of medical waste has become a weak link in hospitals. Pyrolysis technology is a treatment method for medical waste that can enable rapid disposal in hospital settings and relieve environmental pressure, while also producing high-value products and reducing disposal costs. In this work, the effects of feedstock ratio and temperature on product yield and components of gauze (GA) and medical bottles (MB) co-pyrolysis have been investigated. The higher yield of solid products was obtained by co-pyrolysis of GA and MB at 400 ℃. With the addition of MB and an increase in temperature for the co-pyrolysis of GA and MB in a similar ratio, the pyrolysis oil and gas yields gradually increased. According to GC-MS analysis, co-feeding 75% MB to GA improved the alcohol content from 33.21% to a maximum yield of 59.8% at a pyrolysis temperature of 700 ℃. The content of aliphatic hydrocarbon reached 38.68% when the pyrolysis temperature and MB addition ratio were 700 °C and 75%, respectively. The GC data shows that the main gas components of co-pyrolysis of GA/MB were CH4 and H2, while the pyrolysis of pure GA or MB resulted in CO or CO2. Additionally, the solid carbon products obtained have an excellent pore structure. This strategy can benefit medical waste control and resource utilization for the low-cost disposal of medical waste and the acquisition of high-value resource products.

5.
Mol Ther Nucleic Acids ; 32: 721-728, 2023 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-37251691

RESUMO

Identifying proteins that interact with drug compounds has been recognized as an important part in the process of drug discovery. Despite extensive efforts that have been invested in predicting compound-protein interactions (CPIs), existing traditional methods still face several challenges. The computer-aided methods can identify high-quality CPI candidates instantaneously. In this research, a novel model is named GraphCPIs, proposed to improve the CPI prediction accuracy. First, we establish the adjacent matrix of entities connected to both drugs and proteins from the collected dataset. Then, the feature representation of nodes could be obtained by using the graph convolutional network and Grarep embedding model. Finally, an extreme gradient boosting (XGBoost) classifier is exploited to identify potential CPIs based on the stacked two kinds of features. The results demonstrate that GraphCPIs achieves the best performance, whose average predictive accuracy rate reaches 90.09%, average area under the receiver operating characteristic curve is 0.9572, and the average area under the precision and recall curve is 0.9621. Moreover, comparative experiments reveal that our method surpasses the state-of-the-art approaches in the field of accuracy and other indicators with the same experimental environment. We believe that the GraphCPIs model will provide valuable insight to discover novel candidate drug-related proteins.

6.
Plants (Basel) ; 12(5)2023 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-36903855

RESUMO

The AP2/ERF gene family is one of the most conserved and important transcription factor families mainly occurring in plants with various functions in regulating plant biological and physiological processes. However, little comprehensive research has been conducted on the AP2/ERF gene family in Rhododendron (specifically, Rhododendron simsii), an important ornamental plant. The existing whole-genome sequence of Rhododendron provided data to investigate the AP2/ERF genes in Rhododendron on a genome-wide scale. A total of 120 Rhododendron AP2/ERF genes were identified. The phylogenetic analysis showed that RsAP2 genes were classified into five main subfamilies, AP2, ERF, DREB, RAV and soloist. Cis-acting elements involving plant growth regulators, response to abiotic stress and MYB binding sites were detected in the upstream sequences of RsAP2 genes. A heatmap of RsAP2 gene expression levels showed that these genes had different expression patterns in the five developmental stages of Rhododendron flowers. Twenty RsAP2 genes were selected for quantitative RT-PCR experiments to clarify the expression level changes under cold, salt and drought stress treatments, and the results showed that most of the RsAP2 genes responded to these abiotic stresses. This study generated comprehensive information on the RsAP2 gene family and provides a theoretical basis for future genetic improvement.

7.
Artigo em Inglês | MEDLINE | ID: mdl-35389869

RESUMO

DNA-binding proteins (DBPs) play vital roles in the regulation of biological systems. Although there are already many deep learning methods for predicting the sequence specificities of DBPs, they face two challenges as follows. Classic deep learning methods for DBPs prediction usually fail to capture the dependencies between genomic sequences since their commonly used one-hot codes are mutually orthogonal. Besides, these methods usually perform poorly when samples are inadequate. To address these two challenges, we developed a novel language model for mining DBPs using human genomic data and ChIP-seq datasets with decaying learning rates, named DNA Fine-tuned Language Model (DFLM). It can capture the dependencies between genome sequences based on the context of human genomic data and then fine-tune the features of DBPs tasks using different ChIP-seq datasets. First, we compared DFLM with the existing widely used methods on 69 datasets and we achieved excellent performance. Moreover, we conducted comparative experiments on complex DBPs and small datasets. The results show that DFLM still achieved a significant improvement. Finally, through visualization analysis of one-hot encoding and DFLM, we found that one-hot encoding completely cut off the dependencies of DNA sequences themselves, while DFLM using language models can well represent the dependency of DNA sequences. Source code are available at: https://github.com/Deep-Bioinfo/DFLM.


Assuntos
Algoritmos , Proteínas de Ligação a DNA , Humanos , Genômica , DNA/genética , Genoma
8.
PLoS Comput Biol ; 18(3): e1009941, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35263332

RESUMO

Transcription factors (TFs) play an important role in regulating gene expression, thus the identification of the sites bound by them has become a fundamental step for molecular and cellular biology. In this paper, we developed a deep learning framework leveraging existing fully convolutional neural networks (FCN) to predict TF-DNA binding signals at the base-resolution level (named as FCNsignal). The proposed FCNsignal can simultaneously achieve the following tasks: (i) modeling the base-resolution signals of binding regions; (ii) discriminating binding or non-binding regions; (iii) locating TF-DNA binding regions; (iv) predicting binding motifs. Besides, FCNsignal can also be used to predict opening regions across the whole genome. The experimental results on 53 TF ChIP-seq datasets and 6 chromatin accessibility ATAC-seq datasets show that our proposed framework outperforms some existing state-of-the-art methods. In addition, we explored to use the trained FCNsignal to locate all potential TF-DNA binding regions on a whole chromosome and predict DNA sequences of arbitrary length, and the results show that our framework can find most of the known binding regions and accept sequences of arbitrary length. Furthermore, we demonstrated the potential ability of our framework in discovering causal disease-associated single-nucleotide polymorphisms (SNPs) through a series of experiments.


Assuntos
Aprendizado Profundo , Sítios de Ligação , Sequenciamento de Cromatina por Imunoprecipitação , Ligação Proteica , Fatores de Transcrição/metabolismo
9.
BMC Bioinformatics ; 22(Suppl 5): 622, 2022 Mar 22.
Artigo em Inglês | MEDLINE | ID: mdl-35317723

RESUMO

BACKGROUND: lncRNAs play a critical role in numerous biological processes and life activities, especially diseases. Considering that traditional wet experiments for identifying uncovered lncRNA-disease associations is limited in terms of time consumption and labor cost. It is imperative to construct reliable and efficient computational models as addition for practice. Deep learning technologies have been proved to make impressive contributions in many areas, but the feasibility of it in bioinformatics has not been adequately verified. RESULTS: In this paper, a machine learning-based model called LDACE was proposed to predict potential lncRNA-disease associations by combining Extreme Learning Machine (ELM) and Convolutional Neural Network (CNN). Specifically, the representation vectors are constructed by integrating multiple types of biology information including functional similarity and semantic similarity. Then, CNN is applied to mine both local and global features. Finally, ELM is chosen to carry out the prediction task to detect the potential lncRNA-disease associations. The proposed method achieved remarkable Area Under Receiver Operating Characteristic Curve of 0.9086 in Leave-one-out cross-validation and 0.8994 in fivefold cross-validation, respectively. In addition, 2 kinds of case studies based on lung cancer and endometrial cancer indicate the robustness and efficiency of LDACE even in a real environment. CONCLUSIONS: Substantial results demonstrated that the proposed model is expected to be an auxiliary tool to guide and assist biomedical research, and the close integration of deep learning and biology big data will provide life sciences with novel insights.


Assuntos
RNA Longo não Codificante , Biologia Computacional/métodos , Aprendizado de Máquina , Redes Neurais de Computação , RNA Longo não Codificante/genética , Curva ROC
10.
Mol Biol Rep ; 49(4): 2641-2653, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35059966

RESUMO

BACKGROUND: Rhododendron is an important woody ornamental plant, and breeding varieties with different colors is a key research goal. Although there have been a few reports on the molecular mechanisms of flower colors and color patterning in Rhododendron, it is still largely unknown what factors regulate flower pigmentation in Rhododendron. METHODS AND RESULTS: In this study, the flower color variation cultivar 'Yanzhi Mi' and the wild-type (WT) cultivar 'Dayuanyangjin' were used as research objects, and the pigments and transcriptomes of their petals during five flower development stages were analyzed and compared. The results showed that derivatives of cyanidin, peonidin and pelargonidin might be responsible for the pink color of mutant petals and that the S2 stage was the key stage of flower color formation. In total, 412,910 transcripts and 2780 differentially expressed genes (DEGs) were identified in pairwise comparisons of WT and mutant petals. GO and KEGG enrichment analyses of the DEGs showed that 'DNA-binding transcription factor activity', 'Flavonoid biosynthesis' and 'Phenylpropanoid biosynthesis' were more active in mutant petals. Early anthocyanin pathway candidate DEGs (CHS3-CHS6, CHI, F3Hs and F3'H) were significantly correlated and were more highly expressed in mutant petals than in WT petals in the S2 stage. An R2R3-MYB unigene (TRINITY_DN55156_c1_g2) was upregulated approximately 10.5-fold in 'Yanzhi Mi' petals relative to 'Dayuanyangjin' petals in the S2 stage, and an R2R3-MYB unigene (TRINITY_DN59015_c3_g2) that was significantly downregulated in 'Yanzhi Mi' petals in the S2 stage was found to be closely related to Tca MYB112 in cacao. CONCLUSIONS: Taken together, the results of the present study could shed light on the molecular basis of anthocyanin biosynthesis in two Rhododendron obtusum cultivars and may provide a genetic resource for breeding varieties with different flower colors.


Assuntos
Rhododendron , Flores/genética , Flores/metabolismo , Perfilação da Expressão Gênica , Pigmentação/genética , Melhoramento Vegetal , Rhododendron/genética
11.
Sci Total Environ ; 821: 153336, 2022 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-35077791

RESUMO

During dust storm, mineral particle is frequently observed to be mixed with anthropogenic pollutants (APs) and forms mixing particle which arises more complex influences on regional climate than unmixed mineral particle. Even though mixing particle formation mechanism received significant attention recently, most studies focused on the heterogeneous reaction of inorganic APs on single composition of mineral. Here, the heterogeneous reaction mechanism of amine (a proxy of organic APs) with sulfuric acid (SA) on kaolinite (Kao, a proxy of mineral dust), and its contribution to mixing particle formation are investigated under variable atmospheric conditions. Two heterogeneous reactions of Kao-SA-amine and Kao-H2O-SA-amine in absence/presence of water were comparably investigated using combined theoretical and experimental methods, respectively. The contribution from such two heterogeneous reactions to mixing particle formation was evaluated, respectively, exploring the effect of methyl groups (1-3 -CH3), relative humidity (RH) (11-100%) and temperature (220-298.15 K). Water was observed to play a significant role in promoting heterogeneous reaction of amines with SA on Kao surface, reducing formation energy of mixing particle containing ammonium salt converted by SA. Moreover, the promotion effect from water is enhanced with the increasing RH and the decreasing temperature. For methylamine and dimethylamine containing 1-2 -CH3, the heterogeneous reaction of Kao-H2O-SA-amine contributes more to mixing particle formation. However, for trimethylamine containing 3 -CH3, the heterogeneous reaction of Kao-SA-amine is the dominant source to mixing particle formation. For mixing particle generated from the above two heterogeneous reactions, ammoniums salts are supposed to be predominant components which is of strong hygroscopicity and further leads to significant influence on climate by altering radiative forcing of mixed particle and participating in the cloud condensation nuclei and ice nuclei.


Assuntos
Aminas , Atmosfera , Argila , Minerais , Ácidos Sulfúricos
12.
Front Plant Sci ; 12: 751771, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34868137

RESUMO

Cryptomeria fortunei Hooibrenk is an important fast-growing coniferous timber species that is widely used in landscaping. Recently, research on timber quality has gained substantial attention in the field of tree breeding. Wood is the secondary xylem formed by the continuous inward division and differentiation of the vascular cambium; therefore, the development of the vascular cambium is particularly important for wood quality. In this study, we analyzed the transcriptomes of the cambial zone in C. fortunei during different developmental stages using Illumina HiSeq sequencing, focusing on general transcriptome and microRNA (miRNA) data. We performed functional annotation of the differentially expressed genes (DEGs) in the different stages identified by transcriptome sequencing and generated 15 miRNA libraries yielding 4.73 Gb of clean reads. The most common length of the filtered miRNAs was 21nt, accounting for 33.1% of the total filtered reads. We annotated a total of 32 known miRNA families. Some miRNAs played roles in hormone signal transduction (miR159, miR160, and miR166), growth and development (miR166 and miR396), and the coercion response (miR394 and miR395), and degradome sequencing showed potential cleavage sites between miRNAs and target genes. Differential expression of miRNAs and target genes and functional validation of the obtained transcriptome and miRNA data provide a theoretical basis for further elucidating the molecular mechanisms of cellular growth and differentiation, as well as wood formation in the vascular cambium, which will help improve the wood quality of C. fortunei.

13.
Cancers (Basel) ; 13(9)2021 Apr 27.
Artigo em Inglês | MEDLINE | ID: mdl-33925568

RESUMO

Identification of drug-target interactions (DTIs) is a significant step in the drug discovery or repositioning process. Compared with the time-consuming and labor-intensive in vivo experimental methods, the computational models can provide high-quality DTI candidates in an instant. In this study, we propose a novel method called LGDTI to predict DTIs based on large-scale graph representation learning. LGDTI can capture the local and global structural information of the graph. Specifically, the first-order neighbor information of nodes can be aggregated by the graph convolutional network (GCN); on the other hand, the high-order neighbor information of nodes can be learned by the graph embedding method called DeepWalk. Finally, the two kinds of feature are fed into the random forest classifier to train and predict potential DTIs. The results show that our method obtained area under the receiver operating characteristic curve (AUROC) of 0.9455 and area under the precision-recall curve (AUPR) of 0.9491 under 5-fold cross-validation. Moreover, we compare the presented method with some existing state-of-the-art methods. These results imply that LGDTI can efficiently and robustly capture undiscovered DTIs. Moreover, the proposed model is expected to bring new inspiration and provide novel perspectives to relevant researchers.

14.
Brief Bioinform ; 22(2): 2085-2095, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-32232320

RESUMO

Effectively representing Medical Subject Headings (MeSH) headings (terms) such as disease and drug as discriminative vectors could greatly improve the performance of downstream computational prediction models. However, these terms are often abstract and difficult to quantify. In this paper, we converted the MeSH tree structure into a relationship network and applied several graph embedding algorithms on it to represent these terms. Specifically, the relationship network consisting of nodes (MeSH headings) and edges (relationships), which can be constructed by the tree num. Then, five graph embedding algorithms including DeepWalk, LINE, SDNE, LAP and HOPE were implemented on the relationship network to represent MeSH headings as vectors. In order to evaluate the performance of the proposed methods, we carried out the node classification and relationship prediction tasks. The results show that the MeSH headings characterized by graph embedding algorithms can not only be treated as an independent carrier for representation, but also can be utilized as additional information to enhance the representation ability of vectors. Thus, it can serve as an input and continue to play a significant role in any computational models related to disease, drug, microbe, etc. Besides, our method holds great hope to inspire relevant researchers to study the representation of terms in this network perspective.


Assuntos
Algoritmos , Medical Subject Headings , Simulação por Computador , Sistemas de Liberação de Medicamentos , Predisposição Genética para Doença , Humanos , MicroRNAs/genética , Semântica
15.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2546-2554, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-32070992

RESUMO

A key aim of post-genomic biomedical research is to systematically understand molecules and their interactions in human cells. Multiple biomolecules coordinate to sustain life activities, and interactions between various biomolecules are interconnected. However, existing studies usually only focusing on associations between two or very limited types of molecules. In this study, we propose a network representation learning based computational framework MAN-SDNE to predict any intermolecular associations. More specifically, we constructed a large-scale molecular association network of multiple biomolecules in human by integrating associations among long non-coding RNA, microRNA, protein, drug, and disease, containing 6,528 molecular nodes, 9 kind of,105,546 associations. And then, the feature of each node is represented by its network proximity and attribute features. Furthermore, these features are used to train Random Forest classifier to predict intermolecular associations. MAN-SDNE achieves a remarkable performance with an AUC of 0.9552 and an AUPR of 0.9338 under five-fold cross-validation. To indicate the ability to predict specific types of interactions, a case study for predicting lncRNA-protein interactions using MAN-SDNE is also executed. Experimental results demonstrate this work offers a systematic insight for understanding the synergistic associations between molecules and complex diseases and provides a network-based computational tool to systematically explore intermolecular interactions.


Assuntos
Modelos Biológicos , Biologia de Sistemas/métodos , Simulação por Computador , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Preparações Farmacêuticas/metabolismo , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo
16.
Comput Struct Biotechnol J ; 18: 2391-2400, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33005302

RESUMO

Benefiting from advances in high-throughput experimental techniques, important regulatory roles of miRNAs, lncRNAs, and proteins, as well as biological property information, are gradually being complemented. As the key data support to promote biomedical research, domain knowledge such as intermolecular relationships that are increasingly revealed by molecular genome-wide analysis is often used to guide the discovery of potential associations. However, the method of performing network representation learning from the perspective of the global biological network is scarce. These methods cover a very limited type of molecular associations and are therefore not suitable for more comprehensive analysis of molecular network representation information. In this study, we propose a computational model based on the Biological network for predicting potential associations between miRNAs and diseases called iMDA-BN. The iMDA-BN has three significant advantages: I) It uses a new method to describe disease and miRNA characteristics which analyzes node representation information for disease and miRNA from the perspective of biological networks. II) It can predict unproven associations even if miRNAs and diseases do not appear in the biological network. III) Accurate description of miRNA characteristics from biological properties based on high-throughput sequence information. The iMDA-BN predictor achieves an AUC of 0.9145 and an accuracy of 84.49% on the miRNA-disease association baseline dataset, and it can also achieve an AUC of 0.8765 and an accuracy of 80.96% when predicting unknown diseases and miRNAs in the biological network. Compared to existing miRNA-disease association prediction methods, iMDA-BN has higher accuracy and the advantage of predicting unknown associations. In addition, 45, 49, and 49 of the top 50 miRNA-disease associations with the highest predicted scores were confirmed in the case studies, respectively.

17.
J Transl Med ; 18(1): 347, 2020 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-32894154

RESUMO

BACKGROUND: The prediction of potential drug-target interactions (DTIs) not only provides a better comprehension of biological processes but also is critical for identifying new drugs. However, due to the disadvantages of expensive and high time-consuming traditional experiments, only a small section of interactions between drugs and targets in the database were verified experimentally. Therefore, it is meaningful and important to develop new computational methods with good performance for DTIs prediction. At present, many existing computational methods only utilize the single type of interactions between drugs and proteins without paying attention to the associations and influences with other types of molecules. METHODS: In this work, we developed a novel network embedding-based heterogeneous information integration model to predict potential drug-target interactions. Firstly, a heterogeneous multi-molecuar information network is built by combining the known associations among protein, drug, lncRNA, disease, and miRNA. Secondly, the Large-scale Information Network Embedding (LINE) model is used to learn behavior information (associations with other nodes) of drugs and proteins in the network. Hence, the known drug-protein interaction pairs can be represented as a combination of attribute information (e.g. protein sequences information and drug molecular fingerprints) and behavior information of themselves. Thirdly, the Random Forest classifier is used for training and prediction. RESULTS: In the results, under the five-fold cross validation, our method obtained 85.83% prediction accuracy with 80.47% sensitivity at the AUC of 92.33%. Moreover, in the case studies of three common drugs, the top 10 candidate targets have 8 (Caffeine), 7 (Clozapine) and 6 (Pioglitazone) are respectively verified to be associated with corresponding drugs. CONCLUSIONS: In short, these results indicate that our method can be a powerful tool for predicting potential drug-target interactions and finding unknown targets for certain drugs or unknown drugs for certain targets.


Assuntos
MicroRNAs , Preparações Farmacêuticas , RNA Longo não Codificante , Algoritmos , Sequência de Aminoácidos , Proteínas
18.
ACS Omega ; 5(28): 17022-17032, 2020 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-32715187

RESUMO

Analysis of miRNA-target mRNA interaction (MTI) is of crucial significance in discovering new target candidates for miRNAs. However, the biological experiments for identifying MTIs have a high false positive rate and are high-priced, time-consuming, and arduous. It is an urgent task to develop effective computational approaches to enhance the investigation of miRNA-target mRNA relationships. In this study, a novel method called MIPDH is developed for miRNA-mRNA interaction prediction by using DeepWalk on a heterogeneous network. More specifically, MIPDH extracts two kinds of features, in which a biological behavior feature is learned using a network embedding algorithm on a constructed heterogeneous network derived from 17 kinds of associations among drug, disease, and 6 kinds of biomolecules, and the attribute feature is learned using the k-mer method on sequences of miRNAs and target mRNAs. Then, a random forest classifier is trained on the features combined with the biological behavior feature and attribute feature. When implementing a 5-fold cross-validation experiment, MIPDH achieved an average accuracy, sensitivity, specificity and AUC of 75.85, 74.37, 77.33%, and 0.8044, respectively. To further evaluate the performance of MIPDH, other classifiers and feature descriptors are conducted for comparisons. MIPDH can achieve a better performance. Additionally, case studies on hsa-miR-106b-5p, hsa-let-7d-5p, and hsa-let-7e-5p are also implemented. As a result, 14, 9, and 9 out of the top 15 targets that interacted with these miRNAs were verified using the experimental literature or other databases. All these prediction results indicate that MIPDH is an effective method for predicting miRNA-target mRNA interactions.

19.
Gigascience ; 9(6)2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32533701

RESUMO

BACKGROUND: The explosive growth of genomic, chemical, and pathological data provides new opportunities and challenges for humans to thoroughly understand life activities in cells. However, there exist few computational models that aggregate various bioentities to comprehensively reveal the physical and functional landscape of biological systems. RESULTS: We constructed a molecular association network, which contains 18 edges (relationships) between 8 nodes (bioentities). Based on this, we propose Bioentity2vec, a new method for representing bioentities, which integrates information about the attributes and behaviors of a bioentity. Applying the random forest classifier, we achieved promising performance on 18 relationships, with an area under the curve of 0.9608 and an area under the precision-recall curve of 0.9572. CONCLUSIONS: Our study shows that constructing a network with rich topological and biological information is important for systematic understanding of the biological landscape at the molecular level. Our results show that Bioentity2vec can effectively represent biological entities and provides easily distinguishable information about classification tasks. Our method is also able to simultaneously predict relationships between single types and multiple types, which will accelerate progress in biological experimental research and industrial product development.


Assuntos
Algoritmos , Biologia Computacional/métodos , Software , Biologia de Sistemas/métodos , Perfilação da Expressão Gênica/métodos , Curva ROC
20.
Artigo em Inglês | MEDLINE | ID: mdl-32582646

RESUMO

Predicting drug-target interactions (DTIs) is crucial in innovative drug discovery, drug repositioning and other fields. However, there are many shortcomings for predicting DTIs using traditional biological experimental methods, such as the high-cost, time-consumption, low efficiency, and so on, which make these methods difficult to widely apply. As a supplement, the in silico method can provide helpful information for predictions of DTIs in a timely manner. In this work, a deep walk embedding method is developed for predicting DTIs from a multi-molecular network. More specifically, a multi-molecular network, also called molecular associations network, is constructed by integrating the associations among drug, protein, disease, lncRNA, and miRNA. Then, each node can be represented as a behavior feature vector by using a deep walk embedding method. Finally, we compared behavior features with traditional attribute features on an integrated dataset by using various classifiers. The experimental results revealed that the behavior feature could be performed better on different classifiers, especially on the random forest classifier. It is also demonstrated that the use of behavior information is very helpful for addressing the problem of sequences containing both self-interacting and non-interacting pairs of proteins. This work is not only extremely suitable for predicting DTIs, but also provides a new perspective for the prediction of other biomolecules' associations.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA