Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 86
Filtrar
1.
J Cell Mol Med ; 28(7): e18180, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38506066

RESUMO

Circular RNA (circRNA) is a common non-coding RNA and plays an important role in the diagnosis and therapy of human diseases, circRNA-disease associations prediction based on computational methods can provide a new way for better clinical diagnosis. In this article, we proposed a novel method for circRNA-disease associations prediction based on ensemble learning, named ELCDA. First, the association heterogeneous network was constructed via collecting multiple information of circRNAs and diseases, and multiple similarity measures are adopted here, then, we use metapath, matrix factorization and GraphSAGE-based models to extract features of nodes from different views, the final comprehensive features of circRNAs and diseases via ensemble learning, finally, a soft voting ensemble strategy is used to integrate the predicted results of all classifier. The performance of ELCDA is evaluated by fivefold cross-validation and compare with other state-of-the-art methods, the experimental results show that ELCDA is outperformance than others. Furthermore, three common diseases are used as case studies, which also demonstrate that ELCDA is an effective method for predicting circRNA-disease associations.


Assuntos
Aprendizado de Máquina , RNA Circular , Humanos , RNA Circular/genética , Biologia Computacional/métodos
2.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35323901

RESUMO

MOTIVATION: MicroRNAs (miRNAs), as critical regulators, are involved in various fundamental and vital biological processes, and their abnormalities are closely related to human diseases. Predicting disease-related miRNAs is beneficial to uncovering new biomarkers for the prevention, detection, prognosis, diagnosis and treatment of complex diseases. RESULTS: In this study, we propose a multi-view Laplacian regularized deep factorization machine (DeepFM) model, MLRDFM, to predict novel miRNA-disease associations while improving the standard DeepFM. Specifically, MLRDFM improves DeepFM from two aspects: first, MLRDFM takes the relationships among items into consideration by regularizing their embedding features via their similarity-based Laplacians. In this study, miRNA Laplacian regularization integrates four types of miRNA similarity, while disease Laplacian regularization integrates two types of disease similarity. Second, to judiciously train our model, Laplacian eigenmaps are utilized to initialize the weights in the dense embedding layer. The experimental results on the latest HMDD v3.2 dataset show that MLRDFM improves the performance and reduces the overfitting phenomenon of DeepFM. Besides, MLRDFM is greatly superior to the state-of-the-art models in miRNA-disease association prediction in terms of different evaluation metrics with the 5-fold cross-validation. Furthermore, case studies further demonstrate the effectiveness of MLRDFM.


Assuntos
MicroRNAs , Algoritmos , Biologia Computacional/métodos , Predisposição Genética para Doença , Humanos , MicroRNAs/genética
3.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34864856

RESUMO

Drug repositioning is proposed to find novel usages for existing drugs. Among many types of drug repositioning approaches, predicting drug-drug interactions (DDIs) helps explore the pharmacological functions of drugs and achieves potential drugs for novel treatments. A number of models have been applied to predict DDIs. The DDI network, which is constructed from the known DDIs, is a common part in many of the existing methods. However, the functions of DDIs are different, and thus integrating them in a single DDI graph may overlook some useful information. We propose a graph convolutional network with multi-kernel (GCNMK) to predict potential DDIs. GCNMK adopts two DDI graph kernels for the graph convolutional layers, namely, increased DDI graph consisting of 'increase'-related DDIs and decreased DDI graph consisting of 'decrease'-related DDIs. The learned drug features are fed into a block with three fully connected layers for the DDI prediction. We compare various types of drug features, whereas the target feature of drugs outperforms all other types of features and their concatenated features. In comparison with three different DDI prediction methods, our proposed GCNMK achieves the best performance in terms of area under receiver operating characteristic curve and area under precision-recall curve. In case studies, we identify the top 20 potential DDIs from all unknown DDIs, and the top 10 potential DDIs from the unknown DDIs among breast, colorectal and lung neoplasms-related drugs. Most of them have evidence to support the existence of their interactions. fangxiang.wu@usask.ca.


Assuntos
Algoritmos , Reposicionamento de Medicamentos , Interações Medicamentosas , Curva ROC
4.
PLoS Comput Biol ; 19(1): e1010812, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36701288

RESUMO

Expressive molecular representation plays critical roles in researching drug design, while effective methods are beneficial to learning molecular representations and solving related problems in drug discovery, especially for drug-drug interactions (DDIs) prediction. Recently, a lot of work has been put forward using graph neural networks (GNNs) to forecast DDIs and learn molecular representations. However, under the current GNNs structure, the majority of approaches learn drug molecular representation from one-dimensional string or two-dimensional molecular graph structure, while the interaction information between chemical substructure remains rarely explored, and it is neglected to identify key substructures that contribute significantly to the DDIs prediction. Therefore, we proposed a dual graph neural network named DGNN-DDI to learn drug molecular features by using molecular structure and interactions. Specifically, we first designed a directed message passing neural network with substructure attention mechanism (SA-DMPNN) to adaptively extract substructures. Second, in order to improve the final features, we separated the drug-drug interactions into pairwise interactions between each drug's unique substructures. Then, the features are adopted to predict interaction probability of a DDI tuple. We evaluated DGNN-DDI on real-world dataset. Compared to state-of-the-art methods, the model improved DDIs prediction performance. We also conducted case study on existing drugs aiming to predict drug combinations that may be effective for the novel coronavirus disease 2019 (COVID-19). Moreover, the visual interpretation results proved that the DGNN-DDI was sensitive to the structure information of drugs and able to detect the key substructures for DDIs. These advantages demonstrated that the proposed method enhanced the performance and interpretation capability of DDI prediction modeling.


Assuntos
COVID-19 , Humanos , Estrutura Molecular , Interações Medicamentosas , Redes Neurais de Computação , Probabilidade
5.
PLoS Comput Biol ; 19(12): e1011677, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38055721

RESUMO

RNA modification is a post transcriptional modification that occurs in all organisms and plays a crucial role in the stages of RNA life, closely related to many life processes. As one of the newly discovered modifications, N1-methyladenosine (m1A) plays an important role in gene expression regulation, closely related to the occurrence and development of diseases. However, due to the low abundance of m1A, verifying the associations between m1As and diseases through wet experiments requires a great quantity of manpower and resources. In this study, we proposed a computational method for predicting the associations of RNA methylation and disease based on graph convolutional network (RMDGCN) with attention mechanism. We build an adjacency matrix through the collected m1As and diseases associations, and use positive-unlabeled learning to increase the number of positive samples. By extracting the features of m1As and diseases, a heterogeneous network is constructed, and a GCN with attention mechanism is adopted to predict the associations between m1As and diseases. The experimental results indicate that under a 5-fold cross validation, RMDGCN is superior to other methods (AUC = 0.9892 and AUPR = 0.8682). In addition, case studies indicate that RMDGCN can predict the relationships between unknown m1As and diseases. In summary, RMDGCN is an effective method for predicting the associations between m1As and diseases.


Assuntos
Aprendizagem , Metilação de RNA , RNA/genética , Projetos de Pesquisa , Biologia Computacional , Algoritmos
6.
Int J Mol Sci ; 25(10)2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38791165

RESUMO

Studying drug-target interactions (DTIs) is the foundational and crucial phase in drug discovery. Biochemical experiments, while being the most reliable method for determining drug-target affinity (DTA), are time-consuming and costly, making it challenging to meet the current demands for swift and efficient drug development. Consequently, computational DTA prediction methods have emerged as indispensable tools for this research. In this article, we propose a novel deep learning algorithm named GRA-DTA, for DTA prediction. Specifically, we introduce Bidirectional Gated Recurrent Unit (BiGRU) combined with a soft attention mechanism to learn target representations. We employ Graph Sample and Aggregate (GraphSAGE) to learn drug representation, especially to distinguish the different features of drug and target representations and their dimensional contributions. We merge drug and target representations by an attention neural network (ANN) to learn drug-target pair representations, which are fed into fully connected layers to yield predictive DTA. The experimental results showed that GRA-DTA achieved mean squared error of 0.142 and 0.225 and concordance index reached 0.897 and 0.890 on the benchmark datasets KIBA and Davis, respectively, surpassing the most state-of-the-art DTA prediction algorithms.


Assuntos
Algoritmos , Aprendizado Profundo , Redes Neurais de Computação , Descoberta de Drogas/métodos , Humanos , Preparações Farmacêuticas/química
7.
Int J Mol Sci ; 25(14)2024 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-39062962

RESUMO

Postharvest fibrosis and greening of Toona sinensis buds significantly affect their quality during storage. This study aimed to clarify the effects of low-temperature storage on postharvest red TSB quality harvested in different seasons. Red TSB samples were collected from Guizhou province, China, 21 days after the beginning of spring (Lichun), summer (Lixia), and autumn (Liqiu), and stored at 4 °C in dark conditions. We compared and analyzed the appearance, microstructure, chlorophyll and cellulose content, and expression levels of related genes across different seasons. The results indicated that TSB harvested in spring had a bright, purple-red color, whereas those harvested in summer and autumn were green. All samples lost water and darkened after 1 day of storage. Severe greening occurred in spring-harvested TSB within 3 days, a phenomenon not observed in summer and autumn samples. Microstructural analysis revealed that the cells in the palisade and spongy tissues of spring and autumn TSB settled closely during storage, while summer TSB cells remained loosely aligned. Xylem cells were smallest in spring-harvested TSB and largest in autumn. Prolonged storage led to thickening of the secondary cell walls and pith cell autolysis in the petioles, enlarging the cavity area. Chlorophyll content was higher in leaves than in petioles, while cellulose content was lower in petioles across all seasons. Both chlorophyll and cellulose content increased with storage time. Gene expression analysis showed season-dependent variations and significant increases in the expression of over half of the chlorophyll-related and cellulose-related genes during refrigeration, correlating with the observed changes in chlorophyll and cellulose content. This research provides valuable insights for improving postharvest storage and freshness preservation strategies for red TSB across different seasons.


Assuntos
Celulose , Clorofila , Temperatura Baixa , Estações do Ano , Clorofila/metabolismo , Celulose/metabolismo , Regulação da Expressão Gênica de Plantas , China
8.
Zhongguo Zhong Yao Za Zhi ; 49(14): 3758-3768, 2024 Jul.
Artigo em Chinês | MEDLINE | ID: mdl-39099350

RESUMO

The function of the Trihelix transcription factor is that it plays an important role in many abiotic stresses, especially in the signaling pathway of low temperature, drought, flood, saline, abscisic acid, methyl jasmonate, and other abiotic stresses. However, there are few studies on the Trihelix gene family of ginseng. In this study, 41 Trihelix gene family members were identified and screened from the ginseng genome database, and their physicochemical properties, cis-acting elements, subcellular localization, chromosomal assignment, and abiotic stress-induced expression patterns were analyzed by bioinformatics methods. The results showed that 85% of Trihelix family members of ginseng were located in the nucleus, and the main secondary structure of Trihelix protein was random coil and α helix. In the promoter region of Trihelix, cis-acting regulatory elements related to various abiotic stresses such as low temperature, hormone response, and growth and development were identified. Through the collinearity analysis of interspecific Trihelix transcription factors of model plants Arabidopsis thaliana and ginseng, 19 collinear gene pairs were found between A. thaliana and ginseng, and no collinear gene pairs existed on chromosomes 3, 6, and 12 only. qRT-PCR analysis showed that the expression of GWHGBEIJ010320.1 was significantly up-regulated under low temperature stress, a significant response to low temperature stress. This study lays a foundation for further research on the role of the Trihelix transcription factor of ginseng in abiotic stress, as well as the growth and development of ginseng.


Assuntos
Regulação da Expressão Gênica de Plantas , Família Multigênica , Panax , Filogenia , Proteínas de Plantas , Estresse Fisiológico , Fatores de Transcrição , Panax/genética , Panax/química , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Regulação da Expressão Gênica de Plantas/efeitos dos fármacos , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Estresse Fisiológico/genética , Regiões Promotoras Genéticas , Perfilação da Expressão Gênica
9.
BMC Genomics ; 24(1): 334, 2023 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-37328802

RESUMO

BACKGROUND: Panax ginseng is a perennial herb and one of the most widely used traditional medicines in China. During its long growth period, it is affected by various environmental factors. Past studies have shown that growth-regulating factors (GRFs) and GRF-interacting factors (GIFs) are involved in regulating plant growth and development, responding to environmental stress, and responding to the induction of exogenous hormones. However, GRF and GIF transcription factors in ginseng have not been reported. RESULTS: In this study, 20 GRF gene members of ginseng were systematically identified and found to be distributed on 13 chromosomes. The ginseng GIF gene family has only ten members, which are distributed on ten chromosomes. Phylogenetic analysis divided these PgGRFs into six clades and PgGIFs into two clades. In total, 18 of the 20 PgGRFs and eight of the ten PgGIFs are segmental duplications. Most PgGRF and PgGIF gene promoters contain some hormone- and stress- related cis-regulatory elements. Based on the available public RNA-Seq data, the expression patterns of PgGRF and PgGIF genes were analysed from 14 different tissues. The responses of the PgGRF gene to different hormones (6-BA, ABA, GA3, IAA) and abiotic stresses (cold, heat, drought, and salt) were studied. The expression of the PgGRF gene was significantly upregulated under GA3 induction and three weeks of heat treatment. The expression level of the PgGIF gene changed only slightly after one week of heat treatment. CONCLUSIONS: The results of this study may be helpful for further study of the function of PgGRF and PgGIF genes and lay a foundation for further study of their role in the growth and development of Panax ginseng.


Assuntos
Panax , Filogenia , Panax/genética , Panax/metabolismo , Fatores de Transcrição/metabolismo , Peptídeos e Proteínas de Sinalização Intercelular/genética , Hormônios , Regulação da Expressão Gênica de Plantas , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Perfilação da Expressão Gênica
10.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34415289

RESUMO

Circular RNAs (circRNAs) are widely expressed in highly diverged eukaryotes. Although circRNAs have been known for many years, their function remains unclear. Interaction with RNA-binding protein (RBP) to influence post-transcriptional regulation is considered to be an important pathway for circRNA function, such as acting as an oncogenic RBP sponge to inhibit cancer. In this study, we design a deep learning framework, CRPBsites, to predict the binding sites of RBPs on circRNAs. In this model, the sequences of variable-length binding sites are transformed into embedding vectors by word2vec model. Bidirectional LSTM is used to encode the embedding vectors of binding sites, and then they are fed into another LSTM decoder for decoding and classification tasks. To train and test the model, we construct four datasets that contain sequences of variable-length binding sites on circRNAs, and each set corresponds to an RBP, which is overexpressed in bladder cancer tissues. Experimental results on four datasets and comparison with other existing models show that CRPBsites has superior performance. Afterwards, we found that there were highly similar binding motifs in the four binding site datasets. Finally, we applied well-trained CRPBsites to identify the binding sites of IGF2BP1 on circCDYL, and the results proved the effectiveness of this method. In conclusion, CRPBsites is an effective prediction model for circRNA-RBP interaction site identification. We hope that CRPBsites can provide valuable guidance for experimental studies on the influence of circRNA on post-transcriptional regulation.


Assuntos
Sequência de Bases , Sítios de Ligação , Biologia Computacional/métodos , Aprendizado Profundo , RNA Circular/química , Proteínas de Ligação a RNA/química , Algoritmos , Bases de Dados Genéticas , RNA Circular/metabolismo , Proteínas de Ligação a RNA/metabolismo , Curva ROC , Reprodutibilidade dos Testes
11.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33341893

RESUMO

The studies on relationships between non-coding RNAs and diseases are widely carried out in recent years. A large number of experimental methods and technologies of producing biological data have also been developed. However, due to their high labor cost and production time, nowadays, calculation-based methods, especially machine learning and deep learning methods, have received a lot of attention and been used commonly to solve these problems. From a computational point of view, this survey mainly introduces three common non-coding RNAs, i.e. miRNAs, lncRNAs and circRNAs, and the related computational methods for predicting their association with diseases. First, the mainstream databases of above three non-coding RNAs are introduced in detail. Then, we present several methods for RNA similarity and disease similarity calculations. Later, we investigate ncRNA-disease prediction methods in details and classify these methods into five types: network propagating, recommend system, matrix completion, machine learning and deep learning. Furthermore, we provide a summary of the applications of these five types of computational methods in predicting the associations between diseases and miRNAs, lncRNAs and circRNAs, respectively. Finally, the advantages and limitations of various methods are identified, and future researches and challenges are also discussed.


Assuntos
Biologia Computacional , Aprendizado de Máquina , MicroRNAs/genética , RNA Circular/genética , RNA Longo não Codificante/genética , Análise de Sequência de RNA , Humanos
12.
Methods ; 205: 179-190, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35810958

RESUMO

Circular RNA (circRNA) can exert biological functions by interacting with RNA-binding protein (RBP), and some deep learning-based methods have been developed to predict RBP binding sites on circRNA. However, most of these methods identify circRNA-RBP binding sites are only based on single data resource and cannot provide exact binding sites, only providing the probability value of a sequence fragment. To solve these problems, we propose a binding sites localization algorithm that fuses binding sites from multiple databases, and further design a stacked generalization ensemble deep learning model named CirRBP to identify RBP binding sites on circRNA. The CirRBP is trained by combining the binding sites from multiple databases and makes predictions by weighted aggregating the predictions of each sub-model. The results show that the CirRBP outperforms any sub-model and existing online prediction model. For better access to our research results, we develop an open-source web application called CRWS (CircRNA-RBP Web Server). Its back-end learning model of the CRWS is a stacked generalization ensemble learning model CirRBP based on different deep learning frameworks. Given a full-length circRNA or fragment sequence and a target RBP, the CRWS can analyze and provide the exact potential binding sites of the target RBP on the given sequence through the binding sites localization algorithm, and visualize it. In addition, the CRWS can discover the most widely distributed motif in each RBP dataset. Up to now, CRWS is the first significant online tool that uses multi-source data to train models and predict exact binding sites. CRWS is now publicly and freely available without login requirement at: http://www.bioinformatics.team.


Assuntos
Aprendizado Profundo , RNA Circular , Algoritmos , Sítios de Ligação , RNA Circular/genética , Proteínas de Ligação a RNA/metabolismo
13.
Methods ; 207: 57-64, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36113743

RESUMO

Circular RNAs (circRNAs) are widely expressed in tissues and play a key role in diseases through interacting with RNA binding proteins (RBPs). Since the high cost of traditional technology, computational methods are developed to identify the binding sites between circRNAs and RBPs. Unfortunately, these methods suffer from the insufficient learning of features and the single classification of output. To address these limitations, we propose a novel method named circ-pSBLA which constructs a pseudo-Siamese framework integrating Bi-directional long short-term memory (BiLSTM) network and soft attention mechanism for circRNA-RBP binding sites prediction. Softmax function and CatBoost are adopted to classify, respectively, and then a pseudo-Siamese framework is constructed. circ-pSBLA combines them to get final output. To validate the effectiveness of circ-pSBLA, we compare it with other state-of-the-art methods and carry out an ablation experiment on 17 sub-datasets. Moreover, we do motif analysis on 3 sub-datasets. The results show that circ-pSBLA achieves superior performance and outperforms other methods. All supporting source codes can be downloaded from https://github.com/gyj9811/circ-pSBLA.


Assuntos
RNA Circular , Proteínas de Ligação a RNA , RNA Circular/genética , Sítios de Ligação , Proteínas de Ligação a RNA/metabolismo , Software
14.
Methods ; 198: 32-44, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34748953

RESUMO

Accumulated studies have discovered that circular RNAs (CircRNAs) are closely related to many complex human diseases. Due to this close relationship, CircRNAs can be used as good biomarkers for disease diagnosis and therapeutic targets for treatments. However, the number of experimentally verified circRNA-disease associations are still fewer and also conducting wet-lab experiments are constrained by the small scale and cost of time and labour. Therefore, effective computational methods are required to predict associations between circRNAs and diseases which will be promising candidates for small scale biological and clinical experiments. In this paper, we propose novel computational models based on Graph Convolution Networks (GCN) for the potential circRNA-disease association prediction. Currently most of the existing prediction methods use shallow learning algorithms. Instead, the proposed models combine the strengths of deep learning and graphs for the computation. First, they integrate multi-source similarity information into the association network. Next, models predict potential associations using graph convolution which explore this important relational knowledge of that network structure. Two circRNA-disease association prediction models, GCN based Node Classification (GCN-NC) and GCN based Link Prediction (GCN-LP) are introduced in this work and they demonstrate promising results in various experiments and outperforms other existing methods. Further, a case study proves that some of the predicted results of the novel computational models were confirmed by published literature and all top results could be verified using gene-gene interaction networks.


Assuntos
Biologia Computacional , RNA Circular , Algoritmos , Biologia Computacional/métodos , Redes Reguladoras de Genes , Humanos , RNA Circular/genética
15.
Methods ; 203: 378-382, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-34245870

RESUMO

The primary sequences of DNA, RNA and protein have been used as the dominant information source of existing machine learning tools, especially for contexts not fully explored by wet-experimental approaches. Since molecular markers are profoundly orchestrated in the living organisms, those markers that cannot be unambiguously recovered from the primary sequence often help to predict other biological events. To the best of our knowledge, there is no current tool to build and deploy machine learning models that consider genomic evidence. We therefore developed the WHISTLE server, the first machine learning platform based on genomic coordinates. It features convenient covariate extraction and model web deployment with 46 distinct genomic features integrated along with the conventional sequence features. We showed that, when predicting m6A sites from SRAMP project, the model integrating genomic features substantially outperformed those based on only sequence features. The WHISTLE server should be a useful tool for studying biological attributes specifically associated with genomic coordinates, and is freely accessible at: www.xjtlu.edu.cn/biologicalsciences/whi2.


Assuntos
Aprendizado de Máquina , RNA , Biologia Computacional , Genômica , RNA/genética , RNA/metabolismo , Análise de Sequência de RNA
16.
Med Res Rev ; 42(1): 441-461, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34346083

RESUMO

Currently, the research of multi-omics, such as genomics, proteinomics, transcriptomics, microbiome, metabolomics, pathomics, and radiomics, are hot spots. The relationship between multi-omics data, drugs, and diseases has received extensive attention from researchers. At the same time, multi-omics can effectively predict the diagnosis, prognosis, and treatment of diseases. In essence, these research entities, such as genes, RNAs, proteins, microbes, metabolites, pathways as well as pathological and medical imaging data, can all be represented by the network at different levels. And some computer and biology scholars have tried to use computational methods to explore the potential relationships between biological entities. We summary a comprehensive research strategy, that is to build a multi-omics heterogeneous network, covering multimodal data, and use the current popular computational methods to make predictions. In this study, we first introduce the calculation method of the similarity of biological entities at the data level, second discuss multimodal data fusion and methods of feature extraction. Finally, the challenges and opportunities at this stage are summarized. Some scholars have used such a framework to calculate and predict. We also summarize them and discuss the challenges. We hope that our review could help scholars who are interested in the field of bioinformatics, biomedical image, and computer research.


Assuntos
Microbiota , Transcriptoma , Biologia Computacional , Genômica , Humanos , Metabolômica , Microbiota/genética
17.
Methods ; 192: 25-34, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-32798654

RESUMO

Cumulative experimental studies have demonstrated the critical roles of microRNAs (miRNAs) in the diverse fundamental and important biological processes, and in the development of numerous complex human diseases. Thus, exploring the relationships between miRNAs and diseases is helpful with understanding the mechanisms, the detection, diagnosis, and treatment of complex diseases. As the identification of miRNA-disease associations via traditional biological experiments is time-consuming and expensive, an effective computational prediction method is appealing. In this study, we present a deep learning framework with variational graph auto-encoder for miRNA-disease association prediction (VGAE-MDA). VGAE-MDA first gets the representations of miRNAs and diseases from the heterogeneous networks constructed by miRNA-miRNA similarity, disease-disease similarity, and known miRNA-disease associations. Then, VGAE-MDA constructs two sub-networks: miRNA-based network and disease-based network. Combining the representations based on the heterogeneous network, two variational graph auto-encoders (VGAE) are deployed for calculating the miRNA-disease association scores from two sub-networks, respectively. Lastly, VGAE-MDA obtains the final predicted association score for a miRNA-disease pair by integrating the scores from these two trained networks. Unlike the previous model, the VGAE-MDA can mitigate the effect of noises from random selection of negative samples. Besides, the use of graph convolutional neural (GCN) network can naturally incorporate the node features from the graph structure while the variational autoencoder (VAE) makes use of latent variables to predict associations from the perspective of data distribution. The experimental results show that VGAE-MDA outperforms the state-of-the-art approaches in miRNA-disease association prediction. Besides, the effectiveness of our model has been further demonstrated by case studies.


Assuntos
MicroRNAs/genética , Algoritmos , Biologia Computacional , Humanos , Redes Neurais de Computação
18.
BMC Bioinformatics ; 22(1): 19, 2021 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-33413092

RESUMO

BACKGROUND: Circular RNAs (circRNAs) are widely expressed in cells and tissues and are involved in biological processes and human diseases. Recent studies have demonstrated that circRNAs can interact with RNA-binding proteins (RBPs), which is considered an important aspect for investigating the function of circRNAs. RESULTS: In this study, we design a slight variant of the capsule network, called circRB, to identify the sequence specificities of circRNAs binding to RBPs. In this model, the sequence features of circRNAs are extracted by convolution operations, and then, two dynamic routing algorithms in a capsule network are employed to discriminate between different binding sites by analysing the convolution features of binding sites. The experimental results show that the circRB method outperforms the existing computational methods. Afterwards, the trained models are applied to detect the sequence motifs on the seven circRNA-RBP bound sequence datasets and matched to known human RNA motifs. Some motifs on circular RNAs overlap with those on linear RNAs. Finally, we also predict binding sites on the reported full-length sequences of circRNAs interacting with RBPs, attempting to assist current studies. We hope that our model will contribute to better understanding the mechanisms of the interactions between RBPs and circRNAs. CONCLUSION: In view of the poor studies about the sequence specificities of circRNA-binding proteins, we designed a classification framework called circRB based on the capsule network. The results show that the circRB method is an effective method, and it achieves higher prediction accuracy than other methods.


Assuntos
Biologia Computacional/métodos , RNA Circular , Algoritmos , Sítios de Ligação , Humanos , RNA Circular/genética , RNA Circular/metabolismo , Proteínas de Ligação a RNA
19.
BMC Bioinformatics ; 21(1): 229, 2020 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-32503474

RESUMO

BACKGROUND: Circular RNA (circRNA) has been extensively identified in cells and tissues, and plays crucial roles in human diseases and biological processes. circRNA could act as dynamic scaffolding molecules that modulate protein-protein interactions. The interactions between circRNA and RNA Binding Proteins (RBPs) are also deemed to an essential element underlying the functions of circRNA. Considering cost-heavy and labor-intensive aspects of these biological experimental technologies, instead, the high-throughput experimental data has enabled the large-scale prediction and analysis of circRNA-RBP interactions. RESULTS: A computational framework is constructed by employing Positive Unlabeled learning (P-U learning) to predict unknown circRNA-RBP interaction pairs with kernel model MFNN (Matrix Factorization with Neural Networks). The neural network is employed to extract the latent factors of circRNA and RBP in the interaction matrix, the P-U learning strategy is applied to alleviate the imbalanced characteristics of data samples and predict unknown interaction pairs. For this purpose, the known circRNA-RBP interaction data samples are collected from the circRNAs in cancer cell lines database (CircRic), and the circRNA-RBP interaction matrix is constructed as the input of the model. The experimental results show that kernel MFNN outperforms the other deep kernel models. Interestingly, it is found that the deeper of hidden layers in neural network framework does not mean the better in our model. Finally, the unlabeled interactions are scored using P-U learning with MFNN kernel, and the predicted interaction pairs are matched to the known interactions database. The results indicate that our method is an effective model to analyze the circRNA-RBP interactions. CONCLUSION: For a poorly studied circRNA-RBP interactions, we design a prediction framework only based on interaction matrix by employing matrix factorization and neural network. We demonstrate that MFNN achieves higher prediction accuracy, and it is an effective method.


Assuntos
Redes Neurais de Computação , RNA Circular/metabolismo , Proteínas de Ligação a RNA/metabolismo , Área Sob a Curva , Linhagem Celular Tumoral , Bases de Dados Factuais , Humanos , Neoplasias/genética , Neoplasias/patologia , Curva ROC
20.
Curr Genomics ; 21(1): 67-76, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-32655300

RESUMO

INTRODUCTION: N 6-methyladenosine (m6A) is one of the most widely studied epigenetic modifications. It plays important roles in various biological processes, such as splicing, RNA localization and degradation, many of which are related to the functions of introns. Although a number of computational approaches have been proposed to predict the m6A sites in different species, none of them were optimized for intronic m6A sites. As existing experimental data overwhelmingly relied on polyA selection in sample preparation and the intronic RNAs are usually underrepresented in the captured RNA library, the accuracy of general m6A sites prediction approaches is limited for intronic m6A sites prediction task. METHODOLOGY: A computational framework, WITMSG, dedicated to the large-scale prediction of intronic m6A RNA methylation sites in humans has been proposed here for the first time. Based on the random forest algorithm and using only known intronic m6A sites as the training data, WITMSG takes advantage of both conventional sequence features and a variety of genomic characteristics for improved prediction performance of intron-specific m6A sites. RESULTS AND CONCLUSION: It has been observed that WITMSG outperformed competing approaches (trained with all the m6A sites or intronic m6A sites only) in 10-fold cross-validation (AUC: 0.940) and when tested on independent datasets (AUC: 0.946). WITMSG was also applied intronome-wide in humans to predict all possible intronic m6A sites, and the prediction results are freely accessible at http://rnamd.com/intron/.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA