RESUMEN
MOTIVATION: Accumulating evidences have indicated that microRNA (miRNA) plays a crucial role in the pathogenesis and progression of various complex diseases. Inferring disease-associated miRNAs is significant to explore the etiology, diagnosis and treatment of human diseases. As the biological experiments are time-consuming and labor-intensive, developing effective computational methods has become indispensable to identify associations between miRNAs and diseases. RESULTS: We present an Ensemble learning framework with Resampling method for MiRNA-Disease Association (ERMDA) prediction to discover potential disease-related miRNAs. Firstly, the resampling strategy is proposed for building multiple different balanced training subsets to address the challenge of sample imbalance within the database. Then, ERMDA extracts miRNA and disease feature representations by integrating miRNA-miRNA similarities, disease-disease similarities and experimentally verified miRNA-disease association information. Next, the feature selection approach is applied to reduce the redundant information and increase the diversity among these subsets. Lastly, ERMDA constructs an individual learner on each subset to yield primitive outcomes, and the soft voting method is introduced for making the final decision based on the prediction results of individual learners. A series of experimental results demonstrates that ERMDA outperforms other state-of-the-art methods on both balanced and unbalanced testing sets. Besides, case studies conducted on the three human diseases further confirm the ERMDA's prediction capability for identifying potential disease-related miRNAs. In conclusion, these experimental results demonstrate that our method can serve as an effective and reliable tool for researchers to explore the regulatory role of miRNAs in complex diseases.
Asunto(s)
Enfermedad/genética , Estudios de Asociación Genética , Aprendizaje Automático , MicroARNs/genética , Algoritmos , Biología Computacional , Predisposición Genética a la Enfermedad/genética , HumanosRESUMEN
Predicting potential drug-disease associations (RDAs) plays a pivotal role in elucidating therapeutic strategies for diseases and facilitating drug repositioning, making it of paramount importance. However, existing methods are constrained and rely heavily on limited domain-specific knowledge, impeding their ability to effectively predict candidate associations between drugs and diseases. Moreover, the simplistic definition of unknown information pertaining to drug-disease relationships as negative samples presents inherent limitations. To overcome these challenges, we introduce a novel hierarchical negative sampling-based graph contrastive model, termed HSGCLRDA, which aims to forecast latent associations between drugs and diseases. In this study, HSGCLRDA integrates the association information as well as similarity between drugs, diseases and proteins. Meanwhile, the model constructs a drug-disease-protein heterogeneous network. Subsequently, employing a hierarchical structural sampling technique, we establish reliable negative drug-disease samples utilizing PageRank algorithms. Utilizing meta-path aggregation within the heterogeneous network, we derive low-dimensional representations for drugs and diseases, thereby constructing global and local feature graphs that capture their interactions comprehensively. To obtain representation information, we adopt a self-supervised graph contrastive approach that leverages graph convolutional networks (GCNs) and second-order GCNs to extract feature graph information. Furthermore, we integrate a contrastive cost function derived from the cross-entropy cost function, facilitating holistic model optimization. Experimental results obtained from benchmark datasets not only showcase the superior performance of HSGCLRDA compared to various baseline methods in predicting RDAs but also emphasize its practical utility in identifying novel potential diseases associated with existing drugs through meticulous case studies.
Asunto(s)
Algoritmos , Biología Computacional , Humanos , Biología Computacional/métodos , Aprendizaje Automático , Reposicionamiento de Medicamentos/métodos , Enfermedad/clasificación , Preparaciones FarmacéuticasRESUMEN
The search for potential drug-disease associations (DDA) can speed up drug development cycles, reduce costly wasted resources, and accelerate disease treatment by repurposing existing drugs that can control further disease progression. As technologies such as deep learning continue to mature, many researchers tend to use emerging technologies to predict potential DDA. The performance of DDA prediction is still challenging and there is some space for improvement due to issues such as the small number of existing associations and possible noise in the data. To better predict DDA, we propose a computational approach based on hypergraph learning with subgraph matching (HGDDA). In particular, HGDDA first extracts feature subgraph information in the validated drug-disease association network and proposes a negative sampling strategy based on similarity network to reduce the data imbalance. Second, the hypergraph Unet module is used by extracting Finally, the potential DDA is predicted by designing a hypergraph combination module to convolution and pooling the two constructed hypergraphs separately, and calculating the difference information between the subgraphs using cosine similarity for node matching. The performance of HGDDA is verified under two standard datasets by 10-fold cross-validation (10-CV), and the results outperform existing drug-disease prediction methods. In addition, to validate the overall utility of the model, the top 10 drugs for the specific disease are predicted through the case study and validated using the CTD database.
Asunto(s)
Algoritmos , Biología Computacional , Bases de Datos Factuales , Biología Computacional/métodosRESUMEN
Antimicrobial resistance is an increasing threat to human populations. The emergence of multidrug-resistant "superbugs" in mycobacterial infections has further complicated the processes of curing patients, thereby resulting in high morbidity and mortality. Early diagnosis and alternative treatment are important for improving the success and cure rates associated with mycobacterial infections and the use of mycobacteriophages is a potentially good option. Since each bacteriophage has its own host range, mycobacteriophages have the capacity to detect specific mycobacterial isolates. The bacteriolysis properties of mycobacteriophages make them more attractive when it comes to treating infectious diseases. In fact, they have been clinically applied in Eastern Europe for several decades. Therefore, mycobacteriophages can also treat mycobacteria infections. This review explores the potential clinical applications of mycobacteriophages, including phage-based diagnosis and phage therapy in mycobacterial infections. Furthermore, this review summarizes the current difficulties in phage therapy, providing insights into new treatment strategies against drug-resistant mycobacteria.
RESUMEN
Studies have shown that IncRNA-miRNA interactions can affect cellular expression at the level of gene molecules through a variety of regulatory mechanisms and have important effects on the biological activities of living organisms. Several biomolecular network-based approaches have been proposed to accelerate the identification of lncRNA-miRNA interactions. However, most of the methods cannot fully utilize the structural and topological information of the lncRNA-miRNA interaction network. In this article, we proposed a new method, ISLMI, a prediction model based on information injection and second order graph convolution network(SOGCN). The model calculated the sequence similarity and Gaussian interaction profile kernel similarity between lncRNA and miRNA, fused them to enhance the intrinsic interaction between the nodes, using SOGCN to learn second-order representations of similarity matrix information. At the same time, multiple feature representations obtain using different graph embedding methods were also injected into the second-order graph representation. Finally, matrix complementation was used to increase the model accuracy. The model combined the advantages of different methods and achieved reliable performance in 5-fold cross-validation, significantly improved the performance of predicting lncRNA-miRNA interactions. In addition, our model successfully confirmed the superiority of ISLMI by comparing it with several other model algorithm.
Asunto(s)
MicroARNs , ARN Largo no Codificante , MicroARNs/genética , MicroARNs/metabolismo , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , Biología Computacional/métodos , AlgoritmosRESUMEN
Recent studies have found that lncRNA (long non-coding RNA) in ncRNA (non-coding RNA) is not only involved in many biological processes, but also abnormally expressed in many complex diseases. Identification of lncRNA-disease associations accurately is of great significance for understanding the function of lncRNA and disease mechanism. In this paper, a deep learning framework consisting of stacked autoencoder(SAE), multi-scale ResNet and stacked ensemble module, named DHNLDA, was constructed to predict lncRNA-disease associations, which integrates multiple biological data sources and constructing feature matrices. Among them, the biological data including the similarity and the interaction of lncRNAs, diseases and miRNAs are integrated. The feature matrices are obtained by node2vec embedding and feature extraction respectively. Then, the SAE and the multi-scale ResNet are used to learn the complementary information between nodes, and the high-level features of node attributes are obtained. Finally, the fusion of high-level feature is input into the stacked ensemble module to obtain the prediction results of lncRNA-disease associations. The experimental results of five-fold cross-validation show that the AUC of DHNLDA reaches 0.975 better than the existing methods. Case studies of stomach cancer, breast cancer and lung cancer have shown the great ability of DHNLDA to discover the potential lncRNA-disease associations.
Asunto(s)
Neoplasias de la Mama , MicroARNs , ARN Largo no Codificante , Neoplasias Gástricas , Humanos , Femenino , ARN Largo no Codificante/genética , Algoritmos , MicroARNs/genética , Neoplasias de la Mama/genética , Neoplasias Gástricas/genética , Biología Computacional/métodosRESUMEN
Long non-coding RNAs (lncRNAs) play a regulatory role in many biological cells, and the recognition of lncRNA-protein interactions is helpful to reveal the functional mechanism of lncRNAs. Identification of lncRNA-protein interaction by biological techniques is costly and time-consuming. Here, an ensemble learning framework, RLF-LPI is proposed, to predict lncRNA-protein interactions. The RLF-LPI of the residual LSTM autoencoder module with fusion attention mechanism can extract the potential representation of features and capture the dependencies between sequences and structures by k-mer method. Finally, the relationship between lncRNA and protein is learned through the method of fuzzy decision. The experimental results show that the ACC of RLF-LPI is 0.912 on ATH948 dataset and 0.921 on ZEA22133 dataset. Thus, it is demonstrated that our proposed method performed better in predicting lncRNA-protein interaction than other methods.
Asunto(s)
ARN Largo no Codificante , Biología Computacional/métodos , Aprendizaje Automático , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismoRESUMEN
The diversification of the characteristic sequences of anti-cancer peptides has imposed difficulties on research. To effectively predict new anti-cancer peptides, this paper proposes a more suitable feature grouping sequence and spatial dimension-integrated network algorithm for anti-cancer peptide sequence prediction called GRCI-Net. The main process is as follows: First, we implemented the fusion reduction of binary structure features and K-mer sparse matrix features through principal component analysis and generated a set of new features; second, we constructed a new bidirectional long- and short-term memory network. We used traditional convolution and dilated convolution to acquire features in the spatial dimension using the memory network's grouping sequence model, which is designed to better handle the diversification of anti-cancer peptide feature sequences and to fully learn the contextual information between features. Finally, we achieved the fusion of grouping sequence features and spatial dimensional integration features through two sets of dense network layers, achieved the prediction of anti-cancer peptides through the sigmoid function, and verified the approach with two public datasets, ACP740 (accuracy reached 0.8230) and ACP240 (accuracy reached 0.8750). The following is a link to the model code and datasets mentioned in this article: https://github.com/ YouHongfeng101/ACP-DL.
Asunto(s)
Redes Neurales de la Computación , Péptidos , Algoritmos , Secuencia de AminoácidosRESUMEN
RNA binding protein (RBP) is extensively involved in various cellular regulatory processes through the interaction with RNAs. Capturing the RBP binding preferences is fundamental for revealing the pathogenesis of complex diseases. Many experimental detection techniques are still time-consuming and labor-intensive, therefore, it is indispensable to develop a computational method with convincing accuracy. In this study, we proposed a CNN-BLSTM hybrid deep learning framework, named DeepDW, for predicting the RBP binding sites on RNAs with high-order encoding features of RNA sequence and secondary structure. The high-order encoding strategy was used to characterize the dependencies among adjacency nucleotides. For CNN-BLSTM hybrid model, DeepDW first employed two 1-D convolutional neural networks (CNNs) for learning the local features from high-order encoded matrices of RNA sequence and structure separately, and then applied two bidirectional long short-term memory networks (BLSTMs) to capture the global information in a higher level. Moreover, a series of experiments were carried out on 31 public datasets to evaluate our proposed framework, and DeepDW achieved superior performance than the state-of-the-art methods. The results indicated that the combination of high-order encoding method and CNN-BLSTM hybrid model had advantages in identifying RBP-RNA binding sites.
Asunto(s)
Redes Neurales de la Computación , ARN , Sitios de Unión/genética , Unión Proteica , ARN/genética , ARN/metabolismo , Proteínas de Unión al ARN/químicaRESUMEN
Long non-coding RNA(lncRNA) can interact with microRNA(miRNA) and play an important role in inhibiting or activating the expression of target genes and the occurrence and development of tumors. Accumulating studies focus on the prediction of miRNA-lncRNA interaction, and mostly are concerned with biological experiments and machine learning methods. These methods are found with long cycles, high costs, and requiring over much human intervention. In this paper, a data-driven hierarchical deep learning framework was proposed, which was composed of a capsule network, an independent recurrent neural network with attention mechanism and bi-directional long short-term memory network. This framework combines the advantages of different networks, uses multiple sequence-derived features of the original sequence and features of secondary structure to mine the dependency between features, and devotes to obtain better results. In the experiment, five-fold cross-validation was used to evaluate the performance of the model, and the zea mays data set was compared with the different model to obtain better classification effect. In addition, sorghum, brachypodium distachyon and bryophyte data sets were used to test the model, and the accuracy reached 0.9850, 0.9859 and 0.9777, respectively, which verified the model's good generalization ability.
Asunto(s)
Aprendizaje Profundo , MicroARNs , ARN Largo no Codificante , Biología Computacional/métodos , Humanos , Aprendizaje Automático , MicroARNs/genética , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismoRESUMEN
Long non-coding RNA(lncRNA) is one of the non-coding RNAs longer than 200 nucleotides and it has no protein encoding function. LncRNA plays a key role in many biological processes. Studying the RNA-binding protein (RBP) binding sites on the lncRNA chain helps to reveal epigenetic and post-transcriptional mechanisms, to explore the physiological and pathological processes of cancer, and to discover new therapeutic breakthroughs. To improve the recognition rate of RBP binding sites and reduce the experimental time and cost, many calculation methods based on domain knowledge to predict RBP binding sites have emerged. However, these prediction methods are independent of nucleotides and do not take into account nucleotide statistics. In this paper, we use a high-order statistical-based encoding scheme, then the encoded lncRNA sequences are fed into a hybrid deep learning architecture named AC-Caps. It consists of a joint processing layer(composed of attention mechanism and convolutional neural network) and a capsule network. The AC-Caps model was evaluated using 31 independent experimental data sets from 12 lncRNA-binding proteins. In experiments, our method achieves excellent performance, with an average area under the curve (AUC) of 0.967 and an average accuracy (ACC) of 92.5%, which are 0.014, 2.3%, 0.261, 28.9%, 0.189, and 21.8% higher than HOCCNNLB, iDeepS, and DeepBind, respectively. The results show that the AC-Caps method can reliably process the large-scale RBP binding site data on the lncRNA chain, and the prediction performance is better than existing deep-learning models. The source code of AC-Caps and the datasets used in this paper are available at https://github.com/JinmiaoS/AC-Caps .