Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 44
Filter
Add more filters










Publication year range
1.
Mol Ther Nucleic Acids ; 35(2): 102187, 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38706631

ABSTRACT

Long non-coding RNAs (lncRNAs) are important factors involved in biological regulatory networks. Accurately predicting lncRNA-protein interactions (LPIs) is vital for clarifying lncRNA's functions and pathogenic mechanisms. Existing deep learning models have yet to yield satisfactory results in LPI prediction. Recently, graph autoencoders (GAEs) have seen rapid development, excelling in tasks like link prediction and node classification. We employed GAE technology for LPI prediction, devising the FMSRT-LPI model based on path masking and degree regression strategies and thereby achieving satisfactory outcomes. This represents the first known integration of path masking and degree regression strategies into the GAE framework for potential LPI inference. The effectiveness of our FMSRT-LPI model primarily relies on four key aspects. First, within the GAE framework, our model integrates multi-source relationships of lncRNAs and proteins with LPN's topological data. Second, the implemented masking strategy efficiently identifies LPN's key paths, reconstructs the network, and reduces the impact of redundant or incorrect data. Third, the integrated degree decoder balances degree and structural information, enhancing node representation. Fourth, the PolyLoss function we introduced is more appropriate for LPI prediction tasks. The results on multiple public datasets further demonstrate our model's potential in LPI prediction.

2.
Comput Biol Med ; 176: 108543, 2024 May 03.
Article in English | MEDLINE | ID: mdl-38744015

ABSTRACT

Proteins play a vital role in various biological processes and achieve their functions through protein-protein interactions (PPIs). Thus, accurate identification of PPI sites is essential. Traditional biological methods for identifying PPIs are costly, labor-intensive, and time-consuming. The development of computational prediction methods for PPI sites offers promising alternatives. Most known deep learning (DL) methods employ layer-wise multi-scale CNNs to extract features from protein sequences. But, these methods usually neglect the spatial positions and hierarchical information embedded within protein sequences, which are actually crucial for PPI site prediction. In this paper, we propose MR2CPPIS, a novel sequence-based DL model that utilizes the multi-scale Res2Net with coordinate attention mechanism to exploit multi-scale features and enhance PPI site prediction capability. We leverage the multi-scale Res2Net to expand the receptive field for each network layer, thus capturing multi-scale information of protein sequences at a granular level. To further explore the local contextual features of each target residue, we employ a coordinate attention block to characterize the precise spatial position information, enabling the network to effectively extract long-range dependencies. We evaluate our MR2CPPIS on three public benchmark datasets (Dset 72, Dset 186, and PDBset 164), achieving state-of-the-art performance. The source codes are available at https://github.com/YyinGong/MR2CPPIS.

3.
Comput Biol Med ; 174: 108484, 2024 May.
Article in English | MEDLINE | ID: mdl-38643595

ABSTRACT

Accurately identifying cancer driver genes (CDGs) is crucial for guiding cancer treatment and has recently received great attention from researchers. However, the high complexity and heterogeneity of cancer gene regulatory networks limit the precition accuracy of existing deep learning models. To address this, we introduce a model called SCIS-CDG that utilizes Schur complement graph augmentation and independent subspace feature extraction techniques to effectively predict potential CDGs. Firstly, a random Schur complement strategy is adopted to generate two augmented views of gene network within a graph contrastive learning framework. Rapid randomization of the random Schur complement strategy enhances the model's generalization and its ability to handle complex networks effectively. Upholding the Schur complement principle in expectations promotes the preservation of the original gene network's vital structure in the augmented views. Subsequently, we employ feature extraction technology using multiple independent subspaces, each trained with independent weights to reduce inter-subspace dependence and improve the model's expressiveness. Concurrently, we introduced a feature expansion component based on the structure of the gene network to address issues arising from the limited dimensionality of node features. Moreover, it can alleviate the challenges posed by the heterogeneity of cancer gene networks to some extent. Finally, we integrate a learnable attention weight mechanism into the graph neural network (GNN) encoder, utilizing feature expansion technology to optimize the significance of various feature levels in the prediction task. Following extensive experimental validation, the SCIS-CDG model has exhibited high efficiency in identifying known CDGs and uncovering potential unknown CDGs in external datasets. Particularly when compared to previous conventional GNN models, its performance has seen significant improved. The code and data are publicly available at: https://github.com/mxqmxqmxq/SCIS-CDG.


Subject(s)
Gene Regulatory Networks , Neoplasms , Humans , Neoplasms/genetics , Computational Biology/methods , Deep Learning , Algorithms
4.
Bioinformatics ; 40(5)2024 May 02.
Article in English | MEDLINE | ID: mdl-38648052

ABSTRACT

MOTIVATION: Accurate inference of potential drug-protein interactions (DPIs) aids in understanding drug mechanisms and developing novel treatments. Existing deep learning models, however, struggle with accurate node representation in DPI prediction, limiting their performance. RESULTS: We propose a new computational framework that integrates global and local features of nodes in the drug-protein bipartite graph for efficient DPI inference. Initially, we employ pre-trained models to acquire fundamental knowledge of drugs and proteins and to determine their initial features. Subsequently, the MinHash and HyperLogLog algorithms are utilized to estimate the similarity and set cardinality between drug and protein subgraphs, serving as their local features. Then, an energy-constrained diffusion mechanism is integrated into the transformer architecture, capturing interdependencies between nodes in the drug-protein bipartite graph and extracting their global features. Finally, we fuse the local and global features of nodes and employ multilayer perceptrons to predict the likelihood of potential DPIs. A comprehensive and precise node representation guarantees efficient prediction of unknown DPIs by the model. Various experiments validate the accuracy and reliability of our model, with molecular docking results revealing its capability to identify potential DPIs not present in existing databases. This approach is expected to offer valuable insights for furthering drug repurposing and personalized medicine research. AVAILABILITY AND IMPLEMENTATION: Our code and data are accessible at: https://github.com/ZZCrazy00/DPI.


Subject(s)
Algorithms , Molecular Docking Simulation , Proteins , Proteins/chemistry , Proteins/metabolism , Pharmaceutical Preparations/chemistry , Pharmaceutical Preparations/metabolism , Computational Biology/methods , Deep Learning
5.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38446739

ABSTRACT

Antimicrobial peptides (AMPs), short peptides with diverse functions, effectively target and combat various organisms. The widespread misuse of chemical antibiotics has led to increasing microbial resistance. Due to their low drug resistance and toxicity, AMPs are considered promising substitutes for traditional antibiotics. While existing deep learning technology enhances AMP generation, it also presents certain challenges. Firstly, AMP generation overlooks the complex interdependencies among amino acids. Secondly, current models fail to integrate crucial tasks like screening, attribute prediction and iterative optimization. Consequently, we develop a integrated deep learning framework, Diff-AMP, that automates AMP generation, identification, attribute prediction and iterative optimization. We innovatively integrate kinetic diffusion and attention mechanisms into the reinforcement learning framework for efficient AMP generation. Additionally, our prediction module incorporates pre-training and transfer learning strategies for precise AMP identification and screening. We employ a convolutional neural network for multi-attribute prediction and a reinforcement learning-based iterative optimization strategy to produce diverse AMPs. This framework automates molecule generation, screening, attribute prediction and optimization, thereby advancing AMP research. We have also deployed Diff-AMP on a web server, with code, data and server details available in the Data Availability section.


Subject(s)
Amino Acids , Antimicrobial Peptides , Anti-Bacterial Agents , Diffusion , Kinetics
6.
Molecules ; 29(6)2024 Mar 10.
Article in English | MEDLINE | ID: mdl-38542866

ABSTRACT

The development of effective inhibitors targeting the Kirsten rat sarcoma viral proto-oncogene (KRASG12D) mutation, a prevalent oncogenic driver in cancer, represents a significant unmet need in precision medicine. In this study, an integrated computational approach combining structure-based virtual screening and molecular dynamics simulation was employed to identify novel noncovalent inhibitors targeting the KRASG12D variant. Through virtual screening of over 1.7 million diverse compounds, potential lead compounds with high binding affinity and specificity were identified using molecular docking and scoring techniques. Subsequently, 200 ns molecular dynamics simulations provided critical insights into the dynamic behavior, stability, and conformational changes of the inhibitor-KRASG12D complexes, facilitating the selection of lead compounds with robust binding profiles. Additionally, in silico absorption, distribution, metabolism, excretion (ADME) profiling, and toxicity predictions were applied to prioritize the lead compounds for further experimental validation. The discovered noncovalent KRASG12D inhibitors exhibit promises as potential candidates for targeted therapy against KRASG12D-driven cancers. This comprehensive computational framework not only expedites the discovery of novel KRASG12D inhibitors but also provides valuable insights for the development of precision treatments tailored to this oncogenic mutation.


Subject(s)
Molecular Dynamics Simulation , Neoplasms , Humans , Proto-Oncogene Proteins p21(ras)/genetics , Molecular Docking Simulation , Mutation
7.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38555479

ABSTRACT

MOTIVATION: Accurately predicting molecular metabolic stability is of great significance to drug research and development, ensuring drug safety and effectiveness. Existing deep learning methods, especially graph neural networks, can reveal the molecular structure of drugs and thus efficiently predict the metabolic stability of molecules. However, most of these methods focus on the message passing between adjacent atoms in the molecular graph, ignoring the relationship between bonds. This makes it difficult for these methods to estimate accurate molecular representations, thereby being limited in molecular metabolic stability prediction tasks. RESULTS: We propose the MS-BACL model based on bond graph augmentation technology and contrastive learning strategy, which can efficiently and reliably predict the metabolic stability of molecules. To our knowledge, this is the first time that bond-to-bond relationships in molecular graph structures have been considered in the task of metabolic stability prediction. We build a bond graph based on 'atom-bond-atom', and the model can simultaneously capture the information of atoms and bonds during the message propagation process. This enhances the model's ability to reveal the internal structure of the molecule, thereby improving the structural representation of the molecule. Furthermore, we perform contrastive learning training based on the molecular graph and its bond graph to learn the final molecular representation. Multiple sets of experimental results on public datasets show that the proposed MS-BACL model outperforms the state-of-the-art model. AVAILABILITY AND IMPLEMENTATION: The code and data are publicly available at https://github.com/taowang11/MS.


Subject(s)
Neural Networks, Computer
8.
Article in English | MEDLINE | ID: mdl-38386576

ABSTRACT

Improving the drug development process can expedite the introduction of more novel drugs that cater to the demands of precision medicine. Accurately predicting molecular properties remains a fundamental challenge in drug discovery and development. Currently, a plethora of computer-aided drug discovery (CADD) methods have been widely employed in the field of molecular prediction. However, most of these methods primarily analyze molecules using low-dimensional representations such as SMILES notations, molecular fingerprints, and molecular graph-based descriptors. Only a few approaches have focused on incorporating and utilizing high-dimensional spatial structural representations of molecules. In light of the advancements in artificial intelligence, we introduce a 3D graph-spatial co-representation model called AEGNN-M, which combines two graph neural networks, GAT and EGNN. AEGNN-M enables learning of information from both molecular graphs representations and 3D spatial structural representations to predict molecular properties accurately. We conducted experiments on seven public datasets, three regression datasets and 14 breast cancer cell line phenotype screening datasets, comparing the performance of AEGNN-M with state-of-the-art deep learning methods. Extensive experimental results demonstrate the satisfactory performance of the AEGNN-M model. Furthermore, we analyzed the performance impact of different modules within AEGNN-M and the influence of spatial structural representations on the model's performance. The interpretability analysis also revealed the significance of specific atoms in determining particular molecular properties.

9.
Brief Funct Genomics ; 2024 Feb 22.
Article in English | MEDLINE | ID: mdl-38391194

ABSTRACT

MicroRNAs (miRNAs) are found ubiquitously in biological cells and play a pivotal role in regulating the expression of numerous target genes. Therapies centered around miRNAs are emerging as a promising strategy for disease treatment, aiming to intervene in disease progression by modulating abnormal miRNA expressions. The accurate prediction of miRNA-drug resistance (MDR) is crucial for the success of miRNA therapies. Computational models based on deep learning have demonstrated exceptional performance in predicting potential MDRs. However, their effectiveness can be compromised by errors in the data acquisition process, leading to inaccurate node representations. To address this challenge, we introduce the GAM-MDR model, which combines the graph autoencoder (GAE) with random path masking techniques to precisely predict potential MDRs. The reliability and effectiveness of the GAM-MDR model are mainly reflected in two aspects. Firstly, it efficiently extracts the representations of miRNA and drug nodes in the miRNA-drug network. Secondly, our designed random path masking strategy efficiently reconstructs critical paths in the network, thereby reducing the adverse impact of noisy data. To our knowledge, this is the first time that a random path masking strategy has been integrated into a GAE to infer MDRs. Our method was subjected to multiple validations on public datasets and yielded promising results. We are optimistic that our model could offer valuable insights for miRNA therapeutic strategies and deepen the understanding of the regulatory mechanisms of miRNAs. Our data and code are publicly available at GitHub:https://github.com/ZZCrazy00/GAM-MDR.

10.
Comput Biol Med ; 171: 108104, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38335821

ABSTRACT

Drug-food interactions (DFIs) crucially impact patient safety and drug efficacy by modifying absorption, distribution, metabolism, and excretion. The application of deep learning for predicting DFIs is promising, yet the development of computational models remains in its early stages. This is mainly due to the complexity of food compounds, challenging dataset developers in acquiring comprehensive ingredient data, often resulting in incomplete or vague food component descriptions. DFI-MS tackles this issue by employing an accurate feature representation method alongside a refined computational model. It innovatively achieves a more precise characterization of food features, a previously daunting task in DFI research. This is accomplished through modules designed for perturbation interactions, feature alignment and domain separation, and inference feedback. These modules extract essential information from features, using a perturbation module and a feature interaction encoder to establish robust representations. The feature alignment and domain separation modules are particularly effective in managing data with diverse frequencies and characteristics. DFI-MS stands out as the first in its field to combine data augmentation, feature alignment, domain separation, and contrastive learning. The flexibility of the inference feedback module allows its application in various downstream tasks. Demonstrating exceptional performance across multiple datasets, DFI-MS represents a significant advancement in food presentations technology. Our code and data are available at https://github.com/kkkayle/DFI-MS.


Subject(s)
Food-Drug Interactions , Food , Humans , Supervised Machine Learning
11.
Mol Ther Nucleic Acids ; 35(1): 102103, 2024 Mar 12.
Article in English | MEDLINE | ID: mdl-38261851

ABSTRACT

Inferring small molecule-miRNA associations (MMAs) is crucial for revealing the intricacies of biological processes and disease mechanisms. Deep learning, renowned for its exceptional speed and accuracy, is extensively used for predicting MMAs. However, given their heavy reliance on data, inaccuracies during data collection can make these methods susceptible to noise interference. To address this challenge, we introduce the joint masking and self-supervised (JMSS)-MMA model. This model synergizes graph autoencoders with a probability distribution-based masking strategy, effectively countering the impact of noisy data and enabling precise predictions of unknown MMAs. Operating in a self-supervised manner, it deeply encodes the relationship data of small molecules and miRNA through the graph autoencoder, delving into its latent information. Our masking strategy has successfully reduced data noise, enhancing prediction accuracy. To our knowledge, this is the pioneering integration of a masking strategy with graph autoencoders for MMA prediction. Furthermore, the JMSS-MMA model incorporates a node-degree-based decoder, deepening the understanding of the network's structure. Experiments on two mainstream datasets confirm the model's efficiency and precision, and ablation studies further attest to its robustness. We firmly believe that this model will revolutionize drug development, personalized medicine, and biomedical research.

12.
J Chem Inf Model ; 64(7): 2798-2806, 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-37643082

ABSTRACT

Plant small secretory peptides (SSPs) play an important role in the regulation of biological processes in plants. Accurately predicting SSPs enables efficient exploration of their functions. Traditional experimental verification methods are very reliable and accurate, but they require expensive equipment and a lot of time. The method of machine learning speeds up the prediction process of SSPs, but the instability of feature extraction will also lead to further limitations of this type of method. Therefore, this paper proposes a new feature-correction-based model for SSP recognition in plants, abbreviated as SE-SSP. The model mainly includes the following three advantages: First, the use of transformer encoders can better reveal implicit features. Second, design a feature correction module suitable for sequences, named 2-D SENET, to adaptively adjust the features to obtain a more robust feature representation. Third, stack multiple linear modules to further dig out the deep information on the sample. At the same time, the training based on a contrastive learning strategy can alleviate the problem of sparse samples. We construct experiments on publicly available data sets, and the results verify that our model shows an excellent performance. The proposed model can be used as a convenient and effective SSP prediction tool in the future. Our data and code are publicly available at https://github.com/wrab12/SE-SSP/.


Subject(s)
Electric Power Supplies , Machine Learning , Biological Transport , Peptides , Research Design
13.
J Chem Inf Model ; 64(7): 2912-2920, 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-37920888

ABSTRACT

Deep learning methods can accurately study noncoding RNA protein interactions (NPI), which is of great significance in gene regulation, human disease, and other fields. However, the computational method for predicting NPI in large-scale dynamic ncRNA protein bipartite graphs is rarely discussed, which is an online modeling and prediction problem. In addition, the results published by researchers on the Web site cannot meet real-time needs due to the large amount of basic data and long update cycles. Therefore, we propose a real-time method based on the dynamic ncRNA-protein bipartite graph learning framework, termed ML-GNN, which can model and predict the NPIs in real time. Our proposed method has the following advantages: first, the meta-learning strategy can alleviate the problem of large prediction errors in sparse neighborhood samples; second, dynamic modeling of newly added data can reduce computational pressure and predict NPIs in real-time. In the experiment, we built a dynamic bipartite graph based on 300000 NPIs from the NPInterv4.0 database. The experimental results indicate that our model achieved excellent performance in multiple experiments. The code for the model is available at https://github.com/taowang11/ML-NPI, and the data can be downloaded freely at http://bigdata.ibp.ac.cn/npinter4.


Subject(s)
RNA, Untranslated , Research Personnel , Humans , Databases, Factual , RNA, Untranslated/genetics
14.
Methods ; 221: 73-81, 2024 01.
Article in English | MEDLINE | ID: mdl-38123109

ABSTRACT

Research indicates that miRNAs present in herbal medicines are crucial for identifying disease markers, advancing gene therapy, facilitating drug delivery, and so on. These miRNAs maintain stability in the extracellular environment, making them viable tools for disease diagnosis. They can withstand the digestive processes in the gastrointestinal tract, positioning them as potential carriers for specific oral drug delivery. By engineering plants to generate effective, non-toxic miRNA interference sequences, it's possible to broaden their applicability, including the treatment of diseases such as hepatitis C. Consequently, delving into the miRNA-disease associations (MDAs) within herbal medicines holds immense promise for diagnosing and addressing miRNA-related diseases. In our research, we propose the SGAE-MDA model, which harnesses the strengths of a graph autoencoder (GAE) combined with a semi-supervised approach to uncover potential MDAs in herbal medicines more effectively. Leveraging the GAE framework, the SGAE-MDA model exactly integrates the inherent feature vectors of miRNAs and disease nodes with the regulatory data in the miRNA-disease network. Additionally, the proposed semi-supervised learning approach randomly hides the partial structure of the miRNA-disease network, subsequently reconstructing them within the GAE framework. This technique effectively minimizes network noise interference. Through comparison against other leading deep learning models, the results consistently highlighted the superior performance of the proposed SGAE-MDA model. Our code and dataset can be available at: https://github.com/22n9n23/SGAE-MDA.


Subject(s)
MicroRNAs , MicroRNAs/genetics , Algorithms , Computational Biology/methods , Supervised Machine Learning , Plant Extracts
15.
IEEE J Biomed Health Inform ; 28(3): 1564-1574, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38153823

ABSTRACT

The prediction of molecular properties remains a challenging task in the field of drug design and development. Recently, there has been a growing interest in the analysis of biological images. Molecular images, as a novel representation, have proven to be competitive, yet they lack explicit information and detailed semantic richness. Conversely, semantic information in SMILES sequences is explicit but lacks spatial structural details. Therefore, in this study, we focus on and explore the relationship between these two types of representations, proposing a novel multimodal architecture named ISMol. ISMol relies on a cross-attention mechanism to extract information representations of molecules from both images and SMILES strings, thereby predicting molecular properties. Evaluation results on 14 small molecule ADMET datasets indicate that ISMol outperforms machine learning (ML) and deep learning (DL) models based on single-modal representations. In addition, we analyze our method through a large number of experiments to test the superiority, interpretability and generalizability of the method. In summary, ISMol offers a powerful deep learning toolbox for drug discovery in a variety of molecular properties.


Subject(s)
Drug Design , Drug Discovery , Humans , Machine Learning , Semantics
16.
BMC Genomics ; 24(1): 742, 2023 Dec 05.
Article in English | MEDLINE | ID: mdl-38053026

ABSTRACT

BACKGROUND: DNA methylation, instrumental in numerous life processes, underscores the paramount importance of its accurate prediction. Recent studies suggest that deep learning, due to its capacity to extract profound insights, provides a more precise DNA methylation prediction. However, issues related to the stability and generalization performance of these models persist. RESULTS: In this study, we introduce an efficient and stable DNA methylation prediction model. This model incorporates a feature fusion approach, adaptive feature correction technology, and a contrastive learning strategy. The proposed model presents several advantages. First, DNA sequences are encoded at four levels to comprehensively capture intricate information across multi-scale and low-span features. Second, we design a sequence-specific feature correction module that adaptively adjusts the weights of sequence features. This improvement enhances the model's stability and scalability, or its generality. Third, our contrastive learning strategy mitigates the instability issues resulting from sparse data. To validate our model, we conducted multiple sets of experiments on commonly used datasets, demonstrating the model's robustness and stability. Simultaneously, we amalgamate various datasets into a single, unified dataset. The experimental outcomes from this combined dataset substantiate the model's robust adaptability. CONCLUSIONS: Our research findings affirm that the StableDNAm model is a general, stable, and effective instrument for DNA methylation prediction. It holds substantial promise for providing invaluable assistance in future methylation-related research and analyses.


Subject(s)
DNA Methylation , Protein Processing, Post-Translational
17.
Math Biosci Eng ; 20(12): 20648-20667, 2023 Nov 15.
Article in English | MEDLINE | ID: mdl-38124569

ABSTRACT

The prediction of long non-coding RNA (lncRNA) subcellular localization is essential to the understanding of its function and involvement in cellular regulation. Traditional biological experimental methods are costly and time-consuming, making computational methods the preferred approach for predicting lncRNA subcellular localization (LSL). However, existing computational methods have limitations due to the structural characteristics of lncRNAs and the uneven distribution of data across subcellular compartments. We propose a discrete wavelet transform (DWT)-based model for predicting LSL, called DlncRNALoc. We construct a physicochemical property matrix of a 2-tuple bases based on lncRNA sequences, and we introduce a DWT lncRNA feature extraction method. We use the Synthetic Minority Over-sampling Technique (SMOTE) for oversampling and the local fisher discriminant analysis (LFDA) algorithm to optimize feature information. The optimized feature vectors are fed into support vector machine (SVM) to construct a predictive model. DlncRNALoc has been applied for a five-fold cross-validation on the three sets of benchmark datasets. Extensive experiments have demonstrated the superiority and effectiveness of the DlncRNALoc model in predicting LSL.


Subject(s)
RNA, Long Noncoding , RNA, Long Noncoding/genetics , Wavelet Analysis , Algorithms , Support Vector Machine , Computational Biology/methods
18.
Brief Bioinform ; 24(4)2023 07 20.
Article in English | MEDLINE | ID: mdl-37427977

ABSTRACT

Studies have shown that the mechanism of action of many drugs is related to miRNA. In-depth research on the relationship between miRNA and drugs can provide theoretical foundations and practical approaches for various areas, such as drug target discovery, drug repositioning and biomarker research. Traditional biological experiments to test miRNA-drug susceptibility are costly and time-consuming. Thus, sequence- or topology-based deep learning methods are recognized in this field for their efficiency and accuracy. However, these methods have limitations in dealing with sparse topologies and higher-order information of miRNA (drug) feature. In this work, we propose GCFMCL, a model for multi-view contrastive learning based on graph collaborative filtering. To the best of our knowledge, this is the first attempt that incorporates contrastive learning strategy into the graph collaborative filtering framework to predict the sensitivity relationships between miRNA and drug. The proposed multi-view contrastive learning method is divided into topological contrastive objective and feature contrastive objective: (1) For the homogeneous neighbors of the topological graph, we propose a novel topological contrastive learning method via constructing the contrastive target through the topological neighborhood information of nodes. (2) The proposed model obtains feature contrastive targets from high-order feature information according to the correlation of node features, and mines potential neighborhood relationships in the feature space. The proposed multi-view comparative learning effectively alleviates the impact of heterogeneous node noise and graph data sparsity in graph collaborative filtering, and significantly enhances the performance of the model. Our study employs a dataset derived from the NoncoRNA and ncDR databases, encompassing 2049 experimentally validated miRNA-drug sensitivity associations. Five-fold cross-validation shows that the Area Under the Curve (AUC), Area Under the Precision-Recall Curve (AUPR) and F1-score (F1) of GCFMCL reach 95.28%, 95.66% and 89.77%, which outperforms the state-of-the-art (SOTA) method by the margin of 2.73%, 3.42% and 4.96%, respectively. Our code and data can be accessed at https://github.com/kkkayle/GCFMCL.


Subject(s)
Drug Delivery Systems , MicroRNAs , Area Under Curve , Databases, Factual , Drug Discovery , MicroRNAs/genetics
19.
Brief Bioinform ; 24(4)2023 07 20.
Article in English | MEDLINE | ID: mdl-37328701

ABSTRACT

Circular RNA (circRNA) is closely associated with human diseases. Accordingly, identifying the associations between human diseases and circRNA can help in disease prevention, diagnosis and treatment. Traditional methods are time consuming and laborious. Meanwhile, computational models can effectively predict potential circRNA-disease associations (CDAs), but are restricted by limited data, resulting in data with high dimension and imbalance. In this study, we propose a model based on automatically selected meta-path and contrastive learning, called the MPCLCDA model. First, the model constructs a new heterogeneous network based on circRNA similarity, disease similarity and known association, via automatically selected meta-path and obtains the low-dimensional fusion features of nodes via graph convolutional networks. Then, contrastive learning is used to optimize the fusion features further, and obtain the node features that make the distinction between positive and negative samples more evident. Finally, circRNA-disease scores are predicted through a multilayer perceptron. The proposed method is compared with advanced methods on four datasets. The average area under the receiver operating characteristic curve, area under the precision-recall curve and F1 score under 5-fold cross-validation reached 0.9752, 0.9831 and 0.9745, respectively. Simultaneously, case studies on human diseases further prove the predictive ability and application value of this method.


Subject(s)
Neural Networks, Computer , RNA, Circular , Humans , RNA, Circular/genetics , ROC Curve , Computational Biology/methods , Algorithms
20.
Comput Biol Med ; 163: 107143, 2023 09.
Article in English | MEDLINE | ID: mdl-37339574

ABSTRACT

Non-coding RNA (ncRNA) is a functional RNA molecule that plays a key role in various fundamental biological processes, such as gene regulation. Therefore, studying the connection between ncRNA and proteins holds significant importance in exploring the function of ncRNA. Although many efficient and accurate methods have been developed by modern biological scientists, accurate predictions still pose a major challenge for various issues. In our approach, we utilize a multi-head attention mechanism to merge residual connections, allowing for the automatic learning of ncRNA and protein sequence features. Specifically, the proposed method projects node features into multiple spaces based on multi-head attention mechanism, thereby obtaining different feature interaction patterns in these spaces. By stacking interaction layers, higher-order interaction modes can be derived, while still preserving the initial feature information through the residual connection. This strategy effectively leverages the sequence information of ncRNA and protein, enabling the capture of hidden high-order features. The final experimental results demonstrate the effectiveness of our method, with AUC values of 97.4%, 98.5%, and 94.8% achieved on the NPInter v2.0, RPI807, and RPI488 datasets, respectively. These impressive results solidify our method as a powerful tool for exploring the connection between ncRNAs and proteins. We have uploaded the implementation code on GitHub: https://github.com/ZZCrazy00/MHAM-NPI.


Subject(s)
Proteins , RNA, Untranslated , RNA, Untranslated/genetics , RNA, Untranslated/metabolism , Proteins/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...