Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 107
Filtrar
1.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36631407

RESUMEN

Recently, peptide-based drugs have gained unprecedented interest in discovering and developing antifungal drugs due to their high efficacy, broad-spectrum activity, low toxicity and few side effects. However, it is time-consuming and expensive to identify antifungal peptides (AFPs) experimentally. Therefore, computational methods for accurately predicting AFPs are highly required. In this work, we develop AFP-MFL, a novel deep learning model that predicts AFPs only relying on peptide sequences without using any structural information. AFP-MFL first constructs comprehensive feature profiles of AFPs, including contextual semantic information derived from a pre-trained protein language model, evolutionary information, and physicochemical properties. Subsequently, the co-attention mechanism is utilized to integrate contextual semantic information with evolutionary information and physicochemical properties separately. Extensive experiments show that AFP-MFL outperforms state-of-the-art models on four independent test datasets. Furthermore, the SHAP method is employed to explore each feature contribution to the AFPs prediction. Finally, a user-friendly web server of the proposed AFP-MFL is developed and freely accessible at http://inner.wei-group.net/AFPMFL/, which can be considered as a powerful tool for the rapid screening and identification of novel AFPs.


Asunto(s)
Antifúngicos , alfa-Fetoproteínas , Antifúngicos/farmacología , Algoritmos , Péptidos/química , Biología Computacional/métodos
2.
Brief Bioinform ; 24(6)2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37861173

RESUMEN

NcRNA-encoded small peptides (ncPEPs) have recently emerged as promising targets and biomarkers for cancer immunotherapy. Therefore, identifying cancer-associated ncPEPs is crucial for cancer research. In this work, we propose CoraL, a novel supervised contrastive meta-learning framework for predicting cancer-associated ncPEPs. Specifically, the proposed meta-learning strategy enables our model to learn meta-knowledge from different types of peptides and train a promising predictive model even with few labeled samples. The results show that our model is capable of making high-confidence predictions on unseen cancer biomarkers with only five samples, potentially accelerating the discovery of novel cancer biomarkers for immunotherapy. Moreover, our approach remarkably outperforms existing deep learning models on 15 cancer-associated ncPEPs datasets, demonstrating its effectiveness and robustness. Interestingly, our model exhibits outstanding performance when extended for the identification of short open reading frames derived from ncPEPs, demonstrating the strong prediction ability of CoraL at the transcriptome level. Importantly, our feature interpretation analysis discovers unique sequential patterns as the fingerprint for each cancer-associated ncPEPs, revealing the relationship among certain cancer biomarkers that are validated by relevant literature and motif comparison. Overall, we expect CoraL to be a useful tool to decipher the pathogenesis of cancer and provide valuable information for cancer research. The dataset and source code of our proposed method can be found at https://github.com/Johnsunnn/CoraL.


Asunto(s)
Antozoos , Neoplasias , Animales , Antozoos/genética , Neoplasias/genética , Biomarcadores de Tumor/genética , Inmunoterapia , Péptidos/genética , ARN no Traducido
3.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36562719

RESUMEN

BACKGROUND: Cell-penetrating peptides (CPPs) have received considerable attention as a means of transporting pharmacologically active molecules into living cells without damaging the cell membrane, and thus hold great promise as future therapeutics. Recently, several machine learning-based algorithms have been proposed for predicting CPPs. However, most existing predictive methods do not consider the agreement (disagreement) between similar (dissimilar) CPPs and depend heavily on expert knowledge-based handcrafted features. RESULTS: In this study, we present SiameseCPP, a novel deep learning framework for automated CPPs prediction. SiameseCPP learns discriminative representations of CPPs based on a well-pretrained model and a Siamese neural network consisting of a transformer and gated recurrent units. Contrastive learning is used for the first time to build a CPP predictive model. Comprehensive experiments demonstrate that our proposed SiameseCPP is superior to existing baseline models for predicting CPPs. Moreover, SiameseCPP also achieves good performance on other functional peptide datasets, exhibiting satisfactory generalization ability.


Asunto(s)
Péptidos de Penetración Celular , Péptidos de Penetración Celular/metabolismo , Algoritmos , Transporte Biológico , Redes Neurales de la Computación , Aprendizaje Automático
4.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37225420

RESUMEN

Enzymatic reactions are crucial to explore the mechanistic function of metabolites and proteins in cellular processes and to understand the etiology of diseases. The increasing number of interconnected metabolic reactions allows the development of in silico deep learning-based methods to discover new enzymatic reaction links between metabolites and proteins to further expand the landscape of existing metabolite-protein interactome. Computational approaches to predict the enzymatic reaction link by metabolite-protein interaction (MPI) prediction are still very limited. In this study, we developed a Variational Graph Autoencoders (VGAE)-based framework to predict MPI in genome-scale heterogeneous enzymatic reaction networks across ten organisms. By incorporating molecular features of metabolites and proteins as well as neighboring information in the MPI networks, our MPI-VGAE predictor achieved the best predictive performance compared to other machine learning methods. Moreover, when applying the MPI-VGAE framework to reconstruct hundreds of metabolic pathways, functional enzymatic reaction networks and a metabolite-metabolite interaction network, our method showed the most robust performance among all scenarios. To the best of our knowledge, this is the first MPI predictor by VGAE for enzymatic reaction link prediction. Furthermore, we implemented the MPI-VGAE framework to reconstruct the disease-specific MPI network based on the disrupted metabolites and proteins in Alzheimer's disease and colorectal cancer, respectively. A substantial number of novel enzymatic reaction links were identified. We further validated and explored the interactions of these enzymatic reactions using molecular docking. These results highlight the potential of the MPI-VGAE framework for the discovery of novel disease-related enzymatic reactions and facilitate the study of the disrupted metabolisms in diseases.


Asunto(s)
Aprendizaje Automático , Redes y Vías Metabólicas , Simulación del Acoplamiento Molecular , Fenómenos Fisiológicos Celulares
5.
Bioinformatics ; 40(2)2024 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-38305428

RESUMEN

MOTIVATION: 5-Methylcytosine (5mC), a fundamental element of DNA methylation in eukaryotes, plays a vital role in gene expression regulation, embryonic development, and other biological processes. Although several computational methods have been proposed for detecting the base modifications in DNA like 5mC sites from Nanopore sequencing data, they face challenges including sensitivity to noise, and ignoring the imbalanced distribution of methylation sites in real-world scenarios. RESULTS: Here, we develop NanoCon, a deep hybrid network coupled with contrastive learning strategy to detect 5mC methylation sites from Nanopore reads. In particular, we adopted a contrastive learning module to alleviate the issues caused by imbalanced data distribution in nanopore sequencing, offering a more accurate and robust detection of 5mC sites. Evaluation results demonstrate that NanoCon outperforms existing methods, highlighting its potential as a valuable tool in genomic sequencing and methylation prediction. In addition, we also verified the effectiveness of our representation learning ability on two datasets by visualizing the dimension reduction of the features of methylation and nonmethylation sites from our NanoCon. Furthermore, cross-species and cross-5mC methylation motifs experiments indicated the robustness and the ability to perform transfer learning of our model. We hope this work can contribute to the community by providing a powerful and reliable solution for 5mC site detection in genomic studies. AVAILABILITY AND IMPLEMENTATION: The project code is available at https://github.com/Challis-yin/NanoCon.


Asunto(s)
Nanoporos , Metilación de ADN , Genómica , Genoma , ADN
6.
Bioinformatics ; 40(2)2024 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-38305458

RESUMEN

MOTIVATION: Diabetes is a chronic metabolic disorder that has been a major cause of blindness, kidney failure, heart attacks, stroke, and lower limb amputation across the world. To alleviate the impact of diabetes, researchers have developed the next generation of anti-diabetic drugs, known as dipeptidyl peptidase IV inhibitory peptides (DPP-IV-IPs). However, the discovery of these promising drugs has been restricted due to the lack of effective peptide-mining tools. RESULTS: Here, we presented StructuralDPPIV, a deep learning model designed for DPP-IV-IP identification, which takes advantage of both molecular graph features in amino acid and sequence information. Experimental results on the independent test dataset and two wet experiment datasets show that our model outperforms the other state-of-art methods. Moreover, to better study what StructuralDPPIV learns, we used CAM technology and perturbation experiment to analyze our model, which yielded interpretable insights into the reasoning behind prediction results. AVAILABILITY AND IMPLEMENTATION: The project code is available at https://github.com/WeiLab-BioChem/Structural-DPP-IV.


Asunto(s)
Aprendizaje Profundo , Diabetes Mellitus , Humanos , Dipeptidil Peptidasa 4 , Aminoácidos , Péptidos
7.
Nucleic Acids Res ; 51(7): 3017-3029, 2023 04 24.
Artículo en Inglés | MEDLINE | ID: mdl-36796796

RESUMEN

Here, we present DeepBIO, the first-of-its-kind automated and interpretable deep-learning platform for high-throughput biological sequence functional analysis. DeepBIO is a one-stop-shop web service that enables researchers to develop new deep-learning architectures to answer any biological question. Specifically, given any biological sequence data, DeepBIO supports a total of 42 state-of-the-art deep-learning algorithms for model training, comparison, optimization and evaluation in a fully automated pipeline. DeepBIO provides a comprehensive result visualization analysis for predictive models covering several aspects, such as model interpretability, feature analysis and functional sequential region discovery. Additionally, DeepBIO supports nine base-level functional annotation tasks using deep-learning architectures, with comprehensive interpretations and graphical visualizations to validate the reliability of annotated sites. Empowered by high-performance computers, DeepBIO allows ultra-fast prediction with up to million-scale sequence data in a few hours, demonstrating its usability in real application scenarios. Case study results show that DeepBIO provides an accurate, robust and interpretable prediction, demonstrating the power of deep learning in biological sequence functional analysis. Overall, we expect DeepBIO to ensure the reproducibility of deep-learning biological sequence analysis, lessen the programming and hardware burden for biologists and provide meaningful functional insights at both the sequence level and base level from biological sequences alone. DeepBIO is publicly available at https://inner.wei-group.net/DeepBIO.


The development of next-generation sequencing techniques has led to an exponential increase in the amount of biological sequence data accessible. It naturally poses a fundamental challenge­how to build the relationships from such large-scale sequences to their functions. In this work, we present DeepBIO, the first-of-its-kind automated and interpretable deep-learning platform for high-throughput biological sequence functional analysis. It enables researchers to develop new deep-learning architectures to answer any biological question in a fully automated pipeline. We expect DeepBIO to ensure the reproducibility of deep-learning-based biological sequence analysis, lessen the programming and hardware burden for biologists and provide meaningful functional insights at both the sequence level and base level from biological sequences alone.


Asunto(s)
Aprendizaje Profundo , Reproducibilidad de los Resultados , Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento
8.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34882198

RESUMEN

Metastasis is a major cause of cancer morbidity and mortality, and most cancer deaths are caused by cancer metastasis rather than by the primary tumor. The prediction of metastasis based on computational methods has not been explored much in the previous research. In this study, we proposed a graph convolutional network embedded with a graph learning (GL) module, named glmGCN, to predict the distant metastasis of cancer. Both the mRNA and lncRNA expressions were used to provide more genetic information than using the mRNA alone and we used them to construct gene interaction graph representation to consider the effect of genetic interaction. Then, the prediction of the cancer metastasis was performed under a GCN framework, which extracted informative and advanced features from the built non-regular graph structures. Particularly, a GL module was embedded in the proposed glmGCN to learn an optimal graph representation of the gene interaction. We firstly constructed the protein-protein interaction network to represent the initial gene(node) relationship graph. Then, through the GL module, a new graph representation was built which optimally learned the gene interaction strength. Finally, the GCN was adopted to identify the distant metastasis cases. It is worth mentioning that the proposed method pays more attentions on the gene-gene relation than the previous GCN-based method, so more accurate prediction performance can be obtained. The glmGCN was trained based on two types of cancer and was further validated using two other cancer types. A series of experiments have shown that the effectiveness of the proposed method. The implementation for the proposed method is available at https://github.com/RanSuLab/Metastasis-glmGCN.


Asunto(s)
Neoplasias , ARN Largo no Codificante , Humanos , Aprendizaje Automático , Neoplasias/genética , Redes Neurales de la Computación , ARN Largo no Codificante/genética
9.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35181793

RESUMEN

Chromosome is composed of many distinct chromatin domains, referred to variably as topological domains or topologically associating domains (TADs). The domains are stable across different cell types and highly conserved across species, thus these chromatin domains have been considered as the basic units of chromosome folding and regarded as an important secondary structure in chromosome organization. However, the identification of TAD boundaries is still a great challenge due to the high cost and low resolution of Hi-C data or experiments. In this study, we propose a novel ensemble learning framework, termed as StackTADB, for predicting the boundaries of TADs. StackTADB integrates four base classifiers including Random Forest, Logistic Regression, K-NearestNeighbor and Support Vector Machine. From the analysis of a series of examinations on the data set in the previous study, it is concluded that StackTADB has optimal performance in six metrics, AUC, Accuracy, MCC, Precision, Recall and F1 score, and it is superior to the existing methods. In addition, the comparison of the performance of multiple features shows that Kmers-based features play an essential role in predicting TADs boundaries of fruit flies, and we also apply the SHapley Additive exPlanations (SHAP) framework to interpret the predictions of StackTADB to identify the reason why Kmers-based features are vital. The experimental results show that the subsequences matching the BEAF-32 motif play a crucial role in predicting the boundaries of TADs. The source code is freely available at https://github.com/HaoWuLab-Bioinformatics/StackTADB and the webserver of StackTADB is freely available at http://hwtad.sdu.edu.cn:8002/StackTADB.


Asunto(s)
Cromatina , Proteínas de Drosophila , Animales , Cromosomas , Proteínas de Unión al ADN/genética , Drosophila/genética , Proteínas de Drosophila/genética , Proteínas del Ojo/genética , Aprendizaje Automático , Programas Informáticos
10.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35043144

RESUMEN

Predicting the response of cancer patients to a particular treatment is a major goal of modern oncology and an important step toward personalized treatment. In the practical clinics, the clinicians prefer to obtain the most-suited drugs for a particular patient instead of knowing the exact values of drug sensitivity. Instead of predicting the exact value of drug response, we proposed a deep learning-based method, named Siamese Response Deep Factorization Machines (SRDFM) Network, for personalized anti-cancer drug recommendation, which directly ranks the drugs and provides the most effective drugs. A Siamese network (SN), a type of deep learning network that is composed of identical subnetworks that share the same architecture, parameters and weights, was used to measure the relative position (RP) between drugs for each cell line. Through minimizing the difference between the real RP and the predicted RP, an optimal SN model was established to provide the rank for all the candidate drugs. Specifically, the subnetwork in each side of the SN consists of a feature generation level and a predictor construction level. On the feature generation level, both drug property and gene expression, were adopted to build a concatenated feature vector, which even enables the recommendation for newly designed drugs with only chemical property known. Particularly, we developed a response unit here to generate weighted genetic feature vector to simulate the biological interaction mechanism between a specific drug and the genes. For the predictor construction level, we built this level integrating a factorization machine (FM) component with a deep neural network component. The FM can well handle the discrete chemical information and both low-order and high-order feature interactions could be sufficiently learned. Impressively, the SRDFM works well on both single-drug recommendation and synergic drug combination. Experiment result on both single-drug and synergetic drug data sets have shown the efficiency of the SRDFM. The Python implementation for the proposed SRDFM is available at at https://github.com/RanSuLab/SRDFM Contact: ran.su@tju.edu.cn, gbx@mju.edu.cn and weileyi@sdu.edu.cn.


Asunto(s)
Antineoplásicos , Neoplasias , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Humanos , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Redes Neurales de la Computación
11.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34882225

RESUMEN

Recently, machine learning methods have been developed to identify various peptide bio-activities. However, due to the lack of experimentally validated peptides, machine learning methods cannot provide a sufficiently trained model, easily resulting in poor generalizability. Furthermore, there is no generic computational framework to predict the bioactivities of different peptides. Thus, a natural question is whether we can use limited samples to build an effective predictive model for different kinds of peptides. To address this question, we propose Mutual Information Maximization Meta-Learning (MIMML), a novel meta-learning-based predictive model for bioactive peptide discovery. Using few samples from various functional peptides, MIMML can sufficiently learn the discriminative information amongst various functions and characterize functional differences. Experimental results show excellent performance of MIMML though using far fewer training samples as compared to the state-of-the-art methods. We also decipher the latent relationships among different kinds of functions to understand what meta-model learned to improve a specific task. In summary, this study is a pioneering work in the field of functional peptide mining and provides the first-of-its-kind solution for few-sample learning problems in biological sequence analysis, accelerating the new functional peptide discovery. The source codes and datasets are available on https://github.com/TearsWaiting/MIMML.


Asunto(s)
Aprendizaje Automático , Péptidos , Péptidos/química , Programas Informáticos
12.
Bioinformatics ; 39(3)2023 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-36897030

RESUMEN

MOTIVATION: Plant Small Secreted Peptides (SSPs) play an important role in plant growth, development, and plant-microbe interactions. Therefore, the identification of SSPs is essential for revealing the functional mechanisms. Over the last few decades, machine learning-based methods have been developed, accelerating the discovery of SSPs to some extent. However, existing methods highly depend on handcrafted feature engineering, which easily ignores the latent feature representations and impacts the predictive performance. RESULTS: Here, we propose ExamPle, a novel deep learning model using Siamese network and multi-view representation for the explainable prediction of the plant SSPs. Benchmarking comparison results show that our ExamPle performs significantly better than existing methods in the prediction of plant SSPs. Also, our model shows excellent feature extraction ability. Importantly, by utilizing in silicomutagenesis experiment, ExamPle can discover sequential characteristics and identify the contribution of each amino acid for the predictions. The key novel principle learned by our model is that the head region of the peptide and some specific sequential patterns are strongly associated with the SSPs' functions. Thus, ExamPle is expected to be a useful tool for predicting plant SSPs and designing effective plant SSPs. AVAILABILITY AND IMPLEMENTATION: Our codes and datasets are available at https://github.com/Johnsunnn/ExamPle.


Asunto(s)
Aprendizaje Profundo , Péptidos , Aprendizaje Automático , Aminoácidos , Benchmarking
13.
Bioinformatics ; 39(12)2023 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-38015872

RESUMEN

MOTIVATION: Identifying the functional sites of a protein, such as the binding sites of proteins, peptides, or other biological components, is crucial for understanding related biological processes and drug design. However, existing sequence-based methods have limited predictive accuracy, as they only consider sequence-adjacent contextual features and lack structural information. RESULTS: In this study, DeepProSite is presented as a new framework for identifying protein binding site that utilizes protein structure and sequence information. DeepProSite first generates protein structures from ESMFold and sequence representations from pretrained language models. It then uses Graph Transformer and formulates binding site predictions as graph node classifications. In predicting protein-protein/peptide binding sites, DeepProSite outperforms state-of-the-art sequence- and structure-based methods on most metrics. Moreover, DeepProSite maintains its performance when predicting unbound structures, in contrast to competing structure-based prediction methods. DeepProSite is also extended to the prediction of binding sites for nucleic acids and other ligands, verifying its generalization capability. Finally, an online server for predicting multiple types of residue is established as the implementation of the proposed DeepProSite. AVAILABILITY AND IMPLEMENTATION: The datasets and source codes can be accessed at https://github.com/WeiLab-Biology/DeepProSite. The proposed DeepProSite can be accessed at https://inner.wei-group.net/DeepProSite/.


Asunto(s)
Péptidos , Proteínas , Unión Proteica , Proteínas/química , Sitios de Unión , Programas Informáticos
14.
PLoS Comput Biol ; 19(11): e1011597, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37956212

RESUMEN

The powerful combination of large-scale drug-related interaction networks and deep learning provides new opportunities for accelerating the process of drug discovery. However, chemical structures that play an important role in drug properties and high-order relations that involve a greater number of nodes are not tackled in current biomedical networks. In this study, we present a general hypergraph learning framework, which introduces Drug-Substructures relationship into Molecular interaction Networks to construct the micro-to-macro drug centric heterogeneous network (DSMN), and develop a multi-branches HyperGraph learning model, called HGDrug, for Drug multi-task predictions. HGDrug achieves highly accurate and robust predictions on 4 benchmark tasks (drug-drug, drug-target, drug-disease, and drug-side-effect interactions), outperforming 8 state-of-the-art task specific models and 6 general-purpose conventional models. Experiments analysis verifies the effectiveness and rationality of the HGDrug model architecture as well as the multi-branches setup, and demonstrates that HGDrug is able to capture the relations between drugs associated with the same functional groups. In addition, our proposed drug-substructure interaction networks can help improve the performance of existing network models for drug-related prediction tasks.


Asunto(s)
Algoritmos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Benchmarking , Sistemas de Liberación de Medicamentos , Descubrimiento de Drogas
15.
J Chem Inf Model ; 64(7): 2854-2862, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-37565997

RESUMEN

Identifying synergistic drug combinations is fundamentally important to treat a variety of complex diseases while avoiding severe adverse drug-drug interactions. Although several computational methods have been proposed, they highly rely on handcrafted feature engineering and cannot learn better interactive information between drug pairs, easily resulting in relatively low performance. Recently, deep-learning methods, especially graph neural networks, have been widely developed in this area and demonstrated their ability to address complex biological problems. In this study, we proposed AttenSyn, an attention-based deep graph neural network for accurately predicting synergistic drug combinations. In particular, we adopted a graph neural network module to extract high-latent features based on the molecular graphs only and exploited the attention-based pooling module to learn interactive information between drug pairs to strengthen the representations of drug pairs. Comparative results on the benchmark datasets demonstrated that our AttenSyn performs better than the state-of-the-art methods in the prediction of anticancer synergistic drug combinations. Additionally, to provide good interpretability of our model, we explored and visualized some crucial substructures in drugs through attention mechanisms. Furthermore, we also verified the effectiveness of our proposed AttenSyn on two cell lines by visualizing the features of drug combinations learnt from our model, exhibiting satisfactory generalization ability.


Asunto(s)
Benchmarking , Aprendizaje , Línea Celular , Redes Neurales de la Computación
16.
J Chem Inf Model ; 64(3): 1050-1065, 2024 Feb 12.
Artículo en Inglés | MEDLINE | ID: mdl-38301174

RESUMEN

Protein-molecule interactions play a crucial role in various biological functions, with their accurate prediction being pivotal for drug discovery and design processes. Traditional methods for predicting protein-molecule interactions are limited. Some can only predict interactions with a specific molecule, restricting their applicability, while others target multiple molecule types but fail to efficiently process diverse interaction information, leading to complexity and inefficiency. This study presents a novel deep learning model, MucLiPred, equipped with a dual contrastive learning mechanism aimed at improving the prediction of multiple molecule-protein interactions and the identification of potential molecule-binding residues. The residue-level paradigm focuses on differentiating binding from non-binding residues, illuminating detailed local interactions. The type-level paradigm, meanwhile, analyzes overarching contexts of molecule types, like DNA or RNA, ensuring that representations of identical molecule types gravitate closer in the representational space, bolstering the model's proficiency in discerning interaction motifs. This dual approach enables comprehensive multi-molecule predictions, elucidating the relationships among different molecule types and strengthening precise protein-molecule interaction predictions. Empirical evidence demonstrates MucLiPred's superiority over existing models in robustness and prediction accuracy. The integration of dual contrastive learning techniques amplifies its capability to detect potential molecule-binding residues with precision. Further optimization, separating representational and classification tasks, has markedly improved its performance. MucLiPred thus represents a significant advancement in protein-molecule interaction prediction, setting a new precedent for future research in this field.


Asunto(s)
Ácidos Nucleicos , Proteínas , Proteínas/química
17.
J Chem Inf Model ; 64(1): 316-326, 2024 Jan 08.
Artículo en Inglés | MEDLINE | ID: mdl-38135439

RESUMEN

Antimicrobial peptides are peptides that are effective against bacteria and viruses, and the discovery of new antimicrobial peptides is of great importance to human life and health. Although the design of antimicrobial peptides using machine learning methods has achieved good results in recent years, it remains a challenge to learn and design novel antimicrobial peptides with multiple properties of interest from peptide data with certain property labels. To this end, we propose Multi-CGAN, a deep generative model-based architecture that can learn from single-attribute peptide data and generate antimicrobial peptide sequences with multiple attributes that we need, which may have a potentially wide range of uses in drug discovery. In particular, we verified that our Multi-CGAN generated peptides with the desired properties have good performance in terms of generation rate. Moreover, a comprehensive statistical analysis demonstrated that our generated peptides are diverse and have a low probability of being homologous to the training data. Interestingly, we found that the performance of many popular deep learning methods on the antimicrobial peptide prediction task can be improved by using Multi-CGAN to expand the data on the training set of the original task, indicating the high quality of our generated peptides and the robust ability of our method. In addition, we also investigated whether it is possible to directionally generate peptide sequences with specified properties by controlling the input noise sampling for our model.


Asunto(s)
Péptidos Antimicrobianos , Péptidos , Humanos , Péptidos/farmacología , Péptidos/química , Aprendizaje Automático , Descubrimiento de Drogas
18.
J Chem Inf Model ; 64(7): 2807-2816, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-37252890

RESUMEN

Anticancer peptides (ACPs) recently have been receiving increasing attention in cancer therapy due to their low consumption, few adverse side effects, and easy accessibility. However, it remains a great challenge to identify anticancer peptides via experimental approaches, requiring expensive and time-consuming experimental studies. In addition, traditional machine-learning-based methods are proposed for ACP prediction mainly depending on hand-crafted feature engineering, which normally achieves low prediction performance. In this study, we propose CACPP (Contrastive ACP Predictor), a deep learning framework based on the convolutional neural network (CNN) and contrastive learning for accurately predicting anticancer peptides. In particular, we introduce the TextCNN model to extract the high-latent features based on the peptide sequences only and exploit the contrastive learning module to learn more distinguishable feature representations to make better predictions. Comparative results on the benchmark data sets indicate that CACPP outperforms all the state-of-the-art methods in the prediction of anticancer peptides. Moreover, to intuitively show that our model has good classification ability, we visualize the dimension reduction of the features from our model and explore the relationship between ACP sequences and anticancer functions. Furthermore, we also discuss the influence of data set construction on model prediction and explore our model performance on the data sets with verified negative samples.


Asunto(s)
Benchmarking , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Aprendizaje Automático , Redes Neurales de la Computación , Péptidos/farmacología
19.
J Chem Inf Model ; 64(7): 2174-2194, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-37934070

RESUMEN

The discovery of new drugs has important implications for human health. Traditional methods for drug discovery rely on experiments to optimize the structure of lead molecules, which are time-consuming and high-cost. Recently, artificial intelligence has exhibited promising and efficient performance for drug-like molecule generation. In particular, deep generative models achieve great success in de novo generation of drug-like molecules with desired properties, showing massive potential for novel drug discovery. In this study, we review the recent progress of molecule generation using deep generative models, mainly focusing on molecule representations, public databases, data processing tools, and advanced artificial intelligence based molecule generation frameworks. In particular, we present a comprehensive comparison of state-of-the-art deep generative models for molecule generation and a summary of commonly used molecular design strategies. We identify research gaps and challenges of molecule generation such as the need for better databases, missing 3D information in molecular representation, and the lack of high-precision evaluation metrics. We suggest future directions for molecular generation and drug discovery.


Asunto(s)
Inteligencia Artificial , Benchmarking , Humanos , Bases de Datos Factuales , Descubrimiento de Drogas , Diseño de Fármacos
20.
Nucleic Acids Res ; 50(9): 4877-4899, 2022 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-35524568

RESUMEN

With the advent of single-cell RNA sequencing (scRNA-seq), one major challenging is the so-called 'dropout' events that distort gene expression and remarkably influence downstream analysis in single-cell transcriptome. To address this issue, much effort has been done and several scRNA-seq imputation methods were developed with two categories: model-based and deep learning-based. However, comprehensively and systematically comparing existing methods are still lacking. In this work, we use six simulated and two real scRNA-seq datasets to comprehensively evaluate and compare a total of 12 available imputation methods from the following four aspects: (i) gene expression recovering, (ii) cell clustering, (iii) gene differential expression, and (iv) cellular trajectory reconstruction. We demonstrate that deep learning-based approaches generally exhibit better overall performance than model-based approaches under major benchmarking comparison, indicating the power of deep learning for imputation. Importantly, we built scIMC (single-cell Imputation Methods Comparison platform), the first online platform that integrates all available state-of-the-art imputation methods for benchmarking comparison and visualization analysis, which is expected to be a convenient and useful tool for researchers of interest. It is now freely accessible via https://server.wei-group.net/scIMC/.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Benchmarking , Análisis por Conglomerados , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA