Pesquisa | Portal Regional da BVS

1.

Machine Learning Enables Comprehensive Prediction of the Relative Protein Abundance of Multiple Proteins on the Protein Corona.

Fu, Xiuhao; Yang, Chao; Su, Yunyun; Liu, Chunling; Qiu, Haoye; Yu, Yanyan; Su, Gaoxing; Zhang, Qingchen; Wei, Leyi; Cui, Feifei; Zou, Quan; Zhang, Zilong.

Research (Wash D C) ; 7: 0487, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39324017

RESUMO

Understanding protein corona composition is essential for evaluating their potential applications in biomedicine. Relative protein abundance (RPA), accounting for the total proteins in the corona, is an important parameter for describing the protein corona. For the first time, we comprehensively predicted the RPA of multiple proteins on the protein corona. First, we used multiple machine learning algorithms to predict whether a protein adsorbs to a nanoparticle, which is dichotomous prediction. Then, we selected the top 3 performing machine learning algorithms in dichotomous prediction to predict the specific value of RPA, which is regression prediction. Meanwhile, we analyzed the advantages and disadvantages of different machine learning algorithms for RPA prediction through interpretable analysis. Finally, we mined important features about the RPA prediction, which provided effective suggestions for the preliminary design of protein corona. The service for the prediction of RPA is available at http://www.bioai-lab.com/PC_ML.

2.

mACPpred 2.0: Stacked Deep Learning for Anticancer Peptide Prediction with Integrated Spatial and Probabilistic Feature Representations.

Sangaraju, Vinoth Kumar; Pham, Nhat Truong; Wei, Leyi; Yu, Xue; Manavalan, Balachandran.

J Mol Biol ; 436(17): 168687, 2024 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-39237191

RESUMO

Anticancer peptides (ACPs), naturally occurring molecules with remarkable potential to target and kill cancer cells. However, identifying ACPs based solely from their primary amino acid sequences remains a major hurdle in immunoinformatics. In the past, several web-based machine learning (ML) tools have been proposed to assist researchers in identifying potential ACPs for further testing. Notably, our meta-approach method, mACPpred, introduced in 2019, has significantly advanced the field of ACP research. Given the exponential growth in the number of characterized ACPs, there is now a pressing need to create an updated version of mACPpred. To develop mACPpred 2.0, we constructed an up-to-date benchmarking dataset by integrating all publicly available ACP datasets. We employed a large-scale of feature descriptors, encompassing both conventional feature descriptors and advanced pre-trained natural language processing (NLP)-based embeddings. We evaluated their ability to discriminate between ACPs and non-ACPs using eleven different classifiers. Subsequently, we employed a stacked deep learning (SDL) approach, incorporating 1D convolutional neural network (1D CNN) blocks and hybrid features. These features included the top seven performing NLP-based features and 90 probabilistic features, allowing us to identify hidden patterns within these diverse features and improve the accuracy of our ACP prediction model. This is the first study to integrate spatial and probabilistic feature representations for predicting ACPs. Rigorous cross-validation and independent tests conclusively demonstrated that mACPpred 2.0 not only surpassed its predecessor (mACPpred) but also outperformed the existing state-of-the-art predictors, highlighting the importance of advanced feature representation capabilities attained through SDL. To facilitate widespread use and accessibility, we have developed a user-friendly for mACPpred 2.0, available at https://balalab-skku.org/mACPpred2/.

Assuntos

Antineoplásicos , Aprendizado Profundo , Peptídeos , Peptídeos/química , Humanos , Antineoplásicos/farmacologia , Biologia Computacional/métodos , Software , Sequência de Aminoácidos , Redes Neurais de Computação

3.

Advanced deep learning approaches enable high-throughput biological and biomedicine data analysis.

Wei, Leyi.

Methods ; 230: 116-118, 2024 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-39154807

Assuntos

Aprendizado Profundo , Humanos , Análise de Dados , Biologia Computacional/métodos , Ensaios de Triagem em Larga Escala/métodos , Pesquisa Biomédica/métodos , Pesquisa Biomédica/tendências

4.

Ginkgolide C attenuated Western diet-induced non-alcoholic fatty liver disease via increasing AMPK activation.

Xie, Yao; Wei, Leyi; Guo, Jiashi; Jiang, Qingsong; Xiang, Yang; Lin, Yan; Xie, Huang; Yin, Xinru; Gong, Xia; Wan, Jingyuan.

Inflammation ; 2024 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-38954260

RESUMO

BACKGROUND: Non-alcoholic steatohepatitis (NASH) is a metabolic dysregulation-related disorder that is generally characterized by lipid metabolism dysfunction and an excessive inflammatory response. Currently, there are no authorized pharmacological interventions specifically designed to manage NASH. It has been reported that Ginkgolide C exhibits anti-inflammatory effects and modulates lipid metabolism. However, the impact and function of Ginkgolide C in diet-induced NASH are unclear. METHODS: In this study, mice were induced by a Western Diet (WD) with different doses of Ginkgolide C with or without Compound C (adenosine 5 '-monophosphate (AMP)-activated protein kinase (AMPK) inhibitor). The effects of Ginkgolide C were evaluated by assessing liver damage, steatosis, fibrosis, and AMPK expression. RESULTS: The results showed that Ginkgolide C significantly alleviated liver damage, steatosis, and fibrosis in the WD-induced mice. In addition, Ginkgolide C markedly improved insulin resistance and attenuated hepatic inflammation. Importantly, Ginkgolide C exerted protective effects by activating the AMPK signaling pathway, which was reversed by AMPK inhibition. CONCLUSION: Ginkgolide C alleviated NASH induced by WD in mice, potentially via activating the AMPK signaling pathway.

5.

CELA-MFP: a contrast-enhanced and label-adaptive framework for multi-functional therapeutic peptides prediction.

Fang, Yitian; Luo, Mingshuang; Ren, Zhixiang; Wei, Leyi; Wei, Dong-Qing.

Brief Bioinform ; 25(4)2024 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-39038935

RESUMO

Functional peptides play crucial roles in various biological processes and hold significant potential in many fields such as drug discovery and biotechnology. Accurately predicting the functions of peptides is essential for understanding their diverse effects and designing peptide-based therapeutics. Here, we propose CELA-MFP, a deep learning framework that incorporates feature Contrastive Enhancement and Label Adaptation for predicting Multi-Functional therapeutic Peptides. CELA-MFP utilizes a protein language model (pLM) to extract features from peptide sequences, which are then fed into a Transformer decoder for function prediction, effectively modeling correlations between different functions. To enhance the representation of each peptide sequence, contrastive learning is employed during training. Experimental results demonstrate that CELA-MFP outperforms state-of-the-art methods on most evaluation metrics for two widely used datasets, MFBP and MFTP. The interpretability of CELA-MFP is demonstrated by visualizing attention patterns in pLM and Transformer decoder. Finally, a user-friendly online server for predicting multi-functional peptides is established as the implementation of the proposed CELA-MFP and can be freely accessed at http://dreamai.cmii.online/CELA-MFP.

Assuntos

Aprendizado Profundo , Peptídeos , Peptídeos/química , Biologia Computacional/métodos , Software , Humanos , Algoritmos , Bases de Dados de Proteínas

6.

Moss-m7G: A Motif-Based Interpretable Deep Learning Method for RNA N7-Methlguanosine Site Prediction.

Zhao, Yanxi; Jin, Junru; Gao, Wenjia; Qiao, Jianbo; Wei, Leyi.

J Chem Inf Model ; 64(15): 6230-6240, 2024 Aug 12.

Artigo em Inglês | MEDLINE | ID: mdl-39011571

RESUMO

N-7methylguanosine (m7G) modification plays a crucial role in various biological processes and is closely associated with the development and progression of many cancers. Accurate identification of m7G modification sites is essential for understanding their regulatory mechanisms and advancing cancer therapy. Previous studies often suffered from insufficient research data, underutilization of motif information, and lack of interpretability. In this work, we designed a novel motif-based interpretable method for m7G modification site prediction, called Moss-m7G. This approach enables the analysis of RNA sequences from a motif-centric perspective. Our proposed word-detection module and motif-embedding module within Moss-m7G extract motif information from sequences, transforming the raw sequences from base-level into motif-level and generating embeddings for these motif sequences. Compared with base sequences, motif sequences contain richer contextual information, which is further analyzed and integrated through the Transformer model. We constructed a comprehensive m7G data set to implement the training and testing process to address the data insufficiency noted in prior research. Our experimental results affirm the effectiveness and superiority of Moss-m7G in predicting m7G modification sites. Moreover, the introduction of the word-detection module enhances the interpretability of the model, providing insights into the predictive mechanisms.

Assuntos

Aprendizado Profundo , Guanosina , Motivos de Nucleotídeos , RNA , Guanosina/análogos & derivados , Guanosina/química , RNA/química

7.

HydrogelFinder: A Foundation Model for Efficient Self-Assembling Peptide Discovery Guided by Non-Peptidal Small Molecules.

Ren, Xuanbai; Wei, Jiaying; Luo, Xiaoli; Liu, Yuansheng; Li, Kenli; Zhang, Qiang; Gao, Xin; Yan, Sizhe; Wu, Xia; Jiang, Xingyue; Liu, Mingquan; Cao, Dongsheng; Wei, Leyi; Zeng, Xiangxiang; Shi, Junfeng.

Adv Sci (Weinh) ; 11(26): e2400829, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38704695

RESUMO

Self-assembling peptides have numerous applications in medicine, food chemistry, and nanotechnology. However, their discovery has traditionally been serendipitous rather than driven by rational design. Here, HydrogelFinder, a foundation model is developed for the rational design of self-assembling peptides from scratch. This model explores the self-assembly properties by molecular structure, leveraging 1,377 self-assembling non-peptidal small molecules to navigate chemical space and improve structural diversity. Utilizing HydrogelFinder, 111 peptide candidates are generated and synthesized 17 peptides, subsequently experimentally validating the self-assembly and biophysical characteristics of nine peptides ranging from 1-10 amino acids-all achieved within a 19-day workflow. Notably, the two de novo-designed self-assembling peptides demonstrated low cytotoxicity and biocompatibility, as confirmed by live/dead assays. This work highlights the capacity of HydrogelFinder to diversify the design of self-assembling peptides through non-peptidal small molecules, offering a powerful toolkit and paradigm for future peptide discovery endeavors.

Assuntos

Peptídeos , Peptídeos/química

8.

Editorial: Artificial intelligence in drug discovery and development.

Wei, Leyi; Zou, Quan; Zeng, Xiangxiang.

Methods ; 226: 133-137, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38582311

Assuntos

Inteligência Artificial , Descoberta de Drogas , Descoberta de Drogas/métodos , Humanos , Desenvolvimento de Medicamentos/métodos , Desenvolvimento de Medicamentos/tendências

9.

MucLiPred: Multi-Level Contrastive Learning for Predicting Nucleic Acid Binding Residues of Proteins.

Zhang, Jiashuo; Wang, Ruheng; Wei, Leyi.

J Chem Inf Model ; 64(3): 1050-1065, 2024 02 12.

Artigo em Inglês | MEDLINE | ID: mdl-38301174

RESUMO

Protein-molecule interactions play a crucial role in various biological functions, with their accurate prediction being pivotal for drug discovery and design processes. Traditional methods for predicting protein-molecule interactions are limited. Some can only predict interactions with a specific molecule, restricting their applicability, while others target multiple molecule types but fail to efficiently process diverse interaction information, leading to complexity and inefficiency. This study presents a novel deep learning model, MucLiPred, equipped with a dual contrastive learning mechanism aimed at improving the prediction of multiple molecule-protein interactions and the identification of potential molecule-binding residues. The residue-level paradigm focuses on differentiating binding from non-binding residues, illuminating detailed local interactions. The type-level paradigm, meanwhile, analyzes overarching contexts of molecule types, like DNA or RNA, ensuring that representations of identical molecule types gravitate closer in the representational space, bolstering the model's proficiency in discerning interaction motifs. This dual approach enables comprehensive multi-molecule predictions, elucidating the relationships among different molecule types and strengthening precise protein-molecule interaction predictions. Empirical evidence demonstrates MucLiPred's superiority over existing models in robustness and prediction accuracy. The integration of dual contrastive learning techniques amplifies its capability to detect potential molecule-binding residues with precision. Further optimization, separating representational and classification tasks, has markedly improved its performance. MucLiPred thus represents a significant advancement in protein-molecule interaction prediction, setting a new precedent for future research in this field.

Assuntos

Ácidos Nucleicos , Proteínas , Proteínas/química

10.

StructuralDPPIV: a novel deep learning model based on atom structure for predicting dipeptidyl peptidase-IV inhibitory peptides.

Wang, Ding; Jin, Junru; Li, Zhongshen; Wang, Yu; Fan, Mushuang; Liang, Sirui; Su, Ran; Wei, Leyi.

Bioinformatics ; 40(2)2024 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-38305458

RESUMO

MOTIVATION: Diabetes is a chronic metabolic disorder that has been a major cause of blindness, kidney failure, heart attacks, stroke, and lower limb amputation across the world. To alleviate the impact of diabetes, researchers have developed the next generation of anti-diabetic drugs, known as dipeptidyl peptidase IV inhibitory peptides (DPP-IV-IPs). However, the discovery of these promising drugs has been restricted due to the lack of effective peptide-mining tools. RESULTS: Here, we presented StructuralDPPIV, a deep learning model designed for DPP-IV-IP identification, which takes advantage of both molecular graph features in amino acid and sequence information. Experimental results on the independent test dataset and two wet experiment datasets show that our model outperforms the other state-of-art methods. Moreover, to better study what StructuralDPPIV learns, we used CAM technology and perturbation experiment to analyze our model, which yielded interpretable insights into the reasoning behind prediction results. AVAILABILITY AND IMPLEMENTATION: The project code is available at https://github.com/WeiLab-BioChem/Structural-DPP-IV.

Assuntos

Aprendizado Profundo , Diabetes Mellitus , Humanos , Dipeptidil Peptidase 4 , Aminoácidos , Peptídeos

11.

NanoCon: contrastive learning-based deep hybrid network for nanopore methylation detection.

Yin, Chenglin; Wang, Ruheng; Qiao, Jianbo; Shi, Hua; Duan, Hongliang; Jiang, Xinbo; Teng, Saisai; Wei, Leyi.

Bioinformatics ; 40(2)2024 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-38305428

RESUMO

MOTIVATION: 5-Methylcytosine (5mC), a fundamental element of DNA methylation in eukaryotes, plays a vital role in gene expression regulation, embryonic development, and other biological processes. Although several computational methods have been proposed for detecting the base modifications in DNA like 5mC sites from Nanopore sequencing data, they face challenges including sensitivity to noise, and ignoring the imbalanced distribution of methylation sites in real-world scenarios. RESULTS: Here, we develop NanoCon, a deep hybrid network coupled with contrastive learning strategy to detect 5mC methylation sites from Nanopore reads. In particular, we adopted a contrastive learning module to alleviate the issues caused by imbalanced data distribution in nanopore sequencing, offering a more accurate and robust detection of 5mC sites. Evaluation results demonstrate that NanoCon outperforms existing methods, highlighting its potential as a valuable tool in genomic sequencing and methylation prediction. In addition, we also verified the effectiveness of our representation learning ability on two datasets by visualizing the dimension reduction of the features of methylation and nonmethylation sites from our NanoCon. Furthermore, cross-species and cross-5mC methylation motifs experiments indicated the robustness and the ability to perform transfer learning of our model. We hope this work can contribute to the community by providing a powerful and reliable solution for 5mC site detection in genomic studies. AVAILABILITY AND IMPLEMENTATION: The project code is available at https://github.com/Challis-yin/NanoCon.

Assuntos

Nanoporos , Metilação de DNA , Genômica , Genoma , DNA

12.

CACPP: A Contrastive Learning-Based Siamese Network to Identify Anticancer Peptides Based on Sequence Only.

Yang, Xuetong; Jin, Junru; Wang, Ruheng; Li, Zhongshen; Wang, Yu; Wei, Leyi.

J Chem Inf Model ; 64(7): 2807-2816, 2024 Apr 08.

Artigo em Inglês | MEDLINE | ID: mdl-37252890

RESUMO

Anticancer peptides (ACPs) recently have been receiving increasing attention in cancer therapy due to their low consumption, few adverse side effects, and easy accessibility. However, it remains a great challenge to identify anticancer peptides via experimental approaches, requiring expensive and time-consuming experimental studies. In addition, traditional machine-learning-based methods are proposed for ACP prediction mainly depending on hand-crafted feature engineering, which normally achieves low prediction performance. In this study, we propose CACPP (Contrastive ACP Predictor), a deep learning framework based on the convolutional neural network (CNN) and contrastive learning for accurately predicting anticancer peptides. In particular, we introduce the TextCNN model to extract the high-latent features based on the peptide sequences only and exploit the contrastive learning module to learn more distinguishable feature representations to make better predictions. Comparative results on the benchmark data sets indicate that CACPP outperforms all the state-of-the-art methods in the prediction of anticancer peptides. Moreover, to intuitively show that our model has good classification ability, we visualize the dimension reduction of the features from our model and explore the relationship between ACP sequences and anticancer functions. Furthermore, we also discuss the influence of data set construction on model prediction and explore our model performance on the data sets with verified negative samples.

Assuntos

Benchmarking , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Peptídeos/farmacologia

13.

AttenSyn: An Attention-Based Deep Graph Neural Network for Anticancer Synergistic Drug Combination Prediction.

Wang, Tianshuo; Wang, Ruheng; Wei, Leyi.

J Chem Inf Model ; 64(7): 2854-2862, 2024 Apr 08.

Artigo em Inglês | MEDLINE | ID: mdl-37565997

RESUMO

Identifying synergistic drug combinations is fundamentally important to treat a variety of complex diseases while avoiding severe adverse drug-drug interactions. Although several computational methods have been proposed, they highly rely on handcrafted feature engineering and cannot learn better interactive information between drug pairs, easily resulting in relatively low performance. Recently, deep-learning methods, especially graph neural networks, have been widely developed in this area and demonstrated their ability to address complex biological problems. In this study, we proposed AttenSyn, an attention-based deep graph neural network for accurately predicting synergistic drug combinations. In particular, we adopted a graph neural network module to extract high-latent features based on the molecular graphs only and exploited the attention-based pooling module to learn interactive information between drug pairs to strengthen the representations of drug pairs. Comparative results on the benchmark datasets demonstrated that our AttenSyn performs better than the state-of-the-art methods in the prediction of anticancer synergistic drug combinations. Additionally, to provide good interpretability of our model, we explored and visualized some crucial substructures in drugs through attention mechanisms. Furthermore, we also verified the effectiveness of our proposed AttenSyn on two cell lines by visualizing the features of drug combinations learnt from our model, exhibiting satisfactory generalization ability.

Assuntos

Benchmarking , Aprendizagem , Linhagem Celular , Redes Neurais de Computação

14.

Deep Generative Models in De Novo Drug Molecule Generation.

Pang, Chao; Qiao, Jianbo; Zeng, Xiangxiang; Zou, Quan; Wei, Leyi.

J Chem Inf Model ; 64(7): 2174-2194, 2024 Apr 08.

Artigo em Inglês | MEDLINE | ID: mdl-37934070

RESUMO

The discovery of new drugs has important implications for human health. Traditional methods for drug discovery rely on experiments to optimize the structure of lead molecules, which are time-consuming and high-cost. Recently, artificial intelligence has exhibited promising and efficient performance for drug-like molecule generation. In particular, deep generative models achieve great success in de novo generation of drug-like molecules with desired properties, showing massive potential for novel drug discovery. In this study, we review the recent progress of molecule generation using deep generative models, mainly focusing on molecule representations, public databases, data processing tools, and advanced artificial intelligence based molecule generation frameworks. In particular, we present a comprehensive comparison of state-of-the-art deep generative models for molecule generation and a summary of commonly used molecular design strategies. We identify research gaps and challenges of molecule generation such as the need for better databases, missing 3D information in molecular representation, and the lack of high-precision evaluation metrics. We suggest future directions for molecular generation and drug discovery.

Assuntos

Inteligência Artificial , Benchmarking , Humanos , Bases de Dados Factuais , Descoberta de Drogas , Desenho de Fármacos

15.

Multi-CGAN: Deep Generative Model-Based Multiproperty Antimicrobial Peptide Design.

Yu, Haoqing; Wang, Ruheng; Qiao, Jianbo; Wei, Leyi.

J Chem Inf Model ; 64(1): 316-326, 2024 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-38135439

RESUMO

Antimicrobial peptides are peptides that are effective against bacteria and viruses, and the discovery of new antimicrobial peptides is of great importance to human life and health. Although the design of antimicrobial peptides using machine learning methods has achieved good results in recent years, it remains a challenge to learn and design novel antimicrobial peptides with multiple properties of interest from peptide data with certain property labels. To this end, we propose Multi-CGAN, a deep generative model-based architecture that can learn from single-attribute peptide data and generate antimicrobial peptide sequences with multiple attributes that we need, which may have a potentially wide range of uses in drug discovery. In particular, we verified that our Multi-CGAN generated peptides with the desired properties have good performance in terms of generation rate. Moreover, a comprehensive statistical analysis demonstrated that our generated peptides are diverse and have a low probability of being homologous to the training data. Interestingly, we found that the performance of many popular deep learning methods on the antimicrobial peptide prediction task can be improved by using Multi-CGAN to expand the data on the training set of the original task, indicating the high quality of our generated peptides and the robust ability of our method. In addition, we also investigated whether it is possible to directionally generate peptide sequences with specified properties by controlling the input noise sampling for our model.

Assuntos

Peptídeos Antimicrobianos , Peptídeos , Humanos , Peptídeos/farmacologia , Peptídeos/química , Aprendizado de Máquina , Descoberta de Drogas

16.

A general hypergraph learning algorithm for drug multi-task predictions in micro-to-macro biomedical networks.

Jin, Shuting; Hong, Yue; Zeng, Li; Jiang, Yinghui; Lin, Yuan; Wei, Leyi; Yu, Zhuohang; Zeng, Xiangxiang; Liu, Xiangrong.

PLoS Comput Biol ; 19(11): e1011597, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37956212

RESUMO

The powerful combination of large-scale drug-related interaction networks and deep learning provides new opportunities for accelerating the process of drug discovery. However, chemical structures that play an important role in drug properties and high-order relations that involve a greater number of nodes are not tackled in current biomedical networks. In this study, we present a general hypergraph learning framework, which introduces Drug-Substructures relationship into Molecular interaction Networks to construct the micro-to-macro drug centric heterogeneous network (DSMN), and develop a multi-branches HyperGraph learning model, called HGDrug, for Drug multi-task predictions. HGDrug achieves highly accurate and robust predictions on 4 benchmark tasks (drug-drug, drug-target, drug-disease, and drug-side-effect interactions), outperforming 8 state-of-the-art task specific models and 6 general-purpose conventional models. Experiments analysis verifies the effectiveness and rationality of the HGDrug model architecture as well as the multi-branches setup, and demonstrates that HGDrug is able to capture the relations between drugs associated with the same functional groups. In addition, our proposed drug-substructure interaction networks can help improve the performance of existing network models for drug-related prediction tasks.

Assuntos

Algoritmos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Benchmarking , Sistemas de Liberação de Medicamentos , Descoberta de Drogas

17.

DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model.

Fang, Yitian; Jiang, Yi; Wei, Leyi; Ma, Qin; Ren, Zhixiang; Yuan, Qianmu; Wei, Dong-Qing.

Bioinformatics ; 39(12)2023 12 01.

Artigo em Inglês | MEDLINE | ID: mdl-38015872

RESUMO

MOTIVATION: Identifying the functional sites of a protein, such as the binding sites of proteins, peptides, or other biological components, is crucial for understanding related biological processes and drug design. However, existing sequence-based methods have limited predictive accuracy, as they only consider sequence-adjacent contextual features and lack structural information. RESULTS: In this study, DeepProSite is presented as a new framework for identifying protein binding site that utilizes protein structure and sequence information. DeepProSite first generates protein structures from ESMFold and sequence representations from pretrained language models. It then uses Graph Transformer and formulates binding site predictions as graph node classifications. In predicting protein-protein/peptide binding sites, DeepProSite outperforms state-of-the-art sequence- and structure-based methods on most metrics. Moreover, DeepProSite maintains its performance when predicting unbound structures, in contrast to competing structure-based prediction methods. DeepProSite is also extended to the prediction of binding sites for nucleic acids and other ligands, verifying its generalization capability. Finally, an online server for predicting multiple types of residue is established as the implementation of the proposed DeepProSite. AVAILABILITY AND IMPLEMENTATION: The datasets and source codes can be accessed at https://github.com/WeiLab-Biology/DeepProSite. The proposed DeepProSite can be accessed at https://inner.wei-group.net/DeepProSite/.

Assuntos

Peptídeos , Proteínas , Ligação Proteica , Proteínas/química , Sítios de Ligação , Software

18.

ConPep: Prediction of peptide contact maps with pre-trained biological language model and multi-view feature extracting strategy.

Wei, Qingxin; Wang, Ruheng; Jiang, Yi; Wei, Leyi; Sun, Yu; Geng, Jie; Su, Ran.

Comput Biol Med ; 167: 107631, 2023 12.

Artigo em Inglês | MEDLINE | ID: mdl-37948966

RESUMO

The accurate prediction of peptide contact maps remains a challenging task due to the difficulty in obtaining the interactive information between residues on short sequences. To address this challenge, we propose ConPep, a deep learning framework designed for predicting the contact map of peptides based on sequences only. To sufficiently incorporate the sequential semantic information between residues in peptide sequences, we use a pre-trained biological language model and transfer prior knowledge from large scale databases. Additionally, to extract and integrate sequential local information and residue-based global correlations, our model incorporates Bidirectional Gated Recurrent Unit and attention mechanisms. They can obtain multi-view features and thus enhance the accuracy and robustness of our prediction. Comparative results on independent tests demonstrate that our proposed method significantly outperforms state-of-the-art methods even with short peptides. Notably, our method exhibits superior performance at the sequence level, suggesting the robust ability of our model compared with the multiple sequence alignment (MSA) analysis-based methods. We expect it can be meaningful research for facilitating the wide use of our method.

Assuntos

Algoritmos , Proteínas , Proteínas/química , Biologia Computacional/métodos , Peptídeos , Idioma , Bases de Dados de Proteínas

19.

MolCAP: Molecular Chemical reActivity Pretraining and prompted-finetuning enhanced molecular representation learning.

Wang, Yu; Zhang, Jingjie; Jin, Junru; Wei, Leyi.

Comput Biol Med ; 167: 107666, 2023 12.

Artigo em Inglês | MEDLINE | ID: mdl-37956623

RESUMO

Molecular representation learning (MRL) is a fundamental task for drug discovery. However, previous deep-learning (DL) methods focus excessively on learning robust inner-molecular representations by mask-dominated pretraining frameworks, neglecting abundant chemical reactivity molecular relationships that have been demonstrated as the determining factor for various molecular property prediction tasks. Here, we present MolCAP to promote MRL, a graph-pretraining Transformer based on chemical reactivity (IMR) knowledge with prompted finetuning. Results show that MolCAP outperforms comparative methods based on traditional molecular pretraining frameworks, in 13 publicly available molecular datasets across a diversity of biomedical tasks. Prompted by MolCAP, even basic graph neural networks are capable of achieving surprising performance that outperforms previous models, indicating the promising prospect of applying reactivity information to MRL. In addition, manually designed molecular templets are potential to uncover the dataset bias. All in all, we expect our MolCAP to gain more chemical meaningful insights for the entire process of drug discovery.

Assuntos

Descoberta de Drogas , Aprendizagem , Redes Neurais de Computação

20.

Retrosynthesis prediction with an interpretable deep-learning framework based on molecular assembly tasks.

Wang, Yu; Pang, Chao; Wang, Yuzhe; Jin, Junru; Zhang, Jingjie; Zeng, Xiangxiang; Su, Ran; Zou, Quan; Wei, Leyi.

Nat Commun ; 14(1): 6155, 2023 Oct 03.

Artigo em Inglês | MEDLINE | ID: mdl-37788995

RESUMO

Automating retrosynthesis with artificial intelligence expedites organic chemistry research in digital laboratories. However, most existing deep-learning approaches are hard to explain, like a "black box" with few insights. Here, we propose RetroExplainer, formulizing the retrosynthesis task into a molecular assembly process, containing several retrosynthetic actions guided by deep learning. To guarantee a robust performance of our model, we propose three units: a multi-sense and multi-scale Graph Transformer, structure-aware contrastive learning, and dynamic adaptive multi-task learning. The results on 12 large-scale benchmark datasets demonstrate the effectiveness of RetroExplainer, which outperforms the state-of-the-art single-step retrosynthesis approaches. In addition, the molecular assembly process renders our model with good interpretability, allowing for transparent decision-making and quantitative attribution. When extended to multi-step retrosynthesis planning, RetroExplainer has identified 101 pathways, in which 86.9% of the single reactions correspond to those already reported in the literature. As a result, RetroExplainer is expected to offer valuable insights for reliable, high-throughput, and high-quality organic synthesis in drug development.

RESUMO

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA