RESUMO
Antimicrobial peptides (AMPs), short peptides with diverse functions, effectively target and combat various organisms. The widespread misuse of chemical antibiotics has led to increasing microbial resistance. Due to their low drug resistance and toxicity, AMPs are considered promising substitutes for traditional antibiotics. While existing deep learning technology enhances AMP generation, it also presents certain challenges. Firstly, AMP generation overlooks the complex interdependencies among amino acids. Secondly, current models fail to integrate crucial tasks like screening, attribute prediction and iterative optimization. Consequently, we develop a integrated deep learning framework, Diff-AMP, that automates AMP generation, identification, attribute prediction and iterative optimization. We innovatively integrate kinetic diffusion and attention mechanisms into the reinforcement learning framework for efficient AMP generation. Additionally, our prediction module incorporates pre-training and transfer learning strategies for precise AMP identification and screening. We employ a convolutional neural network for multi-attribute prediction and a reinforcement learning-based iterative optimization strategy to produce diverse AMPs. This framework automates molecule generation, screening, attribute prediction and optimization, thereby advancing AMP research. We have also deployed Diff-AMP on a web server, with code, data and server details available in the Data Availability section.
Assuntos
Aminoácidos , Peptídeos Antimicrobianos , Antibacterianos , Difusão , CinéticaRESUMO
Predicting the drug response of cancer cell lines is crucial for advancing personalized cancer treatment, yet remains challenging due to tumor heterogeneity and individual diversity. In this study, we present a deep learning-based framework named Deep neural network Integrating Prior Knowledge (DIPK) (DIPK), which adopts self-supervised techniques to integrate multiple valuable information, including gene interaction relationships, gene expression profiles and molecular topologies, to enhance prediction accuracy and robustness. We demonstrated the superior performance of DIPK compared to existing methods on both known and novel cells and drugs, underscoring the importance of gene interaction relationships in drug response prediction. In addition, DIPK extends its applicability to single-cell RNA sequencing data, showcasing its capability for single-cell-level response prediction and cell identification. Further, we assess the applicability of DIPK on clinical data. DIPK accurately predicted a higher response to paclitaxel in the pathological complete response (pCR) group compared to the residual disease group, affirming the better response of the pCR group to the chemotherapy compound. We believe that the integration of DIPK into clinical decision-making processes has the potential to enhance individualized treatment strategies for cancer patients.
Assuntos
Aprendizado Profundo , Neoplasias , Humanos , Redes Neurais de Computação , Neoplasias/tratamento farmacológico , Neoplasias/genética , Linhagem CelularRESUMO
Graph neural networks based on deep learning methods have been extensively applied to the molecular property prediction because of its powerful feature learning ability and good performance. However, most of them are black boxes and cannot give the reasonable explanation about the underlying prediction mechanisms, which seriously reduce people's trust on the neural network-based prediction models. Here we proposed a novel graph neural network named iteratively focused graph network (IFGN), which can gradually identify the key atoms/groups in the molecule that are closely related to the predicted properties by the multistep focus mechanism. At the same time, the combination of the multistep focus mechanism with visualization can also generate multistep interpretations, thus allowing us to gain a deep understanding of the predictive behaviors of the model. For all studied eight datasets, the IFGN model achieved good prediction performance, indicating that the proposed multistep focus mechanism also can improve the performance of the model obviously besides increasing the interpretability of built model. For researchers to use conveniently, the corresponding website (http://graphadmet.cn/works/IFGN) was also developed and can be used free of charge.
Assuntos
Aprendizagem , Redes Neurais de Computação , Humanos , PesquisadoresRESUMO
Studies have shown that the mechanism of action of many drugs is related to miRNA. In-depth research on the relationship between miRNA and drugs can provide theoretical foundations and practical approaches for various areas, such as drug target discovery, drug repositioning and biomarker research. Traditional biological experiments to test miRNA-drug susceptibility are costly and time-consuming. Thus, sequence- or topology-based deep learning methods are recognized in this field for their efficiency and accuracy. However, these methods have limitations in dealing with sparse topologies and higher-order information of miRNA (drug) feature. In this work, we propose GCFMCL, a model for multi-view contrastive learning based on graph collaborative filtering. To the best of our knowledge, this is the first attempt that incorporates contrastive learning strategy into the graph collaborative filtering framework to predict the sensitivity relationships between miRNA and drug. The proposed multi-view contrastive learning method is divided into topological contrastive objective and feature contrastive objective: (1) For the homogeneous neighbors of the topological graph, we propose a novel topological contrastive learning method via constructing the contrastive target through the topological neighborhood information of nodes. (2) The proposed model obtains feature contrastive targets from high-order feature information according to the correlation of node features, and mines potential neighborhood relationships in the feature space. The proposed multi-view comparative learning effectively alleviates the impact of heterogeneous node noise and graph data sparsity in graph collaborative filtering, and significantly enhances the performance of the model. Our study employs a dataset derived from the NoncoRNA and ncDR databases, encompassing 2049 experimentally validated miRNA-drug sensitivity associations. Five-fold cross-validation shows that the Area Under the Curve (AUC), Area Under the Precision-Recall Curve (AUPR) and F1-score (F1) of GCFMCL reach 95.28%, 95.66% and 89.77%, which outperforms the state-of-the-art (SOTA) method by the margin of 2.73%, 3.42% and 4.96%, respectively. Our code and data can be accessed at https://github.com/kkkayle/GCFMCL.
Assuntos
Sistemas de Liberação de Medicamentos , MicroRNAs , Área Sob a Curva , Bases de Dados Factuais , Descoberta de Drogas , MicroRNAs/genéticaRESUMO
Rapid and accurate prediction of drug-target affinity can accelerate and improve the drug discovery process. Recent studies show that deep learning models may have the potential to provide fast and accurate drug-target affinity prediction. However, the existing deep learning models still have their own disadvantages that make it difficult to complete the task satisfactorily. Complex-based models rely heavily on the time-consuming docking process, and complex-free models lacks interpretability. In this study, we introduced a novel knowledge-distillation insights drug-target affinity prediction model with feature fusion inputs to make fast, accurate and explainable predictions. We benchmarked the model on public affinity prediction and virtual screening dataset. The results show that it outperformed previous state-of-the-art models and achieved comparable performance to previous complex-based models. Finally, we study the interpretability of this model through visualization and find it can provide meaningful explanations for pairwise interaction. We believe this model can further improve the drug-target affinity prediction for its higher accuracy and reliable interpretability.
Assuntos
Benchmarking , Descoberta de Drogas , Sistemas de Liberação de MedicamentosRESUMO
Messenger RNA (mRNA) is vital for post-transcriptional gene regulation, acting as the direct template for protein synthesis. However, the methods available for predicting mRNA subcellular localization need to be improved and enhanced. Notably, few existing algorithms can annotate mRNA sequences with multiple localizations. In this work, we propose the mRNA-CLA, an innovative multi-label subcellular localization prediction framework for mRNA, leveraging a deep learning approach with a multi-head self-attention mechanism. The framework employs a multi-scale convolutional layer to extract sequence features across different regions and uses a self-attention mechanism explicitly designed for each sequence. Paired with Position Weight Matrices (PWMs) derived from the convolutional neural network layers, our model offers interpretability in the analysis. In particular, we perform a base-level analysis of mRNA sequences from diverse subcellular localizations to determine the nucleotide specificity corresponding to each site. Our evaluations demonstrate that the mRNA-CLA model substantially outperforms existing methods and tools.
Assuntos
Aprendizado Profundo , RNA Mensageiro , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Biologia Computacional/métodos , Redes Neurais de Computação , Humanos , AlgoritmosRESUMO
The lack of effective and safe analgesics for chronic pain management has been a health problem associated with people's livelihoods for many years. Analgesic peptides have recently shown significant therapeutic potential, as they are devoid of opioid-related adverse effects. Programmed cell death protein 1 (PD-1) is widely expressed in neurons. Activation of PD-1 by PD-L1 modulates neuronal excitability and evokes significant analgesic effects, making it a promising target for pain treatment. However, the research and development of small molecule analgesic peptides targeting PD-1 have not been reported. Here, we screened the peptide H-20 using high-throughput screening. The in vitro data demonstrated that H-20 binds to PD-1 with micromolar affinity, evokes Src homology 2 domain-containing tyrosine phosphatase 1 (SHP-1) phosphorylation, and diminishes nociceptive signals in dorsal root ganglion (DRG) neurons. Preemptive treatment with H-20 effectively attenuates perceived pain in naïve WT mice. Spinal H-20 administration displayed effective and longer-lasting analgesia in multiple preclinical pain models with a reduction in or absence of tolerance, abuse liability, constipation, itch, and motor coordination impairment. In summary, our findings reveal that H-20 is a promising candidate drug that ameliorates chronic pain in the clinic.
Assuntos
Analgésicos , Dor Crônica , Peptídeos , Receptor de Morte Celular Programada 1 , Analgésicos/farmacologia , Analgésicos Opioides , Animais , Dor Crônica/tratamento farmacológico , Gânglios Espinais/metabolismo , Camundongos , Peptídeos/farmacologia , Receptor de Morte Celular Programada 1/metabolismoRESUMO
Malignant tumors have increasing morbidity and high mortality, and their occurrence and development is a complicate process. The development of sequencing technologies enabled us to gain a better understanding of the underlying genetic and molecular mechanisms in tumors. In recent years, the spatial transcriptomics sequencing technologies have been developed rapidly and allow the quantification and illustration of gene expression in the spatial context of tissues. Compared with the traditional transcriptomics technologies, spatial transcriptomics technologies not only detect gene expression levels in cells, but also inform the spatial location of genes within tissues, cell composition of biological tissues, and interaction between cells. Here we summarize the development of spatial transcriptomics technologies, spatial transcriptomics tools and its application in cancer research. We also discuss the limitations and challenges of current spatial transcriptomics approaches, as well as future development and prospects.
Assuntos
Perfilação da Expressão Gênica , Neoplasias , Transcriptoma , Humanos , Neoplasias/genética , Neoplasias/patologia , Animais , Regulação Neoplásica da Expressão Gênica , Biologia Computacional/métodos , Biomarcadores Tumorais/genéticaRESUMO
Aberrant canonical NF-κB signaling has been implicated in diseases, such as autoimmune disorders and cancer. Direct disruption of the interaction of NEMO and IKKα/ß has been developed as a novel way to inhibit the overactivation of NF-κB. Peptides are a potential solution for disrupting protein-protein interactions (PPIs); however, they typically suffer from poor stability in vivo and limited tissue penetration permeability, hampering their widespread use as new chemical biology tools and potential therapeutics. In this work, decafluorobiphenyl-cysteine SNAr chemistry, molecular modeling, and biological validation allowed the development of peptide PPI inhibitors. The resulting cyclic peptide specifically inhibited canonical NF-κB signaling in vitro and in vivo, and presented positive metabolic stability, anti-inflammatory effects, and low cytotoxicity. Importantly, our results also revealed that cyclic peptides had huge potential in acute lung injury (ALI) treatment, and confirmed the role of the decafluorobiphenyl-based cyclization strategy in enhancing the biological activity of peptide NEMO-IKKα/ß inhibitors. Moreover, it provided a promising method for the development of peptide-PPI inhibitors.
Assuntos
Lesão Pulmonar Aguda , Quinase I-kappa B , Lipopolissacarídeos , Peptídeos Cíclicos , Quinase I-kappa B/metabolismo , Quinase I-kappa B/antagonistas & inibidores , Lesão Pulmonar Aguda/tratamento farmacológico , Lesão Pulmonar Aguda/induzido quimicamente , Lesão Pulmonar Aguda/metabolismo , Animais , Camundongos , Peptídeos Cíclicos/química , Peptídeos Cíclicos/farmacologia , Humanos , NF-kappa B/metabolismo , Ligação Proteica , CiclizaçãoRESUMO
Plant small secretory peptides (SSPs) play an important role in the regulation of biological processes in plants. Accurately predicting SSPs enables efficient exploration of their functions. Traditional experimental verification methods are very reliable and accurate, but they require expensive equipment and a lot of time. The method of machine learning speeds up the prediction process of SSPs, but the instability of feature extraction will also lead to further limitations of this type of method. Therefore, this paper proposes a new feature-correction-based model for SSP recognition in plants, abbreviated as SE-SSP. The model mainly includes the following three advantages: First, the use of transformer encoders can better reveal implicit features. Second, design a feature correction module suitable for sequences, named 2-D SENET, to adaptively adjust the features to obtain a more robust feature representation. Third, stack multiple linear modules to further dig out the deep information on the sample. At the same time, the training based on a contrastive learning strategy can alleviate the problem of sparse samples. We construct experiments on publicly available data sets, and the results verify that our model shows an excellent performance. The proposed model can be used as a convenient and effective SSP prediction tool in the future. Our data and code are publicly available at https://github.com/wrab12/SE-SSP/.
Assuntos
Fontes de Energia Elétrica , Aprendizado de Máquina , Transporte Biológico , Peptídeos , Projetos de PesquisaRESUMO
Deep learning methods can accurately study noncoding RNA protein interactions (NPI), which is of great significance in gene regulation, human disease, and other fields. However, the computational method for predicting NPI in large-scale dynamic ncRNA protein bipartite graphs is rarely discussed, which is an online modeling and prediction problem. In addition, the results published by researchers on the Web site cannot meet real-time needs due to the large amount of basic data and long update cycles. Therefore, we propose a real-time method based on the dynamic ncRNA-protein bipartite graph learning framework, termed ML-GNN, which can model and predict the NPIs in real time. Our proposed method has the following advantages: first, the meta-learning strategy can alleviate the problem of large prediction errors in sparse neighborhood samples; second, dynamic modeling of newly added data can reduce computational pressure and predict NPIs in real-time. In the experiment, we built a dynamic bipartite graph based on 300000 NPIs from the NPInterv4.0 database. The experimental results indicate that our model achieved excellent performance in multiple experiments. The code for the model is available at https://github.com/taowang11/ML-NPI, and the data can be downloaded freely at http://bigdata.ibp.ac.cn/npinter4.
Assuntos
RNA não Traduzido , Pesquisadores , Humanos , Bases de Dados Factuais , RNA não Traduzido/genéticaRESUMO
Deep learning-based de novo molecular design has recently gained significant attention. While numerous DL-based generative models have been successfully developed for designing novel compounds, the majority of the generated molecules lack sufficiently novel scaffolds or high drug-like profiles. The aforementioned issues may not be fully captured by commonly used metrics for the assessment of molecular generative models, such as novelty, diversity, and quantitative estimation of the drug-likeness score. To address these limitations, we proposed a genetic algorithm-guided generative model called GARel (genetic algorithm-based receptor-ligand interaction generator), a novel framework for training a DL-based generative model to produce drug-like molecules with novel scaffolds. To efficiently train the GARel model, we utilized dense net to update the parameters based on molecules with novel scaffolds and drug-like features. To demonstrate the capability of the GARel model, we used it to design inhibitors for three targets: AA2AR, EGFR, and SARS-Cov2. The results indicate that GARel-generated molecules feature more diverse and novel scaffolds and possess more desirable physicochemical properties and favorable docking scores. Compared with other generative models, GARel makes significant progress in balancing novelty and drug-likeness, providing a promising direction for the further development of DL-based de novo design methodology with potential impacts on drug discovery.
Assuntos
Desenho de Fármacos , RNA Viral , Ligantes , Algoritmos , Descoberta de DrogasRESUMO
Predicting drug-target interactions (DTIs) is one of the crucial tasks in drug discovery, but traditional wet-lab experiments are costly and time-consuming. Recently, deep learning has emerged as a promising tool for accelerating DTI prediction due to its powerful performance. However, the models trained on limited known DTI data struggle to generalize effectively to novel drug-target pairs. In this work, we propose a strategy to train an ensemble of models by capturing both domain-generic and domain-specific features (E-DIS) to learn diverse domain features and adapt them to out-of-distribution data. Multiple experts were trained on different domains to capture and align domain-specific information from various distributions without accessing any data from unseen domains. E-DIS provides a comprehensive representation of proteins and ligands by capturing diverse features. Experimental results on four benchmark data sets in both in-domain and cross-domain settings demonstrated that E-DIS significantly improved model performance and domain generalization compared to existing methods. Our approach presents a significant advancement in DTI prediction by combining domain-generic and domain-specific features, enhancing the generalization ability of the DTI prediction model.
Assuntos
Aprendizado Profundo , Descoberta de Drogas , Proteínas , Descoberta de Drogas/métodos , Proteínas/química , Proteínas/metabolismo , Ligantes , Preparações Farmacêuticas/química , Preparações Farmacêuticas/metabolismo , Domínios ProteicosRESUMO
The Leucine-rich repeat kinase 2 (LRRK2) target has been identified as a promising drug target for Parkinson's disease (PD) treatment. This study focuses on optimizing the activity of LRRK2 inhibitors using alchemical relative binding free energy (RBFE) calculations. Initially, we assessed various free energy calculation methods across different LRRK2 kinase inhibitor scaffolds. The results indicate that alchemical free energy calculations are promising for prospective predictions on LRRK2 inhibitors, especially for the aminopyrimidine scaffold with an RMSE of 1.15 kcal mol-1 and Rp of 0.83. Following this, we optimized a potent LRRK2 kinase inhibitor identified from previous virtual screenings, featuring a novel scaffold. Guided by RBFE predictions using alchemical methods, this optimization led to the discovery of compound LY2023-001. This compound, with a [1,2,4]triazolo[5,6-b]indole scaffold, exhibited enhanced inhibitory activity against G2019S LRRK2 (IC50 = 12.9 nM). Molecular dynamics (MD) simulations revealed that LY2023-001 formed stable hydrogen bonds with Glu1948, and Ala1950 in the G2019S LRRK2 protein. Additionally, its phenyl substituents engage in strong electrostatic interactions with Lys1906 and van der Waals interactions with Leu1885, Phe1890, Val1893, Ile1933, Met1947, Leu1949, Leu2001, Ala2016, and Asp2017. Our findings underscore the potential of computational methods in the successful optimization of small molecules, offering important insights for the development of novel LRRK2 inhibitors.
Assuntos
Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina , Simulação de Dinâmica Molecular , Inibidores de Proteínas Quinases , Termodinâmica , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/antagonistas & inibidores , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/metabolismo , Serina-Treonina Proteína Quinase-2 com Repetições Ricas em Leucina/química , Inibidores de Proteínas Quinases/química , Inibidores de Proteínas Quinases/farmacologia , Humanos , Ligação de Hidrogênio , Ligação Proteica , Estrutura Molecular , Simulação de Acoplamento MolecularRESUMO
PARP1 is a multifaceted component of DNA repair and chromatin remodeling, making it an effective therapeutic target for cancer therapy. The recently reported proteolytic targeting chimera (PROTAC) could effectively degrade PARP1 through the ubiquitin-proteasome pathway, expanding the therapeutic application of PARP1 blocking. In this study, a series of nitrogen heterocyclic PROTACs were designed and synthesized through ternary complex simulation analysis based on our previous work. Our efforts have resulted in a potent PARP1 degrader D6 (DC50 = 25.23 nM) with high selectivity due to nitrogen heterocyclic linker generating multiple interactions with the PARP1-CRBN PPI surface, specifically. Moreover, D6 exhibited strong cytotoxicity to triple negative breast cancer cell line MDA-MB-231 (IC50 = 1.04 µM). And the proteomic results showed that the antitumor mechanism of D6 was found that intensifies DNA damage by intercepting the CDC25C-CDK1 axis to halt cell cycle transition in triple-negative breast cancer cells. Furthermore, in vivo study, D6 showed a promising PK property with moderate oral absorption activity. And D6 could effectively inhibit tumor growth (TGI rate = 71.4 % at 40 mg/kg) without other signs of toxicity in MDA-MB-321 tumor-bearing mice. In summary, we have identified an original scaffold and potent PARP1 PROTAC that provided a novel intervention strategy for the treatment of triple-negative breast cancer.
Assuntos
Neoplasias de Mama Triplo Negativas , Humanos , Camundongos , Animais , Neoplasias de Mama Triplo Negativas/patologia , Proteômica , Proliferação de Células , Pontos de Checagem do Ciclo Celular , Nitrogênio , Linhagem Celular Tumoral , Fosfatases cdc25 , Poli(ADP-Ribose) Polimerase-1 , Proteína Quinase CDC2RESUMO
BACKGROUND: DNA methylation, instrumental in numerous life processes, underscores the paramount importance of its accurate prediction. Recent studies suggest that deep learning, due to its capacity to extract profound insights, provides a more precise DNA methylation prediction. However, issues related to the stability and generalization performance of these models persist. RESULTS: In this study, we introduce an efficient and stable DNA methylation prediction model. This model incorporates a feature fusion approach, adaptive feature correction technology, and a contrastive learning strategy. The proposed model presents several advantages. First, DNA sequences are encoded at four levels to comprehensively capture intricate information across multi-scale and low-span features. Second, we design a sequence-specific feature correction module that adaptively adjusts the weights of sequence features. This improvement enhances the model's stability and scalability, or its generality. Third, our contrastive learning strategy mitigates the instability issues resulting from sparse data. To validate our model, we conducted multiple sets of experiments on commonly used datasets, demonstrating the model's robustness and stability. Simultaneously, we amalgamate various datasets into a single, unified dataset. The experimental outcomes from this combined dataset substantiate the model's robust adaptability. CONCLUSIONS: Our research findings affirm that the StableDNAm model is a general, stable, and effective instrument for DNA methylation prediction. It holds substantial promise for providing invaluable assistance in future methylation-related research and analyses.
Assuntos
Metilação de DNA , Processamento de Proteína Pós-TraducionalRESUMO
Deep learning is an important branch of artificial intelligence that has been successfully applied into medicine and two-dimensional ligand design. The three-dimensional (3D) ligand generation in the 3D pocket of protein target is an interesting and challenging issue for drug design by deep learning. Here, the MolAICal software is introduced to supply a way for generating 3D drugs in the 3D pocket of protein targets by combining with merits of deep learning model and classical algorithm. The MolAICal software mainly contains two modules for 3D drug design. In the first module of MolAICal, it employs the genetic algorithm, deep learning model trained by FDA-approved drug fragments and Vinardo score fitting on the basis of PDBbind database for drug design. In the second module, it uses deep learning generative model trained by drug-like molecules of ZINC database and molecular docking invoked by Autodock Vina automatically. Besides, the Lipinski's rule of five, Pan-assay interference compounds (PAINS), synthetic accessibility (SA) and other user-defined rules are introduced for filtering out unwanted ligands in MolAICal. To show the drug design modules of MolAICal, the membrane protein glucagon receptor and non-membrane protein SARS-CoV-2 main protease are chosen as the investigative drug targets. The results show MolAICal can generate the various and novel ligands with good binding scores and appropriate XLOGP values. We believe that MolAICal can use the advantages of deep learning model and classical programming for designing 3D drugs in protein pocket. MolAICal is freely for any nonprofit purpose and accessible at https://molaical.github.io.
Assuntos
Algoritmos , Inteligência Artificial , Desenho de Fármacos , Proteínas/química , Software , Bases de Dados de Proteínas , Relação Quantitativa Estrutura-AtividadeRESUMO
How to produce expressive molecular representations is a fundamental challenge in artificial intelligence-driven drug discovery. Graph neural network (GNN) has emerged as a powerful technique for modeling molecular data. However, previous supervised approaches usually suffer from the scarcity of labeled data and poor generalization capability. Here, we propose a novel molecular pre-training graph-based deep learning framework, named MPG, that learns molecular representations from large-scale unlabeled molecules. In MPG, we proposed a powerful GNN for modelling molecular graph named MolGNet, and designed an effective self-supervised strategy for pre-training the model at both the node and graph-level. After pre-training on 11 million unlabeled molecules, we revealed that MolGNet can capture valuable chemical insights to produce interpretable representation. The pre-trained MolGNet can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 14 benchmark datasets. The pre-trained MolGNet in MPG has the potential to become an advanced molecular encoder in the drug discovery pipeline.
Assuntos
Bases de Dados de Compostos Químicos , Sistemas de Liberação de Medicamentos , Descoberta de Drogas , Modelos Moleculares , Redes Neurais de ComputaçãoRESUMO
MOTIVATION: Computational methods accelerate drug discovery and play an important role in biomedicine, such as molecular property prediction and compound-protein interaction (CPI) identification. A key challenge is to learn useful molecular representation. In the early years, molecular properties are mainly calculated by quantum mechanics or predicted by traditional machine learning methods, which requires expert knowledge and is often labor-intensive. Nowadays, graph neural networks have received significant attention because of the powerful ability to learn representation from graph data. Nevertheless, current graph-based methods have some limitations that need to be addressed, such as large-scale parameters and insufficient bond information extraction. RESULTS: In this study, we proposed a graph-based approach and employed a novel triplet message mechanism to learn molecular representation efficiently, named triplet message networks (TrimNet). We show that TrimNet can accurately complete multiple molecular representation learning tasks with significant parameter reduction, including the quantum properties, bioactivity, physiology and CPI prediction. In the experiments, TrimNet outperforms the previous state-of-the-art method by a significant margin on various datasets. Besides the few parameters and high prediction accuracy, TrimNet could focus on the atoms essential to the target properties, providing a clear interpretation of the prediction tasks. These advantages have established TrimNet as a powerful and useful computational tool in solving the challenging problem of molecular representation learning. AVAILABILITY: The quantum and drug datasets are available on the website of MoleculeNet: http://moleculenet.ai. The source code is available in GitHub: https://github.com/yvquanli/trimnet. CONTACT: xjyao@lzu.edu.cn, songsen@tsinghua.edu.cn.
Assuntos
Descoberta de Drogas , Aprendizado de Máquina , SoftwareRESUMO
How to accurately estimate protein-ligand binding affinity remains a key challenge in computer-aided drug design (CADD). In many cases, it has been shown that the binding affinities predicted by classical scoring functions (SFs) cannot correlate well with experimentally measured biological activities. In the past few years, machine learning (ML)-based SFs have gradually emerged as potential alternatives and outperformed classical SFs in a series of studies. In this study, to better recognize the potential of classical SFs, we have conducted a comparative assessment of 25 commonly used SFs. Accordingly, the scoring power was systematically estimated by using the state-of-the-art ML methods that replaced the original multiple linear regression method to refit individual energy terms. The results show that the newly-developed ML-based SFs consistently performed better than classical ones. In particular, gradient boosting decision tree (GBDT) and random forest (RF) achieved the best predictions in most cases. The newly-developed ML-based SFs were also tested on another benchmark modified from PDBbind v2007, and the impacts of structural and sequence similarities were evaluated. The results indicated that the superiority of the ML-based SFs could be fully guaranteed when sufficient similar targets were contained in the training set. Moreover, the effect of the combinations of features from multiple SFs was explored, and the results indicated that combining NNscore2.0 with one to four other classical SFs could yield the best scoring power. However, it was not applicable to derive a generic target-specific SF or SF combination.