Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38960407

RESUMO

The optimization of therapeutic antibodies through traditional techniques, such as candidate screening via hybridoma or phage display, is resource-intensive and time-consuming. In recent years, computational and artificial intelligence-based methods have been actively developed to accelerate and improve the development of therapeutic antibodies. In this study, we developed an end-to-end sequence-based deep learning model, termed AttABseq, for the predictions of the antigen-antibody binding affinity changes connected with antibody mutations. AttABseq is a highly efficient and generic attention-based model by utilizing diverse antigen-antibody complex sequences as the input to predict the binding affinity changes of residue mutations. The assessment on the three benchmark datasets illustrates that AttABseq is 120% more accurate than other sequence-based models in terms of the Pearson correlation coefficient between the predicted and experimental binding affinity changes. Moreover, AttABseq also either outperforms or competes favorably with the structure-based approaches. Furthermore, AttABseq consistently demonstrates robust predictive capabilities across a diverse array of conditions, underscoring its remarkable capacity for generalization across a wide spectrum of antigen-antibody complexes. It imposes no constraints on the quantity of altered residues, rendering it particularly applicable in scenarios where crystallographic structures remain unavailable. The attention-based interpretability analysis indicates that the causal effects of point mutations on antibody-antigen binding affinity changes can be visualized at the residue level, which might assist automated antibody sequence optimization. We believe that AttABseq provides a fiercely competitive answer to therapeutic antibody optimization.


Assuntos
Complexo Antígeno-Anticorpo , Aprendizado Profundo , Complexo Antígeno-Anticorpo/química , Antígenos/química , Antígenos/genética , Antígenos/metabolismo , Antígenos/imunologia , Afinidade de Anticorpos , Sequência de Aminoácidos , Biologia Computacional/métodos , Humanos , Mutação , Anticorpos/química , Anticorpos/imunologia , Anticorpos/genética , Anticorpos/metabolismo
2.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35438145

RESUMO

Molecular property prediction models based on machine learning algorithms have become important tools to triage unpromising lead molecules in the early stages of drug discovery. Compared with the mainstream descriptor- and graph-based methods for molecular property predictions, SMILES-based methods can directly extract molecular features from SMILES without human expert knowledge, but they require more powerful algorithms for feature extraction and a larger amount of data for training, which makes SMILES-based methods less popular. Here, we show the great potential of pre-training in promoting the predictions of important pharmaceutical properties. By utilizing three pre-training tasks based on atom feature prediction, molecular feature prediction and contrastive learning, a new pre-training method K-BERT, which can extract chemical information from SMILES like chemists, was developed. The calculation results on 15 pharmaceutical datasets show that K-BERT outperforms well-established descriptor-based (XGBoost) and graph-based (Attentive FP and HRGCN+) models. In addition, we found that the contrastive learning pre-training task enables K-BERT to 'understand' SMILES not limited to canonical SMILES. Moreover, the general fingerprints K-BERT-FP generated by K-BERT exhibit comparative predictive power to MACCS on 15 pharmaceutical datasets and can also capture molecular size and chirality information that traditional binary fingerprints cannot capture. Our results illustrate the great potential of K-BERT in the practical applications of molecular property predictions in drug discovery.


Assuntos
Algoritmos , Aprendizado de Máquina , Humanos , Bases de Conhecimento , Preparações Farmacêuticas , Projetos de Pesquisa
3.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35062020

RESUMO

Accurate prediction of atomic partial charges with high-level quantum mechanics (QM) methods suffers from high computational cost. Numerous feature-engineered machine learning (ML)-based predictors with favorable computability and reliability have been developed as alternatives. However, extensive expertise effort was needed for feature engineering of atom chemical environment, which may consequently introduce domain bias. In this study, SuperAtomicCharge, a data-driven deep graph learning framework, was proposed to predict three important types of partial charges (i.e. RESP, DDEC4 and DDEC78) derived from high-level QM calculations based on the structures of molecules. SuperAtomicCharge was designed to simultaneously exploit the 2D and 3D structural information of molecules, which was proved to be an effective way to improve the prediction accuracy of the model. Moreover, a simple transfer learning strategy and a multitask learning strategy based on self-supervised descriptors were also employed to further improve the prediction accuracy of the proposed model. Compared with the latest baselines, including one GNN-based predictor and two ML-based predictors, SuperAtomicCharge showed better performance on all the three external test sets and had better usability and portability. Furthermore, the QM partial charges of new molecules predicted by SuperAtomicCharge can be efficiently used in drug design applications such as structure-based virtual screening, where the predicted RESP and DDEC4 charges of new molecules showed more robust scoring and screening power than the commonly used partial charges. Finally, two tools including an online server (http://cadd.zju.edu.cn/deepchargepredictor) and the source code command lines (https://github.com/zjujdj/SuperAtomicCharge) were developed for the easy access of the SuperAtomicCharge services.


Assuntos
Aprendizado Profundo , Desenho de Fármacos , Aprendizado de Máquina , Reprodutibilidade dos Testes , Software
4.
J Chem Inf Model ; 64(13): 5016-5027, 2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-38920330

RESUMO

The intricate interaction between major histocompatibility complexes (MHCs) and antigen peptides with diverse amino acid sequences plays a pivotal role in immune responses and T cell activity. In recent years, deep learning (DL)-based models have emerged as promising tools for accelerating antigen peptide screening. However, most of these models solely rely on one-dimensional amino acid sequences, overlooking crucial information required for the three-dimensional (3-D) space binding process. In this study, we propose TransfIGN, a structure-based DL model that is inspired by our previously developed framework, Interaction Graph Network (IGN), and incorporates sequence information from transformers to predict the interactions between HLA-A*02:01 and antigen peptides. Our model, trained on a comprehensive data set containing 61,816 sequences with 9051 binding affinity labels and 56,848 eluted ligand labels, achieves an area under the curve (AUC) of 0.893 on the binary data set, better than state-of-the-art sequence-based models trained on larger data sets such as NetMHCpan4.1, ANN, and TransPHLA. Furthermore, when evaluated on the IEDB weekly benchmark data sets, our predictions (AUC = 0.816) are better than those of the recommended methods like the IEDB consensus (AUC = 0.795). Notably, the interaction weight matrices generated by our method highlight the strong interactions at specific positions within peptides, emphasizing the model's ability to provide physical interpretability. This capability to unveil binding mechanisms through intricate structural features holds promise for new immunotherapeutic avenues.


Assuntos
Aprendizado Profundo , Antígeno HLA-A2 , Peptídeos , Antígeno HLA-A2/química , Antígeno HLA-A2/metabolismo , Peptídeos/química , Peptídeos/metabolismo , Humanos , Ligação Proteica , Modelos Moleculares , Sequência de Aminoácidos , Conformação Proteica
5.
J Chem Inf Model ; 64(6): 2112-2124, 2024 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-38483249

RESUMO

Cyclic peptides have emerged as a highly promising class of therapeutic molecules owing to their favorable pharmacokinetic properties, including stability and permeability. Currently, many clinically approved cyclic peptides are derived from natural products or their derivatives, and the development of molecular docking techniques for cyclic peptide discovery holds great promise for expanding the applications and potential of this class of molecules. Given the availability of numerous docking programs, there is a pressing need for a systematic evaluation of their performance, specifically on protein-cyclic peptide systems. In this study, we constructed an extensive benchmark data set called CPSet, consisting of 493 protein-cyclic peptide complexes. Based on this data set, we conducted a comprehensive evaluation of 10 docking programs, including Rosetta, AutoDock CrankPep, and eight protein-small molecule docking programs (i.e., AutoDock, AudoDock Vina, Glide, GOLD, LeDock, rDock, MOE, and Surflex). The evaluation encompassed the assessment of the sampling power, docking power, and scoring power of these programs. The results revealed that all of the tested protein-small molecule docking programs successfully sampled the binding conformations when using the crystal conformations as the initial structures. Among them, rDock exhibited outstanding performance, achieving a remarkable 94.3% top-100 sampling success rate. However, few programs achieved successful predictions of the binding conformations using tLEaP-generated conformations as the initial structures. Within this scheme, AutoDock CrankPep yielded the highest top-100 sampling success rate of 29.6%. Rosetta's scoring function outperformed the others in selecting optimal conformations, resulting in an impressive top-1 docking success rate of 87.6%. Nevertheless, all the tested scoring functions displayed limited performance in predicting binding affinity, with MOE@Affinity dG exhibiting the highest Pearson's correlation coefficient of 0.378. It is therefore suggested to use an appropriate combination of different docking programs for given tasks in real applications. We expect that this work will offer valuable insights into selecting the appropriate docking programs for protein-cyclic peptide complexes.


Assuntos
Peptídeos Cíclicos , Proteínas , Peptídeos Cíclicos/metabolismo , Simulação de Acoplamento Molecular , Ligação Proteica , Proteínas/química , Conformação Molecular , Ligantes
6.
J Chem Inf Model ; 64(8): 3222-3236, 2024 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-38498003

RESUMO

Liver microsomal stability, a crucial aspect of metabolic stability, significantly impacts practical drug discovery. However, current models for predicting liver microsomal stability are based on limited molecular information from a single species. To address this limitation, we constructed the largest public database of compounds from three common species: human, rat, and mouse. Subsequently, we developed a series of classification models using both traditional descriptor-based and classic graph-based machine learning (ML) algorithms. Remarkably, the best-performing models for the three species achieved Matthews correlation coefficients (MCCs) of 0.616, 0.603, and 0.574, respectively, on the test set. Furthermore, through the construction of consensus models based on these individual models, we have demonstrated their superior predictive performance in comparison with the existing models of the same type. To explore the similarities and differences in the properties of liver microsomal stability among multispecies molecules, we conducted preliminary interpretative explorations using the Shapley additive explanations (SHAP) and atom heatmap approaches for the models and misclassified molecules. Additionally, we further investigated representative structural modifications and substructures that decrease the liver microsomal stability in different species using the matched molecule pair analysis (MMPA) method and substructure extraction techniques. The established prediction models, along with insightful interpretation information regarding liver microsomal stability, will significantly contribute to enhancing the efficiency of exploring practical drugs for development.


Assuntos
Inteligência Artificial , Microssomos Hepáticos , Microssomos Hepáticos/metabolismo , Animais , Camundongos , Ratos , Humanos , Aprendizado de Máquina , Descoberta de Drogas/métodos , Preparações Farmacêuticas/metabolismo , Preparações Farmacêuticas/química
7.
J Chem Inf Model ; 2024 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-38920405

RESUMO

Artificial intelligence (AI)-aided drug design has demonstrated unprecedented effects on modern drug discovery, but there is still an urgent need for user-friendly interfaces that bridge the gap between these sophisticated tools and scientists, particularly those who are less computer savvy. Herein, we present DrugFlow, an AI-driven one-stop platform that offers a clean, convenient, and cloud-based interface to streamline early drug discovery workflows. By seamlessly integrating a range of innovative AI algorithms, covering molecular docking, quantitative structure-activity relationship modeling, molecular generation, ADMET (absorption, distribution, metabolism, excretion and toxicity) prediction, and virtual screening, DrugFlow can offer effective AI solutions for almost all crucial stages in early drug discovery, including hit identification and hit/lead optimization. We hope that the platform can provide sufficiently valuable guidance to aid real-word drug design and discovery. The platform is available at https://drugflow.com.

8.
Phys Chem Chem Phys ; 26(13): 10323-10335, 2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38501198

RESUMO

Ribonucleic acid (RNA)-ligand interactions play a pivotal role in a wide spectrum of biological processes, ranging from protein biosynthesis to cellular reproduction. This recognition has prompted the broader acceptance of RNA as a viable candidate for drug targets. Delving into the atomic-scale understanding of RNA-ligand interactions holds paramount importance in unraveling intricate molecular mechanisms and further contributing to RNA-based drug discovery. Computational approaches, particularly molecular docking, offer an efficient way of predicting the interactions between RNA and small molecules. However, the accuracy and reliability of these predictions heavily depend on the performance of scoring functions (SFs). In contrast to the majority of SFs used in RNA-ligand docking, the end-point binding free energy calculation methods, such as molecular mechanics/generalized Born surface area (MM/GBSA) and molecular mechanics/Poisson Boltzmann surface area (MM/PBSA), stand as theoretically more rigorous approaches. Yet, the evaluation of their effectiveness in predicting both binding affinities and binding poses within RNA-ligand systems remains unexplored. This study first reported the performance of MM/PBSA and MM/GBSA with diverse solvation models, interior dielectric constants (εin) and force fields in the context of binding affinity prediction for 29 RNA-ligand complexes. MM/GBSA is based on short (5 ns) molecular dynamics (MD) simulations in an explicit solvent with the YIL force field; the GBGBn2 model with higher interior dielectric constant (εin = 12, 16 or 20) yields the best correlation (Rp = -0.513), which outperforms the best correlation (Rp = -0.317, rDock) offered by various docking programs. Then, the efficacy of MM/GBSA in identifying the near-native binding poses from the decoys was assessed based on 56 RNA-ligand complexes. However, it is evident that MM/GBSA has limitations in accurately predicting binding poses for RNA-ligand systems, particularly compared with notably proficient docking programs like rDock and PLANTS. The best top-1 success rate achieved by MM/GBSA rescoring is 39.3%, which falls below the best results given by docking programs (50%, PLNATS). This study represents the first evaluation of MM/PBSA and MM/GBSA for RNA-ligand systems and is expected to provide valuable insights into their successful application to RNA targets.


Assuntos
Simulação de Dinâmica Molecular , RNA , Simulação de Acoplamento Molecular , Ligantes , Reprodutibilidade dos Testes , Ligação Proteica , Termodinâmica , Sítios de Ligação
9.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33866354

RESUMO

Accurate predictions of druggability and bioactivities of compounds are desirable to reduce the high cost and time of drug discovery. After more than five decades of continuing developments, quantitative structure-activity relationship (QSAR) methods have been established as indispensable tools that facilitate fast, reliable and affordable assessments of physicochemical and biological properties of compounds in drug-discovery programs. Currently, there are mainly two types of QSAR methods, descriptor-based methods and graph-based methods. The former is developed based on predefined molecular descriptors, whereas the latter is developed based on simple atomic and bond information. In this study, we presented a simple but highly efficient modeling method by combining molecular graphs and molecular descriptors as the input of a modified graph neural network, called hyperbolic relational graph convolution network plus (HRGCN+). The evaluation results show that HRGCN+ achieves state-of-the-art performance on 11 drug-discovery-related datasets. We also explored the impact of the addition of traditional molecular descriptors on the predictions of graph-based methods, and found that the addition of molecular descriptors can indeed boost the predictive power of graph-based methods. The results also highlight the strong anti-noise capability of our method. In addition, our method provides a way to interpret models at both the atom and descriptor levels, which can help medicinal chemists extract hidden information from complex datasets. We also offer an HRGCN+'s online prediction service at https://quantum.tencent.com/hrgcn/.


Assuntos
Algoritmos , Biologia Computacional/métodos , Descoberta de Drogas/métodos , Redes Neurais de Computação , Compostos Orgânicos/química , Relação Quantitativa Estrutura-Atividade , Inteligência Artificial , Gráficos por Computador , Simulação por Computador , Desenho de Fármacos , Modelos Químicos , Estrutura Molecular , Compostos Orgânicos/farmacologia
10.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33822874

RESUMO

Tuberculosis (TB) is an infectious disease caused by Mycobacterium tuberculosis (Mtb) and it has been one of the top 10 causes of death globally. Drug-resistant tuberculosis (XDR-TB), extensively resistant to the commonly used first-line drugs, has emerged as a major challenge to TB treatment. Hence, it is quite necessary to discover novel drug candidates for TB treatment. In this study, based on different types of molecular representations, four machine learning (ML) algorithms, including support vector machine, random forest (RF), extreme gradient boosting (XGBoost) and deep neural networks (DNN), were used to develop classification models to distinguish Mtb inhibitors from noninhibitors. The results demonstrate that the XGBoost model exhibits the best prediction performance. Then, two consensus strategies were employed to integrate the predictions from multiple models. The evaluation results illustrate that the consensus model by stacking the RF, XGBoost and DNN predictions offers the best predictions with area under the receiver operating characteristic curve of 0.842 and 0.942 for the 10-fold cross-validated training set and external test set, respectively. Besides, the association between the important descriptors and the bioactivities of molecules was interpreted by using the Shapley additive explanations method. Finally, an online webserver called ChemTB (http://cadd.zju.edu.cn/chemtb/) was developed, and it offers a freely available computational tool to detect potential Mtb inhibitors.


Assuntos
Antituberculosos/análise , Antituberculosos/farmacologia , Descoberta de Drogas/métodos , Mycobacterium tuberculosis/efeitos dos fármacos , Redes Neurais de Computação , Máquina de Vetores de Suporte , Antituberculosos/uso terapêutico , Área Sob a Curva , Confiabilidade dos Dados , Humanos , Modelos Biológicos , Curva ROC , Reprodutibilidade dos Testes , Tuberculose Resistente a Múltiplos Medicamentos/tratamento farmacológico , Tuberculose Resistente a Múltiplos Medicamentos/microbiologia
11.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33313673

RESUMO

Although a wide variety of machine learning (ML) algorithms have been utilized to learn quantitative structure-activity relationships (QSARs), there is no agreed single best algorithm for QSAR learning. Therefore, a comprehensive understanding of the performance characteristics of popular ML algorithms used in QSAR learning is highly desirable. In this study, five linear algorithms [linear function Gaussian process regression (linear-GPR), linear function support vector machine (linear-SVM), partial least squares regression (PLSR), multiple linear regression (MLR) and principal component regression (PCR)], three analogizers [radial basis function support vector machine (rbf-SVM), K-nearest neighbor (KNN) and radial basis function Gaussian process regression (rbf-GPR)], six symbolists [extreme gradient boosting (XGBoost), Cubist, random forest (RF), multiple adaptive regression splines (MARS), gradient boosting machine (GBM), and classification and regression tree (CART)] and two connectionists [principal component analysis artificial neural network (pca-ANN) and deep neural network (DNN)] were employed to learn the regression-based QSAR models for 14 public data sets comprising nine physicochemical properties and five toxicity endpoints. The results show that rbf-SVM, rbf-GPR, XGBoost and DNN generally illustrate better performances than the other algorithms. The overall performances of different algorithms can be ranked from the best to the worst as follows: rbf-SVM > XGBoost > rbf-GPR > Cubist > GBM > DNN > RF > pca-ANN > MARS > linear-GPR ≈ KNN > linear-SVM ≈ PLSR > CART ≈ PCR ≈ MLR. In terms of prediction accuracy and computational efficiency, SVM and XGBoost are recommended to the regression learning for small data sets, and XGBoost is an excellent choice for large data sets. We then investigated the performances of the ensemble models by integrating the predictions of multiple ML algorithms. The results illustrate that the ensembles of two or three algorithms in different categories can indeed improve the predictions of the best individual ML algorithms.


Assuntos
Modelos Biológicos , Redes Neurais de Computação , Máquina de Vetores de Suporte , Animais , Cyprinidae , Daphnia , Tetrahymena pyriformis
12.
Sensors (Basel) ; 23(7)2023 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-37050447

RESUMO

The Dadu River travels in the mountainous areas of southwestern China, one of regions with the most hazards that has long suffered from frequent geohazards. The early identification of landslides in this region is urgently needed, especially after the recent Luding earthquake (MS 6.8). While conventional ground-based monitoring techniques are limited by the complex terrain conditions in these alpine valley regions, space interferometric synthetic aperture radar (InSAR) provides an incomparable advantage in obtaining surface deformation with high precision and over a wide area, which is very useful for long-term and slow geohazard monitoring. In this study, more than 500 Sentinel-1 SAR images with four frames acquired during 2017~2022 were collected to detect the hidden landslide regions from the Jinchuan to Ebian Section along the Dadu River, based on joint-scatterer InSAR (JS-InSAR) and small baseline subset (SBAS) techniques. The results showed that our method could be successfully applied for landslide monitoring in complex mountainous regions. Furthermore, 143 potential landslide regions spreading over an 800 km area along the Dadu River were extracted by integrating the deformation measurements and optical images. Our study can provide a reference for large-scale geological hazard surveys in mountainous areas, and the InSAR technique will be encouraged for the local government in future long-term monitoring applications in the Dadu River Basin.

13.
Bioinformatics ; 37(22): 4255-4257, 2021 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-34009308

RESUMO

SUMMARY: High-level quantum mechanics (QM) methods are no doubt the most reliable approaches for the prediction of atomic charges, but it usually needs very large computational resources, which apparently hinders the use of high-quality atomic charges in large-scale molecular modeling, such as high-throughput virtual screening. To solve this problem, several algorithms based on machine-learning (ML) have been developed to fit high-level QM atomic charges. Here, we proposed DeepChargePredictor, a web server that is able to generate the high-level QM atomic charges for small molecules based on two state-of-the-art ML algorithms developed in our group, namely AtomPathDescriptor and DeepAtomicCharge. These two algorithms were seamlessly integrated into the platform with the capability to predict three kinds of charges (i.e. RESP, AM1-BCC and DDEC) widely used in structure-based drug design. Moreover, we have comprehensively evaluated the performance of these charges generated by DeepChargePredictor for large-scale drug design applications, such as end-point binding free energy calculations and virtual screening, which all show reliable or even better performance compared with the baseline methods. AVAILABILITY AND IMPLEMENTATION: The data in the article can be obtained on the web page http://cadd.zju.edu.cn/deepchargepredictor/publication. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Computadores , Modelos Moleculares , Física , Aprendizado de Máquina
14.
Acta Pharmacol Sin ; 43(6): 1605-1615, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34667293

RESUMO

Decaprenylphosphoryl-ß-D-ribose oxidase (DprE1) plays important roles in the biosynthesis of mycobacterium cell wall. DprE1 inhibitors have shown great potentials in the development of new regimens for tuberculosis (TB) treatment. In this study, an integrated molecular modeling strategy, which combined computational bioactivity fingerprints and structure-based virtual screening, was employed to identify potential DprE1 inhibitors. Two lead compounds (B2 and H3) that could inhibit DprE1 and thus kill Mycobacterium smegmatis in vitro were identified. Moreover, compound H3 showed potent inhibitory activity against Mycobacterium tuberculosis in vitro (MICMtb = 1.25 µM) and low cytotoxicity against mouse embryo fibroblast NIH-3T3 cells. Our research provided an effective strategy to discover novel anti-TB lead compounds.


Assuntos
Antituberculosos , Mycobacterium tuberculosis , Animais , Antituberculosos/farmacologia , Antituberculosos/uso terapêutico , Proteínas de Bactérias , Camundongos , Modelos Moleculares
15.
J Chem Inf Model ; 61(6): 2844-2856, 2021 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-34014672

RESUMO

The molecular mechanics/generalized Born surface area (MM/GBSA) has been widely used in end-point binding free energy prediction in structure-based drug design (SBDD). However, in practice, it is usually being treated as a disputed method mostly because of its system dependence. Here, combining with machine-learning optimization, we developed a novel version of MM/GBSA, named variable atomic dielectric MM/GBSA (VAD-MM/GBSA), by assigning variable dielectric constants directly to the protein/ligand atoms. The new strategy exhibits markedly improved accuracy in binding affinity calculations for various protein-ligand systems and is promising to be used in the postprocessing of structure-based virtual screening. Moreover, VAD-MM/GBSA outperformed prime MM/GBSA in Schrödinger software and showed remarkable predictive performance for specific protein targets, such as POL polyprotein, human immunodeficiency virus type 1 (HIV-1) protease, etc. Our study showed that the VAD-MM/GBSA method with little extra computational overhead provides a potential replacement of the MM/GBSA in AMBER software. An online web server of VAD-MMGBSA has been developed and is now available at http://cadd.zju.edu.cn/vdgb.


Assuntos
Simulação de Dinâmica Molecular , Proteínas , Entropia , Humanos , Ligantes , Ligação Proteica , Proteínas/metabolismo , Termodinâmica
16.
Chem Sci ; 14(6): 1557-1568, 2023 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-36794194

RESUMO

Generation of representative conformations for small molecules is a fundamental task in cheminformatics and computer-aided drug discovery, but capturing the complex distribution of conformations that contains multiple low energy minima is still a great challenge. Deep generative modeling, aiming to learn complex data distributions, is a promising approach to tackle the conformation generation problem. Here, inspired by stochastic dynamics and recent advances in generative modeling, we developed SDEGen, a novel conformation generation model based on stochastic differential equations. Compared with existing conformation generation methods, it enjoys the following advantages: (1) high model capacity to capture multimodal conformation distribution, thereby searching for multiple low-energy conformations of a molecule quickly, (2) higher conformation generation efficiency, almost ten times faster than the state-of-the-art score-based model, ConfGF, and (3) a clear physical interpretation to learn how a molecule evolves in a stochastic dynamics system starting from noise and eventually relaxing to the conformation that falls in low energy minima. Extensive experiments demonstrate that SDEGen has surpassed existing methods in different tasks for conformation generation, interatomic distance distribution prediction, and thermodynamic property estimation, showing great potential for real-world applications.

17.
J Cheminform ; 15(1): 63, 2023 Jul 04.
Artigo em Inglês | MEDLINE | ID: mdl-37403155

RESUMO

Machine learning-based scoring functions (MLSFs) have shown potential for improving virtual screening capabilities over classical scoring functions (SFs). Due to the high computational cost in the process of feature generation, the numbers of descriptors used in MLSFs and the characterization of protein-ligand interactions are always limited, which may affect the overall accuracy and efficiency. Here, we propose a new SF called TB-IECS (theory-based interaction energy component score), which combines energy terms from Smina and NNScore version 2, and utilizes the eXtreme Gradient Boosting (XGBoost) algorithm for model training. In this study, the energy terms decomposed from 15 traditional SFs were firstly categorized based on their formulas and physicochemical principles, and 324 feature combinations were generated accordingly. Five best feature combinations were selected for further evaluation of the model performance in regard to the selection of feature vectors with various length, interaction types and ML algorithms. The virtual screening power of TB-IECS was assessed on the datasets of DUD-E and LIT-PCBA, as well as seven target-specific datasets from the ChemDiv database. The results showed that TB-IECS outperformed classical SFs including Glide SP and Dock, and effectively balanced the efficiency and accuracy for practical virtual screening.

18.
Nat Commun ; 14(1): 2585, 2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37142585

RESUMO

Graph neural networks (GNNs) have been widely used in molecular property prediction, but explaining their black-box predictions is still a challenge. Most existing explanation methods for GNNs in chemistry focus on attributing model predictions to individual nodes, edges or fragments that are not necessarily derived from a chemically meaningful segmentation of molecules. To address this challenge, we propose a method named substructure mask explanation (SME). SME is based on well-established molecular segmentation methods and provides an interpretation that aligns with the understanding of chemists. We apply SME to elucidate how GNNs learn to predict aqueous solubility, genotoxicity, cardiotoxicity and blood-brain barrier permeation for small molecules. SME provides interpretation that is consistent with the understanding of chemists, alerts them to unreliable performance, and guides them in structural optimization for target properties. Hence, we believe that SME empowers chemists to confidently mine structure-activity relationship (SAR) from reliable GNNs through a transparent inspection on how GNNs pick up useful signals when learning from data.


Assuntos
Barreira Hematoencefálica , Cardiotoxicidade , Humanos , Dano ao DNA , Redes Neurais de Computação , Registros
19.
Chem Sci ; 14(8): 2054-2069, 2023 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-36845922

RESUMO

Metalloproteins play indispensable roles in various biological processes ranging from reaction catalysis to free radical scavenging, and they are also pertinent to numerous pathologies including cancer, HIV infection, neurodegeneration, and inflammation. Discovery of high-affinity ligands for metalloproteins powers the treatment of these pathologies. Extensive efforts have been made to develop in silico approaches, such as molecular docking and machine learning (ML)-based models, for fast identification of ligands binding to heterogeneous proteins, but few of them have exclusively concentrated on metalloproteins. In this study, we first compiled the largest metalloprotein-ligand complex dataset containing 3079 high-quality structures, and systematically evaluated the scoring and docking powers of three competitive docking tools (i.e., PLANTS, AutoDock Vina and Glide SP) for metalloproteins. Then, a structure-based deep graph model called MetalProGNet was developed to predict metalloprotein-ligand interactions. In the model, the coordination interactions between metal ions and protein atoms and the interactions between metal ions and ligand atoms were explicitly modelled through graph convolution. The binding features were then predicted by the informative molecular binding vector learned from a noncovalent atom-atom interaction network. The evaluation on the internal metalloprotein test set, the independent ChEMBL dataset towards 22 different metalloproteins and the virtual screening dataset indicated that MetalProGNet outperformed various baselines. Finally, a noncovalent atom-atom interaction masking technique was employed to interpret MetalProGNet, and the learned knowledge accords with our understanding of physics.

20.
J Chem Theory Comput ; 19(16): 5633-5647, 2023 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-37480347

RESUMO

Nucleic acid (NA)-ligand interactions are of paramount importance in a variety of biological processes, including cellular reproduction and protein biosynthesis, and therefore, NAs have been broadly recognized as potential drug targets. Understanding NA-ligand interactions at the atomic scale is essential for investigating the molecular mechanism and further assisting in NA-targeted drug discovery. Molecular docking is one of the predominant computational approaches for predicting the interactions between NAs and small molecules. Despite the availability of versatile docking programs, their performance profiles for NA-ligand complexes have not been thoroughly characterized. In this study, we first compiled the largest structure-based NA-ligand binding data set to date, containing 800 noncovalent NA-ligand complexes with clearly identified ligands. Based on this extensive data set, eight frequently used docking programs, including six protein-ligand docking programs (LeDock, Surflex-Dock, UCSF Dock6, AutoDock, AutoDock Vina, and PLANTS) and two specific NA-ligand docking programs (rDock and RLDOCK), were systematically evaluated in terms of binding pose and binding affinity predictions. The results demonstrated that some protein-ligand docking programs, specifically PLANTS and LeDock, produced more promising or comparable results compared with the specialized NA-ligand docking programs. Among the programs evaluated, PLANTS, rDock, and LeDock showed the highest performance in binding pose prediction, and their top-1 and best root-mean-square deviation (rmsd) success rates were as follows: PLANTS (35.93 and 76.05%), rDock (27.25 and 72.16%), and LeDock (27.40 and 64.37%). Compared with the moderate level of binding pose prediction, few programs were successful in binding affinity prediction, and the best correlation (Rp = -0.461) was observed with PLANTS. Finally, further comparison with the latest NA-ligand docking program (NLDock) on four well-established data sets revealed that PLANTS and LeDock outperformed NLDock in terms of binding pose prediction on all data sets, demonstrating their significant potential for NA-ligand docking. To the best of our knowledge, this study is the most comprehensive evaluation of popular molecular docking programs for NA-ligand systems.


Assuntos
Descoberta de Drogas , Ácidos Nucleicos , Ligantes , Simulação de Acoplamento Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA