Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
1.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38960407

RESUMEN

The optimization of therapeutic antibodies through traditional techniques, such as candidate screening via hybridoma or phage display, is resource-intensive and time-consuming. In recent years, computational and artificial intelligence-based methods have been actively developed to accelerate and improve the development of therapeutic antibodies. In this study, we developed an end-to-end sequence-based deep learning model, termed AttABseq, for the predictions of the antigen-antibody binding affinity changes connected with antibody mutations. AttABseq is a highly efficient and generic attention-based model by utilizing diverse antigen-antibody complex sequences as the input to predict the binding affinity changes of residue mutations. The assessment on the three benchmark datasets illustrates that AttABseq is 120% more accurate than other sequence-based models in terms of the Pearson correlation coefficient between the predicted and experimental binding affinity changes. Moreover, AttABseq also either outperforms or competes favorably with the structure-based approaches. Furthermore, AttABseq consistently demonstrates robust predictive capabilities across a diverse array of conditions, underscoring its remarkable capacity for generalization across a wide spectrum of antigen-antibody complexes. It imposes no constraints on the quantity of altered residues, rendering it particularly applicable in scenarios where crystallographic structures remain unavailable. The attention-based interpretability analysis indicates that the causal effects of point mutations on antibody-antigen binding affinity changes can be visualized at the residue level, which might assist automated antibody sequence optimization. We believe that AttABseq provides a fiercely competitive answer to therapeutic antibody optimization.


Asunto(s)
Complejo Antígeno-Anticuerpo , Aprendizaje Profundo , Complejo Antígeno-Anticuerpo/química , Antígenos/química , Antígenos/genética , Antígenos/metabolismo , Antígenos/inmunología , Afinidad de Anticuerpos , Secuencia de Aminoácidos , Biología Computacional/métodos , Humanos , Mutación , Anticuerpos/química , Anticuerpos/inmunología , Anticuerpos/genética , Anticuerpos/metabolismo
2.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35062020

RESUMEN

Accurate prediction of atomic partial charges with high-level quantum mechanics (QM) methods suffers from high computational cost. Numerous feature-engineered machine learning (ML)-based predictors with favorable computability and reliability have been developed as alternatives. However, extensive expertise effort was needed for feature engineering of atom chemical environment, which may consequently introduce domain bias. In this study, SuperAtomicCharge, a data-driven deep graph learning framework, was proposed to predict three important types of partial charges (i.e. RESP, DDEC4 and DDEC78) derived from high-level QM calculations based on the structures of molecules. SuperAtomicCharge was designed to simultaneously exploit the 2D and 3D structural information of molecules, which was proved to be an effective way to improve the prediction accuracy of the model. Moreover, a simple transfer learning strategy and a multitask learning strategy based on self-supervised descriptors were also employed to further improve the prediction accuracy of the proposed model. Compared with the latest baselines, including one GNN-based predictor and two ML-based predictors, SuperAtomicCharge showed better performance on all the three external test sets and had better usability and portability. Furthermore, the QM partial charges of new molecules predicted by SuperAtomicCharge can be efficiently used in drug design applications such as structure-based virtual screening, where the predicted RESP and DDEC4 charges of new molecules showed more robust scoring and screening power than the commonly used partial charges. Finally, two tools including an online server (http://cadd.zju.edu.cn/deepchargepredictor) and the source code command lines (https://github.com/zjujdj/SuperAtomicCharge) were developed for the easy access of the SuperAtomicCharge services.


Asunto(s)
Aprendizaje Profundo , Diseño de Fármacos , Aprendizaje Automático , Reproducibilidad de los Resultados , Programas Informáticos
3.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35438145

RESUMEN

Molecular property prediction models based on machine learning algorithms have become important tools to triage unpromising lead molecules in the early stages of drug discovery. Compared with the mainstream descriptor- and graph-based methods for molecular property predictions, SMILES-based methods can directly extract molecular features from SMILES without human expert knowledge, but they require more powerful algorithms for feature extraction and a larger amount of data for training, which makes SMILES-based methods less popular. Here, we show the great potential of pre-training in promoting the predictions of important pharmaceutical properties. By utilizing three pre-training tasks based on atom feature prediction, molecular feature prediction and contrastive learning, a new pre-training method K-BERT, which can extract chemical information from SMILES like chemists, was developed. The calculation results on 15 pharmaceutical datasets show that K-BERT outperforms well-established descriptor-based (XGBoost) and graph-based (Attentive FP and HRGCN+) models. In addition, we found that the contrastive learning pre-training task enables K-BERT to 'understand' SMILES not limited to canonical SMILES. Moreover, the general fingerprints K-BERT-FP generated by K-BERT exhibit comparative predictive power to MACCS on 15 pharmaceutical datasets and can also capture molecular size and chirality information that traditional binary fingerprints cannot capture. Our results illustrate the great potential of K-BERT in the practical applications of molecular property predictions in drug discovery.


Asunto(s)
Algoritmos , Aprendizaje Automático , Humanos , Bases del Conocimiento , Preparaciones Farmacéuticas , Proyectos de Investigación
4.
J Chem Inf Model ; 64(13): 5016-5027, 2024 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-38920330

RESUMEN

The intricate interaction between major histocompatibility complexes (MHCs) and antigen peptides with diverse amino acid sequences plays a pivotal role in immune responses and T cell activity. In recent years, deep learning (DL)-based models have emerged as promising tools for accelerating antigen peptide screening. However, most of these models solely rely on one-dimensional amino acid sequences, overlooking crucial information required for the three-dimensional (3-D) space binding process. In this study, we propose TransfIGN, a structure-based DL model that is inspired by our previously developed framework, Interaction Graph Network (IGN), and incorporates sequence information from transformers to predict the interactions between HLA-A*02:01 and antigen peptides. Our model, trained on a comprehensive data set containing 61,816 sequences with 9051 binding affinity labels and 56,848 eluted ligand labels, achieves an area under the curve (AUC) of 0.893 on the binary data set, better than state-of-the-art sequence-based models trained on larger data sets such as NetMHCpan4.1, ANN, and TransPHLA. Furthermore, when evaluated on the IEDB weekly benchmark data sets, our predictions (AUC = 0.816) are better than those of the recommended methods like the IEDB consensus (AUC = 0.795). Notably, the interaction weight matrices generated by our method highlight the strong interactions at specific positions within peptides, emphasizing the model's ability to provide physical interpretability. This capability to unveil binding mechanisms through intricate structural features holds promise for new immunotherapeutic avenues.


Asunto(s)
Aprendizaje Profundo , Antígeno HLA-A2 , Péptidos , Antígeno HLA-A2/química , Antígeno HLA-A2/metabolismo , Péptidos/química , Péptidos/metabolismo , Humanos , Unión Proteica , Modelos Moleculares , Secuencia de Aminoácidos , Conformación Proteica
5.
J Chem Inf Model ; 64(6): 2112-2124, 2024 03 25.
Artículo en Inglés | MEDLINE | ID: mdl-38483249

RESUMEN

Cyclic peptides have emerged as a highly promising class of therapeutic molecules owing to their favorable pharmacokinetic properties, including stability and permeability. Currently, many clinically approved cyclic peptides are derived from natural products or their derivatives, and the development of molecular docking techniques for cyclic peptide discovery holds great promise for expanding the applications and potential of this class of molecules. Given the availability of numerous docking programs, there is a pressing need for a systematic evaluation of their performance, specifically on protein-cyclic peptide systems. In this study, we constructed an extensive benchmark data set called CPSet, consisting of 493 protein-cyclic peptide complexes. Based on this data set, we conducted a comprehensive evaluation of 10 docking programs, including Rosetta, AutoDock CrankPep, and eight protein-small molecule docking programs (i.e., AutoDock, AudoDock Vina, Glide, GOLD, LeDock, rDock, MOE, and Surflex). The evaluation encompassed the assessment of the sampling power, docking power, and scoring power of these programs. The results revealed that all of the tested protein-small molecule docking programs successfully sampled the binding conformations when using the crystal conformations as the initial structures. Among them, rDock exhibited outstanding performance, achieving a remarkable 94.3% top-100 sampling success rate. However, few programs achieved successful predictions of the binding conformations using tLEaP-generated conformations as the initial structures. Within this scheme, AutoDock CrankPep yielded the highest top-100 sampling success rate of 29.6%. Rosetta's scoring function outperformed the others in selecting optimal conformations, resulting in an impressive top-1 docking success rate of 87.6%. Nevertheless, all the tested scoring functions displayed limited performance in predicting binding affinity, with MOE@Affinity dG exhibiting the highest Pearson's correlation coefficient of 0.378. It is therefore suggested to use an appropriate combination of different docking programs for given tasks in real applications. We expect that this work will offer valuable insights into selecting the appropriate docking programs for protein-cyclic peptide complexes.


Asunto(s)
Péptidos Cíclicos , Proteínas , Péptidos Cíclicos/metabolismo , Simulación del Acoplamiento Molecular , Unión Proteica , Proteínas/química , Conformación Molecular , Ligandos
6.
J Chem Inf Model ; 64(8): 3222-3236, 2024 04 22.
Artículo en Inglés | MEDLINE | ID: mdl-38498003

RESUMEN

Liver microsomal stability, a crucial aspect of metabolic stability, significantly impacts practical drug discovery. However, current models for predicting liver microsomal stability are based on limited molecular information from a single species. To address this limitation, we constructed the largest public database of compounds from three common species: human, rat, and mouse. Subsequently, we developed a series of classification models using both traditional descriptor-based and classic graph-based machine learning (ML) algorithms. Remarkably, the best-performing models for the three species achieved Matthews correlation coefficients (MCCs) of 0.616, 0.603, and 0.574, respectively, on the test set. Furthermore, through the construction of consensus models based on these individual models, we have demonstrated their superior predictive performance in comparison with the existing models of the same type. To explore the similarities and differences in the properties of liver microsomal stability among multispecies molecules, we conducted preliminary interpretative explorations using the Shapley additive explanations (SHAP) and atom heatmap approaches for the models and misclassified molecules. Additionally, we further investigated representative structural modifications and substructures that decrease the liver microsomal stability in different species using the matched molecule pair analysis (MMPA) method and substructure extraction techniques. The established prediction models, along with insightful interpretation information regarding liver microsomal stability, will significantly contribute to enhancing the efficiency of exploring practical drugs for development.


Asunto(s)
Inteligencia Artificial , Microsomas Hepáticos , Microsomas Hepáticos/metabolismo , Animales , Ratones , Ratas , Humanos , Aprendizaje Automático , Descubrimiento de Drogas/métodos , Preparaciones Farmacéuticas/metabolismo , Preparaciones Farmacéuticas/química
7.
J Chem Inf Model ; 64(14): 5381-5391, 2024 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-38920405

RESUMEN

Artificial intelligence (AI)-aided drug design has demonstrated unprecedented effects on modern drug discovery, but there is still an urgent need for user-friendly interfaces that bridge the gap between these sophisticated tools and scientists, particularly those who are less computer savvy. Herein, we present DrugFlow, an AI-driven one-stop platform that offers a clean, convenient, and cloud-based interface to streamline early drug discovery workflows. By seamlessly integrating a range of innovative AI algorithms, covering molecular docking, quantitative structure-activity relationship modeling, molecular generation, ADMET (absorption, distribution, metabolism, excretion and toxicity) prediction, and virtual screening, DrugFlow can offer effective AI solutions for almost all crucial stages in early drug discovery, including hit identification and hit/lead optimization. We hope that the platform can provide sufficiently valuable guidance to aid real-word drug design and discovery. The platform is available at https://drugflow.com.


Asunto(s)
Inteligencia Artificial , Descubrimiento de Drogas , Descubrimiento de Drogas/métodos , Simulación del Acoplamiento Molecular , Relación Estructura-Actividad Cuantitativa , Algoritmos , Diseño de Fármacos , Programas Informáticos , Humanos , Nube Computacional
8.
Phys Chem Chem Phys ; 26(13): 10323-10335, 2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38501198

RESUMEN

Ribonucleic acid (RNA)-ligand interactions play a pivotal role in a wide spectrum of biological processes, ranging from protein biosynthesis to cellular reproduction. This recognition has prompted the broader acceptance of RNA as a viable candidate for drug targets. Delving into the atomic-scale understanding of RNA-ligand interactions holds paramount importance in unraveling intricate molecular mechanisms and further contributing to RNA-based drug discovery. Computational approaches, particularly molecular docking, offer an efficient way of predicting the interactions between RNA and small molecules. However, the accuracy and reliability of these predictions heavily depend on the performance of scoring functions (SFs). In contrast to the majority of SFs used in RNA-ligand docking, the end-point binding free energy calculation methods, such as molecular mechanics/generalized Born surface area (MM/GBSA) and molecular mechanics/Poisson Boltzmann surface area (MM/PBSA), stand as theoretically more rigorous approaches. Yet, the evaluation of their effectiveness in predicting both binding affinities and binding poses within RNA-ligand systems remains unexplored. This study first reported the performance of MM/PBSA and MM/GBSA with diverse solvation models, interior dielectric constants (εin) and force fields in the context of binding affinity prediction for 29 RNA-ligand complexes. MM/GBSA is based on short (5 ns) molecular dynamics (MD) simulations in an explicit solvent with the YIL force field; the GBGBn2 model with higher interior dielectric constant (εin = 12, 16 or 20) yields the best correlation (Rp = -0.513), which outperforms the best correlation (Rp = -0.317, rDock) offered by various docking programs. Then, the efficacy of MM/GBSA in identifying the near-native binding poses from the decoys was assessed based on 56 RNA-ligand complexes. However, it is evident that MM/GBSA has limitations in accurately predicting binding poses for RNA-ligand systems, particularly compared with notably proficient docking programs like rDock and PLANTS. The best top-1 success rate achieved by MM/GBSA rescoring is 39.3%, which falls below the best results given by docking programs (50%, PLNATS). This study represents the first evaluation of MM/PBSA and MM/GBSA for RNA-ligand systems and is expected to provide valuable insights into their successful application to RNA targets.


Asunto(s)
Simulación de Dinámica Molecular , ARN , Simulación del Acoplamiento Molecular , Ligandos , Reproducibilidad de los Resultados , Unión Proteica , Termodinámica , Sitios de Unión
9.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33866354

RESUMEN

Accurate predictions of druggability and bioactivities of compounds are desirable to reduce the high cost and time of drug discovery. After more than five decades of continuing developments, quantitative structure-activity relationship (QSAR) methods have been established as indispensable tools that facilitate fast, reliable and affordable assessments of physicochemical and biological properties of compounds in drug-discovery programs. Currently, there are mainly two types of QSAR methods, descriptor-based methods and graph-based methods. The former is developed based on predefined molecular descriptors, whereas the latter is developed based on simple atomic and bond information. In this study, we presented a simple but highly efficient modeling method by combining molecular graphs and molecular descriptors as the input of a modified graph neural network, called hyperbolic relational graph convolution network plus (HRGCN+). The evaluation results show that HRGCN+ achieves state-of-the-art performance on 11 drug-discovery-related datasets. We also explored the impact of the addition of traditional molecular descriptors on the predictions of graph-based methods, and found that the addition of molecular descriptors can indeed boost the predictive power of graph-based methods. The results also highlight the strong anti-noise capability of our method. In addition, our method provides a way to interpret models at both the atom and descriptor levels, which can help medicinal chemists extract hidden information from complex datasets. We also offer an HRGCN+'s online prediction service at https://quantum.tencent.com/hrgcn/.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Redes Neurales de la Computación , Compuestos Orgánicos/química , Relación Estructura-Actividad Cuantitativa , Inteligencia Artificial , Gráficos por Computador , Simulación por Computador , Diseño de Fármacos , Modelos Químicos , Estructura Molecular , Compuestos Orgánicos/farmacología
10.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33822874

RESUMEN

Tuberculosis (TB) is an infectious disease caused by Mycobacterium tuberculosis (Mtb) and it has been one of the top 10 causes of death globally. Drug-resistant tuberculosis (XDR-TB), extensively resistant to the commonly used first-line drugs, has emerged as a major challenge to TB treatment. Hence, it is quite necessary to discover novel drug candidates for TB treatment. In this study, based on different types of molecular representations, four machine learning (ML) algorithms, including support vector machine, random forest (RF), extreme gradient boosting (XGBoost) and deep neural networks (DNN), were used to develop classification models to distinguish Mtb inhibitors from noninhibitors. The results demonstrate that the XGBoost model exhibits the best prediction performance. Then, two consensus strategies were employed to integrate the predictions from multiple models. The evaluation results illustrate that the consensus model by stacking the RF, XGBoost and DNN predictions offers the best predictions with area under the receiver operating characteristic curve of 0.842 and 0.942 for the 10-fold cross-validated training set and external test set, respectively. Besides, the association between the important descriptors and the bioactivities of molecules was interpreted by using the Shapley additive explanations method. Finally, an online webserver called ChemTB (http://cadd.zju.edu.cn/chemtb/) was developed, and it offers a freely available computational tool to detect potential Mtb inhibitors.


Asunto(s)
Antituberculosos/análisis , Antituberculosos/farmacología , Descubrimiento de Drogas/métodos , Mycobacterium tuberculosis/efectos de los fármacos , Redes Neurales de la Computación , Máquina de Vectores de Soporte , Antituberculosos/uso terapéutico , Área Bajo la Curva , Exactitud de los Datos , Humanos , Modelos Biológicos , Curva ROC , Reproducibilidad de los Resultados , Tuberculosis Resistente a Múltiples Medicamentos/tratamiento farmacológico , Tuberculosis Resistente a Múltiples Medicamentos/microbiología
11.
Brief Bioinform ; 22(4)2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-33313673

RESUMEN

Although a wide variety of machine learning (ML) algorithms have been utilized to learn quantitative structure-activity relationships (QSARs), there is no agreed single best algorithm for QSAR learning. Therefore, a comprehensive understanding of the performance characteristics of popular ML algorithms used in QSAR learning is highly desirable. In this study, five linear algorithms [linear function Gaussian process regression (linear-GPR), linear function support vector machine (linear-SVM), partial least squares regression (PLSR), multiple linear regression (MLR) and principal component regression (PCR)], three analogizers [radial basis function support vector machine (rbf-SVM), K-nearest neighbor (KNN) and radial basis function Gaussian process regression (rbf-GPR)], six symbolists [extreme gradient boosting (XGBoost), Cubist, random forest (RF), multiple adaptive regression splines (MARS), gradient boosting machine (GBM), and classification and regression tree (CART)] and two connectionists [principal component analysis artificial neural network (pca-ANN) and deep neural network (DNN)] were employed to learn the regression-based QSAR models for 14 public data sets comprising nine physicochemical properties and five toxicity endpoints. The results show that rbf-SVM, rbf-GPR, XGBoost and DNN generally illustrate better performances than the other algorithms. The overall performances of different algorithms can be ranked from the best to the worst as follows: rbf-SVM > XGBoost > rbf-GPR > Cubist > GBM > DNN > RF > pca-ANN > MARS > linear-GPR ≈ KNN > linear-SVM ≈ PLSR > CART ≈ PCR ≈ MLR. In terms of prediction accuracy and computational efficiency, SVM and XGBoost are recommended to the regression learning for small data sets, and XGBoost is an excellent choice for large data sets. We then investigated the performances of the ensemble models by integrating the predictions of multiple ML algorithms. The results illustrate that the ensembles of two or three algorithms in different categories can indeed improve the predictions of the best individual ML algorithms.


Asunto(s)
Modelos Biológicos , Redes Neurales de la Computación , Máquina de Vectores de Soporte , Animales , Cyprinidae , Daphnia , Tetrahymena pyriformis
12.
Sensors (Basel) ; 23(7)2023 Mar 23.
Artículo en Inglés | MEDLINE | ID: mdl-37050447

RESUMEN

The Dadu River travels in the mountainous areas of southwestern China, one of regions with the most hazards that has long suffered from frequent geohazards. The early identification of landslides in this region is urgently needed, especially after the recent Luding earthquake (MS 6.8). While conventional ground-based monitoring techniques are limited by the complex terrain conditions in these alpine valley regions, space interferometric synthetic aperture radar (InSAR) provides an incomparable advantage in obtaining surface deformation with high precision and over a wide area, which is very useful for long-term and slow geohazard monitoring. In this study, more than 500 Sentinel-1 SAR images with four frames acquired during 2017~2022 were collected to detect the hidden landslide regions from the Jinchuan to Ebian Section along the Dadu River, based on joint-scatterer InSAR (JS-InSAR) and small baseline subset (SBAS) techniques. The results showed that our method could be successfully applied for landslide monitoring in complex mountainous regions. Furthermore, 143 potential landslide regions spreading over an 800 km area along the Dadu River were extracted by integrating the deformation measurements and optical images. Our study can provide a reference for large-scale geological hazard surveys in mountainous areas, and the InSAR technique will be encouraged for the local government in future long-term monitoring applications in the Dadu River Basin.

13.
Bioinformatics ; 37(22): 4255-4257, 2021 11 18.
Artículo en Inglés | MEDLINE | ID: mdl-34009308

RESUMEN

SUMMARY: High-level quantum mechanics (QM) methods are no doubt the most reliable approaches for the prediction of atomic charges, but it usually needs very large computational resources, which apparently hinders the use of high-quality atomic charges in large-scale molecular modeling, such as high-throughput virtual screening. To solve this problem, several algorithms based on machine-learning (ML) have been developed to fit high-level QM atomic charges. Here, we proposed DeepChargePredictor, a web server that is able to generate the high-level QM atomic charges for small molecules based on two state-of-the-art ML algorithms developed in our group, namely AtomPathDescriptor and DeepAtomicCharge. These two algorithms were seamlessly integrated into the platform with the capability to predict three kinds of charges (i.e. RESP, AM1-BCC and DDEC) widely used in structure-based drug design. Moreover, we have comprehensively evaluated the performance of these charges generated by DeepChargePredictor for large-scale drug design applications, such as end-point binding free energy calculations and virtual screening, which all show reliable or even better performance compared with the baseline methods. AVAILABILITY AND IMPLEMENTATION: The data in the article can be obtained on the web page http://cadd.zju.edu.cn/deepchargepredictor/publication. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Computadores , Modelos Moleculares , Física , Aprendizaje Automático
14.
Acta Pharmacol Sin ; 43(6): 1605-1615, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-34667293

RESUMEN

Decaprenylphosphoryl-ß-D-ribose oxidase (DprE1) plays important roles in the biosynthesis of mycobacterium cell wall. DprE1 inhibitors have shown great potentials in the development of new regimens for tuberculosis (TB) treatment. In this study, an integrated molecular modeling strategy, which combined computational bioactivity fingerprints and structure-based virtual screening, was employed to identify potential DprE1 inhibitors. Two lead compounds (B2 and H3) that could inhibit DprE1 and thus kill Mycobacterium smegmatis in vitro were identified. Moreover, compound H3 showed potent inhibitory activity against Mycobacterium tuberculosis in vitro (MICMtb = 1.25 µM) and low cytotoxicity against mouse embryo fibroblast NIH-3T3 cells. Our research provided an effective strategy to discover novel anti-TB lead compounds.


Asunto(s)
Antituberculosos , Mycobacterium tuberculosis , Animales , Antituberculosos/farmacología , Antituberculosos/uso terapéutico , Proteínas Bacterianas , Ratones , Modelos Moleculares
15.
J Chem Inf Model ; 61(6): 2844-2856, 2021 06 28.
Artículo en Inglés | MEDLINE | ID: mdl-34014672

RESUMEN

The molecular mechanics/generalized Born surface area (MM/GBSA) has been widely used in end-point binding free energy prediction in structure-based drug design (SBDD). However, in practice, it is usually being treated as a disputed method mostly because of its system dependence. Here, combining with machine-learning optimization, we developed a novel version of MM/GBSA, named variable atomic dielectric MM/GBSA (VAD-MM/GBSA), by assigning variable dielectric constants directly to the protein/ligand atoms. The new strategy exhibits markedly improved accuracy in binding affinity calculations for various protein-ligand systems and is promising to be used in the postprocessing of structure-based virtual screening. Moreover, VAD-MM/GBSA outperformed prime MM/GBSA in Schrödinger software and showed remarkable predictive performance for specific protein targets, such as POL polyprotein, human immunodeficiency virus type 1 (HIV-1) protease, etc. Our study showed that the VAD-MM/GBSA method with little extra computational overhead provides a potential replacement of the MM/GBSA in AMBER software. An online web server of VAD-MMGBSA has been developed and is now available at http://cadd.zju.edu.cn/vdgb.


Asunto(s)
Simulación de Dinámica Molecular , Proteínas , Entropía , Humanos , Ligandos , Unión Proteica , Proteínas/metabolismo , Termodinámica
16.
Chem Sci ; 2024 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-39170720

RESUMEN

The identification of targets for candidate molecules is a pivotal stride in the drug development journey, encompassing lead discovery, drug repurposing, and the scrutiny of potential off-target or side effects. Consequently, enhancing the precision of target prediction has significant implications. Moreover, current target prediction methods primarily rely on the principle of ligand-based chemical similarity, lacking the capture of novel compound-target relationships based on ligand high-level characterization similarity. Therefore, in this context, we introduce a pioneering algorithm known as the Fused Multiple Biological Signatures (FMBS) strategy. This approach leverages a Bayesian framework to amalgamate 25 predictable biological space characterizations of molecules to predict novel targets through scaffold hopping, thereby improving target prediction accuracy and providing a versatile tool for a wide range of small-molecule target prediction. When juxtaposed with alternative target prediction methods, FMBS showcases notable efficacy, outperforming traditional descriptors. Through an analysis of scaffold hopping cases, we elucidate how FMBS attains heightened accuracy by assimilating comprehensive and complementary high-dimensional signatures, thereby underscoring its potential in unearthing novel compound-target relationships. The findings underscore that our approach adeptly pinpoints promising candidate targets, thereby expediting drug mechanism exploration through the integration of multiple high-level characterizations.

17.
Nat Commun ; 15(1): 7348, 2024 Aug 27.
Artículo en Inglés | MEDLINE | ID: mdl-39187482

RESUMEN

Annotating active sites in enzymes is crucial for advancing multiple fields including drug discovery, disease research, enzyme engineering, and synthetic biology. Despite the development of numerous automated annotation algorithms, a significant trade-off between speed and accuracy limits their large-scale practical applications. We introduce EasIFA, an enzyme active site annotation algorithm that fuses latent enzyme representations from the Protein Language Model and 3D structural encoder, and then aligns protein-level information with the knowledge of enzymatic reactions using a multi-modal cross-attention framework. EasIFA outperforms BLASTp with a 10-fold speed increase and improved recall, precision, f1 score, and MCC by 7.57%, 13.08%, 9.68%, and 0.1012, respectively. It also surpasses empirical-rule-based algorithm and other state-of-the-art deep learning annotation method based on PSSM features, achieving a speed increase ranging from 650 to 1400 times while enhancing annotation quality. This makes EasIFA a suitable replacement for conventional tools in both industrial and academic settings. EasIFA can also effectively transfer knowledge gained from coarsely annotated enzyme databases to smaller, high-precision datasets, highlighting its ability to model sparse and high-quality databases. Additionally, EasIFA shows potential as a catalytic site monitoring tool for designing enzymes with desired functions beyond their natural distribution.


Asunto(s)
Algoritmos , Dominio Catalítico , Aprendizaje Profundo , Enzimas , Enzimas/metabolismo , Enzimas/química , Bases de Datos de Proteínas , Anotación de Secuencia Molecular/métodos , Biología Computacional/métodos
18.
J Cheminform ; 15(1): 63, 2023 Jul 04.
Artículo en Inglés | MEDLINE | ID: mdl-37403155

RESUMEN

Machine learning-based scoring functions (MLSFs) have shown potential for improving virtual screening capabilities over classical scoring functions (SFs). Due to the high computational cost in the process of feature generation, the numbers of descriptors used in MLSFs and the characterization of protein-ligand interactions are always limited, which may affect the overall accuracy and efficiency. Here, we propose a new SF called TB-IECS (theory-based interaction energy component score), which combines energy terms from Smina and NNScore version 2, and utilizes the eXtreme Gradient Boosting (XGBoost) algorithm for model training. In this study, the energy terms decomposed from 15 traditional SFs were firstly categorized based on their formulas and physicochemical principles, and 324 feature combinations were generated accordingly. Five best feature combinations were selected for further evaluation of the model performance in regard to the selection of feature vectors with various length, interaction types and ML algorithms. The virtual screening power of TB-IECS was assessed on the datasets of DUD-E and LIT-PCBA, as well as seven target-specific datasets from the ChemDiv database. The results showed that TB-IECS outperformed classical SFs including Glide SP and Dock, and effectively balanced the efficiency and accuracy for practical virtual screening.

19.
Nat Commun ; 14(1): 2585, 2023 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-37142585

RESUMEN

Graph neural networks (GNNs) have been widely used in molecular property prediction, but explaining their black-box predictions is still a challenge. Most existing explanation methods for GNNs in chemistry focus on attributing model predictions to individual nodes, edges or fragments that are not necessarily derived from a chemically meaningful segmentation of molecules. To address this challenge, we propose a method named substructure mask explanation (SME). SME is based on well-established molecular segmentation methods and provides an interpretation that aligns with the understanding of chemists. We apply SME to elucidate how GNNs learn to predict aqueous solubility, genotoxicity, cardiotoxicity and blood-brain barrier permeation for small molecules. SME provides interpretation that is consistent with the understanding of chemists, alerts them to unreliable performance, and guides them in structural optimization for target properties. Hence, we believe that SME empowers chemists to confidently mine structure-activity relationship (SAR) from reliable GNNs through a transparent inspection on how GNNs pick up useful signals when learning from data.


Asunto(s)
Barrera Hematoencefálica , Cardiotoxicidad , Humanos , Daño del ADN , Redes Neurales de la Computación , Registros
20.
Chem Sci ; 14(6): 1557-1568, 2023 Feb 08.
Artículo en Inglés | MEDLINE | ID: mdl-36794194

RESUMEN

Generation of representative conformations for small molecules is a fundamental task in cheminformatics and computer-aided drug discovery, but capturing the complex distribution of conformations that contains multiple low energy minima is still a great challenge. Deep generative modeling, aiming to learn complex data distributions, is a promising approach to tackle the conformation generation problem. Here, inspired by stochastic dynamics and recent advances in generative modeling, we developed SDEGen, a novel conformation generation model based on stochastic differential equations. Compared with existing conformation generation methods, it enjoys the following advantages: (1) high model capacity to capture multimodal conformation distribution, thereby searching for multiple low-energy conformations of a molecule quickly, (2) higher conformation generation efficiency, almost ten times faster than the state-of-the-art score-based model, ConfGF, and (3) a clear physical interpretation to learn how a molecule evolves in a stochastic dynamics system starting from noise and eventually relaxing to the conformation that falls in low energy minima. Extensive experiments demonstrate that SDEGen has surpassed existing methods in different tasks for conformation generation, interatomic distance distribution prediction, and thermodynamic property estimation, showing great potential for real-world applications.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA