Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
1.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38960407

RESUMO

The optimization of therapeutic antibodies through traditional techniques, such as candidate screening via hybridoma or phage display, is resource-intensive and time-consuming. In recent years, computational and artificial intelligence-based methods have been actively developed to accelerate and improve the development of therapeutic antibodies. In this study, we developed an end-to-end sequence-based deep learning model, termed AttABseq, for the predictions of the antigen-antibody binding affinity changes connected with antibody mutations. AttABseq is a highly efficient and generic attention-based model by utilizing diverse antigen-antibody complex sequences as the input to predict the binding affinity changes of residue mutations. The assessment on the three benchmark datasets illustrates that AttABseq is 120% more accurate than other sequence-based models in terms of the Pearson correlation coefficient between the predicted and experimental binding affinity changes. Moreover, AttABseq also either outperforms or competes favorably with the structure-based approaches. Furthermore, AttABseq consistently demonstrates robust predictive capabilities across a diverse array of conditions, underscoring its remarkable capacity for generalization across a wide spectrum of antigen-antibody complexes. It imposes no constraints on the quantity of altered residues, rendering it particularly applicable in scenarios where crystallographic structures remain unavailable. The attention-based interpretability analysis indicates that the causal effects of point mutations on antibody-antigen binding affinity changes can be visualized at the residue level, which might assist automated antibody sequence optimization. We believe that AttABseq provides a fiercely competitive answer to therapeutic antibody optimization.


Assuntos
Complexo Antígeno-Anticorpo , Aprendizado Profundo , Complexo Antígeno-Anticorpo/química , Antígenos/química , Antígenos/genética , Antígenos/metabolismo , Antígenos/imunologia , Afinidade de Anticorpos , Sequência de Aminoácidos , Biologia Computacional/métodos , Humanos , Mutação , Anticorpos/química , Anticorpos/imunologia , Anticorpos/genética , Anticorpos/metabolismo
2.
J Chem Inf Model ; 64(13): 5016-5027, 2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-38920330

RESUMO

The intricate interaction between major histocompatibility complexes (MHCs) and antigen peptides with diverse amino acid sequences plays a pivotal role in immune responses and T cell activity. In recent years, deep learning (DL)-based models have emerged as promising tools for accelerating antigen peptide screening. However, most of these models solely rely on one-dimensional amino acid sequences, overlooking crucial information required for the three-dimensional (3-D) space binding process. In this study, we propose TransfIGN, a structure-based DL model that is inspired by our previously developed framework, Interaction Graph Network (IGN), and incorporates sequence information from transformers to predict the interactions between HLA-A*02:01 and antigen peptides. Our model, trained on a comprehensive data set containing 61,816 sequences with 9051 binding affinity labels and 56,848 eluted ligand labels, achieves an area under the curve (AUC) of 0.893 on the binary data set, better than state-of-the-art sequence-based models trained on larger data sets such as NetMHCpan4.1, ANN, and TransPHLA. Furthermore, when evaluated on the IEDB weekly benchmark data sets, our predictions (AUC = 0.816) are better than those of the recommended methods like the IEDB consensus (AUC = 0.795). Notably, the interaction weight matrices generated by our method highlight the strong interactions at specific positions within peptides, emphasizing the model's ability to provide physical interpretability. This capability to unveil binding mechanisms through intricate structural features holds promise for new immunotherapeutic avenues.


Assuntos
Aprendizado Profundo , Antígeno HLA-A2 , Peptídeos , Antígeno HLA-A2/química , Antígeno HLA-A2/metabolismo , Peptídeos/química , Peptídeos/metabolismo , Humanos , Ligação Proteica , Modelos Moleculares , Sequência de Aminoácidos , Conformação Proteica
3.
J Chem Inf Model ; 2024 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-38920405

RESUMO

Artificial intelligence (AI)-aided drug design has demonstrated unprecedented effects on modern drug discovery, but there is still an urgent need for user-friendly interfaces that bridge the gap between these sophisticated tools and scientists, particularly those who are less computer savvy. Herein, we present DrugFlow, an AI-driven one-stop platform that offers a clean, convenient, and cloud-based interface to streamline early drug discovery workflows. By seamlessly integrating a range of innovative AI algorithms, covering molecular docking, quantitative structure-activity relationship modeling, molecular generation, ADMET (absorption, distribution, metabolism, excretion and toxicity) prediction, and virtual screening, DrugFlow can offer effective AI solutions for almost all crucial stages in early drug discovery, including hit identification and hit/lead optimization. We hope that the platform can provide sufficiently valuable guidance to aid real-word drug design and discovery. The platform is available at https://drugflow.com.

4.
J Chem Inf Model ; 64(8): 3222-3236, 2024 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-38498003

RESUMO

Liver microsomal stability, a crucial aspect of metabolic stability, significantly impacts practical drug discovery. However, current models for predicting liver microsomal stability are based on limited molecular information from a single species. To address this limitation, we constructed the largest public database of compounds from three common species: human, rat, and mouse. Subsequently, we developed a series of classification models using both traditional descriptor-based and classic graph-based machine learning (ML) algorithms. Remarkably, the best-performing models for the three species achieved Matthews correlation coefficients (MCCs) of 0.616, 0.603, and 0.574, respectively, on the test set. Furthermore, through the construction of consensus models based on these individual models, we have demonstrated their superior predictive performance in comparison with the existing models of the same type. To explore the similarities and differences in the properties of liver microsomal stability among multispecies molecules, we conducted preliminary interpretative explorations using the Shapley additive explanations (SHAP) and atom heatmap approaches for the models and misclassified molecules. Additionally, we further investigated representative structural modifications and substructures that decrease the liver microsomal stability in different species using the matched molecule pair analysis (MMPA) method and substructure extraction techniques. The established prediction models, along with insightful interpretation information regarding liver microsomal stability, will significantly contribute to enhancing the efficiency of exploring practical drugs for development.


Assuntos
Inteligência Artificial , Microssomos Hepáticos , Microssomos Hepáticos/metabolismo , Animais , Camundongos , Ratos , Humanos , Aprendizado de Máquina , Descoberta de Drogas/métodos , Preparações Farmacêuticas/metabolismo , Preparações Farmacêuticas/química
5.
Phys Chem Chem Phys ; 26(13): 10323-10335, 2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38501198

RESUMO

Ribonucleic acid (RNA)-ligand interactions play a pivotal role in a wide spectrum of biological processes, ranging from protein biosynthesis to cellular reproduction. This recognition has prompted the broader acceptance of RNA as a viable candidate for drug targets. Delving into the atomic-scale understanding of RNA-ligand interactions holds paramount importance in unraveling intricate molecular mechanisms and further contributing to RNA-based drug discovery. Computational approaches, particularly molecular docking, offer an efficient way of predicting the interactions between RNA and small molecules. However, the accuracy and reliability of these predictions heavily depend on the performance of scoring functions (SFs). In contrast to the majority of SFs used in RNA-ligand docking, the end-point binding free energy calculation methods, such as molecular mechanics/generalized Born surface area (MM/GBSA) and molecular mechanics/Poisson Boltzmann surface area (MM/PBSA), stand as theoretically more rigorous approaches. Yet, the evaluation of their effectiveness in predicting both binding affinities and binding poses within RNA-ligand systems remains unexplored. This study first reported the performance of MM/PBSA and MM/GBSA with diverse solvation models, interior dielectric constants (εin) and force fields in the context of binding affinity prediction for 29 RNA-ligand complexes. MM/GBSA is based on short (5 ns) molecular dynamics (MD) simulations in an explicit solvent with the YIL force field; the GBGBn2 model with higher interior dielectric constant (εin = 12, 16 or 20) yields the best correlation (Rp = -0.513), which outperforms the best correlation (Rp = -0.317, rDock) offered by various docking programs. Then, the efficacy of MM/GBSA in identifying the near-native binding poses from the decoys was assessed based on 56 RNA-ligand complexes. However, it is evident that MM/GBSA has limitations in accurately predicting binding poses for RNA-ligand systems, particularly compared with notably proficient docking programs like rDock and PLANTS. The best top-1 success rate achieved by MM/GBSA rescoring is 39.3%, which falls below the best results given by docking programs (50%, PLNATS). This study represents the first evaluation of MM/PBSA and MM/GBSA for RNA-ligand systems and is expected to provide valuable insights into their successful application to RNA targets.


Assuntos
Simulação de Dinâmica Molecular , RNA , Simulação de Acoplamento Molecular , Ligantes , Reprodutibilidade dos Testes , Ligação Proteica , Termodinâmica , Sítios de Ligação
6.
J Chem Inf Model ; 64(6): 2112-2124, 2024 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-38483249

RESUMO

Cyclic peptides have emerged as a highly promising class of therapeutic molecules owing to their favorable pharmacokinetic properties, including stability and permeability. Currently, many clinically approved cyclic peptides are derived from natural products or their derivatives, and the development of molecular docking techniques for cyclic peptide discovery holds great promise for expanding the applications and potential of this class of molecules. Given the availability of numerous docking programs, there is a pressing need for a systematic evaluation of their performance, specifically on protein-cyclic peptide systems. In this study, we constructed an extensive benchmark data set called CPSet, consisting of 493 protein-cyclic peptide complexes. Based on this data set, we conducted a comprehensive evaluation of 10 docking programs, including Rosetta, AutoDock CrankPep, and eight protein-small molecule docking programs (i.e., AutoDock, AudoDock Vina, Glide, GOLD, LeDock, rDock, MOE, and Surflex). The evaluation encompassed the assessment of the sampling power, docking power, and scoring power of these programs. The results revealed that all of the tested protein-small molecule docking programs successfully sampled the binding conformations when using the crystal conformations as the initial structures. Among them, rDock exhibited outstanding performance, achieving a remarkable 94.3% top-100 sampling success rate. However, few programs achieved successful predictions of the binding conformations using tLEaP-generated conformations as the initial structures. Within this scheme, AutoDock CrankPep yielded the highest top-100 sampling success rate of 29.6%. Rosetta's scoring function outperformed the others in selecting optimal conformations, resulting in an impressive top-1 docking success rate of 87.6%. Nevertheless, all the tested scoring functions displayed limited performance in predicting binding affinity, with MOE@Affinity dG exhibiting the highest Pearson's correlation coefficient of 0.378. It is therefore suggested to use an appropriate combination of different docking programs for given tasks in real applications. We expect that this work will offer valuable insights into selecting the appropriate docking programs for protein-cyclic peptide complexes.


Assuntos
Peptídeos Cíclicos , Proteínas , Peptídeos Cíclicos/metabolismo , Simulação de Acoplamento Molecular , Ligação Proteica , Proteínas/química , Conformação Molecular , Ligantes
7.
Chem Sci ; 14(43): 12166-12181, 2023 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-37969589

RESUMO

Contemporary structure-based molecular generative methods have demonstrated their potential to model the geometric and energetic complementarity between ligands and receptors, thereby facilitating the design of molecules with favorable binding affinity and target specificity. Despite the introduction of deep generative models for molecular generation, the atom-wise generation paradigm that partially contradicts chemical intuition limits the validity and synthetic accessibility of the generated molecules. Additionally, the dependence of deep learning models on large-scale structural data has hindered their adaptability across different targets. To overcome these challenges, we present a novel search-based framework, 3D-MCTS, for structure-based de novo drug design. Distinct from prevailing atom-centric methods, 3D-MCTS employs a fragment-based molecular editing strategy. The fragments decomposed from small-molecule drugs are recombined under predefined retrosynthetic rules, offering improved drug-likeness and synthesizability, overcoming the inherent limitations of atom-based approaches. Leveraging multi-threaded parallel simulations combined with a real-time energy constraint-based pruning strategy, 3D-MCTS achieves remarkable efficiency. At a fixed computational cost, it outperforms other state-of-the-art (SOTA) methods by producing molecules with enhanced binding affinity. Furthermore, its fragment-based approach ensures the generation of more dependable binding conformations, exhibiting a success rate 43.6% higher than that of other SOTAs. This advantage becomes even more pronounced when handling targets that significantly deviate from the training dataset. 3D-MCTS is capable of achieving thirty times more hits with high binding affinity than traditional virtual screening methods, which demonstrates the superior ability of 3D-MCTS to explore chemical space. Moreover, the flexibility of our framework makes it easy to incorporate domain knowledge during the process, thereby enabling the generation of molecules with desirable pharmacophores and enhanced binding affinity. The adaptability of 3D-MCTS is further showcased in metalloprotein applications, highlighting its potential across various drug design scenarios.

8.
Research (Wash D C) ; 6: 0231, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37849643

RESUMO

Effective synthesis planning powered by deep learning (DL) can significantly accelerate the discovery of new drugs and materials. However, most DL-assisted synthesis planning methods offer either none or very limited capability to recommend suitable reaction conditions (RCs) for their reaction predictions. Currently, the prediction of RCs with a DL framework is hindered by several factors, including: (a) lack of a standardized dataset for benchmarking, (b) lack of a general prediction model with powerful representation, and (c) lack of interpretability. To address these issues, we first created 2 standardized RC datasets covering a broad range of reaction classes and then proposed a powerful and interpretable Transformer-based RC predictor named Parrot. Through careful design of the model architecture, pretraining method, and training strategy, Parrot improved the overall top-3 prediction accuracy on catalysis, solvents, and other reagents by as much as 13.44%, compared to the best previous model on a newly curated dataset. Additionally, the mean absolute error of the predicted temperatures was reduced by about 4 °C. Furthermore, Parrot manifests strong generalization capacity with superior cross-chemical-space prediction accuracy. Attention analysis indicates that Parrot effectively captures crucial chemical information and exhibits a high level of interpretability in the prediction of RCs. The proposed model Parrot exemplifies how modern neural network architecture when appropriately pretrained can be versatile in making reliable, generalizable, and interpretable recommendation for RCs even when the underlying training dataset may still be limited in diversity.

9.
J Chem Theory Comput ; 19(16): 5633-5647, 2023 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-37480347

RESUMO

Nucleic acid (NA)-ligand interactions are of paramount importance in a variety of biological processes, including cellular reproduction and protein biosynthesis, and therefore, NAs have been broadly recognized as potential drug targets. Understanding NA-ligand interactions at the atomic scale is essential for investigating the molecular mechanism and further assisting in NA-targeted drug discovery. Molecular docking is one of the predominant computational approaches for predicting the interactions between NAs and small molecules. Despite the availability of versatile docking programs, their performance profiles for NA-ligand complexes have not been thoroughly characterized. In this study, we first compiled the largest structure-based NA-ligand binding data set to date, containing 800 noncovalent NA-ligand complexes with clearly identified ligands. Based on this extensive data set, eight frequently used docking programs, including six protein-ligand docking programs (LeDock, Surflex-Dock, UCSF Dock6, AutoDock, AutoDock Vina, and PLANTS) and two specific NA-ligand docking programs (rDock and RLDOCK), were systematically evaluated in terms of binding pose and binding affinity predictions. The results demonstrated that some protein-ligand docking programs, specifically PLANTS and LeDock, produced more promising or comparable results compared with the specialized NA-ligand docking programs. Among the programs evaluated, PLANTS, rDock, and LeDock showed the highest performance in binding pose prediction, and their top-1 and best root-mean-square deviation (rmsd) success rates were as follows: PLANTS (35.93 and 76.05%), rDock (27.25 and 72.16%), and LeDock (27.40 and 64.37%). Compared with the moderate level of binding pose prediction, few programs were successful in binding affinity prediction, and the best correlation (Rp = -0.461) was observed with PLANTS. Finally, further comparison with the latest NA-ligand docking program (NLDock) on four well-established data sets revealed that PLANTS and LeDock outperformed NLDock in terms of binding pose prediction on all data sets, demonstrating their significant potential for NA-ligand docking. To the best of our knowledge, this study is the most comprehensive evaluation of popular molecular docking programs for NA-ligand systems.


Assuntos
Descoberta de Drogas , Ácidos Nucleicos , Ligantes , Simulação de Acoplamento Molecular
10.
J Cheminform ; 15(1): 63, 2023 Jul 04.
Artigo em Inglês | MEDLINE | ID: mdl-37403155

RESUMO

Machine learning-based scoring functions (MLSFs) have shown potential for improving virtual screening capabilities over classical scoring functions (SFs). Due to the high computational cost in the process of feature generation, the numbers of descriptors used in MLSFs and the characterization of protein-ligand interactions are always limited, which may affect the overall accuracy and efficiency. Here, we propose a new SF called TB-IECS (theory-based interaction energy component score), which combines energy terms from Smina and NNScore version 2, and utilizes the eXtreme Gradient Boosting (XGBoost) algorithm for model training. In this study, the energy terms decomposed from 15 traditional SFs were firstly categorized based on their formulas and physicochemical principles, and 324 feature combinations were generated accordingly. Five best feature combinations were selected for further evaluation of the model performance in regard to the selection of feature vectors with various length, interaction types and ML algorithms. The virtual screening power of TB-IECS was assessed on the datasets of DUD-E and LIT-PCBA, as well as seven target-specific datasets from the ChemDiv database. The results showed that TB-IECS outperformed classical SFs including Glide SP and Dock, and effectively balanced the efficiency and accuracy for practical virtual screening.

11.
Nat Commun ; 14(1): 2585, 2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37142585

RESUMO

Graph neural networks (GNNs) have been widely used in molecular property prediction, but explaining their black-box predictions is still a challenge. Most existing explanation methods for GNNs in chemistry focus on attributing model predictions to individual nodes, edges or fragments that are not necessarily derived from a chemically meaningful segmentation of molecules. To address this challenge, we propose a method named substructure mask explanation (SME). SME is based on well-established molecular segmentation methods and provides an interpretation that aligns with the understanding of chemists. We apply SME to elucidate how GNNs learn to predict aqueous solubility, genotoxicity, cardiotoxicity and blood-brain barrier permeation for small molecules. SME provides interpretation that is consistent with the understanding of chemists, alerts them to unreliable performance, and guides them in structural optimization for target properties. Hence, we believe that SME empowers chemists to confidently mine structure-activity relationship (SAR) from reliable GNNs through a transparent inspection on how GNNs pick up useful signals when learning from data.


Assuntos
Barreira Hematoencefálica , Cardiotoxicidade , Humanos , Dano ao DNA , Redes Neurais de Computação , Registros
12.
Sensors (Basel) ; 23(7)2023 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-37050447

RESUMO

The Dadu River travels in the mountainous areas of southwestern China, one of regions with the most hazards that has long suffered from frequent geohazards. The early identification of landslides in this region is urgently needed, especially after the recent Luding earthquake (MS 6.8). While conventional ground-based monitoring techniques are limited by the complex terrain conditions in these alpine valley regions, space interferometric synthetic aperture radar (InSAR) provides an incomparable advantage in obtaining surface deformation with high precision and over a wide area, which is very useful for long-term and slow geohazard monitoring. In this study, more than 500 Sentinel-1 SAR images with four frames acquired during 2017~2022 were collected to detect the hidden landslide regions from the Jinchuan to Ebian Section along the Dadu River, based on joint-scatterer InSAR (JS-InSAR) and small baseline subset (SBAS) techniques. The results showed that our method could be successfully applied for landslide monitoring in complex mountainous regions. Furthermore, 143 potential landslide regions spreading over an 800 km area along the Dadu River were extracted by integrating the deformation measurements and optical images. Our study can provide a reference for large-scale geological hazard surveys in mountainous areas, and the InSAR technique will be encouraged for the local government in future long-term monitoring applications in the Dadu River Basin.

13.
Chem Sci ; 14(8): 2054-2069, 2023 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-36845922

RESUMO

Metalloproteins play indispensable roles in various biological processes ranging from reaction catalysis to free radical scavenging, and they are also pertinent to numerous pathologies including cancer, HIV infection, neurodegeneration, and inflammation. Discovery of high-affinity ligands for metalloproteins powers the treatment of these pathologies. Extensive efforts have been made to develop in silico approaches, such as molecular docking and machine learning (ML)-based models, for fast identification of ligands binding to heterogeneous proteins, but few of them have exclusively concentrated on metalloproteins. In this study, we first compiled the largest metalloprotein-ligand complex dataset containing 3079 high-quality structures, and systematically evaluated the scoring and docking powers of three competitive docking tools (i.e., PLANTS, AutoDock Vina and Glide SP) for metalloproteins. Then, a structure-based deep graph model called MetalProGNet was developed to predict metalloprotein-ligand interactions. In the model, the coordination interactions between metal ions and protein atoms and the interactions between metal ions and ligand atoms were explicitly modelled through graph convolution. The binding features were then predicted by the informative molecular binding vector learned from a noncovalent atom-atom interaction network. The evaluation on the internal metalloprotein test set, the independent ChEMBL dataset towards 22 different metalloproteins and the virtual screening dataset indicated that MetalProGNet outperformed various baselines. Finally, a noncovalent atom-atom interaction masking technique was employed to interpret MetalProGNet, and the learned knowledge accords with our understanding of physics.

14.
Chem Sci ; 14(6): 1557-1568, 2023 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-36794194

RESUMO

Generation of representative conformations for small molecules is a fundamental task in cheminformatics and computer-aided drug discovery, but capturing the complex distribution of conformations that contains multiple low energy minima is still a great challenge. Deep generative modeling, aiming to learn complex data distributions, is a promising approach to tackle the conformation generation problem. Here, inspired by stochastic dynamics and recent advances in generative modeling, we developed SDEGen, a novel conformation generation model based on stochastic differential equations. Compared with existing conformation generation methods, it enjoys the following advantages: (1) high model capacity to capture multimodal conformation distribution, thereby searching for multiple low-energy conformations of a molecule quickly, (2) higher conformation generation efficiency, almost ten times faster than the state-of-the-art score-based model, ConfGF, and (3) a clear physical interpretation to learn how a molecule evolves in a stochastic dynamics system starting from noise and eventually relaxing to the conformation that falls in low energy minima. Extensive experiments demonstrate that SDEGen has surpassed existing methods in different tasks for conformation generation, interatomic distance distribution prediction, and thermodynamic property estimation, showing great potential for real-world applications.

15.
Nat Comput Sci ; 3(10): 849-859, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38177756

RESUMO

Highly effective de novo design is a grand challenge of computer-aided drug discovery. Practical structure-specific three-dimensional molecule generations have started to emerge in recent years, but most approaches treat the target structure as a conditional input to bias the molecule generation and do not fully learn the detailed atomic interactions that govern the molecular conformation and stability of the binding complexes. The omission of these fine details leads to many models having difficulty in outputting reasonable molecules for a variety of therapeutic targets. Here, to address this challenge, we formulate a model, called SurfGen, that designs molecules in a fashion closely resembling the figurative key-and-lock principle. SurfGen comprises two equivariant neural networks, Geodesic-GNN and Geoatom-GNN, which capture the topological interactions on the pocket surface and the spatial interaction between ligand atoms and surface nodes, respectively. SurfGen outperforms other methods in a number of benchmarks, and its high sensitivity on the pocket structures enables an effective generative-model-based solution to the thorny issue of mutation-induced drug resistance.


Assuntos
Descoberta de Drogas , Redes Neurais de Computação , Descoberta de Drogas/métodos , Conformação Molecular
16.
J Med Chem ; 65(18): 12482-12496, 2022 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-36065998

RESUMO

Many deep learning (DL)-based molecular generative models have been proposed to design novel molecules. These models may perform well on benchmarks, but they usually do not take real-world constraints into account, such as available training data set, synthetic accessibility, and scaffold diversity in drug discovery. In this study, a new algorithm, ChemistGA, was proposed by combining the traditional heuristic algorithm with DL, in which the crossover of the traditional genetic algorithm (GA) was redefined by DL in conjunction with GA, and an innovative backcrossing operation was implemented to generate desired molecules. Our results clearly show that ChemistGA not only retains the strength of the traditional GA but also greatly enhances the synthetic accessibility and success rate of the generated molecules with desired properties. Calculations on the two benchmarks illustrate that ChemistGA achieves impressive performance among the state-of-the-art baselines, and it opens a new avenue for the application of generative models to real-world drug discovery scenarios.


Assuntos
Algoritmos , Descoberta de Drogas , Desenho de Fármacos , Modelos Moleculares
17.
Research (Wash D C) ; 2022: 9873564, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35958111

RESUMO

Covalent ligands have attracted increasing attention due to their unique advantages, such as long residence time, high selectivity, and strong binding affinity. They also show promise for targets where previous efforts to identify noncovalent small molecule inhibitors have failed. However, our limited knowledge of covalent binding sites has hindered the discovery of novel ligands. Therefore, developing in silico methods to identify covalent binding sites is highly desirable. Here, we propose DeepCoSI, the first structure-based deep graph learning model to identify ligandable covalent sites in the protein. By integrating the characterization of the binding pocket and the interactions between each cysteine and the surrounding environment, DeepCoSI achieves state-of-the-art predictive performances. The validation on two external test sets which mimic the real application scenarios shows that DeepCoSI has strong ability to distinguish ligandable sites from the others. Finally, we profiled the entire set of protein structures in the RCSB Protein Data Bank (PDB) with DeepCoSI to evaluate the ligandability of each cysteine for covalent ligand design, and made the predicted data publicly available on website.

18.
J Med Chem ; 65(11): 7918-7932, 2022 06 09.
Artigo em Inglês | MEDLINE | ID: mdl-35642777

RESUMO

Development of accurate machine-learning-based scoring functions (MLSFs) for structure-based virtual screening against a given target requires a large unbiased dataset with structurally diverse actives and decoys. However, most datasets for the development of MLSFs were designed for traditional SFs and may suffer from hidden biases and data insufficiency. Hereby, we developed a new approach named Topology-based and Conformation-based decoys generation (TocoDecoy), which integrates two strategies to generate decoys by tweaking the actives for a specific target, to generate unbiased and expandable datasets for training and benchmarking MLSFs. For hidden bias evaluation, the performance of InteractionGraphNet (IGN) trained on the TocoDecoy, LIT-PCBA, and DUD-E-like datasets was assessed. The results illustrate that the IGN model trained on the TocoDecoy dataset is competitive with that trained on the LIT-PCBA dataset but remarkably outperforms that trained on the DUD-E dataset, suggesting that the decoys in TocoDecoy are unbiased for training and benchmarking MLSFs.


Assuntos
Benchmarking , Aprendizado de Máquina , Ligantes , Conformação Molecular
19.
Eur J Med Chem ; 237: 114382, 2022 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-35483323

RESUMO

Glucocorticoids (GCs) are the most commonly used anti-inflammatory drugs. However, their excellent therapeutic effects are often accompanied by undesirable side effects. To discover selective glucocorticoid receptor modulators (SGRMs) that preferentially induce transrepression with little or no transactivation activity, a structure-based virtual screening by combining molecular docking and InteractionGraphNet (IGN) rescoring was performed, and compound HP210 was identified. HP210 did not induce the transactivation functions of GR while still acted on the NF-κB mediated tethered transrepression function (IC50 = 2.32 µM), and suppressed the secretion of pro-inflammation cytokines IL-1ß and IL-6. Compared with dexamethasone, HP210 showed no cross activities with phylogenetically related mineralcorticoid receptor and progesterone receptor and no significant effect on osteoprotegerin, exhibiting a reduced side-effect profile. Then, guided by the molecular dynamics simulations and binding free energy calculations, compound HP210_b4 with over two-fold higher transrepression activity (IC50 = 0.99 µM) was discovered. This study reported a group of non-steroidal new-scaffold SGRMs, providing valuable clues for the development of novel anti-inflammatory drugs.


Assuntos
Glucocorticoides , Receptores de Glucocorticoides , Anti-Inflamatórios/farmacologia , Glucocorticoides/farmacologia , Simulação de Acoplamento Molecular , NF-kappa B/metabolismo , Receptores de Glucocorticoides/química
20.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35438145

RESUMO

Molecular property prediction models based on machine learning algorithms have become important tools to triage unpromising lead molecules in the early stages of drug discovery. Compared with the mainstream descriptor- and graph-based methods for molecular property predictions, SMILES-based methods can directly extract molecular features from SMILES without human expert knowledge, but they require more powerful algorithms for feature extraction and a larger amount of data for training, which makes SMILES-based methods less popular. Here, we show the great potential of pre-training in promoting the predictions of important pharmaceutical properties. By utilizing three pre-training tasks based on atom feature prediction, molecular feature prediction and contrastive learning, a new pre-training method K-BERT, which can extract chemical information from SMILES like chemists, was developed. The calculation results on 15 pharmaceutical datasets show that K-BERT outperforms well-established descriptor-based (XGBoost) and graph-based (Attentive FP and HRGCN+) models. In addition, we found that the contrastive learning pre-training task enables K-BERT to 'understand' SMILES not limited to canonical SMILES. Moreover, the general fingerprints K-BERT-FP generated by K-BERT exhibit comparative predictive power to MACCS on 15 pharmaceutical datasets and can also capture molecular size and chirality information that traditional binary fingerprints cannot capture. Our results illustrate the great potential of K-BERT in the practical applications of molecular property predictions in drug discovery.


Assuntos
Algoritmos , Aprendizado de Máquina , Humanos , Bases de Conhecimento , Preparações Farmacêuticas , Projetos de Pesquisa
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA