Pesquisa | BVS Doenças Infecciosas e Parasitárias

1.

CMGN: a conditional molecular generation net to design target-specific molecules with desired properties.

Yang, Minjian; Sun, Hanyu; Liu, Xue; Xue, Xi; Deng, Yafeng; Wang, Xiaojian.

Brief Bioinform ; 24(4)2023 07 20.

Artigo em Inglês | MEDLINE | ID: mdl-37193672

RESUMO

The rational design of chemical entities with desired properties for a specific target is a long-standing challenge in drug design. Generative neural networks have emerged as a powerful approach to sample novel molecules with specific properties, termed as inverse drug design. However, generating molecules with biological activity against certain targets and predefined drug properties still remains challenging. Here, we propose a conditional molecular generation net (CMGN), the backbone of which is a bidirectional and autoregressive transformer. CMGN applies large-scale pretraining for molecular understanding and navigates the chemical space for specified targets by fine-tuning with corresponding datasets. Additionally, fragments and properties were trained to recover molecules to learn the structure-properties relationships. Our model crisscrosses the chemical space for specific targets and properties that control fragment-growth processes. Case studies demonstrated the advantages and utility of our model in fragment-to-lead processes and multi-objective lead optimization. The results presented in this paper illustrate that CMGN has the potential to accelerate the drug discovery process.

Assuntos

Desenho de Fármacos , Descoberta de Drogas , Aprendizagem , Redes Neurais de Computação , Receptores Proteína Tirosina Quinases

2.

Comprehensive assessment of nine target prediction web services: which should we choose for target fishing?

Ji, Kai-Yue; Liu, Chong; Liu, Zhao-Qian; Deng, Ya-Feng; Hou, Ting-Jun; Cao, Dong-Sheng.

Brief Bioinform ; 24(2)2023 03 19.

Artigo em Inglês | MEDLINE | ID: mdl-36681902

RESUMO

Identification of potential targets for known bioactive compounds and novel synthetic analogs is of considerable significance. In silico target fishing (TF) has become an alternative strategy because of the expensive and laborious wet-lab experiments, explosive growth of bioactivity data and rapid development of high-throughput technologies. However, these TF methods are based on different algorithms, molecular representations and training datasets, which may lead to different results when predicting the same query molecules. This can be confusing for practitioners in practical applications. Therefore, this study systematically evaluated nine popular ligand-based TF methods based on target and ligand-target pair statistical strategies, which will help practitioners make choices among multiple TF methods. The evaluation results showed that SwissTargetPrediction was the best method to produce the most reliable predictions while enriching more targets. High-recall similarity ensemble approach (SEA) was able to find real targets for more compounds compared with other TF methods. Therefore, SwissTargetPrediction and SEA can be considered as primary selection methods in future studies. In addition, the results showed that k = 5 was the optimal number of experimental candidate targets. Finally, a novel ensemble TF method based on consensus voting is proposed to improve the prediction performance. The precision of the ensemble TF method outperforms the individual TF method, indicating that the ensemble TF method can more effectively identify real targets within a given top-k threshold. The results of this study can be used as a reference to guide practitioners in selecting the most effective methods in computational drug discovery.

Assuntos

Algoritmos , Ligantes

3.

ML-PLIC: a web platform for characterizing protein-ligand interactions and developing machine learning-based scoring functions.

Zhang, Xujun; Shen, Chao; Wang, Tianyue; Deng, Yafeng; Kang, Yu; Li, Dan; Hou, Tingjun; Pan, Peichen.

Brief Bioinform ; 24(5)2023 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-37738401

RESUMO

Cracking the entangling code of protein-ligand interaction (PLI) is of great importance to structure-based drug design and discovery. Different physical and biochemical representations can be used to describe PLI such as energy terms and interaction fingerprints, which can be analyzed by machine learning (ML) algorithms to create ML-based scoring functions (MLSFs). Here, we propose the ML-based PLI capturer (ML-PLIC), a web platform that automatically characterizes PLI and generates MLSFs to identify the potential binders of a specific protein target through virtual screening (VS). ML-PLIC comprises five modules, including Docking for ligand docking, Descriptors for PLI generation, Modeling for MLSF training, Screening for VS and Pipeline for the integration of the aforementioned functions. We validated the MLSFs constructed by ML-PLIC in three benchmark datasets (Directory of Useful Decoys-Enhanced, Active as Decoys and TocoDecoy), demonstrating accuracy outperforming traditional docking tools and competitive performance to the deep learning-based SF, and provided a case study of the Serine/threonine-protein kinase WEE1 in which MLSFs were developed by using the ML-based VS pipeline in ML-PLIC. Underpinning the latest version of ML-PLIC is a powerful platform that incorporates physical and biological knowledge about PLI, leveraging PLI characterization and MLSF generation into the design of structure-based VS pipeline. The ML-PLIC web platform is now freely available at http://cadd.zju.edu.cn/plic/.

Assuntos

Algoritmos , Benchmarking , Ligantes , Desenho de Fármacos , Aprendizado de Máquina

4.

Reducing false positive rate of docking-based virtual screening by active learning.

Wang, Lei; Shi, Shao-Hua; Li, Hui; Zeng, Xiang-Xiang; Liu, Su-You; Liu, Zhao-Qian; Deng, Ya-Feng; Lu, Ai-Ping; Hou, Ting-Jun; Cao, Dong-Sheng.

Brief Bioinform ; 24(1)2023 01 19.

Artigo em Inglês | MEDLINE | ID: mdl-36642412

RESUMO

Machine learning-based scoring functions (MLSFs) have become a very favorable alternative to classical scoring functions because of their potential superior screening performance. However, the information of negative data used to construct MLSFs was rarely reported in the literature, and meanwhile the putative inactive molecules recorded in existing databases usually have obvious bias from active molecules. Here we proposed an easy-to-use method named AMLSF that combines active learning using negative molecular selection strategies with MLSF, which can iteratively improve the quality of inactive sets and thus reduce the false positive rate of virtual screening. We chose energy auxiliary terms learning as the MLSF and validated our method on eight targets in the diverse subset of DUD-E. For each target, we screened the IterBioScreen database by AMLSF and compared the screening results with those of the four control models. The results illustrate that the number of active molecules in the top 1000 molecules identified by AMLSF was significantly higher than those identified by the control models. In addition, the free energy calculation results for the top 10 molecules screened out by the AMLSF, null model and control models based on DUD-E also proved that more active molecules can be identified, and the false positive rate can be reduced by AMLSF.

Assuntos

Proteínas , Proteínas/metabolismo , Bases de Dados Factuais , Ligantes , Simulação de Acoplamento Molecular , Ligação Proteica

5.

PROTAC-DB 2.0: an updated database of PROTACs.

Weng, Gaoqi; Cai, Xuanyan; Cao, Dongsheng; Du, Hongyan; Shen, Chao; Deng, Yafeng; He, Qiaojun; Yang, Bo; Li, Dan; Hou, Tingjun.

Nucleic Acids Res ; 51(D1): D1367-D1372, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36300631

RESUMO

Proteolysis targeting chimeras (PROTACs), which harness the ubiquitin-proteasome system to selectively induce targeted protein degradation, represent an emerging therapeutic technology with the potential to modulate traditional undruggable targets. Over the past few years, this technology has moved from academia to industry and more than 10 PROTACs have been advanced into clinical trials. However, designing potent PROTACs with desirable drug-like properties still remains a great challenge. Here, we report an updated online database, PROTAC-DB 2.0, which is a repository of structural and experimental data about PROTACs. In this 2nd release, we expanded the number of PROTACs to 3270, which corresponds to a 96% expansion over the first version. Meanwhile, the numbers of warheads (small molecules targeting the proteins of interest), linkers, and E3 ligands (small molecules recruiting E3 ligases) have increased to over 360, 1500 and 80, respectively. In addition, given the importance and the limited number of the crystal target-PROTAC-E3 ternary complex structures, we provide the predicted ternary complex structures for PROTACs with good degradation capability using our PROTAC-Model method. To further facilitate the analysis of PROTAC data, a new filtering strategy based on the E3 ligases is also added. PROTAC-DB 2.0 is available online at http://cadd.zju.edu.cn/protacdb/.

Assuntos

Bases de Dados de Proteínas , Complexo de Endopeptidases do Proteassoma , Proteólise , Complexo de Endopeptidases do Proteassoma/metabolismo , Proteínas/metabolismo , Ubiquitina/metabolismo , Ubiquitina-Proteína Ligases/metabolismo

6.

Cross-Modal Retrieval Between ¹³C NMR Spectra and Structures Based on Focused Libraries.

Sun, Hanyu; Xue, Xi; Liu, Xue; Hu, Hai-Yu; Deng, Yafeng; Wang, Xiaojian.

Anal Chem ; 96(15): 5763-5770, 2024 04 16.

Artigo em Inglês | MEDLINE | ID: mdl-38564366

RESUMO

Library matching by comparing carbon-13 nuclear magnetic resonance (13C NMR) spectra with spectral data in the library is a crucial method for compound identification. In our previous paper, we introduced a deep contrastive learning system called CReSS, which used a library that contained more structures. However, CReSS has two limitations: there were no unknown structures in the library, and a redundant library reduces the structure-elucidation accuracy. Herein, we replaced the oversize traditional libraries with focused libraries containing a small number of molecules. A previously generative model, CMGNet, was used to generate focused libraries for CReSS. The combined model achieved a Top-10 accuracy of 54.03% when tested on 6,471 13C NMR spectra. In comparison, CReSS with a random reference structure library achieved an accuracy of only 9.17%. Furthermore, to expand the advantages of the focused libraries, we proposed SAmpRNN, which is a recurrent neural network (RNN). With the large focused library amplified by SAmpRNN, the structure-identification accuracy of the model increased in 70.0% of the 30 random example cases. In general, cross-modal retrieval between 13C NMR spectra and structures based on focused libraries (CFLS) achieved high accuracy and provided more accurate candidate structures than traditional libraries for compound identification.

Assuntos

Imageamento por Ressonância Magnética , Espectroscopia de Ressonância Magnética

7.

Genetic Algorithm-Based Receptor Ligand: A Genetic Algorithm-Guided Generative Model to Boost the Novelty and Drug-Likeness of Molecules in a Sampling Chemical Space.

Wang, Mingyang; Wu, Zhengjian; Wang, Jike; Weng, Gaoqi; Kang, Yu; Pan, Peichen; Li, Dan; Deng, Yafeng; Yao, Xiaojun; Bing, Zhitong; Hsieh, Chang-Yu; Hou, Tingjun.

J Chem Inf Model ; 64(4): 1213-1228, 2024 Feb 26.

Artigo em Inglês | MEDLINE | ID: mdl-38302422

RESUMO

Deep learning-based de novo molecular design has recently gained significant attention. While numerous DL-based generative models have been successfully developed for designing novel compounds, the majority of the generated molecules lack sufficiently novel scaffolds or high drug-like profiles. The aforementioned issues may not be fully captured by commonly used metrics for the assessment of molecular generative models, such as novelty, diversity, and quantitative estimation of the drug-likeness score. To address these limitations, we proposed a genetic algorithm-guided generative model called GARel (genetic algorithm-based receptor-ligand interaction generator), a novel framework for training a DL-based generative model to produce drug-like molecules with novel scaffolds. To efficiently train the GARel model, we utilized dense net to update the parameters based on molecules with novel scaffolds and drug-like features. To demonstrate the capability of the GARel model, we used it to design inhibitors for three targets: AA2AR, EGFR, and SARS-Cov2. The results indicate that GARel-generated molecules feature more diverse and novel scaffolds and possess more desirable physicochemical properties and favorable docking scores. Compared with other generative models, GARel makes significant progress in balancing novelty and drug-likeness, providing a promising direction for the further development of DL-based de novo design methodology with potential impacts on drug discovery.

Assuntos

Desenho de Fármacos , RNA Viral , Ligantes , Algoritmos , Descoberta de Drogas

8.

DrugFlow: An AI-Driven One-Stop Platform for Innovative Drug Discovery.

Shen, Chao; Song, Jianfei; Hsieh, Chang-Yu; Cao, Dongsheng; Kang, Yu; Ye, Wenling; Wu, Zhenxing; Wang, Jike; Zhang, Odin; Zhang, Xujun; Zeng, Hao; Cai, Heng; Chen, Yu; Chen, Linkang; Luo, Hao; Zhao, Xinda; Jian, Tianye; Chen, Tong; Jiang, Dejun; Wang, Mingyang; Ye, Qing; Wu, Jialu; Du, Hongyan; Shi, Hui; Deng, Yafeng; Hou, Tingjun.

J Chem Inf Model ; 64(14): 5381-5391, 2024 Jul 22.

Artigo em Inglês | MEDLINE | ID: mdl-38920405

RESUMO

Artificial intelligence (AI)-aided drug design has demonstrated unprecedented effects on modern drug discovery, but there is still an urgent need for user-friendly interfaces that bridge the gap between these sophisticated tools and scientists, particularly those who are less computer savvy. Herein, we present DrugFlow, an AI-driven one-stop platform that offers a clean, convenient, and cloud-based interface to streamline early drug discovery workflows. By seamlessly integrating a range of innovative AI algorithms, covering molecular docking, quantitative structure-activity relationship modeling, molecular generation, ADMET (absorption, distribution, metabolism, excretion and toxicity) prediction, and virtual screening, DrugFlow can offer effective AI solutions for almost all crucial stages in early drug discovery, including hit identification and hit/lead optimization. We hope that the platform can provide sufficiently valuable guidance to aid real-word drug design and discovery. The platform is available at https://drugflow.com.

Assuntos

Inteligência Artificial , Descoberta de Drogas , Descoberta de Drogas/métodos , Simulação de Acoplamento Molecular , Relação Quantitativa Estrutura-Atividade , Algoritmos , Desenho de Fármacos , Software , Humanos , Computação em Nuvem

9.

Assessing the performance of MM/PBSA and MM/GBSA methods. 10. Prediction reliability of binding affinities and binding poses for RNA-ligand complexes.

Jiang, Dejun; Du, Hongyan; Zhao, Huifeng; Deng, Yafeng; Wu, Zhenxing; Wang, Jike; Zeng, Yundian; Zhang, Haotian; Wang, Xiaorui; Wang, Ercheng; Hou, Tingjun; Hsieh, Chang-Yu.

Phys Chem Chem Phys ; 26(13): 10323-10335, 2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38501198

RESUMO

Ribonucleic acid (RNA)-ligand interactions play a pivotal role in a wide spectrum of biological processes, ranging from protein biosynthesis to cellular reproduction. This recognition has prompted the broader acceptance of RNA as a viable candidate for drug targets. Delving into the atomic-scale understanding of RNA-ligand interactions holds paramount importance in unraveling intricate molecular mechanisms and further contributing to RNA-based drug discovery. Computational approaches, particularly molecular docking, offer an efficient way of predicting the interactions between RNA and small molecules. However, the accuracy and reliability of these predictions heavily depend on the performance of scoring functions (SFs). In contrast to the majority of SFs used in RNA-ligand docking, the end-point binding free energy calculation methods, such as molecular mechanics/generalized Born surface area (MM/GBSA) and molecular mechanics/Poisson Boltzmann surface area (MM/PBSA), stand as theoretically more rigorous approaches. Yet, the evaluation of their effectiveness in predicting both binding affinities and binding poses within RNA-ligand systems remains unexplored. This study first reported the performance of MM/PBSA and MM/GBSA with diverse solvation models, interior dielectric constants (Îµin) and force fields in the context of binding affinity prediction for 29 RNA-ligand complexes. MM/GBSA is based on short (5 ns) molecular dynamics (MD) simulations in an explicit solvent with the YIL force field; the GBGBn2 model with higher interior dielectric constant (Îµin = 12, 16 or 20) yields the best correlation (Rp = -0.513), which outperforms the best correlation (Rp = -0.317, rDock) offered by various docking programs. Then, the efficacy of MM/GBSA in identifying the near-native binding poses from the decoys was assessed based on 56 RNA-ligand complexes. However, it is evident that MM/GBSA has limitations in accurately predicting binding poses for RNA-ligand systems, particularly compared with notably proficient docking programs like rDock and PLANTS. The best top-1 success rate achieved by MM/GBSA rescoring is 39.3%, which falls below the best results given by docking programs (50%, PLNATS). This study represents the first evaluation of MM/PBSA and MM/GBSA for RNA-ligand systems and is expected to provide valuable insights into their successful application to RNA targets.

Assuntos

Simulação de Dinâmica Molecular , RNA , Simulação de Acoplamento Molecular , Ligantes , Reprodutibilidade dos Testes , Ligação Proteica , Termodinâmica , Sítios de Ligação

10.

Advances in the Application of Artificial Intelligence-Based Spectral Data Interpretation: A Perspective.

Xue, Xi; Sun, Hanyu; Yang, Minjian; Liu, Xue; Hu, Hai-Yu; Deng, Yafeng; Wang, Xiaojian.

Anal Chem ; 95(37): 13733-13745, 2023 Sep 19.

Artigo em Inglês | MEDLINE | ID: mdl-37688541

RESUMO

The interpretation of spectral data, including mass, nuclear magnetic resonance, infrared, and ultraviolet-visible spectra, is critical for obtaining molecular structural information. The development of advanced sensing technology has multiplied the amount of available spectral data. Chemical experts must use basic principles corresponding to the spectral information generated by molecular fragments and functional groups. This is a time-consuming process that requires a solid professional knowledge base. In recent years, the rapid development of computer science and its applications in cheminformatics and the emergence of computer-aided expert systems have greatly reduced the difficulty in analyzing large quantities of data. For expert systems, however, the problem-solving strategy must be known in advance or extracted by human experts and translated into algorithms. Gratifyingly, the development of artificial intelligence (AI) methods has shown great promise for solving such problems. Traditional algorithms, including the latest neural network algorithms, have shown great potential for both extracting useful information and processing massive quantities of data. This Perspective highlights recent innovations covering all of the emerging AI-based spectral interpretation techniques. In addition, the main limitations and current obstacles are presented, and the corresponding directions for further research are proposed. Moreover, this Perspective gives the authors' personal outlook on the development and future applications of spectral interpretation.

11.

Conditional Molecular Generation Net Enables Automated Structure Elucidation Based on ¹³C NMR Spectra and Prior Knowledge.

Yao, Lin; Yang, Minjian; Song, Jianfei; Yang, Zhuo; Sun, Hanyu; Shi, Hui; Liu, Xue; Ji, Xiangyang; Deng, Yafeng; Wang, Xiaojian.

Anal Chem ; 95(12): 5393-5401, 2023 Mar 28.

Artigo em Inglês | MEDLINE | ID: mdl-36926883

RESUMO

Structure elucidation of unknown compounds based on nuclear magnetic resonance (NMR) remains a challenging problem in both synthetic organic and natural product chemistry. Library matching has been an efficient method to assist structure elucidation. However, it is limited by the coverage of libraries. In addition, prior knowledge such as molecular fragments is neglected. To solve the problem, we propose a conditional molecular generation net (CMGNet) to allow input of multiple sources of information. CMGNet not only uses 13C NMR spectrum data as input but molecular formulas and fragments of molecules are also employed as input conditions. Our model applies large-scale pretraining for molecular understanding and fine-tuning on two NMR spectral data sets of different granularity levels to accommodate structure elucidation tasks. CMGNet generates structures based on 13C NMR data, molecular formula, and fragment information, with a recovery rate of 94.17% in the top 10 recommendations. In addition, the generative model performed well in the generation of various classes of compounds and in the structural revision task. CMGNet has a deep understanding of molecular connectivities from 13C NMR, molecular formula, and fragments, paving the way for a new paradigm of deep learning-assisted inverse problem-solving.

12.

From Black Boxes to Actionable Insights: A Perspective on Explainable Artificial Intelligence for Scientific Discovery.

Wu, Zhenxing; Chen, Jihong; Li, Yitong; Deng, Yafeng; Zhao, Haitao; Hsieh, Chang-Yu; Hou, Tingjun.

J Chem Inf Model ; 63(24): 7617-7627, 2023 Dec 25.

Artigo em Inglês | MEDLINE | ID: mdl-38079566

RESUMO

The application of Explainable Artificial Intelligence (XAI) in the field of chemistry has garnered growing interest for its potential to justify the prediction of black-box machine learning models and provide actionable insights. We first survey a range of XAI techniques adapted for chemical applications and categorize them based on the technical details of each methodology. We then present a few case studies to illustrate the practical utility of XAI, such as identifying carcinogenic molecules and guiding molecular optimizations, in order to provide chemists with concrete examples of ways to take full advantage of XAI-augmented machine learning for chemistry. Despite the initial success of XAI in chemistry, we still face the challenges of developing more reliable explanations, assuring robustness against adversarial actions, and customizing the explanation for different applications and needs of the diverse scientific community. Finally, we discuss the emerging role of large language models like GPT in generating natural language explanations and discusses the specific challenges associated with them. We advocate that addressing the aforementioned challenges and actively embracing new techniques may contribute to establishing machine learning as an indispensable technique for chemistry in this digital era.

Assuntos

Inteligência Artificial , Aprendizado de Máquina , Idioma

13.

CODD-Pred: A Web Server for Efficient Target Identification and Bioactivity Prediction of Small Molecules.

Yin, Xiaodan; Wang, Xiaorui; Li, Yuquan; Wang, Jike; Wang, Yuwei; Deng, Yafeng; Hou, Tingjun; Liu, Huanxiang; Luo, Pei; Yao, Xiaojun.

J Chem Inf Model ; 63(20): 6169-6176, 2023 10 23.

Artigo em Inglês | MEDLINE | ID: mdl-37820365

RESUMO

Target identification and bioactivity prediction are critical steps in the drug discovery process. Here we introduce CODD-Pred (COmprehensive Drug Design Predictor), an online web server with well-curated data sets from the GOSTAR database, which is designed with a dual purpose of predicting potential protein drug targets and computing bioactivity values of small molecules. We first designed a double molecular graph perception (DMGP) framework for target prediction based on a large library of 646â¯498 small molecules interacting with 640 human targets. The framework achieved a top-5 accuracy of over 80% for hitting at least one target on both external validation sets. Additionally, its performance on the external validation set comprising 200 molecules surpassed that of four existing target prediction servers. Second, we collected 56 targets closely related to the occurrence and development of cancer, metabolic diseases, and inflammatory immune diseases and developed a multi-model self-validation activity prediction (MSAP) framework that enables accurate bioactivity quantification predictions for small-molecule ligands of these 56 targets. CODD-Pred is a handy tool for rapid evaluation and optimization of small molecules with specific target activity. CODD-Pred is freely accessible at http://codd.iddd.group/.

Assuntos

Computadores , Proteínas , Humanos , Proteínas/química , Desenho de Fármacos , Descoberta de Drogas , Bases de Dados Factuais

14.

Improved GNNs for Logâ¯D_7.4 Prediction by Transferring Knowledge from Low-Fidelity Data.

Duan, Yan-Jing; Fu, Li; Zhang, Xiao-Chen; Long, Teng-Zhi; He, Yuan-Hang; Liu, Zhao-Qian; Lu, Ai-Ping; Deng, Ya-Feng; Hsieh, Chang-Yu; Hou, Ting-Jun; Cao, Dong-Sheng.

J Chem Inf Model ; 63(8): 2345-2359, 2023 04 24.

Artigo em Inglês | MEDLINE | ID: mdl-37000044

RESUMO

The n-octanol/buffer solution distribution coefficient at pH = 7.4 (logâ¯D7.4) is an indicator of lipophilicity, and it influences a wide variety of absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties and druggability of compounds. In logâ¯D7.4 prediction, graph neural networks (GNNs) can uncover subtle structure-property relationships (SPRs) by automatically extracting features from molecular graphs that facilitate the learning of SPRs, but their performances are often limited by the small size of available datasets. Herein, we present a transfer learning strategy called pretraining on computational data and then fine-tuning on experimental data (PCFE) to fully exploit the predictive potential of GNNs. PCFE works by pretraining a GNN model on 1.71 million computational logâ¯D data (low-fidelity data) and then fine-tuning it on 19,155 experimental logâ¯D7.4 data (high-fidelity data). The experiments for three GNN architectures (graph convolutional network (GCN), graph attention network (GAT), and Attentive FP) demonstrated the effectiveness of PCFE in improving GNNs for logâ¯D7.4 predictions. Moreover, the optimal PCFE-trained GNN model (cx-Attentive FP, Rtest2 = 0.909) outperformed four excellent descriptor-based models (random forest (RF), gradient boosting (GB), support vector machine (SVM), and extreme gradient boosting (XGBoost)). The robustness of the cx-Attentive FP model was also confirmed by evaluating the models with different training data sizes and dataset splitting strategies. Therefore, we developed a webserver and defined the applicability domain for this model. The webserver (http://tools.scbdd.com/chemlogd/) provides free logâ¯D7.4 prediction services. In addition, the important descriptors for logâ¯D7.4 were detected by the Shapley additive explanations (SHAP) method, and the most relevant substructures of logâ¯D7.4 were identified by the attention mechanism. Finally, the matched molecular pair analysis (MMPA) was performed to summarize the contributions of common chemical substituents to logâ¯D7.4, including a variety of hydrocarbon groups, halogen groups, heteroatoms, and polar groups. In conclusion, we believe that the cx-Attentive FP model can serve as a reliable tool to predict logâ¯D7.4 and hope that pretraining on low-fidelity data can help GNNs make accurate predictions of other endpoints in drug discovery.

Assuntos

Descoberta de Drogas , Halogênios , 1-Octanol , Aprendizagem , Redes Neurais de Computação

15.

Yellow Carbon Dots for Fluorescent Water Sensing, Relative Humidity Sensing, and Anticounterfeiting Applications.

Deng, Yafeng; Huang, Shaoyun; Li, Jinli; Zhou, Yihua; Qian, Jun.

J Fluoresc ; 33(6): 2273-2280, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37017894

RESUMO

Most fluorescent probes based on carbon dots (CDs) fluorescence color or intensity change are still used for detection in solution, but in practical fluorescence detection applications, detection in the solid state is necessary. Therefore, a CDs-based fluorescence sensing device is designed in this paper, which can be used for water detection in liquid and solid states. Using oPD as a single precursor, yellow fluorescent CDs (y-CDs) were prepared by hydrothermal method, which can be used in the field of water detection and anti-counterfeiting by using its solvent-sensitive properties. First, y-CDs can be used to visually and intelligently detect the water content in ethanol. Secondly, it can be used to detect the Relative Humidity (RH) of the environment by combining it with cellulose to form a fluorescent film. Finally, y-CDs can also be used as a fluorescent material for fluorescence anti-counterfeiting.

16.

ALipSol: An Attention-Driven Mixture-of-Experts Model for Lipophilicity and Solubility Prediction.

Wu, Jialu; Wang, Junmei; Wu, Zhenxing; Zhang, Shengyu; Deng, Yafeng; Kang, Yu; Cao, Dongsheng; Hsieh, Chang-Yu; Hou, Tingjun.

J Chem Inf Model ; 62(23): 5975-5987, 2022 Dec 12.

Artigo em Inglês | MEDLINE | ID: mdl-36417544

RESUMO

Lipophilicity (logD) and aqueous solubility (logSw) play a central role in drug development. The accurate prediction of these properties remains to be solved due to data scarcity. Current methodologies neglect the intrinsic relationships between physicochemical properties and usually ignore the ionization effects. Here, we propose an attention-driven mixture-of-experts (MoE) model named ALipSol, which explicitly reproduces the hierarchy of task relationships. We adopt the principle of divide-and-conquer by breaking down the complex end point (logD or logSw) into simpler ones (acidic pKa, basic pKa, and logP) and allocating a specific expert network for each subproblem. Subsequently, we implement transfer learning to extract knowledge from related tasks, thus alleviating the dilemma of limited data. Additionally, we substitute the gating network with an attention mechanism to better capture the dynamic task relationships on a per-example basis. We adopt local fine-tuning and consensus prediction to further boost model performance. Extensive evaluation experiments verify the success of the ALipSol model, which achieves RMSE improvement of 8.04%, 2.49%, 8.57%, 12.8%, and 8.60% on the Lipop, ESOL, AqSolDB, external logD, and external logS data sets, respectively, compared with Attentive FP and the state-of-the-art in silico tools. In particular, our model yields more significant advantages (Welch's t-test) for small training data, implying its high robustness and generalizability. The interpretability analysis proves that the atom contributions learned by ALipSol are more reasonable compared with the vanilla Attentive FP, and the substitution effects in benzene derivatives agreed well with empirical constants, revealing the potential of our model to extract useful patterns from data and provide guidance for lead optimization.

Assuntos

Água , Solubilidade , Água/química

17.

Cross-Modal Retrieval between ¹³C NMR Spectra and Structures for Compound Identification Using Deep Contrastive Learning.

Yang, Zhuo; Song, Jianfei; Yang, Minjian; Yao, Lin; Zhang, Jiahua; Shi, Hui; Ji, Xiangyang; Deng, Yafeng; Wang, Xiaojian.

Anal Chem ; 93(50): 16947-16955, 2021 12 21.

Artigo em Inglês | MEDLINE | ID: mdl-34841854

RESUMO

Library matching using carbon-13 nuclear magnetic resonance (13C NMR) spectra has been a popular method adopted in compound identification systems. However, the usability of existing approaches has been restricted as enlarging a library containing both a chemical structure and spectrum is a costly and time-consuming process. Therefore, we propose a fundamentally different, novel approach to match 13C NMR spectra directly against a molecular structure library. We develop a cross-modal retrieval between spectrum and structure (CReSS) system using deep contrastive learning, which allows us to search a molecular structure library using the 13C NMR spectrum of a compound. In the test of searching 41,494 13C NMR spectra against a reference structure library containing 10.4 million compounds, CReSS reached a recall@10 accuracy of 91.64% and a processing speed of 0.114 s per query spectrum. When further incorporating a filter with a molecular weight tolerance of 5 Da, CReSS achieved a new remarkable recall@10 of 98.39%. Furthermore, CReSS has potential in detecting scaffolds of novel structures and demonstrates great performance for the task of structural revision. CReSS is built and developed to bridge the gap between 13C NMR spectra and structures and could be generally applicable in compound identification.

Assuntos

Espectroscopia de Ressonância Magnética

18.

Comprehensive, Open-Source, and Automated Workflow for Multisite λ-Dynamics in Lead Optimization.

Hu, Renling; Zhang, Jintu; Kang, Yu; Wang, Zhe; Pan, Peichen; Deng, Yafeng; Hsieh, Chang-Yu; Hou, Tingjun.

J Chem Theory Comput ; 20(3): 1465-1478, 2024 Feb 13.

Artigo em Inglês | MEDLINE | ID: mdl-38300792

RESUMO

Multisite λ-dynamics (MSLD) is a highly efficient binding free energy calculation method that samples multiple ligands in a single round by assigning different λ values to the alchemical part of each ligand. This method holds great promise for lead optimization (LO) in drug discovery. However, the complex data preparation and simulation process limits its widespread application in diverse protein-ligand systems. To address this challenge, we developed a comprehensive, open-source, and automated workflow for MSLD calculations based on the BLaDE dynamics engine. This workflow incorporates the Ligand Internal and Cartesian coordinate reconstruction-based alignment algorithm (LIC-align) and an optimized maximum common substructure (MCS) search algorithm to accurately generate MSLD multiple topologies with ideal perturbation patterns. Furthermore, our workflow is highly modularized, allowing straightforward integration and extension of various simulation techniques, and is highly accessible to nonexperts. This workflow was validated by calculating the relative binding free energies of large-scale congeneric ligands, many of which have large perturbing groups. The agreement between the calculations and experiments was excellent, with an average unsigned error of 1.08 ± 0.47 kcal/mol. More than 57.1% of the ligands had an error of less than 1.0 kcal/mol, and the perturbations of 6 targets were fully connected via the calculations, while those of 2 targets were connected via both calculations and experimental data. The Pearson correlation coefficient reached 0.88, indicating that the MSLD workflow provides accurate predictions that can guide lead optimization in drug discovery. We also examined the impact of single-site versus multisite perturbations, ligand grouping by perturbing group size, and the position of the anchor atom on the MSLD performance. By integrating our proposed LIC-align and optimized MCS search algorithm along with the coping strategies to handle challenging molecular substructures, our workflow can handle many realistic scenarios more reasonably than all previously published methods. Moreover, we observed that our MSLD workflow achieved similar accuracy to free energy perturbation (FEP) while improving computational efficiency by over 1 order of magnitude in speedup. These findings provide valuable insights and strategies for further MSLD development, making MSLD a competitive tool for lead optimization.

Assuntos

Simulação de Dinâmica Molecular , Proteínas , Termodinâmica , Ligantes , Fluxo de Trabalho , Proteínas/química , Ligação Proteica

19.

Enhancing Generic Reaction Yield Prediction through Reaction Condition-Based Contrastive Learning.

Yin, Xiaodan; Hsieh, Chang-Yu; Wang, Xiaorui; Wu, Zhenxing; Ye, Qing; Bao, Honglei; Deng, Yafeng; Chen, Hongming; Luo, Pei; Liu, Huanxiang; Hou, Tingjun; Yao, Xiaojun.

Research (Wash D C) ; 7: 0292, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38213662

RESUMO

Deep learning (DL)-driven efficient synthesis planning may profoundly transform the paradigm for designing novel pharmaceuticals and materials. However, the progress of many DL-assisted synthesis planning (DASP) algorithms has suffered from the lack of reliable automated pathway evaluation tools. As a critical metric for evaluating chemical reactions, accurate prediction of reaction yields helps improve the practicality of DASP algorithms in the real-world scenarios. Currently, accurately predicting yields of interesting reactions still faces numerous challenges, mainly including the absence of high-quality generic reaction yield datasets and robust generic yield predictors. To compensate for the limitations of high-throughput yield datasets, we curated a generic reaction yield dataset containing 12 reaction categories and rich reaction condition information. Subsequently, by utilizing 2 pretraining tasks based on chemical reaction masked language modeling and contrastive learning, we proposed a powerful bidirectional encoder representations from transformers (BERT)-based reaction yield predictor named Egret. It achieved comparable or even superior performance to the best previous models on 4 benchmark datasets and established state-of-the-art performance on the newly curated dataset. We found that reaction-condition-based contrastive learning enhances the model's sensitivity to reaction conditions, and Egret is capable of capturing subtle differences between reactions involving identical reactants and products but different reaction conditions. Furthermore, we proposed a new scoring function that incorporated Egret into the evaluation of multistep synthesis routes. Test results showed that yield-incorporated scoring facilitated the prioritization of literature-supported high-yield reaction pathways for target molecules. In addition, through meta-learning strategy, we further improved the reliability of the model's prediction for reaction types with limited data and lower data quality. Our results suggest that Egret holds the potential to become an essential component of the next-generation DASP tools.

20.

OptADMET: a web-based tool for substructure modifications to improve ADMET properties of lead compounds.

Yi, Jiacai; Shi, Shaohua; Fu, Li; Yang, Ziyi; Nie, Pengfei; Lu, Aiping; Wu, Chengkun; Deng, Yafeng; Hsieh, Changyu; Zeng, Xiangxiang; Hou, Tingjun; Cao, Dongsheng.

Nat Protoc ; 19(4): 1105-1121, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38263521

RESUMO

Lead optimization is a crucial step in the drug discovery process, which aims to design potential drug candidates from biologically active hits. During lead optimization, active hits undergo modifications to improve their absorption, distribution, metabolism, excretion and toxicity (ADMET) profiles. Medicinal chemists face key questions regarding which compound(s) should be synthesized next and how to balance multiple ADMET properties. Reliable transformation rules from multiple experimental analyses are critical to improve this decision-making process. We developed OptADMET ( https://cadd.nscc-tj.cn/deploy/optadmet/ ), an integrated web-based platform that provides chemical transformation rules for 32 ADMET properties and leverages prior experimental data for lead optimization. The multiproperty transformation rule database contains a total of 41,779 validated transformation rules generated from the analysis of 177,191 reliable experimental datasets. Additionally, 146,450 rules were generated by analyzing 239,194 molecular data predictions. OptADMET provides the ADMET profiles of all optimized molecules from the queried molecule and enables the prediction of desirable substructure transformations and subsequent validation of drug candidates. OptADMET is based on matched molecular pairs analysis derived from synthetic chemistry, thus providing improved practicality over other methods. OptADMET is designed for use by both experimental and computational scientists.

Assuntos

Descoberta de Drogas , Internet , Bases de Dados Factuais

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA