Pesquisa | Biblioteca Virtual em Saúde Fiocruz

1.

ChemMORT: an automatic ADMET optimization platform using deep learning and multi-objective particle swarm optimization.

Yi, Jia-Cai; Yang, Zi-Yi; Zhao, Wen-Tao; Yang, Zhi-Jiang; Zhang, Xiao-Chen; Wu, Cheng-Kun; Lu, Ai-Ping; Cao, Dong-Sheng.

Brief Bioinform ; 25(2)2024 Jan 22.

Artigo em Inglês | MEDLINE | ID: mdl-38385872

RESUMO

Drug discovery and development constitute a laborious and costly undertaking. The success of a drug hinges not only good efficacy but also acceptable absorption, distribution, metabolism, elimination, and toxicity (ADMET) properties. Overall, up to 50% of drug development failures have been contributed from undesirable ADMET profiles. As a multiple parameter objective, the optimization of the ADMET properties is extremely challenging owing to the vast chemical space and limited human expert knowledge. In this study, a freely available platform called Chemical Molecular Optimization, Representation and Translation (ChemMORT) is developed for the optimization of multiple ADMET endpoints without the loss of potency (https://cadd.nscc-tj.cn/deploy/chemmort/). ChemMORT contains three modules: Simplified Molecular Input Line Entry System (SMILES) Encoder, Descriptor Decoder and Molecular Optimizer. The SMILES Encoder can generate the molecular representation with a 512-dimensional vector, and the Descriptor Decoder is able to translate the above representation to the corresponding molecular structure with high accuracy. Based on reversible molecular representation and particle swarm optimization strategy, the Molecular Optimizer can be used to effectively optimize undesirable ADMET properties without the loss of bioactivity, which essentially accomplishes the design of inverse QSAR. The constrained multi-objective optimization of the poly (ADP-ribose) polymerase-1 inhibitor is provided as the case to explore the utility of ChemMORT.

Assuntos

Aprendizado Profundo , Humanos , Desenvolvimento de Medicamentos , Descoberta de Drogas , Inibidores de Poli(ADP-Ribose) Polimerases

2.

ChemFH: an integrated tool for screening frequent false positives in chemical biology and drug discovery.

Shi, Shaohua; Fu, Li; Yi, Jiacai; Yang, Ziyi; Zhang, Xiaochen; Deng, Youchao; Wang, Wenxuan; Wu, Chengkun; Zhao, Wentao; Hou, Tingjun; Zeng, Xiangxiang; Lyu, Aiping; Cao, Dongsheng.

Nucleic Acids Res ; 52(W1): W439-W449, 2024 Jul 05.

Artigo em Inglês | MEDLINE | ID: mdl-38783035

RESUMO

High-throughput screening rapidly tests an extensive array of chemical compounds to identify hit compounds for specific biological targets in drug discovery. However, false-positive results disrupt hit compound screening, leading to wastage of time and resources. To address this, we propose ChemFH, an integrated online platform facilitating rapid virtual evaluation of potential false positives, including colloidal aggregators, spectroscopic interference compounds, firefly luciferase inhibitors, chemical reactive compounds, promiscuous compounds, and other assay interferences. By leveraging a dataset containing 823 391 compounds, we constructed high-quality prediction models using multi-task directed message-passing network (DMPNN) architectures combining uncertainty estimation, yielding an average AUC value of 0.91. Furthermore, ChemFH incorporated 1441 representative alert substructures derived from the collected data and ten commonly used frequent hitter screening rules. ChemFH was validated with an external set of 75 compounds. Subsequently, the virtual screening capability of ChemFH was successfully confirmed through its application to five virtual screening libraries. Furthermore, ChemFH underwent additional validation on two natural products and FDA-approved drugs, yielding reliable and accurate results. ChemFH is a comprehensive, reliable, and computationally efficient screening pipeline that facilitates the identification of true positive results in assays, contributing to enhanced efficiency and success rates in drug discovery. ChemFH is freely available via https://chemfh.scbdd.com/.

Assuntos

Descoberta de Drogas , Ensaios de Triagem em Larga Escala , Software , Descoberta de Drogas/métodos , Ensaios de Triagem em Larga Escala/métodos , Avaliação Pré-Clínica de Medicamentos/métodos , Reações Falso-Positivas , Bibliotecas de Moléculas Pequenas/farmacologia , Bibliotecas de Moléculas Pequenas/química , Humanos

3.

ADMETlab 3.0: an updated comprehensive online ADMET prediction platform enhanced with broader coverage, improved performance, API functionality and decision support.

Fu, Li; Shi, Shaohua; Yi, Jiacai; Wang, Ningning; He, Yuanhang; Wu, Zhenxing; Peng, Jinfu; Deng, Youchao; Wang, Wenxuan; Wu, Chengkun; Lyu, Aiping; Zeng, Xiangxiang; Zhao, Wentao; Hou, Tingjun; Cao, Dongsheng.

Nucleic Acids Res ; 52(W1): W422-W431, 2024 Jul 05.

Artigo em Inglês | MEDLINE | ID: mdl-38572755

RESUMO

ADMETlab 3.0 is the second updated version of the web server that provides a comprehensive and efficient platform for evaluating ADMET-related parameters as well as physicochemical properties and medicinal chemistry characteristics involved in the drug discovery process. This new release addresses the limitations of the previous version and offers broader coverage, improved performance, API functionality, and decision support. For supporting data and endpoints, this version includes 119 features, an increase of 31 compared to the previous version. The updated number of entries is 1.5 times larger than the previous version with over 400 000 entries. ADMETlab 3.0 incorporates a multi-task DMPNN architecture coupled with molecular descriptors, a method that not only guaranteed calculation speed for each endpoint simultaneously, but also achieved a superior performance in terms of accuracy and robustness. In addition, an API has been introduced to meet the growing demand for programmatic access to large amounts of data in ADMETlab 3.0. Moreover, this version includes uncertainty estimates in the prediction results, aiding in the confident selection of candidate compounds for further studies and experiments. ADMETlab 3.0 is publicly for access without the need for registration at: https://admetlab3.scbdd.com.

Assuntos

Descoberta de Drogas , Internet , Software , Descoberta de Drogas/métodos , Humanos , Preparações Farmacêuticas/química , Preparações Farmacêuticas/metabolismo

4.

Comprehensive assessment of nine target prediction web services: which should we choose for target fishing?

Ji, Kai-Yue; Liu, Chong; Liu, Zhao-Qian; Deng, Ya-Feng; Hou, Ting-Jun; Cao, Dong-Sheng.

Brief Bioinform ; 24(2)2023 03 19.

Artigo em Inglês | MEDLINE | ID: mdl-36681902

RESUMO

Identification of potential targets for known bioactive compounds and novel synthetic analogs is of considerable significance. In silico target fishing (TF) has become an alternative strategy because of the expensive and laborious wet-lab experiments, explosive growth of bioactivity data and rapid development of high-throughput technologies. However, these TF methods are based on different algorithms, molecular representations and training datasets, which may lead to different results when predicting the same query molecules. This can be confusing for practitioners in practical applications. Therefore, this study systematically evaluated nine popular ligand-based TF methods based on target and ligand-target pair statistical strategies, which will help practitioners make choices among multiple TF methods. The evaluation results showed that SwissTargetPrediction was the best method to produce the most reliable predictions while enriching more targets. High-recall similarity ensemble approach (SEA) was able to find real targets for more compounds compared with other TF methods. Therefore, SwissTargetPrediction and SEA can be considered as primary selection methods in future studies. In addition, the results showed that k = 5 was the optimal number of experimental candidate targets. Finally, a novel ensemble TF method based on consensus voting is proposed to improve the prediction performance. The precision of the ensemble TF method outperforms the individual TF method, indicating that the ensemble TF method can more effectively identify real targets within a given top-k threshold. The results of this study can be used as a reference to guide practitioners in selecting the most effective methods in computational drug discovery.

Assuntos

Algoritmos , Ligantes

5.

Graph deep learning enabled spatial domains identification for spatial transcriptomics.

Liu, Teng; Fang, Zhao-Yu; Li, Xin; Zhang, Li-Ning; Cao, Dong-Sheng; Yin, Ming-Zhu.

Brief Bioinform ; 24(3)2023 05 19.

Artigo em Inglês | MEDLINE | ID: mdl-37080761

RESUMO

Advancing spatially resolved transcriptomics (ST) technologies help biologists comprehensively understand organ function and tissue microenvironment. Accurate spatial domain identification is the foundation for delineating genome heterogeneity and cellular interaction. Motivated by this perspective, a graph deep learning (GDL) based spatial clustering approach is constructed in this paper. First, the deep graph infomax module embedded with residual gated graph convolutional neural network is leveraged to address the gene expression profiles and spatial positions in ST. Then, the Bayesian Gaussian mixture model is applied to handle the latent embeddings to generate spatial domains. Designed experiments certify that the presented method is superior to other state-of-the-art GDL-enabled techniques on multiple ST datasets. The codes and dataset used in this manuscript are summarized at https://github.com/narutoten520/SCGDL.

Assuntos

Aprendizado Profundo , Transcriptoma , Teorema de Bayes , Perfilação da Expressão Gênica , Comunicação Celular

6.

DKADE: a novel framework based on deep learning and knowledge graph for identifying adverse drug events and related medications.

Feng, Ze-Ying; Wu, Xue-Hong; Ma, Jun-Long; Li, Min; He, Ge-Fei; Cao, Dong-Sheng; Yang, Guo-Ping.

Brief Bioinform ; 24(4)2023 07 20.

Artigo em Inglês | MEDLINE | ID: mdl-37344167

RESUMO

Adverse drug events (ADEs) are common in clinical practice and can cause significant harm to patients and increase resource use. Natural language processing (NLP) has been applied to automate ADE detection, but NLP systems become less adaptable when drug entities are missing or multiple medications are specified in clinical narratives. Additionally, no Chinese-language NLP system has been developed for ADE detection due to the complexity of Chinese semantics, despite Ë10 million cases of drug-related adverse events occurring annually in China. To address these challenges, we propose DKADE, a deep learning and knowledge graph-based framework for identifying ADEs. DKADE infers missing drug entities and evaluates their correlations with ADEs by combining medication orders and existing drug knowledge. Moreover, DKADE can automatically screen for new adverse drug reactions. Experimental results show that DKADE achieves an overall F1-score value of 91.13%. Furthermore, the adaptability of DKADE is validated using real-world external clinical data. In summary, DKADE is a powerful tool for studying drug safety and automating adverse event monitoring.

Assuntos

Aprendizado Profundo , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Reconhecimento Automatizado de Padrão , Semântica , Processamento de Linguagem Natural

7.

Reducing false positive rate of docking-based virtual screening by active learning.

Wang, Lei; Shi, Shao-Hua; Li, Hui; Zeng, Xiang-Xiang; Liu, Su-You; Liu, Zhao-Qian; Deng, Ya-Feng; Lu, Ai-Ping; Hou, Ting-Jun; Cao, Dong-Sheng.

Brief Bioinform ; 24(1)2023 01 19.

Artigo em Inglês | MEDLINE | ID: mdl-36642412

RESUMO

Machine learning-based scoring functions (MLSFs) have become a very favorable alternative to classical scoring functions because of their potential superior screening performance. However, the information of negative data used to construct MLSFs was rarely reported in the literature, and meanwhile the putative inactive molecules recorded in existing databases usually have obvious bias from active molecules. Here we proposed an easy-to-use method named AMLSF that combines active learning using negative molecular selection strategies with MLSF, which can iteratively improve the quality of inactive sets and thus reduce the false positive rate of virtual screening. We chose energy auxiliary terms learning as the MLSF and validated our method on eight targets in the diverse subset of DUD-E. For each target, we screened the IterBioScreen database by AMLSF and compared the screening results with those of the four control models. The results illustrate that the number of active molecules in the top 1000 molecules identified by AMLSF was significantly higher than those identified by the control models. In addition, the free energy calculation results for the top 10 molecules screened out by the AMLSF, null model and control models based on DUD-E also proved that more active molecules can be identified, and the false positive rate can be reduced by AMLSF.

Assuntos

Proteínas , Proteínas/metabolismo , Bases de Dados Factuais , Ligantes , Simulação de Acoplamento Molecular , Ligação Proteica

8.

Comprehensive evaluation of deep and graph learning on drug-drug interactions prediction.

Lin, Xuan; Dai, Lichang; Zhou, Yafang; Yu, Zu-Guo; Zhang, Wen; Shi, Jian-Yu; Cao, Dong-Sheng; Zeng, Li; Chen, Haowen; Song, Bosheng; Yu, Philip S; Zeng, Xiangxiang.

Brief Bioinform ; 24(4)2023 07 20.

Artigo em Inglês | MEDLINE | ID: mdl-37401373

RESUMO

Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug-drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in drug discovery and clinical research. DDIs prediction through traditional clinical trials and experiments is an expensive and time-consuming process. To correctly apply the advanced AI and deep learning, the developer and user meet various challenges such as the availability and encoding of data resources, and the design of computational methods. This review summarizes chemical structure based, network based, natural language processing based and hybrid methods, providing an updated and accessible guide to the broad researchers and development community with different domain knowledge. We introduce widely used molecular representation and describe the theoretical frameworks of graph neural network models for representing molecular structures. We present the advantages and disadvantages of deep and graph learning methods by performing comparative experiments. We discuss the potential technical challenges and highlight future directions of deep and graph learning models for accelerating DDIs prediction.

Assuntos

Inteligência Artificial , Redes Neurais de Computação , Humanos , Interações Medicamentosas , Processamento de Linguagem Natural , Descoberta de Drogas

9.

Assembling spatial clustering framework for heterogeneous spatial transcriptomics data with GRAPHDeep.

Liu, Teng; Fang, Zhaoyu; Li, Xin; Zhang, Lining; Cao, Dong-Sheng; Li, Min; Yin, Mingzhu.

Bioinformatics ; 40(1)2024 01 02.

Artigo em Inglês | MEDLINE | ID: mdl-38243703

RESUMO

MOTIVATION: Spatial clustering is essential and challenging for spatial transcriptomics' data analysis to unravel tissue microenvironment and biological function. Graph neural networks are promising to address gene expression profiles and spatial location information in spatial transcriptomics to generate latent representations. However, choosing an appropriate graph deep learning module and graph neural network necessitates further exploration and investigation. RESULTS: In this article, we present GRAPHDeep to assemble a spatial clustering framework for heterogeneous spatial transcriptomics data. Through integrating 2 graph deep learning modules and 20 graph neural networks, the most appropriate combination is decided for each dataset. The constructed spatial clustering method is compared with state-of-the-art algorithms to demonstrate its effectiveness and superiority. The significant new findings include: (i) the number of genes or proteins of spatial omics data is quite crucial in spatial clustering algorithms; (ii) the variational graph autoencoder is more suitable for spatial clustering tasks than deep graph infomax module; (iii) UniMP, SAGE, SuperGAT, GATv2, GCN, and TAG are the recommended graph neural networks for spatial clustering tasks; and (iv) the used graph neural network in the existent spatial clustering frameworks is not the best candidate. This study could be regarded as desirable guidance for choosing an appropriate graph neural network for spatial clustering. AVAILABILITY AND IMPLEMENTATION: The source code of GRAPHDeep is available at https://github.com/narutoten520/GRAPHDeep. The studied spatial omics data are available at https://zenodo.org/record/8141084.

Assuntos

Algoritmos , Perfilação da Expressão Gênica , Redes Neurais de Computação , Software , Análise por Conglomerados

10.

Comprehensive evaluation of molecule property prediction with ChatGPT.

Cai, Xibao; Lai, Houtim; Wang, Xing; Wang, Longyue; Liu, Wei; Wang, Yijun; Wang, Zixu; Cao, Dongsheng; Zeng, Xiangxiang.

Methods ; 222: 133-141, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38242382

RESUMO

The versatility of ChatGPT in performing a diverse range of tasks has elicited considerable interest on its potential applications within professional fields. Taking drug discovery as a testbed, this paper provides a comprehensive evaluation of ChatGPT's ability on molecule property prediction. The study focuses on three aspects: 1) Effects of different prompt settings, where we investigate the impact of varying prompts on the prediction outcomes of ChatGPT; 2) Comprehensive evaluation on molecule property prediction, where we conduct a comprehensive evaluation on 53 ADMET-related endpoints; 3) Analysis of ChatGPT's potential and limitations, where we make comparisons with models tailored for molecule property prediction, thus gaining a more accurate understanding of ChatGPT's capabilities and limitations in this area. Through comprehensive evaluation, we find that 1) With appropriate prompt settings, ChatGPT can attain satisfactory prediction outcomes that are competitive with specialized models designed for those tasks. 2) Prompt settings significantly affect ChatGPT's performance. Among all prompt settings, the strategy of selecting examples in few-shot has the greatest impact on results. Scaffold sampling greatly outperforms random sampling. 3) The capacity of ChatGPT to accomplish high-precision predictions is significantly influenced by the quality of examples provided, which may constrain its practical applicability in real-world scenarios. This work highlights ChatGPT's potential and limitations on molecule property prediction, which we hope can inspire future design and evaluation of Large Language Models within scientific domains.

Assuntos

Descoberta de Drogas , Projetos de Pesquisa

11.

PROTAC-DB 2.0: an updated database of PROTACs.

Weng, Gaoqi; Cai, Xuanyan; Cao, Dongsheng; Du, Hongyan; Shen, Chao; Deng, Yafeng; He, Qiaojun; Yang, Bo; Li, Dan; Hou, Tingjun.

Nucleic Acids Res ; 51(D1): D1367-D1372, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36300631

RESUMO

Proteolysis targeting chimeras (PROTACs), which harness the ubiquitin-proteasome system to selectively induce targeted protein degradation, represent an emerging therapeutic technology with the potential to modulate traditional undruggable targets. Over the past few years, this technology has moved from academia to industry and more than 10 PROTACs have been advanced into clinical trials. However, designing potent PROTACs with desirable drug-like properties still remains a great challenge. Here, we report an updated online database, PROTAC-DB 2.0, which is a repository of structural and experimental data about PROTACs. In this 2nd release, we expanded the number of PROTACs to 3270, which corresponds to a 96% expansion over the first version. Meanwhile, the numbers of warheads (small molecules targeting the proteins of interest), linkers, and E3 ligands (small molecules recruiting E3 ligases) have increased to over 360, 1500 and 80, respectively. In addition, given the importance and the limited number of the crystal target-PROTAC-E3 ternary complex structures, we provide the predicted ternary complex structures for PROTACs with good degradation capability using our PROTAC-Model method. To further facilitate the analysis of PROTAC data, a new filtering strategy based on the E3 ligases is also added. PROTAC-DB 2.0 is available online at http://cadd.zju.edu.cn/protacdb/.

Assuntos

Bases de Dados de Proteínas , Complexo de Endopeptidases do Proteassoma , Proteólise , Complexo de Endopeptidases do Proteassoma/metabolismo , Proteínas/metabolismo , Ubiquitina/metabolismo , Ubiquitina-Proteína Ligases/metabolismo

12.

fastDRH: a webserver to predict and analyze protein-ligand complexes based on molecular docking and MM/PB(GB)SA computation.

Wang, Zhe; Pan, Hong; Sun, Huiyong; Kang, Yu; Liu, Huanxiang; Cao, Dongsheng; Hou, Tingjun.

Brief Bioinform ; 23(5)2022 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-35580866

RESUMO

Predicting the native or near-native binding pose of a small molecule within a protein binding pocket is an extremely important task in structure-based drug design, especially in the hit-to-lead and lead optimization phases. In this study, fastDRH, a free and open accessed web server, was developed to predict and analyze protein-ligand complex structures. In fastDRH server, AutoDock Vina and AutoDock-GPU docking engines, structure-truncated MM/PB(GB)SA free energy calculation procedures and multiple poses based per-residue energy decomposition analysis were well integrated into a user-friendly and multifunctional online platform. Benefit from the modular architecture, users can flexibly use one or more of three features, including molecular docking, docking pose rescoring and hotspot residue prediction, to obtain the key information clearly based on a result analysis panel supported by 3Dmol.js and Apache ECharts. In terms of protein-ligand binding mode prediction, the integrated structure-truncated MM/PB(GB)SA rescoring procedures exhibit a success rate of >80% in benchmark, which is much better than the AutoDock Vina (~70%). For hotspot residue identification, our multiple poses based per-residue energy decomposition analysis strategy is a more reliable solution than the one using only a single pose, and the performance of our solution has been experimentally validated in several drug discovery projects. To summarize, the fastDRH server is a useful tool for predicting the ligand binding mode and the hotspot residue of protein for ligand binding. The fastDRH server is accessible free of charge at http://cadd.zju.edu.cn/fastdrh/.

Assuntos

Proteínas , Sítios de Ligação , Entropia , Ligantes , Simulação de Acoplamento Molecular , Ligação Proteica , Proteínas/química

13.

ABC-Net: a divide-and-conquer based deep learning architecture for SMILES recognition from molecular images.

Zhang, Xiao-Chen; Yi, Jia-Cai; Yang, Guo-Ping; Wu, Cheng-Kun; Hou, Ting-Jun; Cao, Dong-Sheng.

Brief Bioinform ; 23(2)2022 03 10.

Artigo em Inglês | MEDLINE | ID: mdl-35212357

RESUMO

Structural information for chemical compounds is often described by pictorial images in most scientific documents, which cannot be easily understood and manipulated by computers. This dilemma makes optical chemical structure recognition (OCSR) an essential tool for automatically mining knowledge from an enormous amount of literature. However, existing OCSR methods fall far short of our expectations for realistic requirements due to their poor recovery accuracy. In this paper, we developed a deep neural network model named ABC-Net (Atom and Bond Center Network) to predict graph structures directly. Based on the divide-and-conquer principle, we propose to model an atom or a bond as a single point in the center. In this way, we can leverage a fully convolutional neural network (CNN) to generate a series of heat-maps to identify these points and predict relevant properties, such as atom types, atom charges, bond types and other properties. Thus, the molecular structure can be recovered by assembling the detected atoms and bonds. Our approach integrates all the detection and property prediction tasks into a single fully CNN, which is scalable and capable of processing molecular images quite efficiently. Experimental results demonstrate that our method could achieve a significant improvement in recognition performance compared with publicly available tools. The proposed method could be considered as a promising solution to OCSR problems and a starting point for the acquisition of molecular information in the literature.

Assuntos

Aprendizado Profundo , Estrutura Molecular , Redes Neurais de Computação

14.

BioNet: a large-scale and heterogeneous biological network model for interaction prediction with graph convolution.

Yang, Xi; Wang, Wei; Ma, Jing-Lun; Qiu, Yan-Long; Lu, Kai; Cao, Dong-Sheng; Wu, Cheng-Kun.

Brief Bioinform ; 23(1)2022 01 17.

Artigo em Inglês | MEDLINE | ID: mdl-34849567

RESUMO

MOTIVATION: Understanding chemical-gene interactions (CGIs) is crucial for screening drugs. Wet experiments are usually costly and laborious, which limits relevant studies to a small scale. On the contrary, computational studies enable efficient in-silico exploration. For the CGI prediction problem, a common method is to perform systematic analyses on a heterogeneous network involving various biomedical entities. Recently, graph neural networks become popular in the field of relation prediction. However, the inherent heterogeneous complexity of biological interaction networks and the massive amount of data pose enormous challenges. This paper aims to develop a data-driven model that is capable of learning latent information from the interaction network and making correct predictions. RESULTS: We developed BioNet, a deep biological networkmodel with a graph encoder-decoder architecture. The graph encoder utilizes graph convolution to learn latent information embedded in complex interactions among chemicals, genes, diseases and biological pathways. The learning process is featured by two consecutive steps. Then, embedded information learnt by the encoder is then employed to make multi-type interaction predictions between chemicals and genes with a tensor decomposition decoder based on the RESCAL algorithm. BioNet includes 79 325 entities as nodes, and 34 005 501 relations as edges. To train such a massive deep graph model, BioNet introduces a parallel training algorithm utilizing multiple Graphics Processing Unit (GPUs). The evaluation experiments indicated that BioNet exhibits outstanding prediction performance with a best area under Receiver Operating Characteristic (ROC) curve of 0.952, which significantly surpasses state-of-theart methods. For further validation, top predicted CGIs of cancer and COVID-19 by BioNet were verified by external curated data and published literature.

Assuntos

Biologia Computacional , Simulação por Computador , Modelos Biológicos , Redes Neurais de Computação

15.

Comprehensive assessment of deep generative architectures for de novo drug design.

Wang, Mingyang; Sun, Huiyong; Wang, Jike; Pang, Jinping; Chai, Xin; Xu, Lei; Li, Honglin; Cao, Dongsheng; Hou, Tingjun.

Brief Bioinform ; 23(1)2022 01 17.

Artigo em Inglês | MEDLINE | ID: mdl-34929743

RESUMO

Recently, deep learning (DL)-based de novo drug design represents a new trend in pharmaceutical research, and numerous DL-based methods have been developed for the generation of novel compounds with desired properties. However, a comprehensive understanding of the advantages and disadvantages of these methods is still lacking. In this study, the performances of different generative models were evaluated by analyzing the properties of the generated molecules in different scenarios, such as goal-directed (rediscovery, optimization and scaffold hopping of active compounds) and target-specific (generation of novel compounds for a given target) tasks. In overall, the DL-based models have significant advantages over the baseline models built by the traditional methods in learning the physicochemical property distributions of the training sets and may be more suitable for target-specific tasks. However, both the baselines and DL-based generative models cannot fully exploit the scaffolds of the training sets, and the molecules generated by the DL-based methods even have lower scaffold diversity than those generated by the traditional models. Moreover, our assessment illustrates that the DL-based methods do not exhibit obvious advantages over the genetic algorithm-based baselines in goal-directed tasks. We believe that our study provides valuable guidance for the effective use of generative models in de novo drug design.

Assuntos

Desenho de Fármacos , Descoberta de Drogas/métodos , Algoritmos , Aprendizado Profundo

16.

Knowledge-based BERT: a method to extract molecular features like computational chemists.

Wu, Zhenxing; Jiang, Dejun; Wang, Jike; Zhang, Xujun; Du, Hongyan; Pan, Lurong; Hsieh, Chang-Yu; Cao, Dongsheng; Hou, Tingjun.

Brief Bioinform ; 23(3)2022 05 13.

Artigo em Inglês | MEDLINE | ID: mdl-35438145

RESUMO

Molecular property prediction models based on machine learning algorithms have become important tools to triage unpromising lead molecules in the early stages of drug discovery. Compared with the mainstream descriptor- and graph-based methods for molecular property predictions, SMILES-based methods can directly extract molecular features from SMILES without human expert knowledge, but they require more powerful algorithms for feature extraction and a larger amount of data for training, which makes SMILES-based methods less popular. Here, we show the great potential of pre-training in promoting the predictions of important pharmaceutical properties. By utilizing three pre-training tasks based on atom feature prediction, molecular feature prediction and contrastive learning, a new pre-training method K-BERT, which can extract chemical information from SMILES like chemists, was developed. The calculation results on 15 pharmaceutical datasets show that K-BERT outperforms well-established descriptor-based (XGBoost) and graph-based (Attentive FP and HRGCN+) models. In addition, we found that the contrastive learning pre-training task enables K-BERT to 'understand' SMILES not limited to canonical SMILES. Moreover, the general fingerprints K-BERT-FP generated by K-BERT exhibit comparative predictive power to MACCS on 15 pharmaceutical datasets and can also capture molecular size and chirality information that traditional binary fingerprints cannot capture. Our results illustrate the great potential of K-BERT in the practical applications of molecular property predictions in drug discovery.

Assuntos

Algoritmos , Aprendizado de Máquina , Humanos , Bases de Conhecimento , Preparações Farmacêuticas , Projetos de Pesquisa

17.

Out-of-the-box deep learning prediction of quantum-mechanical partial charges by graph representation and transfer learning.

Jiang, Dejun; Sun, Huiyong; Wang, Jike; Hsieh, Chang-Yu; Li, Yuquan; Wu, Zhenxing; Cao, Dongsheng; Wu, Jian; Hou, Tingjun.

Brief Bioinform ; 23(2)2022 03 10.

Artigo em Inglês | MEDLINE | ID: mdl-35062020

RESUMO

Accurate prediction of atomic partial charges with high-level quantum mechanics (QM) methods suffers from high computational cost. Numerous feature-engineered machine learning (ML)-based predictors with favorable computability and reliability have been developed as alternatives. However, extensive expertise effort was needed for feature engineering of atom chemical environment, which may consequently introduce domain bias. In this study, SuperAtomicCharge, a data-driven deep graph learning framework, was proposed to predict three important types of partial charges (i.e. RESP, DDEC4 and DDEC78) derived from high-level QM calculations based on the structures of molecules. SuperAtomicCharge was designed to simultaneously exploit the 2D and 3D structural information of molecules, which was proved to be an effective way to improve the prediction accuracy of the model. Moreover, a simple transfer learning strategy and a multitask learning strategy based on self-supervised descriptors were also employed to further improve the prediction accuracy of the proposed model. Compared with the latest baselines, including one GNN-based predictor and two ML-based predictors, SuperAtomicCharge showed better performance on all the three external test sets and had better usability and portability. Furthermore, the QM partial charges of new molecules predicted by SuperAtomicCharge can be efficiently used in drug design applications such as structure-based virtual screening, where the predicted RESP and DDEC4 charges of new molecules showed more robust scoring and screening power than the commonly used partial charges. Finally, two tools including an online server (http://cadd.zju.edu.cn/deepchargepredictor) and the source code command lines (https://github.com/zjujdj/SuperAtomicCharge) were developed for the easy access of the SuperAtomicCharge services.

Assuntos

Aprendizado Profundo , Desenho de Fármacos , Aprendizado de Máquina , Reprodutibilidade dos Testes , Software

18.

Predicting Elimination of Small-Molecule Drug Half-Life in Pharmacokinetics Using Ensemble and Consensus Machine Learning Methods.

Fan, Jianing; Shi, Shaohua; Xiang, Hong; Fu, Li; Duan, Yanjing; Cao, Dongsheng; Lu, Hongwei.

J Chem Inf Model ; 64(8): 3080-3092, 2024 Apr 22.

Artigo em Inglês | MEDLINE | ID: mdl-38563433

RESUMO

Half-life is a significant pharmacokinetic parameter included in the excretion phase of absorption, distribution, metabolism, and excretion. It is one of the key factors for the successful marketing of drug candidates. Therefore, predicting half-life is of great significance in drug design. In this study, we employed eXtreme Gradient Boosting (XGboost), randomForest (RF), gradient boosting machine (GBM), and supporting vector machine (SVM) to build quantitative structure-activity relationship (QSAR) models on 3512 compounds and evaluated model performance by using root-mean-square error (RMSE), R2, and mean absolute error (MAE) metrics and interpreted features by SHapley Additive exPlanation (SHAP). Furthermore, we developed consensus models through integrating four individual models and validated their performance using a Y-randomization test and applicability domain analysis. Finally, matched molecular pair analysis was used to extract the transformation rules. Our results revealed that XGboost outperformed other individual models (RMSE = 0.176, R2 = 0.845, MAE = 0.141). The consensus model integrating all four models continued to enhance prediction performance (RMSE = 0.172, R2 = 0.856, MAE = 0.138). We evaluated the reliability, robustness, and generalization ability via Y-randomization test and applicability domain analysis. Meanwhile, we utilized SHAP to interpret features and employed matched molecular pair analysis to extract chemical transformation rules that provide suggestions for optimizing drug structure. In conclusion, we believe that the consensus model developed in this study serve as a reliable tool to evaluate half-life in drug discovery, and the chemical transformation rules concluded in this study could provide valuable suggestions in drug discovery.

Assuntos

Aprendizado de Máquina , Relação Quantitativa Estrutura-Atividade , Meia-Vida , Preparações Farmacêuticas/química , Preparações Farmacêuticas/metabolismo , Bibliotecas de Moléculas Pequenas/química , Farmacocinética , Máquina de Vetores de Suporte

19.

Comprehensive Review of Drug-Drug Interaction Prediction Based on Machine Learning: Current Status, Challenges, and Opportunities.

Wang, Ning-Ning; Zhu, Bei; Li, Xin-Liang; Liu, Shao; Shi, Jian-Yu; Cao, Dong-Sheng.

J Chem Inf Model ; 64(1): 96-109, 2024 Jan 08.

Artigo em Inglês | MEDLINE | ID: mdl-38132638

RESUMO

Detecting drug-drug interactions (DDIs) is an essential step in drug development and drug administration. Given the shortcomings of current experimental methods, the machine learning (ML) approach has become a reliable alternative, attracting extensive attention from the academic and industrial fields. With the rapid development of computational science and the growing popularity of cross-disciplinary research, a large number of DDI prediction studies based on ML methods have been published in recent years. To give an insight into the current situation and future direction of DDI prediction research, we systemically review these studies from three aspects: (1) the classic DDI databases, mainly including databases of drugs, side effects, and DDI information; (2) commonly used drug attributes, which focus on chemical, biological, and phenotypic attributes for representing drugs; (3) popular ML approaches, such as shallow learning-based, deep learning-based, recommender system-based, and knowledge graph-based methods for DDI detection. For each section, related studies are described, summarized, and compared, respectively. In the end, we conclude the research status of DDI prediction based on ML methods and point out the existing issues, future challenges, potential opportunities, and subsequent research direction.

Assuntos

Bases de Conhecimento , Aprendizado de Máquina , Interações Medicamentosas , Preparações Farmacêuticas , Bases de Dados Factuais

20.

Enhancing Multi-species Liver Microsomal Stability Prediction through Artificial Intelligence.

Long, Teng-Zhi; Jiang, De-Jun; Shi, Shao-Hua; Deng, You-Chao; Wang, Wen-Xuan; Cao, Dong-Sheng.

J Chem Inf Model ; 64(8): 3222-3236, 2024 Apr 22.

Artigo em Inglês | MEDLINE | ID: mdl-38498003

RESUMO

Liver microsomal stability, a crucial aspect of metabolic stability, significantly impacts practical drug discovery. However, current models for predicting liver microsomal stability are based on limited molecular information from a single species. To address this limitation, we constructed the largest public database of compounds from three common species: human, rat, and mouse. Subsequently, we developed a series of classification models using both traditional descriptor-based and classic graph-based machine learning (ML) algorithms. Remarkably, the best-performing models for the three species achieved Matthews correlation coefficients (MCCs) of 0.616, 0.603, and 0.574, respectively, on the test set. Furthermore, through the construction of consensus models based on these individual models, we have demonstrated their superior predictive performance in comparison with the existing models of the same type. To explore the similarities and differences in the properties of liver microsomal stability among multispecies molecules, we conducted preliminary interpretative explorations using the Shapley additive explanations (SHAP) and atom heatmap approaches for the models and misclassified molecules. Additionally, we further investigated representative structural modifications and substructures that decrease the liver microsomal stability in different species using the matched molecule pair analysis (MMPA) method and substructure extraction techniques. The established prediction models, along with insightful interpretation information regarding liver microsomal stability, will significantly contribute to enhancing the efficiency of exploring practical drugs for development.

Assuntos

Inteligência Artificial , Microssomos Hepáticos , Microssomos Hepáticos/metabolismo , Animais , Camundongos , Ratos , Humanos , Aprendizado de Máquina , Descoberta de Drogas/métodos , Preparações Farmacêuticas/metabolismo , Preparações Farmacêuticas/química

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA