Improved GNNs for Logâ¯<i>D</i><sub>7.4</sub> Prediction by Transferring Knowledge from Low-Fidelity Data.

Duan, Yan-Jing; Fu, Li; Zhang, Xiao-Chen; Long, Teng-Zhi; He, Yuan-Hang; Liu, Zhao-Qian; Lu, Ai-Ping; Deng, Ya-Feng; Hsieh, Chang-Yu; Hou, Ting-Jun; Cao, Dong-Sheng

Improved GNNs for Logâ¯D_7.4 Prediction by Transferring Knowledge from Low-Fidelity Data.

Duan, Yan-Jing; Fu, Li; Zhang, Xiao-Chen; Long, Teng-Zhi; He, Yuan-Hang; Liu, Zhao-Qian; Lu, Ai-Ping; Deng, Ya-Feng; Hsieh, Chang-Yu; Hou, Ting-Jun; Cao, Dong-Sheng.

Afiliação

Duan YJ; Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China.
Fu L; Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China.
Zhang XC; School of Information Technology, Shangqiu Normal University, Shangqiu 476000, Henan, P. R. China.
Long TZ; Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China.
He YH; Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China.
Liu ZQ; Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China.
Lu AP; Institute for Advancing Translational Medicine in Bone & Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong SAR 999077, P. R. China.
Deng YF; CarbonSilicon AI Technology Co., Ltd., Hangzhou 310018, Zhejiang, P. R. China.
Hsieh CY; Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.
Hou TJ; Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, P. R. China.
Cao DS; Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P. R. China.

J Chem Inf Model ; 63(8): 2345-2359, 2023 04 24.

Article em En | MEDLINE | ID: mdl-37000044

RESUMO

The n-octanol/buffer solution distribution coefficient at pH = 7.4 (logâ¯D7.4) is an indicator of lipophilicity, and it influences a wide variety of absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties and druggability of compounds. In logâ¯D7.4 prediction, graph neural networks (GNNs) can uncover subtle structure-property relationships (SPRs) by automatically extracting features from molecular graphs that facilitate the learning of SPRs, but their performances are often limited by the small size of available datasets. Herein, we present a transfer learning strategy called pretraining on computational data and then fine-tuning on experimental data (PCFE) to fully exploit the predictive potential of GNNs. PCFE works by pretraining a GNN model on 1.71 million computational logâ¯D data (low-fidelity data) and then fine-tuning it on 19,155 experimental logâ¯D7.4 data (high-fidelity data). The experiments for three GNN architectures (graph convolutional network (GCN), graph attention network (GAT), and Attentive FP) demonstrated the effectiveness of PCFE in improving GNNs for logâ¯D7.4 predictions. Moreover, the optimal PCFE-trained GNN model (cx-Attentive FP, Rtest2 = 0.909) outperformed four excellent descriptor-based models (random forest (RF), gradient boosting (GB), support vector machine (SVM), and extreme gradient boosting (XGBoost)). The robustness of the cx-Attentive FP model was also confirmed by evaluating the models with different training data sizes and dataset splitting strategies. Therefore, we developed a webserver and defined the applicability domain for this model. The webserver (http://tools.scbdd.com/chemlogd/) provides free logâ¯D7.4 prediction services. In addition, the important descriptors for logâ¯D7.4 were detected by the Shapley additive explanations (SHAP) method, and the most relevant substructures of logâ¯D7.4 were identified by the attention mechanism. Finally, the matched molecular pair analysis (MMPA) was performed to summarize the contributions of common chemical substituents to logâ¯D7.4, including a variety of hydrocarbon groups, halogen groups, heteroatoms, and polar groups. In conclusion, we believe that the cx-Attentive FP model can serve as a reliable tool to predict logâ¯D7.4 and hope that pretraining on low-fidelity data can help GNNs make accurate predictions of other endpoints in drug discovery.

Assuntos

Descoberta de Drogas; Halogênios; 1-Octanol; Aprendizagem; Redes Neurais de Computação

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Descoberta de Drogas / Halogênios Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Descoberta de Drogas / Halogênios Idioma: En Ano de publicação: 2023 Tipo de documento: Article