Your browser doesn't support javascript.
loading
MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction.
Zhang, Xiao-Chen; Wu, Cheng-Kun; Yang, Zhi-Jiang; Wu, Zhen-Xing; Yi, Jia-Cai; Hsieh, Chang-Yu; Hou, Ting-Jun; Cao, Dong-Sheng.
Affiliation
  • Zhang XC; State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, China.
  • Wu CK; State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, China.
  • Yang ZJ; Xiangya School of Pharmaceutical Sciences, Central South University, China.
  • Wu ZX; College of Pharmaceutical Sciences, Zhengjiang University, China.
  • Yi JC; State Key Laboratory of High-Performance Computing, School of Computer Science, National University of Defense Technology, China.
  • Hsieh CY; Tencent Quantum Laboratory since 2018. He received his PhD degree in Physics from the University of Ottawa in 2012 and worked as a postdoctoral researcher at the University of Toronto (2012-2013) and Massachusetts Institute of Technology (2013-2016), respectively. Before joining Tencent, he worked a
  • Hou TJ; College of Pharmaceutical Sciences, Zhejiang University, China.
  • Cao DS; Xiangya School of Pharmaceutical Sciences, Central South University, China.
Brief Bioinform ; 22(6)2021 11 05.
Article in En | MEDLINE | ID: mdl-33951729
ABSTRACT
MOTIVATION Accurate and efficient prediction of molecular properties is one of the fundamental issues in drug design and discovery pipelines. Traditional feature engineering-based approaches require extensive expertise in the feature design and selection process. With the development of artificial intelligence (AI) technologies, data-driven methods exhibit unparalleled advantages over the feature engineering-based methods in various domains. Nevertheless, when applied to molecular property prediction, AI models usually suffer from the scarcity of labeled data and show poor generalization ability.

RESULTS:

In this study, we proposed molecular graph BERT (MG-BERT), which integrates the local message passing mechanism of graph neural networks (GNNs) into the powerful BERT model to facilitate learning from molecular graphs. Furthermore, an effective self-supervised learning strategy named masked atoms prediction was proposed to pretrain the MG-BERT model on a large amount of unlabeled data to mine context information in molecules. We found the MG-BERT model can generate context-sensitive atomic representations after pretraining and transfer the learned knowledge to the prediction of a variety of molecular properties. The experimental results show that the pretrained MG-BERT model with a little extra fine-tuning can consistently outperform the state-of-the-art methods on all 11 ADMET datasets. Moreover, the MG-BERT model leverages attention mechanisms to focus on atomic features essential to the target property, providing excellent interpretability for the trained model. The MG-BERT model does not require any hand-crafted feature as input and is more reliable due to its excellent interpretability, providing a novel framework to develop state-of-the-art models for a wide range of drug discovery tasks.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Neural Networks, Computer / Models, Theoretical Type of study: Prognostic_studies / Risk_factors_studies Language: En Journal: Brief Bioinform Journal subject: BIOLOGIA / INFORMATICA MEDICA Year: 2021 Document type: Article Affiliation country: China

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Neural Networks, Computer / Models, Theoretical Type of study: Prognostic_studies / Risk_factors_studies Language: En Journal: Brief Bioinform Journal subject: BIOLOGIA / INFORMATICA MEDICA Year: 2021 Document type: Article Affiliation country: China
...