Your browser doesn't support javascript.
loading
Self-supervised learning with chemistry-aware fragmentation for effective molecular property prediction.
Xie, Ailin; Zhang, Ziqiao; Guan, Jihong; Zhou, Shuigeng.
Afiliação
  • Xie A; Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, 200438 Shanghai, China.
  • Zhang Z; Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, 200438 Shanghai, China.
  • Guan J; Department of Computer Science and Technology, Tongji University, 201804 Shanghai, China.
  • Zhou S; Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, 200438 Shanghai, China.
Brief Bioinform ; 24(5)2023 09 20.
Article em En | MEDLINE | ID: mdl-37598424
ABSTRACT
Molecular property prediction (MPP) is a crucial and fundamental task for AI-aided drug discovery (AIDD). Recent studies have shown great promise of applying self-supervised learning (SSL) to producing molecular representations to cope with the widely-concerned data scarcity problem in AIDD. As some specific substructures of molecules play important roles in determining molecular properties, molecular representations learned by deep learning models are expected to attach more importance to such substructures implicitly or explicitly to achieve better predictive performance. However, few SSL pre-trained models for MPP in the literature have ever focused on such substructures. To challenge this situation, this paper presents a Chemistry-Aware Fragmentation for Effective MPP (CAFE-MPP in short) under the self-supervised contrastive learning framework. First, a novel fragment-based molecular graph (FMG) is designed to represent the topological relationship between chemistry-aware substructures that constitute a molecule. Then, with well-designed hard negative pairs, a is pre-trained on fragment-level by contrastive learning to extract representations for the nodes in FMGs. Finally, a Graphormer model is leveraged to produce molecular representations for MPP based on the embeddings of fragments. Experiments on 11 benchmark datasets show that the proposed CAFE-MPP method achieves state-of-the-art performance on 7 of the 11 datasets and the second-best performance on 3 datasets, compared with six remarkable self-supervised methods. Further investigations also demonstrate that CAFE-MPP can learn to embed molecules into representations implicitly containing the information of fragments highly correlated to molecular properties, and can alleviate the over-smoothing problem of graph neural networks.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Benchmarking / Descoberta de Drogas Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Benchmarking / Descoberta de Drogas Idioma: En Ano de publicação: 2023 Tipo de documento: Article