Complementary multi-modality molecular self-supervised learning via non-overlapping masking for property prediction.

Shen, Ao; Yuan, Mingzhi; Ma, Yingfan; Du, Jie; Wang, Manning

Shen, Ao; Yuan, Mingzhi; Ma, Yingfan; Du, Jie; Wang, Manning.

Afiliação

Shen A; Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong'an Road, 200032, Shanghai, China.
Yuan M; Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong'an Road, 200032, Shanghai, China.
Ma Y; Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong'an Road, 200032, Shanghai, China.
Du J; Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong'an Road, 200032, Shanghai, China.
Wang M; Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong'an Road, 200032, Shanghai, China.

Brief Bioinform ; 25(4)2024 May 23.

Article em En | MEDLINE | ID: mdl-38801702

ABSTRACT

ABSTRACT

Self-supervised learning plays an important role in molecular representation learning because labeled molecular data are usually limited in many tasks, such as chemical property prediction and virtual screening. However, most existing molecular pre-training methods focus on one modality of molecular data, and the complementary information of two important modalities, SMILES and graph, is not fully explored. In this study, we propose an effective multi-modality self-supervised learning framework for molecular SMILES and graph. Specifically, SMILES data and graph data are first tokenized so that they can be processed by a unified Transformer-based backbone network, which is trained by a masked reconstruction strategy. In addition, we introduce a specialized non-overlapping masking strategy to encourage fine-grained interaction between these two modalities. Experimental results show that our framework achieves state-of-the-art performance in a series of molecular property prediction tasks, and a detailed ablation study demonstrates efficacy of the multi-modality framework and the masking strategy.

Assuntos

Aprendizado de Máquina Supervisionado; Algoritmos; Biologia Computacional/métodos

Palavras-chave

molecular property prediction; molecular representations; multi-modality self-supervised learning

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina Supervisionado Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina Supervisionado Idioma: En Ano de publicação: 2024 Tipo de documento: Article