An effective self-supervised framework for learning expressive molecular global representations to drug discovery.

Li, Pengyong; Wang, Jun; Qiao, Yixuan; Chen, Hao; Yu, Yihuan; Yao, Xiaojun; Gao, Peng; Xie, Guotong; Song, Sen

Li, Pengyong; Wang, Jun; Qiao, Yixuan; Chen, Hao; Yu, Yihuan; Yao, Xiaojun; Gao, Peng; Xie, Guotong; Song, Sen.

Afiliación

Li P; Department of Biomedical Engineering at Tsinghua University, China.
Wang J; Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China.
Qiao Y; Operations Research and Cybernetics at Beijing University of Technology, China.
Chen H; Cybernetics at Beijing University of Technology, China.
Yu Y; Beijing University of Biomedical Engineering, China.
Yao X; Analytical Chemistry and Chemoinformatics at Lanzhou University, China.
Gao P; Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China.
Xie G; Ping An Healthcare Technology, Chaoyang, 100027 Beijing, China.
Song S; Tsinghua Laboratory of Brain and Intelligence and Department of Biomedical Engineering, Tsinghua University, Haidian, 100084 Beijing, China.

Brief Bioinform ; 22(6)2021 11 05.

Article en En | MEDLINE | ID: mdl-33940598

RESUMEN

How to produce expressive molecular representations is a fundamental challenge in artificial intelligence-driven drug discovery. Graph neural network (GNN) has emerged as a powerful technique for modeling molecular data. However, previous supervised approaches usually suffer from the scarcity of labeled data and poor generalization capability. Here, we propose a novel molecular pre-training graph-based deep learning framework, named MPG, that learns molecular representations from large-scale unlabeled molecules. In MPG, we proposed a powerful GNN for modelling molecular graph named MolGNet, and designed an effective self-supervised strategy for pre-training the model at both the node and graph-level. After pre-training on 11 million unlabeled molecules, we revealed that MolGNet can capture valuable chemical insights to produce interpretable representation. The pre-trained MolGNet can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 14 benchmark datasets. The pre-trained MolGNet in MPG has the potential to become an advanced molecular encoder in the drug discovery pipeline.

Asunto(s)

Bases de Datos de Compuestos Químicos; Sistemas de Liberación de Medicamentos; Descubrimiento de Drogas; Modelos Moleculares; Redes Neurales de la Computación

Palabras clave

deep learning; graph neural network; molecular representation; self-supervised learning

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Modelos Moleculares / Redes Neurales de la Computación / Sistemas de Liberación de Medicamentos / Descubrimiento de Drogas / Bases de Datos de Compuestos Químicos Tipo de estudio: Prognostic_studies Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2021 Tipo del documento: Article País de afiliación: China Pais de publicación: Reino Unido

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google