Mix-Key: graph mixup with key structures for molecular property prediction.

Jiang, Tianyi; Wang, Zeyu; Yu, Wenchao; Wang, Jinhuan; Yu, Shanqing; Bao, Xiaoze; Wei, Bin; Xuan, Qi

Jiang, Tianyi; Wang, Zeyu; Yu, Wenchao; Wang, Jinhuan; Yu, Shanqing; Bao, Xiaoze; Wei, Bin; Xuan, Qi.

Jiang T; Institute of Cyberspace Security, College of Information Engineering, Zhejiang University of Technology, 310023, Hangzhou, China.
Wang Z; Binjiang Institute of Artificial Intelligence, Zhejiang University of Technology, 310056, Hangzhou, China.
Yu W; Institute of Cyberspace Security, College of Information Engineering, Zhejiang University of Technology, 310023, Hangzhou, China.
Wang J; Binjiang Institute of Artificial Intelligence, Zhejiang University of Technology, 310056, Hangzhou, China.
Yu S; the College of Pharmaceutical Science & Collaborative Innovation Center of Yangtze River Delta Region Green Pharmaceuticals, Zhejiang University of Technology, 310014, Hangzhou, China.
Bao X; Institute of Cyberspace Security, College of Information Engineering, Zhejiang University of Technology, 310023, Hangzhou, China.
Wei B; Binjiang Institute of Artificial Intelligence, Zhejiang University of Technology, 310056, Hangzhou, China.
Xuan Q; Institute of Cyberspace Security, College of Information Engineering, Zhejiang University of Technology, 310023, Hangzhou, China.

Brief Bioinform ; 25(3)2024 Mar 27.

Article en En | MEDLINE | ID: mdl-38706318

ABSTRACT

ABSTRACT

Molecular property prediction faces the challenge of limited labeled data as it necessitates a series of specialized experiments to annotate target molecules. Data augmentation techniques can effectively address the issue of data scarcity. In recent years, Mixup has achieved significant success in traditional domains such as image processing. However, its application in molecular property prediction is relatively limited due to the irregular, non-Euclidean nature of graphs and the fact that minor variations in molecular structures can lead to alterations in their properties. To address these challenges, we propose a novel data augmentation method called Mix-Key tailored for molecular property prediction. Mix-Key aims to capture crucial features of molecular graphs, focusing separately on the molecular scaffolds and functional groups. By generating isomers that are relatively invariant to the scaffolds or functional groups, we effectively preserve the core information of molecules. Additionally, to capture interactive information between the scaffolds and functional groups while ensuring correlation between the original and augmented graphs, we introduce molecular fingerprint similarity and node similarity. Through these steps, Mix-Key determines the mixup ratio between the original graph and two isomers, thus generating more informative augmented molecular graphs. We extensively validate our approach on molecular datasets of different scales with several Graph Neural Network architectures. The results demonstrate that Mix-Key consistently outperforms other data augmentation methods in enhancing molecular property prediction on several datasets.

Asunto(s)

Algoritmos; Estructura Molecular; Biología Computacional/métodos; Programas Informáticos

Palabras clave

data augmentation; graph neural network; molecular graph; molecular property prediction

Texto completo

Imprimir

XML

PubMed Links

Search on Google