RESUMO
Artificial intelligence-based molecular property prediction plays a key role in molecular design such as bioactive molecules and functional materials. In this study, we propose a self-supervised pretraining deep learning (DL) framework, called functional group bidirectional encoder representations from transformers (FG-BERT), pertained based on ~1.45 million unlabeled drug-like molecules, to learn meaningful representation of molecules from function groups. The pretrained FG-BERT framework can be fine-tuned to predict molecular properties. Compared to state-of-the-art (SOTA) machine learning and DL methods, we demonstrate the high performance of FG-BERT in evaluating molecular properties in tasks involving physical chemistry, biophysics and physiology across 44 benchmark datasets. In addition, FG-BERT utilizes attention mechanisms to focus on FG features that are critical to the target properties, thereby providing excellent interpretability for downstream training tasks. Collectively, FG-BERT does not require any artificially crafted features as input and has excellent interpretability, providing an out-of-the-box framework for developing SOTA models for a variety of molecule (especially for drug) discovery tasks.
Assuntos
Algoritmos , Inteligência Artificial , Benchmarking , Aprendizado de MáquinaRESUMO
Small molecule antioxidants can inhibit or retard oxidation reactions and protect against free radical damage to cells, thus playing a key role in food, cosmetics, pharmaceuticals, the environment, as well as materials. Experimentally driven antioxidant discovery is a major paradigm, and computationally assisted antioxidants are rarely reported. In this study, a functional-group-based alternating multitask self-supervised molecular representation learning method is proposed to simultaneously predict the antioxidant activities of small molecules for eight commonly used in vitro antioxidant assays. Extensive evaluation results reveal that compared with the baseline models, the multitask FG-BERT model achieves the best overall predictive performance, with the highest average F1, BA, ROC-AUC, and PRC-AUC values of 0.860, 0.880, 0.954, and 0.937 for the test sets, respectively. The Y-scrambling testing results further demonstrate that such a deep learning model was not constructed by accident and that it has reliable predictive capabilities. Additionally, the excellent interpretability of the multitask FG-BERT model makes it easy to identify key structural fragments/groups that contribute significantly to the antioxidant effect of a given molecule. Finally, an online antioxidant activity prediction platform called AOP (freely available at https://aop.idruglab.cn/) and its local version were developed based on the high-quality multitask FG-BERT model for experts and nonexperts in the field. We anticipate that it will contribute to the discovery of novel small-molecule antioxidants.