Your browser doesn't support javascript.
loading
Efficient Exploration of Chemical Compound Space Using Active Learning for Prediction of Thermodynamic Properties of Alkane Molecules.
Xiang, Yan; Tang, Yu-Hang; Gong, Zheng; Liu, Hongyi; Wu, Liang; Lin, Guang; Sun, Huai.
Afiliação
  • Xiang Y; School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China.
  • Tang YH; Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States.
  • Gong Z; NVIDIA Corporation, Santa Clara, California 95051, United States.
  • Liu H; School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China.
  • Wu L; School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China.
  • Lin G; School of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China.
  • Sun H; Department of Mathematics & School of Mechanical Engineering, Purdue University, West Lafayette, Indiana 47907, United States.
J Chem Inf Model ; 63(21): 6515-6524, 2023 11 13.
Article em En | MEDLINE | ID: mdl-37857374
ABSTRACT
We introduce an exploratory active learning (AL) algorithm using Gaussian process regression and marginalized graph kernel (GPR-MGK) to sample chemical compound space (CCS) at minimal cost. Targeting 251,728 enumerated alkane molecules with 4-19 carbon atoms, we applied the AL algorithm to select a diverse and representative set of molecules and then conducted high-throughput molecular simulations on these selected molecules. To demonstrate the power of the AL algorithm, we built directed message-passing neural networks (D-MPNN) using simulation data as the training set to predict liquid densities, heat capacities, and vaporization enthalpies of the CCS. Validations show that D-MPNN models built on the smallest training set considered in this work, which consists of 313 molecules or 0.124% of the original CCS, predict the properties with R2 > 0.99 against the computational data and R2 > 0.94 against the experimental data. The advantage of the presented AL algorithm is that the predicted uncertainty of GPR depends on only the molecular structures, which renders it compatible with high-throughput data generation.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Redes Neurais de Computação / Alcanos Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Redes Neurais de Computação / Alcanos Idioma: En Ano de publicação: 2023 Tipo de documento: Article