Your browser doesn't support javascript.
loading
Conditional Molecular Generation Net Enables Automated Structure Elucidation Based on 13C NMR Spectra and Prior Knowledge.
Yao, Lin; Yang, Minjian; Song, Jianfei; Yang, Zhuo; Sun, Hanyu; Shi, Hui; Liu, Xue; Ji, Xiangyang; Deng, Yafeng; Wang, Xiaojian.
Affiliation
  • Yao L; CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China.
  • Yang M; State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China.
  • Song J; CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China.
  • Yang Z; CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China.
  • Sun H; State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China.
  • Shi H; CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China.
  • Liu X; State Key Laboratory of Bioactive Substances and Functions of Natural Medicines, Institute of Materia Medica, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100050, China.
  • Ji X; Department of Automation, Tsinghua University, Beijing 100084, China.
  • Deng Y; CarbonSilicon AI Technology Co., Ltd., Beijing 100080, China.
  • Wang X; Department of Automation, Tsinghua University, Beijing 100084, China.
Anal Chem ; 95(12): 5393-5401, 2023 Mar 28.
Article in En | MEDLINE | ID: mdl-36926883
ABSTRACT
Structure elucidation of unknown compounds based on nuclear magnetic resonance (NMR) remains a challenging problem in both synthetic organic and natural product chemistry. Library matching has been an efficient method to assist structure elucidation. However, it is limited by the coverage of libraries. In addition, prior knowledge such as molecular fragments is neglected. To solve the problem, we propose a conditional molecular generation net (CMGNet) to allow input of multiple sources of information. CMGNet not only uses 13C NMR spectrum data as input but molecular formulas and fragments of molecules are also employed as input conditions. Our model applies large-scale pretraining for molecular understanding and fine-tuning on two NMR spectral data sets of different granularity levels to accommodate structure elucidation tasks. CMGNet generates structures based on 13C NMR data, molecular formula, and fragment information, with a recovery rate of 94.17% in the top 10 recommendations. In addition, the generative model performed well in the generation of various classes of compounds and in the structural revision task. CMGNet has a deep understanding of molecular connectivities from 13C NMR, molecular formula, and fragments, paving the way for a new paradigm of deep learning-assisted inverse problem-solving.

Full text: 1 Database: MEDLINE Language: En Year: 2023 Type: Article

Full text: 1 Database: MEDLINE Language: En Year: 2023 Type: Article