Your browser doesn't support javascript.
loading
Data undersampling models for the efficient rule-based retrosynthetic planning.
Park, Min Sik; Lee, Dongseon; Kwon, Youngchun; Kim, Eunji; Choi, Youn-Suk.
Afiliación
  • Park MS; Autonomous Material Development Lab, Samsung Advanced Institute of Technology, Samsung Electronics, 130 Samsung-ro, Suwon, Gyeonggi-do 16678, Republic of Korea. ms91.park@samsung.com.
  • Lee D; Autonomous Material Development Lab, Samsung Advanced Institute of Technology, Samsung Electronics, 130 Samsung-ro, Suwon, Gyeonggi-do 16678, Republic of Korea. ms91.park@samsung.com.
  • Kwon Y; Autonomous Material Development Lab, Samsung Advanced Institute of Technology, Samsung Electronics, 130 Samsung-ro, Suwon, Gyeonggi-do 16678, Republic of Korea. ms91.park@samsung.com.
  • Kim E; Department of Computer Science and Engineering, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, Republic of Korea.
  • Choi YS; Autonomous Material Development Lab, Samsung Advanced Institute of Technology, Samsung Electronics, 130 Samsung-ro, Suwon, Gyeonggi-do 16678, Republic of Korea. ms91.park@samsung.com.
Phys Chem Chem Phys ; 23(46): 26510-26518, 2021 Dec 01.
Article en En | MEDLINE | ID: mdl-34807202
ABSTRACT
Computer-aided retrosynthetic planning for organic molecules, which is based on a large synthetic database, is a significant part of the recent development of autonomous robotic chemists. As in other AI fields, however, the class imbalance problem in the dataset affects the prediction performance of retrosynthetic paths. Here, we demonstrate that applying undersampling models to the imbalanced reaction dataset can improve the prediction of retrosynthetic templates for target molecules. We report improvements in the top-1 and top-10 prediction accuracies by 13.8% (13.1, 5.4%) and 8.8% (6.9, 2.4%) for undersampling based on the similarity (random, dissimilarity) clustering of molecular structures of products, respectively. These results demonstrate the importance of deep understanding of the statistical distribution, internal structure, and sampling for the training dataset. For practical applications, the target-oriented undersampling model is proposed and confirmed by the improved prediction performance of 9.3 and 4.2% for the top-1 and top-10 accuracies, respectively.

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Tipo de estudio: Prognostic_studies Idioma: En Revista: Phys Chem Chem Phys Asunto de la revista: BIOFISICA / QUIMICA Año: 2021 Tipo del documento: Article

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Tipo de estudio: Prognostic_studies Idioma: En Revista: Phys Chem Chem Phys Asunto de la revista: BIOFISICA / QUIMICA Año: 2021 Tipo del documento: Article
...