RESUMEN
Despite the availability and accuracy of modern spectroscopic characterization, the utilization of spectral information in chemical machine learning is still primitive. Here, we report an optical character recognition-based automatic process to utilize spectral information as molecular descriptors, which directly transforms experimental spectrum images to readable vectors. We demonstrate its machine learning application in the reaction yield dataset of Pd-catalyzed Buchwald-Hartwig cross-coupling with aryl halides. In addition, we also show that the predicted spectrum can serve as an alternative encoding source to support the model training.
RESUMEN
Asymmetric hydrogenation of olefins is one of the most powerful asymmetric transformations in molecular synthesis. Although several privileged catalyst scaffolds are available, the catalyst development for asymmetric hydrogenation is still a time- and resource-consuming process due to the lack of predictive catalyst design strategy. Targeting the data-driven design of asymmetric catalysis, we herein report the development of a standardized database that contains the detailed information of over 12000 literature asymmetric hydrogenations of olefins. This database provides a valuable platform for the machine learning applications in asymmetric catalysis. Based on this database, we developed a hierarchical learning approach to achieve predictive machine leaning model using only dozens of enantioselectivity data with the target olefin, which offers a useful solution for the few-shot learning problem and will facilitate the reaction optimization with new olefin substrate in catalysis screening.