Your browser doesn't support javascript.
loading
On the use of real-world datasets for reaction yield prediction.
Saebi, Mandana; Nan, Bozhao; Herr, John E; Wahlers, Jessica; Guo, Zhichun; Zuranski, Andrzej M; Kogej, Thierry; Norrby, Per-Ola; Doyle, Abigail G; Chawla, Nitesh V; Wiest, Olaf.
Afiliación
  • Saebi M; Department of Computer Science and Engineering and Lucy Family Institute for Data and Society, University of Notre Dame Notre Dame IN 46556 USA nchawla@nd.edu.
  • Nan B; Department of Chemistry and Biochemistry, University of Notre Dame Notre Dame IN 46556 USA owiest@nd.edu.
  • Herr JE; Department of Chemistry and Biochemistry, University of Notre Dame Notre Dame IN 46556 USA owiest@nd.edu.
  • Wahlers J; Department of Chemistry and Biochemistry, University of Notre Dame Notre Dame IN 46556 USA owiest@nd.edu.
  • Guo Z; Department of Computer Science and Engineering and Lucy Family Institute for Data and Society, University of Notre Dame Notre Dame IN 46556 USA nchawla@nd.edu.
  • Zuranski AM; Department of Chemistry, Princeton University Princeton New Jersey 08544 USA.
  • Kogej T; Molecular AI, Discovery Sciences, R&D, AstraZeneca Pepparedsleden 1, SE-431 83 Mölndal Gothenburg Sweden.
  • Norrby PO; Data Science and Modelling, Pharmaceutical Sciences, R&D, AstraZeneca Pepparedsleden 1, SE-431 83 Mölndal Gothenburg Sweden.
  • Doyle AG; Department of Chemistry, Princeton University Princeton New Jersey 08544 USA.
  • Chawla NV; Department of Chemistry and Biochemistry, University of California Los Angeles California 90095 USA.
  • Wiest O; Department of Computer Science and Engineering and Lucy Family Institute for Data and Society, University of Notre Dame Notre Dame IN 46556 USA nchawla@nd.edu.
Chem Sci ; 14(19): 4997-5005, 2023 May 17.
Article en En | MEDLINE | ID: mdl-37206399
ABSTRACT
The lack of publicly available, large, and unbiased datasets is a key bottleneck for the application of machine learning (ML) methods in synthetic chemistry. Data from electronic laboratory notebooks (ELNs) could provide less biased, large datasets, but no such datasets have been made publicly available. The first real-world dataset from the ELNs of a large pharmaceutical company is disclosed and its relationship to high-throughput experimentation (HTE) datasets is described. For chemical yield predictions, a key task in chemical synthesis, an attributed graph neural network (AGNN) performs as well as or better than the best previous models on two HTE datasets for the Suzuki-Miyaura and Buchwald-Hartwig reactions. However, training the AGNN on an ELN dataset does not lead to a predictive model. The implications of using ELN data for training ML-based models are discussed in the context of yield predictions.

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Chem Sci Año: 2023 Tipo del documento: Article

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Chem Sci Año: 2023 Tipo del documento: Article