Your browser doesn't support javascript.
loading
Dataset Design for Building Models of Chemical Reactivity.
Raghavan, Priyanka; Haas, Brittany C; Ruos, Madeline E; Schleinitz, Jules; Doyle, Abigail G; Reisman, Sarah E; Sigman, Matthew S; Coley, Connor W.
Affiliation
  • Raghavan P; Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.
  • Haas BC; Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States.
  • Ruos ME; Department of Chemistry & Biochemistry, University of California, Los Angeles, Los Angeles, California 90095, United States.
  • Schleinitz J; Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States.
  • Doyle AG; Department of Chemistry & Biochemistry, University of California, Los Angeles, Los Angeles, California 90095, United States.
  • Reisman SE; Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States.
  • Sigman MS; Department of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States.
  • Coley CW; Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.
ACS Cent Sci ; 9(12): 2196-2204, 2023 Dec 27.
Article in En | MEDLINE | ID: mdl-38161380
ABSTRACT
Models can codify our understanding of chemical reactivity and serve a useful purpose in the development of new synthetic processes via, for example, evaluating hypothetical reaction conditions or in silico substrate tolerance. Perhaps the most determining factor is the composition of the training data and whether it is sufficient to train a model that can make accurate predictions over the full domain of interest. Here, we discuss the design of reaction datasets in ways that are conducive to data-driven modeling, emphasizing the idea that training set diversity and model generalizability rely on the choice of molecular or reaction representation. We additionally discuss the experimental constraints associated with generating common types of chemistry datasets and how these considerations should influence dataset design and model building.

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Prognostic_studies Language: En Journal: ACS Cent Sci Year: 2023 Document type: Article Affiliation country: Country of publication:

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Prognostic_studies Language: En Journal: ACS Cent Sci Year: 2023 Document type: Article Affiliation country: Country of publication: