Deductive machine learning models for product identification.

Jin, Tianfan; Zhao, Qiyuan; Schofield, Andrew B; Savoie, Brett M

Jin, Tianfan; Zhao, Qiyuan; Schofield, Andrew B; Savoie, Brett M.

Afiliação

Jin T; Department of Chemical Engineering, Purdue University West Lafayette USA bsavoie@purdue.edu.
Zhao Q; Department of Chemical Engineering, Purdue University West Lafayette USA bsavoie@purdue.edu.
Schofield AB; Department of Chemical Engineering, Purdue University West Lafayette USA bsavoie@purdue.edu.
Savoie BM; Department of Chemical Engineering, Purdue University West Lafayette USA bsavoie@purdue.edu.

Chem Sci ; 15(30): 11995-12005, 2024 Jul 31.

Article em En | MEDLINE | ID: mdl-39092129

ABSTRACT

ABSTRACT

Deductive solution strategies are required in prediction scenarios that are under determined, when contradictory information is available, or more generally wherever one-to-many non-functional mappings occur. In contrast, most contemporary machine learning (ML) in the chemical sciences is inductive learning from example, with a fixed set of features. Chemical workflows are replete with situations requiring deduction, including many aspects of lab automation and spectral interpretation. Here, a general strategy is described for designing and training machine learning models capable of deduction that consists of combining individual inductive models into a larger deductive network. The training and testing of these models is demonstrated on the task of deducing reaction products from a mixture of spectral sources. The resulting models can distinguish between intended and unintended reaction outcomes and identify starting material based on a mixture of spectral sources. The models also perform well on tasks that they were not directly trained on, like performing structural inference using real rather than simulated spectral inputs, predicting minor products from named organic chemistry reactions, identifying reagents and isomers as plausible impurities, and handling missing or conflicting information. A new dataset of 1 124 043 simulated spectra that were generated to train these models is also distributed with this work. These findings demonstrate that deductive bottlenecks for chemical problems are not fundamentally insuperable for ML models.

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Chem Sci Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: Chem Sci Ano de publicação: 2024 Tipo de documento: Article