Automated Chemical Reaction Extraction from Scientific Literature.

Guo, Jiang; Ibanez-Lopez, A Santiago; Gao, Hanyu; Quach, Victor; Coley, Connor W; Jensen, Klavs F; Barzilay, Regina

Guo, Jiang; Ibanez-Lopez, A Santiago; Gao, Hanyu; Quach, Victor; Coley, Connor W; Jensen, Klavs F; Barzilay, Regina.

Afiliación

Guo J; Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, United States.
Ibanez-Lopez AS; Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, United States.
Gao H; Department of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States.
Quach V; Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, United States.
Coley CW; Department of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States.
Jensen KF; Department of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States.
Barzilay R; Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, United States.

J Chem Inf Model ; 62(9): 2035-2045, 2022 05 09.

Article en En | MEDLINE | ID: mdl-34115937

RESUMEN

Access to structured chemical reaction data is of key importance for chemists in performing bench experiments and in modern applications like computer-aided drug design. Existing reaction databases are generally populated by human curators through manual abstraction from published literature (e.g., patents and journals), which is time consuming and labor intensive, especially with the exponential growth of chemical literature in recent years. In this study, we focus on developing automated methods for extracting reactions from chemical literature. We consider journal publications as the target source of information, which are more comprehensive and better represent the latest developments in chemistry compared to patents; however, they are less formulaic in their descriptions of reactions. To implement the reaction extraction system, we first devised a chemical reaction schema, primarily including a central product, and a set of associated reaction roles such as reactants, catalyst, solvent, and so on. We formulate the task as a structure prediction problem and solve it with a two-stage deep learning framework consisting of product extraction and reaction role labeling. Both models are built upon Transformer-based encoders, which are adaptively pretrained using domain and task-relevant unlabeled data. Our models are shown to be both effective and data efficient, achieving an F1 score of 76.2% in product extraction and 78.7% in role extraction, with only hundreds of annotated reactions.

Asunto(s)

Bases de Datos Factuales; Humanos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Bases de Datos Factuales Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: J Chem Inf Model Asunto de la revista: INFORMATICA MEDICA / QUIMICA Año: 2022 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google