RxnScribe: A Sequence Generation Model for Reaction Diagram Parsing.
J Chem Inf Model
; 63(13): 4030-4041, 2023 07 10.
Article
in En
| MEDLINE
| ID: mdl-37368970
ABSTRACT
Reaction diagram parsing is the task of extracting reaction schemes from a diagram in the chemistry literature. The reaction diagrams can be arbitrarily complex; thus, robustly parsing them into structured data is an open challenge. In this paper, we present RxnScribe, a machine learning model for parsing reaction diagrams of varying styles. We formulate this structured prediction task with a sequence generation approach, which condenses the traditional pipeline into an end-to-end model. We train RxnScribe on a dataset of 1378 diagrams and evaluate it with cross validation, achieving an 80.0% soft match F1 score, with significant improvements over previous models. Our code and data are publicly available at https//github.com/thomas0809/RxnScribe.
Full text:
1
Collection:
01-internacional
Database:
MEDLINE
Main subject:
Machine Learning
Type of study:
Prognostic_studies
Language:
En
Journal:
J Chem Inf Model
Journal subject:
INFORMATICA MEDICA
/
QUIMICA
Year:
2023
Type:
Article
Affiliation country:
United States