Development of a novel fingerprint for chemical reactions and its application to large-scale reaction classification and similarity.
J Chem Inf Model
; 55(1): 39-53, 2015 Jan 26.
Article
en En
| MEDLINE
| ID: mdl-25541888
Fingerprint methods applied to molecules have proven to be useful for similarity determination and as inputs to machine-learning models. Here, we present the development of a new fingerprint for chemical reactions and validate its usefulness in building machine-learning models and in similarity assessment. Our final fingerprint is constructed as the difference of the atom-pair fingerprints of products and reactants and includes agents via calculated physicochemical properties. We validated the fingerprints on a large data set of reactions text-mined from granted United States patents from the last 40 years that have been classified using a substructure-based expert system. We applied machine learning to build a 50-class predictive model for reaction-type classification that correctly predicts 97% of the reactions in an external test set. Impressive accuracies were also observed when applying the classifier to reactions from an in-house electronic laboratory notebook. The performance of the novel fingerprint for assessing reaction similarity was evaluated by a cluster analysis that recovered 48 out of 50 of the reaction classes with a median F-score of 0.63 for the clusters. The data sets used for training and primary validation as well as all python scripts required to reproduce the analysis are provided in the Supporting Information.
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Asunto principal:
Inteligencia Artificial
/
Bases de Datos de Compuestos Químicos
/
Modelos Químicos
Tipo de estudio:
Prognostic_studies
Idioma:
En
Revista:
J Chem Inf Model
Asunto de la revista:
INFORMATICA MEDICA
/
QUIMICA
Año:
2015
Tipo del documento:
Article
País de afiliación:
Suiza