Your browser doesn't support javascript.
loading
Impact of Chemist-In-The-Loop Molecular Representations on Machine Learning Outcomes.
Wills, Todd J; Polshakov, Dmitrii A; Robinson, Matthew C; Lee, Alpha A.
Afiliação
  • Wills TJ; CAS, P.O. Box 3012, Columbus, Ohio 43210-0012, United States.
  • Polshakov DA; CAS, P.O. Box 3012, Columbus, Ohio 43210-0012, United States.
  • Robinson MC; PostEra Inc., 1209 Orange Street, Wilmington, Delaware 19801, United States.
  • Lee AA; PostEra Inc., 1209 Orange Street, Wilmington, Delaware 19801, United States.
J Chem Inf Model ; 60(10): 4449-4456, 2020 10 26.
Article em En | MEDLINE | ID: mdl-32786696
ABSTRACT
The development of molecular descriptors is a central challenge in cheminformatics. Most approaches use algorithms that extract atomic environments or end-to-end machine learning. However, a looming question is that how do these approaches compare with the critical eye of trained chemists. The CAS fingerprint engages expert chemists to curate chemical motifs, which they deem could influence bioactivity. In this paper, we benchmark the CAS fingerprint against commonly used fingerprints using a well-established benchmark set of 88 targets. We show that the CAS fingerprint outperforms most of the commonly used molecular fingerprints. Analysis of the CAS fingerprint reveals that experts tend to select features that are rarely reported in the literature, though not all rare features are selected. Our analysis also shows that the CAS fingerprint provides a different source of information compared to other commonly used fingerprints. These results suggest that anthropomorphic insights do have predictive power and highlight the importance of a chemist-in-the-loop approach in the era of machine learning.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Aprendizado de Máquina Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2020 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Aprendizado de Máquina Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2020 Tipo de documento: Article