Your browser doesn't support javascript.
loading
Machine learning models and performance dependency on 2D chemical descriptor space for retention time prediction of pharmaceuticals.
Beck, Armen G; Fine, Jonathan; Aggarwal, Pankaj; Regalado, Erik L; Levorse, Dorothy; De Jesus Silva, Jordan; Sherer, Edward C.
Afiliação
  • Beck AG; Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA.
  • Fine J; Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA.
  • Aggarwal P; Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA. Electronic address: pankaj.aggarwal@merck.com.
  • Regalado EL; Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA.
  • Levorse D; Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA.
  • De Jesus Silva J; Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA.
  • Sherer EC; Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA.
J Chromatogr A ; 1730: 465109, 2024 Aug 16.
Article em En | MEDLINE | ID: mdl-38968662
ABSTRACT
The predictive modeling of liquid chromatography methods can be an invaluable asset, potentially saving countless hours of labor while also reducing solvent consumption and waste. Tasks such as physicochemical screening and preliminary method screening systems where large amounts of chromatography data are collected from fast and routine operations are particularly well suited for both leveraging large datasets and benefiting from predictive models. Therefore, the generation of predictive models for retention time is an active area of development. However, for these predictive models to gain acceptance, researchers first must have confidence in model performance and the computational cost of building them should be minimal. In this study, a simple and cost-effective workflow for the development of machine learning models to predict retention time using only Molecular Operating Environment 2D descriptors as input for support vector regression is developed. Furthermore, we investigated the relative performance of models based on molecular descriptor space by utilizing uniform manifold approximation and projection and clustering with Gaussian mixture models to identify chemically distinct clusters. Results outlined herein demonstrate that local models trained on clusters in chemical space perform equivalently when compared to models trained on all data. Through 10-fold cross-validation on a comprehensive set containing 67,950 of our company's proprietary analytes, these models achieved coefficients of determination of 0.84 and 3 % error in terms of retention time. This promising statistical significance is found to translate from cross-validation to prospective prediction on an external test set of pharmaceutically relevant analytes. The observed equivalency of global and local modeling of large datasets is retained with METLIN's SMRT dataset, thereby confirming the wider applicability of the developed machine learning workflows for global models.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina Idioma: En Revista: J Chromatogr A Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos País de publicação: Holanda

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina Idioma: En Revista: J Chromatogr A Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos País de publicação: Holanda