Machine learning models and performance dependency on 2D chemical descriptor space for retention time prediction of pharmaceuticals.

Beck, Armen G; Fine, Jonathan; Aggarwal, Pankaj; Regalado, Erik L; Levorse, Dorothy; De Jesus Silva, Jordan; Sherer, Edward C

Beck, Armen G; Fine, Jonathan; Aggarwal, Pankaj; Regalado, Erik L; Levorse, Dorothy; De Jesus Silva, Jordan; Sherer, Edward C.

Afiliação

Beck AG; Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA.
Fine J; Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA.
Aggarwal P; Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA. Electronic address: pankaj.aggarwal@merck.com.
Regalado EL; Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA.
Levorse D; Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA.
De Jesus Silva J; Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA.
Sherer EC; Analytical Research & Development, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA.

J Chromatogr A ; 1730: 465109, 2024 Aug 16.

Article em En | MEDLINE | ID: mdl-38968662

ABSTRACT

ABSTRACT

The predictive modeling of liquid chromatography methods can be an invaluable asset, potentially saving countless hours of labor while also reducing solvent consumption and waste. Tasks such as physicochemical screening and preliminary method screening systems where large amounts of chromatography data are collected from fast and routine operations are particularly well suited for both leveraging large datasets and benefiting from predictive models. Therefore, the generation of predictive models for retention time is an active area of development. However, for these predictive models to gain acceptance, researchers first must have confidence in model performance and the computational cost of building them should be minimal. In this study, a simple and cost-effective workflow for the development of machine learning models to predict retention time using only Molecular Operating Environment 2D descriptors as input for support vector regression is developed. Furthermore, we investigated the relative performance of models based on molecular descriptor space by utilizing uniform manifold approximation and projection and clustering with Gaussian mixture models to identify chemically distinct clusters. Results outlined herein demonstrate that local models trained on clusters in chemical space perform equivalently when compared to models trained on all data. Through 10-fold cross-validation on a comprehensive set containing 67,950 of our company's proprietary analytes, these models achieved coefficients of determination of 0.84 and 3 % error in terms of retention time. This promising statistical significance is found to translate from cross-validation to prospective prediction on an external test set of pharmaceutically relevant analytes. The observed equivalency of global and local modeling of large datasets is retained with METLIN's SMRT dataset, thereby confirming the wider applicability of the developed machine learning workflows for global models.

Assuntos

Aprendizado de Máquina; Preparações Farmacêuticas/análise; Preparações Farmacêuticas/química; Cromatografia Líquida/métodos; Máquina de Vetores de Suporte; Análise por Conglomerados

Palavras-chave

Gaussian mixture models; Liquid chromatography; Retention time prediction; Support vector regression; Uniform Manifold Approximation & Projection

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina Idioma: En Revista: J Chromatogr A Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos País de publicação: Holanda

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google