Quantifying the Benefits of Imputation over QSAR Methods in Toxicology Data Modeling.

Whitehead, Thomas M; Strickland, Joel; Conduit, Gareth J; Borrel, Alexandre; Mucs, Daniel; Baskerville-Abraham, Irene

Whitehead, Thomas M; Strickland, Joel; Conduit, Gareth J; Borrel, Alexandre; Mucs, Daniel; Baskerville-Abraham, Irene.

Afiliação

Whitehead TM; Intellegens Ltd., The Studio, Chesterton Mill, Cambridge CB4 3NP, United Kingdom.
Strickland J; Intellegens Ltd., The Studio, Chesterton Mill, Cambridge CB4 3NP, United Kingdom.
Conduit GJ; Intellegens Ltd., The Studio, Chesterton Mill, Cambridge CB4 3NP, United Kingdom.
Borrel A; Inotiv, Research Triangle Park, North Carolina 27560, United States.
Mucs D; Scientific and Regulatory Affairs, JT International SA, 8, rue Kazem Radjavi, 1202 Geneva, Switzerland.
Baskerville-Abraham I; Scientific and Regulatory Affairs, JT International SA, 8, rue Kazem Radjavi, 1202 Geneva, Switzerland.

J Chem Inf Model ; 64(7): 2624-2636, 2024 Apr 08.

Article em En | MEDLINE | ID: mdl-38091381

RESUMO

Imputation machine learning (ML) surpasses traditional approaches in modeling toxicity data. The method was tested on an open-source data set comprising approximately 2500 ingredients with limited in vitro and in vivo data obtained from the OECD QSAR Toolbox. By leveraging the relationships between different toxicological end points, imputation extracts more valuable information from each data point compared to well-established single end point methods, such as ML-based Quantitative Structure Activity Relationship (QSAR) approaches, providing a final improvement of up to around 0.2 in the coefficient of determination. A significant aspect of this methodology is its resilience to the inclusion of extraneous chemical or experimental data. While additional data typically introduces a considerable level of noise and can hinder performance of single end point QSAR modeling, imputation models remain unaffected. This implies a reduction in the need for laborious manual preprocessing tasks such as feature selection, thereby making data preparation for ML analysis more efficient. This successful test, conducted on open-source data, validates the efficacy of imputation approaches in toxicity data analysis. This work opens the way for applying similar methods to other types of sparse toxicological data matrices, and so we discuss the development of regulatory authority guidelines to accept imputation models, a key aspect for the wider adoption of these methods.

Assuntos

Relação Quantitativa Estrutura-Atividade; Toxicologia; Toxicologia/métodos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Toxicologia / Relação Quantitativa Estrutura-Atividade Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google