Unlocking the Potential of Clustering and Classification Approaches: Navigating Supervised and Unsupervised Chemical Similarity.

Mansouri, Kamel; Taylor, Kyla; Auerbach, Scott; Ferguson, Stephen; Frawley, Rachel; Hsieh, Jui-Hua; Jahnke, Gloria; Kleinstreuer, Nicole; Mehta, Suril; Moreira-Filho, José T; Parham, Fred; Rider, Cynthia; Rooney, Andrew A; Wang, Amy; Sutherland, Vicki

Mansouri, Kamel; Taylor, Kyla; Auerbach, Scott; Ferguson, Stephen; Frawley, Rachel; Hsieh, Jui-Hua; Jahnke, Gloria; Kleinstreuer, Nicole; Mehta, Suril; Moreira-Filho, José T; Parham, Fred; Rider, Cynthia; Rooney, Andrew A; Wang, Amy; Sutherland, Vicki.

Afiliación

Mansouri K; Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
Taylor K; Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
Auerbach S; Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
Ferguson S; Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
Frawley R; Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
Hsieh JH; Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
Jahnke G; Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
Kleinstreuer N; Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
Mehta S; Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
Moreira-Filho JT; Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
Parham F; Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
Rider C; Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
Rooney AA; Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
Wang A; Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
Sutherland V; Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.

Environ Health Perspect ; 132(8): 85002, 2024 Aug.

Article en En | MEDLINE | ID: mdl-39106156

ABSTRACT

ABSTRACT

BACKGROUND:

The field of toxicology has witnessed substantial advancements in recent years, particularly with the adoption of new approach methodologies (NAMs) to understand and predict chemical toxicity. Class-based methods such as clustering and classification are key to NAMs development and application, aiding the understanding of hazard and risk concerns associated with groups of chemicals without additional laboratory work. Advances in computational chemistry, data generation and availability, and machine learning algorithms represent important opportunities for continued improvement of these techniques to optimize their utility for specific regulatory and research purposes. However, due to their intricacy, deep understanding and careful selection are imperative to align the adequate methods with their intended applications.

OBJECTIVES:

This commentary aims to deepen the understanding of class-based approaches by elucidating the pivotal role of chemical similarity (structural and biological) in clustering and classification approaches (CCAs). It addresses the dichotomy between general end point-agnostic similarity, often entailing unsupervised analysis, and end point-specific similarity necessitating supervised learning. The goal is to highlight the nuances of these approaches, their applications, and common misuses.

DISCUSSION:

Understanding similarity is pivotal in toxicological research involving CCAs. The effectiveness of these approaches depends on the right definition and measure of similarity, which varies based on context and objectives of the study. This choice is influenced by how chemical structures are represented and the respective labels indicating biological activity, if applicable. The distinction between unsupervised clustering and supervised classification methods is vital, requiring the use of end point-agnostic vs. end point-specific similarity definition. Separate use or combination of these methods requires careful consideration to prevent bias and ensure relevance for the goal of the study. Unsupervised methods use end point-agnostic similarity measures to uncover general structural patterns and relationships, aiding hypothesis generation and facilitating exploration of datasets without the need for predefined labels or explicit guidance. Conversely, supervised techniques demand end point-specific similarity to group chemicals into predefined classes or to train classification models, allowing accurate predictions for new chemicals. Misuse can arise when unsupervised methods are applied to end point-specific contexts, like analog selection in read-across, leading to erroneous conclusions. This commentary provides insights into the significance of similarity and its role in supervised classification and unsupervised clustering approaches. https//doi.org/10.1289/EHP14001.

Asunto(s)

Aprendizaje Automático; Análisis por Conglomerados; Aprendizaje Automático no Supervisado; Toxicología/métodos; Algoritmos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Aprendizaje Automático Idioma: En Revista: Environ Health Perspect Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google