Your browser doesn't support javascript.
loading
Systematic Approaches for the Encoding of Chemical Groups: A Case Study.
Karamertzanis, Panagiotis G; Patlewicz, Grace; Sannicola, Marta; Paul-Friedman, Katie; Shah, Imran.
Afiliación
  • Karamertzanis PG; Computational Assessment and Alternative Methods, European Chemicals Agency (ECHA), Telakkakatu 6, Helsinki 00150, Finland.
  • Patlewicz G; Center for Computational Toxicology and Exposure (CCTE), US EPA, 109 TW Alexander Dr, Research Triangle Park, North Carolina 27711, United States.
  • Sannicola M; Computational Assessment and Alternative Methods, European Chemicals Agency (ECHA), Telakkakatu 6, Helsinki 00150, Finland.
  • Paul-Friedman K; Center for Computational Toxicology and Exposure (CCTE), US EPA, 109 TW Alexander Dr, Research Triangle Park, North Carolina 27711, United States.
  • Shah I; Center for Computational Toxicology and Exposure (CCTE), US EPA, 109 TW Alexander Dr, Research Triangle Park, North Carolina 27711, United States.
Chem Res Toxicol ; 37(4): 600-619, 2024 Apr 15.
Article en En | MEDLINE | ID: mdl-38498310
ABSTRACT
Regulatory authorities aim to organize substances into groups to facilitate prioritization within hazard and risk assessment processes. Often, such chemical groupings are not explicitly defined by structural rules or physicochemical property information. This is largely due to how these groupings are developed, namely, a manual expert curation process, which in turn makes updating and refining groupings, as new substances are evaluated, a practical challenge. Herein, machine learning methods were leveraged to build models that could preliminarily assign substances to predefined groups. A set of 86 groupings containing 2,184 substances as published on the European Chemicals Agency (ECHA) website were mapped to the U.S. Environmental Protection Agency (EPA) Distributed Toxicity Structure Database (DSSTox) content to extract chemical and structural information. Substances were represented using Morgan fingerprints, and two machine learning approaches were used to classify test substances into 56 groups containing at least 10 substances with a structural representation in the data set k-nearest neighbor (kNN) and random forest (RF), that led to mean 5-fold cross-validation test accuracies (average F1 scores) of 0.781 and 0.853, respectively. With a 9% improvement, the RF classifier was significantly more accurate than KNN (p-value = 0.001). The approach offers promise as a means of the initial profiling of new substances into predefined groups to facilitate prioritization efforts and streamline the assessment of new substances when earlier groupings are available. The algorithm to fit and use these models has been made available in the accompanying repository, thereby enabling both use of the produced models and refitting of these models, as new groupings become available by regulatory authorities or industry.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Algoritmos / Aprendizaje Automático País/Región como asunto: America do norte Idioma: En Revista: Chem Res Toxicol Asunto de la revista: TOXICOLOGIA Año: 2024 Tipo del documento: Article País de afiliación: Finlandia

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Algoritmos / Aprendizaje Automático País/Región como asunto: America do norte Idioma: En Revista: Chem Res Toxicol Asunto de la revista: TOXICOLOGIA Año: 2024 Tipo del documento: Article País de afiliación: Finlandia
...