Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 132
Filter
1.
Methods Mol Biol ; 2834: 131-149, 2025.
Article in English | MEDLINE | ID: mdl-39312163

ABSTRACT

In the rapidly evolving landscape of artificial intelligence (AI) and machine learning (ML), understanding and correctly applying the concept of the applicability domain (AD) has emerged as an essential part. This chapter begins with an introduction and background on the critical area of AD. It dives into the definition and different methodologies associated with the applicability domain, laying a solid foundation for further exploration. A detailed examination of AD's role within the framework of AI and ML is undertaken, supported by in-depth theoretical foundations. The paper then proceeds to delineate the various measures of AD in AI and ML, offering insights into methods like DA index (κ, γ, δ), class probability estimation, and techniques involving local vicinity, boosting, classification neural networks, and subgroup discovery (SGD), among others. We also discussed a series of AD methods employed in Quantitative Structure-Activity Relationship (QSAR) studies. Lastly, the diverse applications of AD are addressed, underlining its widespread influence across different sectors. This chapter is intended to offer a thorough understanding of AD and its applications, particularly in AI and ML, leading to more informed research and decision-making in these fields as a good amount of literature already exists regarding AD of QSAR modeling.


Subject(s)
Artificial Intelligence , Machine Learning , Quantitative Structure-Activity Relationship , Neural Networks, Computer , Humans , Algorithms
2.
J Cheminform ; 16(1): 98, 2024 Aug 12.
Article in English | MEDLINE | ID: mdl-39129016

ABSTRACT

The exponential growth of data is challenging for humans because their ability to analyze data is limited. Especially in chemistry, there is a demand for tools that can visualize molecular datasets in a convenient graphical way. We propose a new, ready-to-use, multi-tool, and open-source framework for visualizing and navigating chemical space. This framework adheres to the low-code/no-code (LCNC) paradigm, providing a KNIME node, a web-based tool, and a Python package, making it accessible to a broad cheminformatics community. The core technique of the MolCompass framework employs a pre-trained parametric t-SNE model. We demonstrate how this framework can be adapted for the visualisation of chemical space and visual validation of binary classification QSAR/QSPR models, revealing their weaknesses and identifying model cliffs. All parts of the framework are publicly available on GitHub, providing accessibility to the broad scientific community. Scientific contributionWe provide an open-source, ready-to-use set of tools for the visualization of chemical space. These tools can be insightful for chemists to analyze compound datasets and for the visual validation of QSAR/QSPR models.

3.
Mol Inform ; 43(7): e202400018, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38803302

ABSTRACT

The growing interest in chemoinformatic model uncertainty calls for a summary of the most widely used regression techniques and how to estimate their reliability. Regression models learn a mapping from the space of explanatory variables to the space of continuous output values. Among other limitations, the predictive performance of the model is restricted by the training data used for model fitting. Identification of unusual objects by outlier detection methods can improve model performance. Additionally, proper model evaluation necessitates defining the limitations of the model, often called the applicability domain. Comparable to certain classifiers, some regression techniques come with built-in methods or augmentations to quantify their (un)certainty, while others rely on generic procedures. The theoretical background of their working principles and how to deduce specific and general definitions for their domain of applicability shall be explained.


Subject(s)
Cheminformatics , Cheminformatics/methods , Regression Analysis
4.
Front Toxicol ; 6: 1359507, 2024.
Article in English | MEDLINE | ID: mdl-38742231

ABSTRACT

In the European regulatory context, rodent in vivo studies are the predominant source of neurotoxicity information. Although they form a cornerstone of neurotoxicological assessments, they are costly and the topic of ethical debate. While the public expects chemicals and products to be safe for the developing and mature nervous systems, considerable numbers of chemicals in commerce have not, or only to a limited extent, been assessed for their potential to cause neurotoxicity. As such, there is a societal push toward the replacement of animal models with in vitro or alternative methods. New approach methods (NAMs) can contribute to the regulatory knowledge base, increase chemical safety, and modernize chemical hazard and risk assessment. Provided they reach an acceptable level of regulatory relevance and reliability, NAMs may be considered as replacements for specific in vivo studies. The European Partnership for the Assessment of Risks from Chemicals (PARC) addresses challenges to the development and implementation of NAMs in chemical risk assessment. In collaboration with regulatory agencies, Project 5.2.1e (Neurotoxicity) aims to develop and evaluate NAMs for developmental neurotoxicity (DNT) and adult neurotoxicity (ANT) and to understand the applicability domain of specific NAMs for the detection of endocrine disruption and epigenetic perturbation. To speed up assay time and reduce costs, we identify early indicators of later-onset effects. Ultimately, we will assemble second-generation developmental neurotoxicity and first-generation adult neurotoxicity test batteries, both of which aim to provide regulatory hazard and risk assessors and industry stakeholders with robust, speedy, lower-cost, and informative next-generation hazard and risk assessment tools.

5.
J Cheminform ; 16(1): 65, 2024 May 30.
Article in English | MEDLINE | ID: mdl-38816859

ABSTRACT

This study describes the development and evaluation of six new models for predicting physical-chemical (PC) properties that are highly relevant for chemical hazard, exposure, and risk estimation: solubility (in water SW and octanol SO), vapor pressure (VP), and the octanol-water (KOW), octanol-air (KOA), and air-water (KAW) partition ratios. The models are implemented in the Iterative Fragment Selection Quantitative Structure-Activity Relationship (IFSQSAR) python package, Version 1.1.0. These models are implemented as Poly-Parameter Linear Free Energy Relationship (PPLFER) equations which combine experimentally calibrated system parameters and solute descriptors predicted with QSPRs. Two other ancillary models have been developed and implemented, a QSPR for Molar Volume (MV) and a classifier for the physical state of chemicals at room temperature. The IFSQSAR methods for characterizing applicability domain (AD) and calculating uncertainty estimates expressed as 95% prediction intervals (PI) for predicted properties are described and tested on 9,000 measured partition ratios and 4,000 VP and SW values. The measured data are external to IFSQSAR training and validation datasets and are used to assess the predictivity of the models for "novel chemicals" in an unbiased manner. The 95% PI intervals calculated from validation datasets for partition ratios needed to be scaled by a factor of 1.25 to capture 95% of the external data. Predictions for VP and SW are more uncertain, primarily due to the challenges in differentiating their physical state (i.e., liquids or solids) at room temperature. The prediction accuracy of the models for log KOW, log KAW and log KOA of novel, data-poor chemicals is estimated to be in the range of 0.7 to 1.4 root mean squared error of prediction (RMSEP), with RMSEP in the range 1.7-1.8 for log VP and log SW. Scientific contributionNew partitioning models integrate empirical PPLFER equations and QSARs, allowing for seamless integration of experimental data and model predictions. This work tests the real predictivity of the models for novel chemicals which are not in the model training or external validation datasets.

6.
Regul Toxicol Pharmacol ; 149: 105619, 2024 May.
Article in English | MEDLINE | ID: mdl-38614220

ABSTRACT

The Xenopus Eleutheroembryonic Thyroid Assay (XETA) was recently published as an OECD Test Guideline for detecting chemicals acting on the thyroid axis. However, the OECD validation did not cover all mechanisms that can potentially be detected by the XETA. This study was therefore initiated to investigate and consolidate the applicability domain of the XETA regarding the following mechanisms: thyroid hormone receptor (THR) agonism, sodium-iodide symporter (NIS) inhibition, thyroperoxidase (TPO) inhibition, deiodinase (DIO) inhibition, glucocorticoid receptor (GR) agonism, and uridine 5'-diphospho-glucuronosyltransferase (UDPGT) induction. In total, 22 chemicals identified as thyroid-active or -inactive in Amphibian Metamorphosis Assays (AMAs) were tested using the XETA OECD Test Guideline. The comparison showed that both assays are highly concordant in identifying chemicals with mechanisms of action related to THR agonism, DIO inhibition, and GR agonism. They also consistently identified the UDPGT inducers as thyroid inactive. NIS inhibition, investigated using sodium perchlorate, was not detected in the XETA. TPO inhibition requires further mechanistic investigations as the reference chemicals tested resulted in opposing response directions in the XETA and AMA. This study contributes refining the applicability domain of the XETA, thereby helping to clarify the conditions where it can be used as an ethical alternative to the AMA.


Subject(s)
Biological Assay , Endocrine Disruptors , Metamorphosis, Biological , Symporters , Thyroid Gland , Animals , Thyroid Gland/drug effects , Thyroid Gland/metabolism , Metamorphosis, Biological/drug effects , Biological Assay/methods , Endocrine Disruptors/toxicity , Xenopus laevis , Receptors, Thyroid Hormone/metabolism , Receptors, Thyroid Hormone/agonists , Iodide Peroxidase/metabolism
7.
J Hazard Mater ; 469: 133989, 2024 May 05.
Article in English | MEDLINE | ID: mdl-38461660

ABSTRACT

Drinking water disinfection can result in the formation disinfection byproducts (DBPs, > 700 have been identified to date), many of them are reportedly cytotoxic, genotoxic, or developmentally toxic. Analyzing the toxicity levels of these contaminants experimentally is challenging, however, a predictive model could rapidly and effectively assess their toxicity. In this study, machine learning models were developed to predict DBP cytotoxicity based on their chemical information and exposure experiments. The Random Forest model achieved the best performance (coefficient of determination of 0.62 and root mean square error of 0.63) among all the algorithms screened. Also, the results of a probabilistic model demonstrated reliable model predictions. According to the model interpretation, halogen atoms are the most prominent features for DBP cytotoxicity compared to other chemical substructures. The presence of iodine and bromine is associated with increased cytotoxicity levels, while the presence of chlorine is linked to a reduction in cytotoxicity levels. Other factors including chemical substructures (CC, N, CN, and 6-member ring), cell line, and exposure duration can significantly affect the cytotoxicity of DBPs. The similarity calculation indicated that the model has a large applicability domain and can provide reliable predictions for DBPs with unknown cytotoxicity. Finally, this study showed the effectiveness of data augmentation in the scenario of data scarcity.


Subject(s)
Disinfectants , Drinking Water , Water Pollutants, Chemical , Water Purification , Animals , Cricetinae , Disinfection , Disinfectants/toxicity , Disinfectants/analysis , Halogenation , Water Pollutants, Chemical/toxicity , Water Pollutants, Chemical/analysis , Halogens , Chlorine , Drinking Water/analysis , CHO Cells
8.
Toxicol Res (Camb) ; 13(1): tfae020, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38496320

ABSTRACT

With the aim of persistence property analysis and ecotoxicological impact of veterinary pharmaceuticals on different terrestrial species, different classes of veterinary pharmaceuticals (n = 37) with soil degradation property (DT50) were gathered and subjected to QSAR and q-RASAR model development. The models were developed from 2D descriptors under organization for economic cooperation and development guidelines with the application of multiple linear regressions along with genetic algorithm. All developed QSAR and q-RASAR were statistically significant (Internal = R2adj: 0.721-0.861, Q2LOO: 0.609-0.757, and external = Q2Fn = 0.597-0.933, MAEext = 0.174-0.260). Further, the leverage approach of applicability domain assured the model's reliability. The veterinary pharmaceuticals with no experimental values were classified based on their persistence level. Further, the terrestrial toxicity analysis of persistent veterinary pharmaceuticals was done using toxicity prediction by computer assisted technology and in-house built quantitative structure toxicity relationship models to prioritize the toxic and persistent veterinary pharmaceuticals. This study will be helpful in estimation of persistence and toxicity of existing and upcoming veterinary pharmaceuticals.

9.
J Comput Aided Mol Des ; 38(1): 9, 2024 Feb 14.
Article in English | MEDLINE | ID: mdl-38351144

ABSTRACT

Notwithstanding the wide adoption of the OECD principles (or best practices) for QSAR modeling, disparities between in silico predictions and experimental results are frequent, suggesting that model predictions are often too optimistic. Of these OECD principles, the applicability domain (AD) estimation has been recognized in several reports in the literature to be one of the most challenging, implying that the actual reliability measures of model predictions are often unreliable. Applying tree-based error analysis workflows on 5 QSAR models reported in the literature and available in the QsarDB repository, i.e., androgen receptor bioactivity (agonists, antagonists, and binders, respectively) and membrane permeability (highest membrane permeability and the intrinsic permeability), we demonstrate that predictions erroneously tagged as reliable (AD prediction errors) overwhelmingly correspond to instances in subspaces (cohorts) with the highest prediction error rates, highlighting the inhomogeneity of the AD space. In this sense, we call for more stringent AD analysis guidelines which require the incorporation of model error analysis schemes, to provide critical insight on the reliability of underlying AD algorithms. Additionally, any selected AD method should be rigorously validated to demonstrate its suitability for the model space over which it is applied. These steps will ultimately contribute to more accurate estimations of the reliability of model predictions. Finally, error analysis may also be useful in "rational" model refinement in that data expansion efforts and model retraining are focused on cohorts with the highest error rates.


Subject(s)
Algorithms , Quantitative Structure-Activity Relationship , Reproducibility of Results
10.
Int J Mol Sci ; 25(3)2024 Jan 23.
Article in English | MEDLINE | ID: mdl-38338650

ABSTRACT

The Ames/quantitative structure-activity relationship (QSAR) International Challenge Projects, held during 2014-2017 and 2020-2022, evaluated the performance of various predictive models. Despite the significant insights gained, the rules allowing participants to select prediction targets introduced ambiguity in model performance evaluation. This reanalysis identified the highest-performing prediction model, assuming a 100% coverage rate (COV) for all prediction target compounds and an estimated performance variation due to changes in COV. All models from both projects were evaluated using balance accuracy (BA), the Matthews correlation coefficient (MCC), the F1 score (F1), and the first principal component (PC1). After normalizing the COV, a correlation analysis with these indicators was conducted, and the evaluation index for all prediction models in terms of the COV was estimated. In total, using 109 models, the model with the highest estimated BA (76.9) at 100% COV was MMI-VOTE1, as reported by Meiji Pharmaceutical University (MPU). The best models for MCC, F1, and PC1 were all MMI-STK1, also reported by MPU. All the models reported by MPU ranked in the top four. MMI-STK1 was estimated to have F1 scores of 59.2, 61.5, and 63.1 at COV levels of 90%, 60%, and 30%, respectively. These findings highlight the current state and potential of the Ames prediction technology.


Subject(s)
Quantitative Structure-Activity Relationship , Humans , Mutagenicity Tests , Correlation of Data
11.
Toxicology ; 503: 153743, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38341018

ABSTRACT

Skin sensitization assessment has progressed from the use of animal models towards the application of New Approach Methodologies (NAMs). Several skin sensitization NAMs are accepted for regulatory use, but a majority relies on submerged in vitro cell cultures that limit their applicability domain, posing challenges for testing hydrophobic chemicals and mixtures. A newly developed three-dimensional (3D) Nrf2 reporter epidermis model for skin sensitization assessment is reported. This NAM may help to overcome these limitations. The NAM combines the in vivo-like biology and exposure conditions of 3D epidermis models with the reliability, convenience, and cost-effectiveness of secreted reporter gene technology. The Keap1-Nrf2-ARE pathway was chosen as the reporter gene read-out, as it is induced by most skin sensitizers and already adopted in OECD Test guideline 442D. Immortalized human primary keratinocytes (Ker-CT) were stably transfected with the pIGB-Nrf2-SEAP vector to construct a Nrf2 reporter cell line. Ker-CT Nrf2 reporter cells showed negligible basal expression of the Secreted Embryonic Alkaline Phosphatase (SEAP) reporter, which was induced 13.5-fold by exposure to the skin sensitizer cinnamic aldehyde (CA). Co-exposure to CA and the Nrf2 inhibitor glucocorticoid clobetasol propionate significantly suppressed the CA-induced SEAP expression, confirming dependance of the SEAP expression on Nrf2 activation. Using air-liquid interface and animal constituent free culture conditions, the Ker-CT Nrf2 reporter cells differentiated to stratified 3D epidermis models with an in vivo-like skin architecture and functional skin barrier. Evaluation of a Ker-CT Nrf2 reporter cell-based 2D assay by testing 10 conventional reference chemicals showed a predictive accuracy for skin sensitization potential of 80% and 70% compared to LLNA and human data in two independent laboratories and a high intra- and interlaboratory reproducibility. Moreover, the 3D epidermis models predicted 3 sensitizing and 2 non-sensitizing reference chemicals correctly in a first proof-of-concept study. Further investigations foresee the testing of additional chemicals, including hydrophobic compounds and mixtures to confirm the potential of the 3D epidermis models to broaden the applicability domain for NAM-based skin sensitization assessment.


Subject(s)
Dermatitis, Allergic Contact , NF-E2-Related Factor 2 , Animals , Humans , NF-E2-Related Factor 2/metabolism , Kelch-Like ECH-Associated Protein 1/metabolism , Reproducibility of Results , Epidermis/metabolism , Keratinocytes/metabolism , Skin/metabolism , Animal Testing Alternatives , Local Lymph Node Assay
12.
Environ Sci Technol ; 58(4): 1944-1953, 2024 Jan 30.
Article in English | MEDLINE | ID: mdl-38240238

ABSTRACT

Tissue-to-blood partition coefficients (Ptb) are key parameters for assessing toxicokinetics of xenobiotics in organisms, yet their experimental data were lacking. Experimental methods for measuring Ptb values are inefficient, underscoring the urgent need for prediction models. However, most existing models failed to fully exploit Ptb data from diverse sources, and their applicability domain (AD) was limited. The current study developed a multimodal model capable of processing and integrating textual (categorical features) and numerical information (molecular descriptors/fingerprints) to simultaneously predict Ptb values across various species, tissues, blood matrices, and measurement methods. Artificial neural network algorithms with embedding layers were used for the multimodal modeling. The corresponding unimodal models were developed for comparison. Results showed that the multimodal model outperformed unimodal models. To enhance the reliability of the model, a method considering categorical features, weighted molecular similarity density, and weighted inconsistency in molecular activities of structure-activity landscapes was used to characterize the AD. The model constrained by the AD exhibited better prediction accuracy for the validation set, with the determination coefficient, root mean-square error, and mean absolute error being 0.843, 0.276, and 0.213 log units, respectively. The multimodal model coupled with the AD characterization can serve as an efficient tool for internal exposure assessment of chemicals.


Subject(s)
Fishes , Quantitative Structure-Activity Relationship , Animals , Reproducibility of Results , Mammals , Neural Networks, Computer
13.
Environ Sci Technol ; 2024 Jan 23.
Article in English | MEDLINE | ID: mdl-38263624

ABSTRACT

A significant number of chemicals registered in national and regional chemical inventories require assessments of their potential "hazard" concerns posed to humans and ecological receptors. This warrants knowledge of their partitioning and reactivity properties, which are often predicted by quantitative structure-property relationships (QSPRs) and other semiempirical relationships. It is imperative to evaluate the applicability domain (AD) of these tools to ensure their suitability for assessment purpose. Here, we investigate the extent to which the ADs of commonly used QSPRs and semiempirical relationships cover seven partitioning and reactivity properties of a chemical "space" comprising 81,000+ organic chemicals registered in regulatory and academic chemical inventories. Our findings show that around or more than half of the chemicals studied are covered by at least one of the commonly used QSPRs. The investigated QSPRs demonstrate adequate AD coverage for organochlorides and organobromines but limited AD coverage for chemicals containing fluorine and phosphorus. These QSPRs exhibit limited AD coverage for atmospheric reactivity, biodegradation, and octanol-air partitioning, particularly for ionizable organic chemicals compared to nonionizable ones, challenging assessments of environmental persistence, bioaccumulation capability, and long-range transport potential. We also find that a predictive tool's AD coverage of chemicals depends on how the AD is defined, for example, by the distance of a predicted chemical from the centroid of the training chemicals or by the presence or absence of structural features.

14.
J Hazard Mater ; 465: 133355, 2024 Mar 05.
Article in English | MEDLINE | ID: mdl-38198864

ABSTRACT

The development of accurate and interpretable models for predicting reaction constants of organic compounds with hydroxyl radicals is vital for advancing quantitative structure-activity relationships (QSAR) in pollutant degradation. Methods like molecular descriptors, molecular fingerprinting, and group contribution methods have limitations, as traditional machine learning struggles to capture all intramolecular information simultaneously. To address this, we established an integrated graph neural network (GNN) with approximately 12 million learnable parameters. GNN represents atoms as nodes and chemical bonds as edges, thus transforming molecules into a graph structures, effectively capturing microscopic properties while depicting atom connectivity in non-Euclidean space. Our datasets comprise 1401 pollutants to develop an integrated GNN model with Bayesian optimization, the model achieves root mean square errors of 0.165, 0.172, and 0.189 on the training, validation, and test datasets, respectively. Furthermore, we assess molecular structure similarity using molecular fingerprint to enhance the model's applicability. Afterwards, we propose a gradient weight mapping method for model explainability, uncovering the key functional groups in chemical reactions in artificial intelligence perspective, which would boost chemistry through artificial intelligence extreme arithmetic power.

15.
Environ Res ; 241: 117603, 2024 Jan 15.
Article in English | MEDLINE | ID: mdl-37939805

ABSTRACT

Tissue-to-blood partition coefficients (Ptb) are crucial for assessing the distribution of chemicals in organisms. Given the lack of experimental data and laborious nature of experimental methods, there is an urgent need to develop efficient predictive models. With the help of machine learning algorithms, i,e., random forest (RF), and artificial neural network (ANN), this study developed multi-task (MT) models that can simultaneously predict Ptb values for various mammalian tissues, including liver, muscle, brain, lung, and adipose. Single-task (ST) models using partial least squares regression, RF, and ANN algorithms for each endpoint were established for comparison. Overall, the performances of MT models were superior to those of ST models. The MT model using ANN algorithms showed the highest prediction accuracy with determination coefficients ranging from 0.704 to 0.886, root mean square errors between 0.223 and 0.410, and mean absolute errors ranging from 0.178 to 0.285 log units. Results showed that lipophilicity and polarizability of molecules significantly influence their partition behavior in organisms. Applicability domains (ADs) of the models were characterized by weighted molecular similarity density, and weighted inconsistency in molecular activities of structure-activity landscapes. When constrained by ADs, the models displayed enhanced predictive accuracy, making them valuable tools for the risk assessment and management of chemicals.


Subject(s)
Algorithms , Neural Networks, Computer , Animals , Machine Learning , Mammals , Liver
16.
Environ Sci Technol ; 57(44): 16906-16917, 2023 11 07.
Article in English | MEDLINE | ID: mdl-37897806

ABSTRACT

In silico models for predicting physicochemical properties and environmental fate parameters are necessary for the sound management of chemicals. This study employed graph attention network (GAT) algorithms to construct such models on 15 end points. The results showed that the GAT models outperformed the previous state-of-the-art models, and their performance was not influenced by the presence or absence of compounds with certain structures. Molecular similarity density (ρs) was found to be a key metrics characterizing data set modelability, in addition to the proportion of compounds at activity cliffs. By introducing molecular graph (MG) contrastive learning, MG-based ρs and molecular inconsistency in activities (IA) were calculated and employed for characterizing the structure-activity landscape (SAL)-based applicability domain ADSAL{ρs, IA}. The GAT models coupled with ADSAL{ρs, IA} significantly improved the prediction coefficient of determination (R2) on all the end points by an average of 14.4% and enabled all the end points to have R2 > 0.9, which could hardly be achieved previously. The models were employed to screen persistent, mobile, and/or bioaccumulative chemicals from inventories consisting of about 106 chemicals. Given the current state-of-the-art model performance and coverage of the various environmental end points, the constructed models with ADSAL{ρs, IA} may serve as benchmarks for future efforts to improve modeling efficacy.


Subject(s)
Algorithms , Benchmarking , Computer Simulation
17.
Environ Sci Technol ; 57(46): 18236-18245, 2023 Nov 21.
Article in English | MEDLINE | ID: mdl-37749748

ABSTRACT

The application of deep learning (DL) models for screening environmental estrogens (EEs) for the sound management of chemicals has garnered significant attention. However, the currently available DL model for screening EEs lacks both a transparent decision-making process and effective applicability domain (AD) characterization, making the reliability of its prediction results uncertain and limiting its practical applications. To address this issue, a graph neural network (GNN) model was developed to screen EEs, achieving accuracy rates of 88.9% and 92.5% on the internal and external test sets, respectively. The decision-making process of the GNN model was explored through the network-like similarity graphs (NSGs) based on the model features (FT). We discovered that the accuracy of the predictions is dependent on the feature distribution of compounds in NSGs. An AD characterization method called ADFT was proposed, which excludes predictions falling outside of the model's prediction range, leading to a 15% improvement in the F1 score of the GNN model. The GNN model with the AD method may serve as an efficient tool for screening EEs, identifying 800 potential EEs in the Inventory of Existing Chemical Substances of China. Additionally, this study offers new insights into comprehending the decision-making process of DL models.


Subject(s)
Estrogens , Neural Networks, Computer , Reproducibility of Results , China , Uncertainty
18.
Regul Toxicol Pharmacol ; 144: 105486, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37633327

ABSTRACT

The Ames assay is required by the regulatory agencies worldwide to assess the mutagenic potential risk of consumer products. As well as this in vitro assay, in silico approaches have been widely used to predict Ames test results as outlined in the International Council for Harmonization (ICH) guidelines. Building on this in silico approach, here we describe DeepAmes, a high performance and robust model developed with a novel deep learning (DL) approach for potential utility in regulatory science. DeepAmes was developed with a large and consistent Ames dataset (>10,000 compounds) and was compared with other five standard Machine Learning (ML) methods. Using a test set of 1,543 compounds, DeepAmes was the best performer in predicting the outcome of Ames assay. In addition, DeepAmes yielded the best and most stable performance up to when compounds were >30% outside of the applicability domain (AD). Regarding the potential for regulatory application, a revised version of DeepAmes with a much-improved sensitivity of 0.87 from 0.47. In conclusion, DeepAmes provides a DL-powered Ames test predictive model for predicting the results of Ames tests; with its defined AD and clear context of use, DeepAmes has potential for utility in regulatory application.


Subject(s)
Deep Learning , Mutagens/toxicity , Mutagenesis , Mutagenicity Tests/methods
19.
Artif Intell Chem ; 1(1)2023 Jun.
Article in English | MEDLINE | ID: mdl-37583465

ABSTRACT

Neural Network (NN) models provide potential to speed up the drug discovery process and reduce its failure rates. The success of NN models requires uncertainty quantification (UQ) as drug discovery explores chemical space beyond the training data distribution. Standard NN models do not provide uncertainty information. Some methods require changing the NN architecture or training procedure, limiting the selection of NN models. Moreover, predictive uncertainty can come from different sources. It is important to have the ability to separately model different types of predictive uncertainty, as the model can take assorted actions depending on the source of uncertainty. In this paper, we examine UQ methods that estimate different sources of predictive uncertainty for NN models aiming at protein-ligand binding prediction. We use our prior knowledge on chemical compounds to design the experiments. By utilizing a visualization method we create non-overlapping and chemically diverse partitions from a collection of chemical compounds. These partitions are used as training and test set splits to explore NN model uncertainty. We demonstrate how the uncertainties estimated by the selected methods describe different sources of uncertainty under different partitions and featurization schemes and the relationship to prediction error.

20.
Chemosphere ; 340: 139965, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37633602

ABSTRACT

This work aimed to verify whether it is possible to extend the applicability domain (AD) of existing QSPR (Quantitative Structure-Property Relationship) models by employing a strategy involving additional quantum-chemical calculations. We selected two published QSPR models: for water solubility, logSW, and vapor pressure, logVP of PFAS as case studies. We aimed to enlarge set of compounds used to build the model by applying factorial planning to plan the augmentation of the set of these compounds based on their structural features (descriptors). Next, we used the COSMO-RS model to calculate the logSW and logVP for selected chemicals. This allowed filling gaps in the experimental data for further training QSPR models. We improved the published models by significantly extending number of compounds for which theoretical predictions are reliable (i.e., extending the AD). Additionally, we performed external validation that had not been carried out in original models. To test effectiveness of the AD extension, we screened 4519 PFAS from NORMAN Database. The number of compounds outside the domain was reduced comparing the original model for both properties. Our work shows that combining physics-based methods with data-driven models can significantly improve the performance of predictions of phys-chem properties relevant for the chemical risk assessment.


Subject(s)
Asteraceae , Fluorocarbons , Vapor Pressure , Solubility , Water
SELECTION OF CITATIONS
SEARCH DETAIL