ABSTRACT
In our earlier work (Golden et al., 2021), we showed 70-80% accuracies for several skin sensitization computational tools using human data. Here, we expanded the data set using the NICEATM human skin sensitization database to create a final data set of 1355 discrete chemicals (largely negative, â¼70%). Using this expanded data set, we analyzed model performance and evaluated mispredictions using Toxtree (v 3.1.0), OECD QSAR Toolbox (v 4.5), VEGA's (1.2.0 BETA) CAESAR (v 2.1.7), and a k-nearest-neighbor (kNN) classification approach. We show that the accuracy on this data set was lower than previous estimates, with balanced accuracies being 63% and 65% for Toxtree and OECD QSAR Toolbox, respectively, 46% for VEGA, and 59% for a kNN approach, with the lower accuracy likely due to the higher percentage of nonsensitizing chemicals. Two hundred eighty seven chemicals were mispredicted by both Toxtree and OECD QSAR Toolbox, which was approximately 20% of the entire data set, and 84% of these were false positives. The absence or presence of metabolic simulation in OECD QSAR Toolbox made no overall difference. While Toxtree is known for overpredicting, 60% of the chemicals in the data set had no alert for skin sensitization, and a substantial number of these chemicals were in fact sensitizers, pointing to sensitization mechanisms not recognized by Toxtree. Interestingly, we observed that chemicals with more than one Toxtree alert were more likely to be nonsensitizers. Finally, a kNN approach tended to mispredict different chemicals than either OECD QSAR Toolbox or Toxtree, suggesting that there was additional information to be garnered from a kNN approach. Overall, the results demonstrate that while there is merit in structural alerts as well as QSAR or read-across approaches (perhaps even more so in their combination), additional improvement will require a more nuanced understanding of mechanisms of skin sensitization.