Pesquisa | Portal Regional da BVS

1.

Interpreting Neural Network Models for Toxicity Prediction by Extracting Learned Chemical Features.

Walter, Moritz; Webb, Samuel J; Gillet, Valerie J.

J Chem Inf Model ; 64(9): 3670-3688, 2024 May 13.

Artigo em Inglês | MEDLINE | ID: mdl-38686880

RESUMO

Neural network models have become a popular machine-learning technique for the toxicity prediction of chemicals. However, due to their complex structure, it is difficult to understand predictions made by these models which limits confidence. Current techniques to tackle this problem such as SHAP or integrated gradients provide insights by attributing importance to the input features of individual compounds. While these methods have produced promising results in some cases, they do not shed light on how representations of compounds are transformed in hidden layers, which constitute how neural networks learn. We present a novel technique to interpret neural networks which identifies chemical substructures in training data found to be responsible for the activation of hidden neurons. For individual test compounds, the importance of hidden neurons is determined, and the associated substructures are leveraged to explain the model prediction. Using structural alerts for mutagenicity from the Derek Nexus expert system as ground truth, we demonstrate the validity of the approach and show that model explanations are competitive with and complementary to explanations obtained from an established feature attribution method.

Assuntos

Redes Neurais de Computação , Aprendizado de Máquina

2.

Analysis of the benefits of imputation models over traditional QSAR models for toxicity prediction.

Walter, Moritz; Allen, Luke N; de la Vega de León, Antonio; Webb, Samuel J; Gillet, Valerie J.

J Cheminform ; 14(1): 32, 2022 Jun 07.

Artigo em Inglês | MEDLINE | ID: mdl-35672779

RESUMO

Recently, imputation techniques have been adapted to predict activity values among sparse bioactivity matrices, showing improvements in predictive performance over traditional QSAR models. These models are able to use experimental activity values for auxiliary assays when predicting the activity of a test compound on a specific assay. In this study, we tested three different multi-task imputation techniques on three classification-based toxicity datasets: two of small scale (12 assays each) and one large scale with 417 assays. Moreover, we analyzed in detail the improvements shown by the imputation models. We found that test compounds that were dissimilar to training compounds, as well as test compounds with a large number of experimental values for other assays, showed the largest improvements. We also investigated the impact of sparsity on the improvements seen as well as the relatedness of the assays being considered. Our results show that even a small amount of additional information can provide imputation methods with a strong boost in predictive performance over traditional single task and multi-task predictive models.

3.

Self organising hypothesis networks: a new approach for representing and structuring SAR knowledge.

Hanser, Thierry; Barber, Chris; Rosser, Edward; Vessey, Jonathan D; Webb, Samuel J; Werner, Stéphane.

J Cheminform ; 6: 21, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24959206

RESUMO

BACKGROUND: Combining different sources of knowledge to build improved structure activity relationship models is not easy owing to the variety of knowledge formats and the absence of a common framework to interoperate between learning techniques. Most of the current approaches address this problem by using consensus models that operate at the prediction level. We explore the possibility to directly combine these sources at the knowledge level, with the aim to harvest potentially increased synergy at an earlier stage. Our goal is to design a general methodology to facilitate knowledge discovery and produce accurate and interpretable models. RESULTS: To combine models at the knowledge level, we propose to decouple the learning phase from the knowledge application phase using a pivot representation (lingua franca) based on the concept of hypothesis. A hypothesis is a simple and interpretable knowledge unit. Regardless of its origin, knowledge is broken down into a collection of hypotheses. These hypotheses are subsequently organised into hierarchical network. This unification permits to combine different sources of knowledge into a common formalised framework. The approach allows us to create a synergistic system between different forms of knowledge and new algorithms can be applied to leverage this unified model. This first article focuses on the general principle of the Self Organising Hypothesis Network (SOHN) approach in the context of binary classification problems along with an illustrative application to the prediction of mutagenicity. CONCLUSION: It is possible to represent knowledge in the unified form of a hypothesis network allowing interpretable predictions with performances comparable to mainstream machine learning techniques. This new approach offers the potential to combine knowledge from different sources into a common framework in which high level reasoning and meta-learning can be applied; these latter perspectives will be explored in future work.

4.

Emerging pattern mining to aid toxicological knowledge discovery.

Sherhod, Richard; Judson, Philip N; Hanser, Thierry; Vessey, Jonathan D; Webb, Samuel J; Gillet, Valerie J.

J Chem Inf Model ; 54(7): 1864-79, 2014 Jul 28.

Artigo em Inglês | MEDLINE | ID: mdl-24873983

RESUMO

Knowledge-based systems for toxicity prediction are typically based on rules, known as structural alerts, that describe relationships between structural features and different toxic effects. The identification of structural features associated with toxicological activity can be a time-consuming process and often requires significant input from domain experts. Here, we describe an emerging pattern mining method for the automated identification of activating structural features in toxicity data sets that is designed to help expedite the process of alert development. We apply the contrast pattern tree mining algorithm to generate a set of emerging patterns of structural fragment descriptors. Using the emerging patterns it is possible to form hierarchical clusters of compounds that are defined by the presence of common structural features and represent distinct chemical classes. The method has been tested on a large public in vitro mutagenicity data set and a public hERG channel inhibition data set and is shown to be effective at identifying common toxic features and recognizable classes of toxicants. We also describe how knowledge developers can use emerging patterns to improve the specificity and sensitivity of an existing expert system.

Assuntos

Mineração de Dados/métodos , Toxicologia , Algoritmos , Determinação de Ponto Final , Canais de Potássio Éter-A-Go-Go/antagonistas & inibidores , Testes de Mutagenicidade , Bloqueadores dos Canais de Potássio/toxicidade

5.

Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity.

Webb, Samuel J; Hanser, Thierry; Howlin, Brendan; Krause, Paul; Vessey, Jonathan D.

J Cheminform ; 6(1): 8, 2014 Mar 25.

Artigo em Inglês | MEDLINE | ID: mdl-24661325

RESUMO

BACKGROUND: A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints.A fragmentation algorithm is utilised to investigate the model's behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model's behaviour for the specific query. RESULTS: Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. CONCLUSION: This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development.

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA