Pesquisa | BVS CLAP/SMR-OPAS/OMS

Generalized Read-Across prediction using genra-py.

Shah, Imran; Tate, Tia; Patlewicz, Grace.

Bioinformatics ; 37(19): 3380-3381, 2021 Oct 11.

Artigo em Inglês | MEDLINE | ID: mdl-33772575

RESUMO

MOTIVATION: Generalized Read-Across (GenRA) is a data-driven approach to estimate physico-chemical, biological or eco-toxicological properties of chemicals by inference from analogues. GenRA attempts to mimic a human expert's manual read-across reasoning for filling data gaps about new chemicals from known chemicals with an interpretable and automated approach based on nearest-neighbors. A key objective of GenRA is to systematically explore different choices of input data selection and neighborhood definition to objectively evaluate predictive performance of automated read-across estimates of chemical properties. RESULTS: We have implemented genra-py as a python package that can be freely used for chemical safety analysis and risk assessment applications. Automated read-across prediction in genra-py conforms to the scikit-learn machine learning library's estimator design pattern, making it easy to use and integrate in computational pipelines. We demonstrate the data-driven application of genra-py to address two key human health risk assessment problems namely: hazard identification and point of departure estimation. AVAILABILITY AND IMPLEMENTATION: The package is available from github.com/i-shah/genra-py.

A Comparison of Machine Learning Approaches for predicting Hepatotoxicity potential using Chemical Structure and Targeted Transcriptomic Data.

Tate, Tia; Patlewicz, Grace; Shah, Imran.

Comput Toxicol ; 29: 1-14, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38993502

RESUMO

Animal toxicity testing is time and resource intensive, making it difficult to keep pace with the number of substances requiring assessment. Machine learning (ML) models that use chemical structure information and high-throughput experimental data can be helpful in predicting potential toxicity . However, much of the toxicity data used to train ML models is biased with an unequal balance of positives and negatives primarily since substances selected for in vivo testing are expected to elicit some toxicity effect. To investigate the impact this bias had on predictive performance, various sampling approaches were used to balance in vivo toxicity data as part of a supervised ML workflow to predict hepatotoxicity outcomes from chemical structure and/or targeted transcriptomic data. From the chronic, subchronic, developmental, multigenerational reproductive, and subacute repeat-dose testing toxicity outcomes with a minimum of 50 positive and 50 negative substances, 18 different study-toxicity outcome combinations were evaluated in up to 7 ML models. These included Artificial Neural Networks, Random Forests, Bernouilli Naïve Bayes, Gradient Boosting, and Support Vector classification algorithms which were compared with a local approach, Generalised Read-Across (GenRA), a similarity-weighted k-Nearest Neighbour (k-NN) method. The mean CV F1 performance for unbalanced data across all classifiers and descriptors for chronic liver effects was 0.735 (0.0395 SD). Mean CV F1 performance dropped to 0.639 (0.073 SD) with over-sampling approaches though the poorer performance of KNN approaches in some cases contributed to the observed decrease (mean CV F1 performance excluding KNN was 0.697 (0.072 SD)). With under-sampling approaches, the mean CV F1 was 0.523 (0.083 SD). For developmental liver effects, the mean CV F1 performance was much lower with 0.089 (0.111 SD) for unbalanced approaches and 0.149 (0.084 SD) for under-sampling. Over-sampling approaches led to an increase in mean CV F1 performance (0.234, (0.107 SD)) for developmental liver toxicity. Model performance was found to be dependent on dataset, model type, balancing approach and feature selection. Accordingly tailoring ML workflows for predicting toxicity should consider class imbalance and rely on simpler classifiers first.

Repeat-dose toxicity prediction with Generalized Read-Across (GenRA) using targeted transcriptomic data: A proof-of-concept case study.

Tate, Tia; Wambaugh, John; Patlewicz, Grace; Shah, Imran.

Comput Toxicol ; 19: 1-12, 2021 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-37309449

RESUMO

Read-across is a data gap filling technique utilized to predict the toxicity of a target chemical using data from similar analogues. Recent efforts such as Generalized Read-Across (GenRA) facilitate automated read-across predictions for untested chemicals. GenRA makes predictions of toxicity outcomes based on "neighboring" chemicals characterized by chemical and bioactivity fingerprints. Here we investigated the impact of biological similarities on neighborhood formation and read-across performance in predicting hazard (based on repeat-dose testing outcomes from US EPA ToxRefDB v2.0). We used targeted transcriptomic data on 93 genes for 1060 chemicals in HepaRG™ cells that measure nuclear receptor activation, xenobiotic metabolism, cellular stress, cell cycle progression, and apoptosis. Transcriptomic similarity between chemicals was calculated using binary hit-calls from concentration-response data for each gene. We evaluated GenRA performance in predicting ToxRefDB v2.0 hazard outcomes using the area under the Receiver Operating Characteristic (ROC) curve (AUC) for the baseline approach (chemical fingerprints) versus transcriptomic fingerprints and a combination of both (hybrid). For all endpoints, there were significant but only modest improvements in ROC AUC scores of 0.01 (2.1%) and 0.04 (7.3%) with transcriptomic and hybrid descriptors, respectively. However, for liver-specific toxicity endpoints, ROC AUC scores improved by 10% and 17% for transcriptomic and hybrid descriptors, respectively. Our findings suggest that using hybrid descriptors formed by combining chemical and targeted transcriptomic information can improve in vivo toxicity predictions in the right context.

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA