Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Chem Res Toxicol ; 34(2): 365-384, 2021 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-33351593

RESUMEN

Adverse drug reactions (ADRs) are undesired effects of medicines that can harm patients and are a significant source of attrition in drug development. ADRs are anticipated by routinely screening drugs against secondary pharmacology protein panels. However, there is still a lack of quantitative information on the links between these off-target proteins and the reporting of ADRs in humans. Here, we present a systematic analysis of associations between measured and predicted in vitro bioactivities of drugs and adverse events (AEs) in humans from two sources of data: the Side Effect Resource, derived from clinical trials, and the Food and Drug Administration Adverse Event Reporting System, derived from postmarketing surveillance. The ratio of a drug's therapeutic unbound plasma concentration over the drug's in vitro potency against a given protein was used to select proteins most likely to be relevant to in vivo effects. In examining individual target bioactivities as predictors of AEs, we found a trade-off between the positive predictive value and the fraction of drugs with AEs that can be detected. However, considering sets of multiple targets for the same AE can help identify a greater fraction of AE-associated drugs. Of the 45 targets with statistically significant associations to AEs, 30 are included on existing safety target panels. The remaining 15 targets include 9 carbonic anhydrases, of which CA5B is significantly associated with cholestatic jaundice. We include the full quantitative data on associations between measured and predicted in vitro bioactivities and AEs in humans in this work, which can be used to make a more informed selection of safety profiling targets.


Asunto(s)
Preparaciones Farmacéuticas/química , Proteínas/análisis , Ensayos Clínicos como Asunto , Humanos , Estructura Molecular , Preparaciones Farmacéuticas/sangre , Proteínas/antagonistas & inhibidores , Estados Unidos , United States Food and Drug Administration
2.
Mutagenesis ; 34(1): 3-16, 2019 03 06.
Artículo en Inglés | MEDLINE | ID: mdl-30357358

RESUMEN

The International Conference on Harmonization (ICH) M7 guideline allows the use of in silico approaches for predicting Ames mutagenicity for the initial assessment of impurities in pharmaceuticals. This is the first international guideline that addresses the use of quantitative structure-activity relationship (QSAR) models in lieu of actual toxicological studies for human health assessment. Therefore, QSAR models for Ames mutagenicity now require higher predictive power for identifying mutagenic chemicals. To increase the predictive power of QSAR models, larger experimental datasets from reliable sources are required. The Division of Genetics and Mutagenesis, National Institute of Health Sciences (DGM/NIHS) of Japan recently established a unique proprietary Ames mutagenicity database containing 12140 new chemicals that have not been previously used for developing QSAR models. The DGM/NIHS provided this Ames database to QSAR vendors to validate and improve their QSAR tools. The Ames/QSAR International Challenge Project was initiated in 2014 with 12 QSAR vendors testing 17 QSAR tools against these compounds in three phases. We now present the final results. All tools were considerably improved by participation in this project. Most tools achieved >50% sensitivity (positive prediction among all Ames positives) and predictive power (accuracy) was as high as 80%, almost equivalent to the inter-laboratory reproducibility of Ames tests. To further increase the predictive power of QSAR tools, accumulation of additional Ames test data is required as well as re-evaluation of some previous Ames test results. Indeed, some Ames-positive or Ames-negative chemicals may have previously been incorrectly classified because of methodological weakness, resulting in false-positive or false-negative predictions by QSAR tools. These incorrect data hamper prediction and are a source of noise in the development of QSAR models. It is thus essential to establish a large benchmark database consisting only of well-validated Ames test results to build more accurate QSAR models.


Asunto(s)
Mutagénesis/efectos de los fármacos , Mutágenos/toxicidad , Relación Estructura-Actividad Cuantitativa , Simulación por Computador , Bases de Datos Factuales , Humanos , Japón , Pruebas de Mutagenicidad
3.
Regul Toxicol Pharmacol ; 76: 7-20, 2016 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-26708083

RESUMEN

The relative wealth of bacterial mutagenicity data available in the public literature means that in silico quantitative/qualitative structure activity relationship (QSAR) systems can readily be built for this endpoint. A good means of evaluating the performance of such systems is to use private unpublished data sets, which generally represent a more distinct chemical space than publicly available test sets and, as a result, provide a greater challenge to the model. However, raw performance metrics should not be the only factor considered when judging this type of software since expert interpretation of the results obtained may allow for further improvements in predictivity. Enough information should be provided by a QSAR to allow the user to make general, scientifically-based arguments in order to assess and overrule predictions when necessary. With all this in mind, we sought to validate the performance of the statistics-based in vitro bacterial mutagenicity prediction system Sarah Nexus (version 1.1) against private test data sets supplied by nine different pharmaceutical companies. The results of these evaluations were then analysed in order to identify findings presented by the model which would be useful for the user to take into consideration when interpreting the results and making their final decision about the mutagenic potential of a given compound.


Asunto(s)
Modelos Estadísticos , Mutagénesis , Pruebas de Mutagenicidad/estadística & datos numéricos , Mutación , Relación Estructura-Actividad Cuantitativa , Algoritmos , Animales , ADN Bacteriano/efectos de los fármacos , ADN Bacteriano/genética , Bases de Datos Factuales , Técnicas de Apoyo para la Decisión , Humanos , Reproducibilidad de los Resultados , Medición de Riesgo , Programas Informáticos
4.
J Chem Inf Model ; 54(7): 1864-79, 2014 Jul 28.
Artículo en Inglés | MEDLINE | ID: mdl-24873983

RESUMEN

Knowledge-based systems for toxicity prediction are typically based on rules, known as structural alerts, that describe relationships between structural features and different toxic effects. The identification of structural features associated with toxicological activity can be a time-consuming process and often requires significant input from domain experts. Here, we describe an emerging pattern mining method for the automated identification of activating structural features in toxicity data sets that is designed to help expedite the process of alert development. We apply the contrast pattern tree mining algorithm to generate a set of emerging patterns of structural fragment descriptors. Using the emerging patterns it is possible to form hierarchical clusters of compounds that are defined by the presence of common structural features and represent distinct chemical classes. The method has been tested on a large public in vitro mutagenicity data set and a public hERG channel inhibition data set and is shown to be effective at identifying common toxic features and recognizable classes of toxicants. We also describe how knowledge developers can use emerging patterns to improve the specificity and sensitivity of an existing expert system.


Asunto(s)
Minería de Datos/métodos , Toxicología , Algoritmos , Determinación de Punto Final , Canales de Potasio Éter-A-Go-Go/antagonistas & inhibidores , Pruebas de Mutagenicidad , Bloqueadores de los Canales de Potasio/toxicidad
5.
J Cheminform ; 16(1): 43, 2024 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-38622648

RESUMEN

Multiple metrics are used when assessing and validating the performance of quantitative structure-activity relationship (QSAR) models. In the case of binary classification, balanced accuracy is a metric to assess the global performance of such models. In contrast to accuracy, balanced accuracy does not depend on the respective prevalence of the two categories in the test set that is used to validate a QSAR classifier. As such, balanced accuracy is used to overcome the effect of imbalanced test sets on the model's perceived accuracy. Matthews' correlation coefficient (MCC), an alternative global performance metric, is also known to mitigate the imbalance of the test set. However, in contrast to the balanced accuracy, MCC remains dependent on the respective prevalence of the predicted categories. For simplicity, the rest of this work is based on the positive prevalence. The MCC value may be underestimated at high or extremely low positive prevalence. It contributes to more challenging comparisons between experiments using test sets with different positive prevalences and may lead to incorrect interpretations. The concept of balanced metrics beyond balanced accuracy is, to the best of our knowledge, not yet described in the cheminformatic literature. Therefore, after describing the relevant literature, this manuscript will first formally define a confusion matrix, sensitivity and specificity and then present, with synthetic data, the danger of comparing performance metrics under nonconstant prevalence. Second, it will demonstrate that balanced accuracy is the performance metric accuracy calibrated to a test set with a positive prevalence of 50% (i.e., balanced test set). This concept of balanced accuracy will then be extended to the MCC after showing its dependency on the positive prevalence. Applying the same concept to any other performance metric and widening it to the concept of calibrated metrics will then be briefly discussed. We will show that, like balanced accuracy, any balanced performance metric may be expressed as a function of the well-known values of sensitivity and specificity. Finally, a tale of two MCCs will exemplify the use of this concept of balanced MCC versus MCC with four use cases using synthetic data. SCIENTIFIC CONTRIBUTION: This work provides a formal, unified framework for understanding prevalence dependence in model validation metrics, deriving balanced metric expressions beyond balanced accuracy, and demonstrating their practical utility for common use cases. In contrast to prior literature, it introduces the derived confusion matrix to express metrics as functions of sensitivity, specificity and prevalence without needing additional coefficients. The manuscript extends the concept of balanced metrics to Matthews' correlation coefficient and other widely used performance indicators, enabling robust comparisons under prevalence shifts.

6.
Curr Opin Struct Biol ; 79: 102545, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36804704

RESUMEN

Federated Learning enables machine learning across multiple sources of data and alleviates the risk of leaking private information between partners thereby encouraging knowledge sharing and collaborative modelling. Hence, Federated Learning opens the ways to a new generation of improved models. Domains involving molecular informatics, like Drug Discovery, are progressively adopting Federated Learning; this review describes the main projects and applications of Federated Learning for molecular discovery with a special focus on their benefits and the remaining challenges. All the studies demonstrate a real benefit of Federated Learning, namely the improvement of the performance of models as well as their applicability domain thanks to knowledge aggregation. The selected publications also reveal several remaining challenges to be addressed to fully exploit Federated Learning.


Asunto(s)
Descubrimiento de Drogas , Aprendizaje Automático
7.
J Cheminform ; 15(1): 47, 2023 Apr 17.
Artículo en Inglés | MEDLINE | ID: mdl-37069675

RESUMEN

INTRODUCTION AND METHODOLOGY: Pairs of similar compounds that only differ by a small structural modification but exhibit a large difference in their binding affinity for a given target are known as activity cliffs (ACs). It has been hypothesised that QSAR models struggle to predict ACs and that ACs thus form a major source of prediction error. However, the AC-prediction power of modern QSAR methods and its quantitative relationship to general QSAR-prediction performance is still underexplored. We systematically construct nine distinct QSAR models by combining three molecular representation methods (extended-connectivity fingerprints, physicochemical-descriptor vectors and graph isomorphism networks) with three regression techniques (random forests, k-nearest neighbours and multilayer perceptrons); we then use each resulting model to classify pairs of similar compounds as ACs or non-ACs and to predict the activities of individual molecules in three case studies: dopamine receptor D2, factor Xa, and SARS-CoV-2 main protease. RESULTS AND CONCLUSIONS: Our results provide strong support for the hypothesis that indeed QSAR models frequently fail to predict ACs. We observe low AC-sensitivity amongst the evaluated models when the activities of both compounds are unknown, but a substantial increase in AC-sensitivity when the actual activity of one of the compounds is given. Graph isomorphism features are found to be competitive with or superior to classical molecular representations for AC-classification and can thus be employed as baseline AC-prediction models or simple compound-optimisation tools. For general QSAR-prediction, however, extended-connectivity fingerprints still consistently deliver the best performance amongs the tested input representations. A potential future pathway to improve QSAR-modelling performance might be the development of techniques to increase AC-sensitivity.

9.
J Cheminform ; 11(1): 9, 2019 Feb 02.
Artículo en Inglés | MEDLINE | ID: mdl-30712151

RESUMEN

In this paper, we explore the impact of combining different in silico prediction approaches and data sources on the predictive performance of the resulting system. We use inhibition of the hERG ion channel target as the endpoint for this study as it constitutes a key safety concern in drug development and a potential cause of attrition. We will show that combining data sources can improve the relevance of the training set in regard of the target chemical space, leading to improved performance. Similarly we will demonstrate that combining multiple statistical models together, and with expert systems, can lead to positive synergistic effects when taking into account the confidence in the predictions of the merged systems. The best combinations analyzed display a good hERG predictivity. Finally, this work demonstrates the suitability of the SOHN methodology for building models in the context of receptor based endpoints like hERG inhibition when using the appropriate pharmacophoric descriptors.

10.
J Cheminform ; 6(1): 8, 2014 Mar 25.
Artículo en Inglés | MEDLINE | ID: mdl-24661325

RESUMEN

BACKGROUND: A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints.A fragmentation algorithm is utilised to investigate the model's behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model's behaviour for the specific query. RESULTS: Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. CONCLUSION: This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development.

11.
J Cheminform ; 6: 21, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24959206

RESUMEN

BACKGROUND: Combining different sources of knowledge to build improved structure activity relationship models is not easy owing to the variety of knowledge formats and the absence of a common framework to interoperate between learning techniques. Most of the current approaches address this problem by using consensus models that operate at the prediction level. We explore the possibility to directly combine these sources at the knowledge level, with the aim to harvest potentially increased synergy at an earlier stage. Our goal is to design a general methodology to facilitate knowledge discovery and produce accurate and interpretable models. RESULTS: To combine models at the knowledge level, we propose to decouple the learning phase from the knowledge application phase using a pivot representation (lingua franca) based on the concept of hypothesis. A hypothesis is a simple and interpretable knowledge unit. Regardless of its origin, knowledge is broken down into a collection of hypotheses. These hypotheses are subsequently organised into hierarchical network. This unification permits to combine different sources of knowledge into a common formalised framework. The approach allows us to create a synergistic system between different forms of knowledge and new algorithms can be applied to leverage this unified model. This first article focuses on the general principle of the Self Organising Hypothesis Network (SOHN) approach in the context of binary classification problems along with an illustrative application to the prediction of mutagenicity. CONCLUSION: It is possible to represent knowledge in the unified form of a hypothesis network allowing interpretable predictions with performances comparable to mainstream machine learning techniques. This new approach offers the potential to combine knowledge from different sources into a common framework in which high level reasoning and meta-learning can be applied; these latter perspectives will be explored in future work.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA