Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioorg Med Chem ; 46: 116388, 2021 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-34488021

RESUMO

The vast majority of approved drugs are metabolized by the five major cytochrome P450 (CYP) isozymes, 1A2, 2C9, 2C19, 2D6 and 3A4. Inhibition of CYP isozymes can cause drug-drug interactions with severe pharmacological and toxicological consequences. Computational methods for the fast and reliable prediction of the inhibition of CYP isozymes by small molecules are therefore of high interest and relevance to pharmaceutical companies and a host of other industries, including the cosmetics and agrochemical industries. Today, a large number of machine learning models for predicting the inhibition of the major CYP isozymes by small molecules are available. With this work we aim to go beyond the coverage of existing models, by combining data from several major public and proprietary sources. More specifically, we used up to 18815 compounds with measured bioactivities to train random forest classification models for the individual CYP isozymes. A major advantage of the new data collection over existing ones is the better representation of the minority class, the CYP inhibitors. With the new data collection we achieved inhibitor-to-non-inhibitor ratios in the order of 1:1 (CYP1A2) to 1:3 (CYP2D6). We show that our models reach competitive performance on external data, with Matthews correlation coefficients (MCCs) ranging from 0.62 (CYP2C19) to 0.70 (CYP2D6), and areas under the receiver operating characteristic curve (AUCs) between 0.89 (CYP2C19) and 0.92 (CYPs 2D6 and 3A4). Importantly, the models show a high level of robustness, reflected in a good predictivity also for compounds that are structurally dissimilar to the compounds represented in the training data. The best models presented in this work are freely accessible for academic research via a web service.

2.
Int J Mol Sci ; 22(15)2021 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-34360558

RESUMO

Experimental screening of large sets of compounds against macromolecular targets is a key strategy to identify novel bioactivities. However, large-scale screening requires substantial experimental resources and is time-consuming and challenging. Therefore, small to medium-sized compound libraries with a high chance of producing genuine hits on an arbitrary protein of interest would be of great value to fields related to early drug discovery, in particular biochemical and cell research. Here, we present a computational approach that incorporates drug-likeness, predicted bioactivities, biological space coverage, and target novelty, to generate optimized compound libraries with maximized chances of producing genuine hits for a wide range of proteins. The computational approach evaluates drug-likeness with a set of established rules, predicts bioactivities with a validated, similarity-based approach, and optimizes the composition of small sets of compounds towards maximum target coverage and novelty. We found that, in comparison to the random selection of compounds for a library, our approach generates substantially improved compound sets. Quantified as the "fitness" of compound libraries, the calculated improvements ranged from +60% (for a library of 15,000 compounds) to +184% (for a library of 1000 compounds). The best of the optimized compound libraries prepared in this work are available for download as a dataset bundle ("BonMOLière").


Assuntos
Algoritmos , Descoberta de Drogas , Ensaios de Triagem em Larga Escala/normas , Proteínas/química , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologia , Avaliação Pré-Clínica de Medicamentos , Ensaios de Triagem em Larga Escala/métodos , Humanos
3.
Molecules ; 26(15)2021 Aug 02.
Artigo em Inglês | MEDLINE | ID: mdl-34361831

RESUMO

The interaction of small organic molecules such as drugs, agrochemicals, and cosmetics with cytochrome P450 enzymes (CYPs) can lead to substantial changes in the bioavailability of active substances and hence consequences with respect to pharmacological efficacy and toxicity. Therefore, efficient means of predicting the interactions of small organic molecules with CYPs are of high importance to a host of different industries. In this work, we present a new set of machine learning models for the classification of xenobiotics into substrates and non-substrates of nine human CYP isozymes: CYPs 1A2, 2A6, 2B6, 2C8, 2C9, 2C19, 2D6, 2E1, and 3A4. The models are trained on an extended, high-quality collection of known substrates and non-substrates and have been subjected to thorough validation. Our results show that the models yield competitive performance and are favorable for the detection of CYP substrates. In particular, a new consensus model reached high performance, with Matthews correlation coefficients (MCCs) between 0.45 (CYP2C8) and 0.85 (CYP3A4), although at the cost of coverage. The best models presented in this work are accessible free of charge via the "CYPstrate" module of the New E-Resource for Drug Discovery (NERDD).


Assuntos
Sistema Enzimático do Citocromo P-450/metabolismo , Aprendizado de Máquina , Xenobióticos/classificação , Xenobióticos/metabolismo , Animais , Humanos , Especificidade por Substrato
4.
Pharmaceuticals (Basel) ; 14(8)2021 Aug 11.
Artigo em Inglês | MEDLINE | ID: mdl-34451887

RESUMO

In recent years, a number of machine learning models for the prediction of the skin sensitization potential of small organic molecules have been reported and become available. These models generally perform well within their applicability domains but, as a result of the use of molecular fingerprints and other non-intuitive descriptors, the interpretability of the existing models is limited. The aim of this work is to develop a strategy to replace the non-intuitive features by predicted outcomes of bioassays. We show that such replacement is indeed possible and that as few as ten interpretable, predicted bioactivities are sufficient to reach competitive performance. On a holdout data set of 257 compounds, the best model ("Skin Doctor CP:Bio") obtained an efficiency of 0.82 and an MCC of 0.52 (at the significance level of 0.20). Skin Doctor CP:Bio is available free of charge for academic research. The modeling strategies explored in this work are easily transferable and could be adopted for the development of more interpretable machine learning models for the prediction of the bioactivity and toxicity of small organic compounds.

5.
Chem Res Toxicol ; 34(2): 330-344, 2021 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-33295759

RESUMO

Skin sensitization potential or potency is an important end point in the safety assessment of new chemicals and new chemical mixtures. Formerly, animal experiments such as the local lymph node assay (LLNA) were the main form of assessment. Today, however, the focus lies on the development of nonanimal testing approaches (i.e., in vitro and in chemico assays) and computational models. In this work, we investigate, based on publicly available LLNA data, the ability of aggregated, Mondrian conformal prediction classifiers to differentiate between non- sensitizing and sensitizing compounds as well as between two levels of skin sensitization potential (weak to moderate sensitizers, and strong to extreme sensitizers). The advantage of the conformal prediction framework over other modeling approaches is that it assigns compounds to activity classes only if a defined minimum level of confidence is reached for the individual predictions. This eliminates the need for applicability domain criteria that often are arbitrary in their nature and less flexible. Our new binary classifier, named Skin Doctor CP, differentiates nonsensitizers from sensitizers with a higher reliability-to-efficiency ratio than the corresponding nonconformal prediction workflow that we presented earlier. When tested on a set of 257 compounds at the significance levels of 0.10 and 0.30, the model reached an efficiency of 0.49 and 0.92, and an accuracy of 0.83 and 0.75, respectively. In addition, we developed a ternary classification workflow to differentiate nonsensitizers, weak to moderate sensitizers, and strong to extreme sensitizers. Although this model achieved satisfactory overall performance (accuracies of 0.90 and 0.73, and efficiencies of 0.42 and 0.90, at significance levels 0.10 and 0.30, respectively), it did not obtain satisfying class-wise results (at a significance level of 0.30, the validities obtained for nonsensitizers, weak to moderate sensitizers, and strong to extreme sensitizers were 0.70, 0.58, and 0.63, respectively). We argue that the model is, in consequence, unable to reliably identify strong to extreme sensitizers and suggest that other ternary models derived from the currently accessible LLNA data might suffer from the same problem. Skin Doctor CP is available via a public web service at https://nerdd.zbh.uni-hamburg.de/skinDoctorII/.

6.
Bioinformatics ; 36(4): 1291-1292, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-32077475

RESUMO

SUMMARY: The New E-Resource for Drug Discovery (NERDD) is a quickly expanding web portal focused on the provision of peer-reviewed in silico tools for drug discovery. NERDD currently hosts tools for predicting the sites of metabolism (FAME) and metabolites (GLORY) of small organic molecules, for flagging compounds that are likely to interfere with biological assays (Hit Dexter), and for identifying natural products and natural product derivatives in large compound collections (NP-Scout). Several additional models and components are currently in development. AVAILABILITY AND IMPLEMENTATION: The NERDD web server is available at https://nerdd.zbh.uni-hamburg.de. Most tools are also available as software packages for local installation.


Assuntos
Produtos Biológicos , Descoberta de Drogas , Simulação por Computador , Computadores , Internet , Software
7.
Mol Inform ; 39(4): e1900103, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31663691

RESUMO

Protein flexibility and solvation pose major challenges to docking algorithms and scoring functions. One established strategy for addressing these challenges is to use multiple protein conformations for docking (all-against-all ensemble docking). Recent studies have shown that the performance of ensemble docking can be improved by selecting the most relevant protein structures for docking. In search for a robust approach to protein structure selection, we have come up with an integrated mAchine Learning AnD DockINg approach (ALADDIN). ALADDIN employs a battery of random forest classifiers to select, individually for each compound of interest, from an ensemble of protein structures, the single most suitable protein structure for docking. ALADDIN outperformed the best single-structure docking runs, ensemble docking and a similarity-based docking approach on three out of four investigated targets, with up to 0.15, 0.11 and 0.16 higher area under the receiver operating characteristic curve (AUC) values, respectively. Only in the case of cytochrome P450 3A4, ALADDIN, like any of the other tested approaches, failed to obtain decent performance. ALADDIN can be particularly useful for structure-based virtual screening of malleable proteins, including kinases, some viral enzymes and anti-targets.


Assuntos
Aprendizado de Máquina , Simulação de Acoplamento Molecular , Proteínas/química , Algoritmos , Conformação Proteica
8.
Int J Mol Sci ; 20(19)2019 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-31569429

RESUMO

The ability to predict the skin sensitization potential of small organic molecules is of high importance to the development and safe application of cosmetics, drugs and pesticides. One of the most widely accepted methods for predicting this hazard is the local lymph node assay (LLNA). The goal of this work was to develop in silico models for the prediction of the skin sensitization potential of small molecules that go beyond the state of the art, with larger LLNA data sets and, most importantly, a robust and intuitive definition of the applicability domain, paired with additional indicators of the reliability of predictions. We explored a large variety of molecular descriptors and fingerprints in combination with random forest and support vector machine classifiers. The most suitable models were tested on holdout data, on which they yielded competitive performance (Matthews correlation coefficients up to 0.52; accuracies up to 0.76; areas under the receiver operating characteristic curves up to 0.83). The most favorable models are available via a public web service that, in addition to predictions, provides assessments of the applicability domain and indicators of the reliability of the individual predictions.


Assuntos
Imunização , Ensaio Local de Linfonodo , Aprendizado de Máquina , Pele/efeitos dos fármacos , Pele/imunologia , Cosméticos/efeitos adversos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Mimetismo Molecular , Prognóstico , Reprodutibilidade dos Testes
9.
J Chem Inf Model ; 59(8): 3400-3412, 2019 08 26.
Artigo em Inglês | MEDLINE | ID: mdl-31361490

RESUMO

In this work we present the third generation of FAst MEtabolizer (FAME 3), a collection of extra trees classifiers for the prediction of sites of metabolism (SoMs) in small molecules such as drugs, druglike compounds, natural products, agrochemicals, and cosmetics. FAME 3 was derived from the MetaQSAR database ( Pedretti et al. J. Med. Chem. 2018 , 61 , 1019 ), a recently published data resource on xenobiotic metabolism that contains more than 2100 substrates annotated with more than 6300 experimentally confirmed SoMs related to redox reactions, hydrolysis and other nonredox reactions, and conjugation reactions. In tests with holdout data, FAME 3 models reached competitive performance, with Matthews correlation coefficients (MCCs) ranging from 0.50 for a global model covering phase 1 and phase 2 metabolism, to 0.75 for a focused model for phase 2 metabolism. A model focused on cytochrome P450 metabolism yielded an MCC of 0.57. Results from case studies with several synthetic compounds, natural products, and natural product derivatives demonstrate the agreement between model predictions and literature data even for molecules with structural patterns clearly distinct from those present in the training data. The applicability domains of the individual models were estimated by a new, atom-based distance measure (FAMEscore) that is based on a nearest-neighbor search in the space of atom environments. FAME 3 is available via a public web service at https://nerdd.zbh.uni-hamburg.de/ and as a self-contained Java software package, free for academic and noncommercial research.


Assuntos
Produtos Biológicos/metabolismo , Biologia Computacional/métodos , Enzimas/metabolismo , Sítios de Ligação , Bases de Dados de Produtos Farmacêuticos , Enzimas/química
10.
Front Chem ; 7: 402, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31249827

RESUMO

Computational prediction of xenobiotic metabolism can provide valuable information to guide the development of drugs, cosmetics, agrochemicals, and other chemical entities. We have previously developed FAME 2, an effective tool for predicting sites of metabolism (SoMs). In this work, we focus on the prediction of the chemical structures of metabolites, in particular metabolites of xenobiotics. To this end, we have developed a new tool, GLORY, which combines SoM prediction with FAME 2 and a new collection of rules for metabolic reactions mediated by the cytochrome P450 enzyme family. GLORY has two modes: MaxEfficiency and MaxCoverage. For MaxEfficiency mode, the use of predicted SoMs to restrict the locations in the molecule at which the reaction rules could be applied was explored. For MaxCoverage mode, the predicted SoM probabilities were instead used to develop a new scoring approach for the predicted metabolites. With this scoring approach, GLORY achieves a recall of 0.83 and can predict at least one known metabolite within the top three ranked positions for 76% of the molecules of a new, manually curated test set. GLORY is freely available as a web server at https://acm.zbh.uni-hamburg.de/glory/, and the datasets and reaction rules are provided in the Supplementary Material.

11.
J Chem Inf Model ; 59(3): 1030-1043, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-30624935

RESUMO

Assay interference caused by small molecules continues to pose a significant challenge for early drug discovery. A number of rule-based and similarity-based approaches have been derived that allow the flagging of potentially "badly behaving compounds", "bad actors", or "nuisance compounds". These compounds are typically aggregators, reactive compounds, and/or pan-assay interference compounds (PAINS), and many of them are frequent hitters. Hit Dexter is a recently introduced machine learning approach that predicts frequent hitters independent of the underlying physicochemical mechanisms (including also the binding of compounds based on "privileged scaffolds" to multiple binding sites). Here we report on the development of a second generation of machine learning models which now covers both primary screening assays and confirmatory dose-response assays. Protein sequence clustering was newly introduced to minimize the overrepresentation of structurally and functionally related proteins. The models correctly classified compounds of large independent test sets as (highly) promiscuous or nonpromiscuous with Matthews correlation coefficient (MCC) values of up to 0.64 and area under the receiver operating characteristic curve (AUC) values of up to 0.96. The models were also utilized to characterize sets of compounds with specific biological and physicochemical properties, such as dark chemical matter, aggregators, compounds from a high-throughput screening library, drug-like compounds, approved drugs, potential PAINS, and natural products. Among the most interesting outcomes is that the new Hit Dexter models predict the presence of large fractions of (highly) promiscuous compounds among approved drugs. Importantly, predictions of the individual Hit Dexter models are generally in good agreement and consistent with those of Badapple, an established statistical model for the prediction of frequent hitters. The new Hit Dexter 2.0 web service, available at http://hitdexter2.zbh.uni-hamburg.de , not only provides user-friendly access to all machine learning models presented in this work but also to similarity-based methods for the prediction of aggregators and dark chemical matter as well as a comprehensive collection of available rule sets for flagging frequent hitters and compounds including undesired substructures.


Assuntos
Aprendizado de Máquina , Preparações Farmacêuticas/química , Proteínas/química , Sítios de Ligação , Bases de Dados de Produtos Farmacêuticos , Ensaios de Triagem em Larga Escala/métodos , Modelos Moleculares , Ligação Proteica , Curva ROC , Bibliotecas de Moléculas Pequenas/química
12.
Biomolecules ; 9(2)2019 01 24.
Artigo em Inglês | MEDLINE | ID: mdl-30682850

RESUMO

Natural products (NPs) remain the most prolific resource for the development of smallmolecule drugs. Here we report a new machine learning approach that allows the identification of natural products with high accuracy. The method also generates similarity maps, which highlight atoms that contribute significantly to the classification of small molecules as a natural product or synthetic molecule. The method can hence be utilized to (i) identify natural products in large molecular libraries, (ii) quantify the natural product-likeness of small molecules, and (iii) visualize atoms in small molecules that are characteristic of natural products or synthetic molecules. The models are based on random forest classifiers trained on data sets consisting of more than 265,000 to 322,000 natural products and synthetic molecules. Two-dimensional molecular descriptors, MACCS keys and Morgan2 fingerprints were explored. On an independent test set the models reached areas under the receiver operating characteristic curve (AUC) of 0.997 and Matthews correlation coefficients (MCCs) of 0.954 and higher. The method was further tested on data from the Dictionary of Natural Products, ChEMBL and other resources. The best-performing models are accessible as a free web service at http://npscout.zbh.uni-hamburg.de/npscout.


Assuntos
Produtos Biológicos/química , Aprendizado de Máquina , Bibliotecas de Moléculas Pequenas/química , Estrutura Molecular
14.
ChemMedChem ; 13(6): 564-571, 2018 03 20.
Artigo em Inglês | MEDLINE | ID: mdl-29285887

RESUMO

False-positive assay readouts caused by badly behaving compounds-frequent hitters, pan-assay interference compounds (PAINS), aggregators, and others-continue to pose a major challenge to experimental screening. There are only a few in silico methods that allow the prediction of such problematic compounds. We report the development of Hit Dexter, two extremely randomized trees classifiers for the prediction of compounds likely to trigger positive assay readouts either by true promiscuity or by assay interference. The models were trained on a well-prepared dataset extracted from the PubChem Bioassay database, consisting of approximately 311 000 compounds tested for activity on at least 50 proteins. Hit Dexter reached MCC and AUC values of up to 0.67 and 0.96 on an independent test set, respectively. The models are expected to be of high value, in particular to medicinal chemists and biochemists who can use Hit Dexter to identify compounds for which extra caution should be exercised with positive assay readouts. Hit Dexter is available as a free web service at http://hitdexter.zbh. uni-hamburg.de.


Assuntos
Ensaios de Triagem em Larga Escala/métodos , Aprendizado de Máquina , Simulação por Computador , Bases de Dados Factuais , Reações Falso-Positivas , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologia
15.
J Chem Inf Model ; 57(8): 1832-1846, 2017 08 28.
Artigo em Inglês | MEDLINE | ID: mdl-28782945

RESUMO

We report on the further development of FAst MEtabolizer (FAME; J. Chem. Inf. MODEL: 2013, 53, 2896-2907), a collection of random forest models for the prediction of sites of metabolism (SoMs) of xenobiotics. A broad set of descriptors was explored, from simple 2D descriptors such as those used in FAME, to quantum chemical descriptors employed in some of the most accurate models for SoM prediction currently available. In line with the original FAME approach, our objective was to keep things simple and to come up with accurate and robust models that are based on a small number of 2D descriptors. We found that circular descriptions of atoms and their environments with such descriptors in combination with an extremely randomized trees algorithm can yield models that perform equally well compared to more complex approaches. Thorough evaluation experiments on an independent test set showed that the best of these models obtained a Matthews correlation coefficient, area under the receiver operating characteristic curve, and Top-2 accuracy of 0.57, 0.91 and 94.1%, respectively. Models for the prediction of isoform-specific regioselectivity of CYP 3A4, 2D6, and 2C9 were also developed and showed competitive performance. The best models have been integrated into a newly developed software package (FAME 2), which is available free of charge from the authors.


Assuntos
Biologia Computacional/métodos , Sistema Enzimático do Citocromo P-450/metabolismo , Aprendizado de Máquina , Software , Estereoisomerismo , Especificidade por Substrato , Xenobióticos/química , Xenobióticos/metabolismo
16.
Chemphyschem ; 18(6): 596-609, 2017 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-28092133

RESUMO

Attaching an organometallic unit to a dithienylethene (DTE) molecular switch can allow one to vary its switching and spectroscopic properties, and to create switchable magnetic properties. In this work, two different dithienylethene molecular switches are used as a bridge between two cobalt sandwich units. The only difference between the switching cores is in the size of the cycloalkene ring connecting both thiophene rings. The complexes present different oxidation states for the cobalt atoms, which are demonstrated to determine the switching reaction. The UV/Vis measurements show that while the Co(I) complexes undergo the switching reaction, the Co(II,III) complexes switch poorly. Kohn-Sham density functional theory calculations indicate diabatic ring-closure mechanisms and a large number of excited states hindering the cyclization reaction and favoring the relaxation to the open form of the molecular switch.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...