Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 15 de 15
1.
J Chem Inf Model ; 64(7): 2331-2344, 2024 Apr 08.
Article En | MEDLINE | ID: mdl-37642660

Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.


Benchmarking , Quantitative Structure-Activity Relationship , Biological Assay , Machine Learning
2.
J Med Chem ; 66(20): 14047-14060, 2023 10 26.
Article En | MEDLINE | ID: mdl-37815201

Early in silico assessment of the potential of a series of compounds to deliver a drug is one of the major challenges in computer-assisted drug design. The goal is to identify the right chemical series of compounds out of a large chemical space to then subsequently prioritize the molecules with the highest potential to become a drug. Although multiple approaches to assess compounds have been developed over decades, the quality of these predictors is often not good enough and compounds that agree with the respective estimates are not necessarily druglike. Here, we report a novel deep learning approach that leverages large-scale predictions of ∼100 ADMET assays to assess the potential of a compound to become a relevant drug candidate. The resulting score, which we termed bPK score, substantially outperforms previous approaches and showed strong discriminative performance on data sets where previous approaches did not.


Computer Simulation
3.
J Cheminform ; 13(1): 96, 2021 Dec 07.
Article En | MEDLINE | ID: mdl-34876230

With the increase in applications of machine learning methods in drug design and related fields, the challenge of designing sound test sets becomes more and more prominent. The goal of this challenge is to have a realistic split of chemical structures (compounds) between training, validation and test set such that the performance on the test set is meaningful to infer the performance in a prospective application. This challenge is by its own very interesting and relevant, but is even more complex in a federated machine learning approach where multiple partners jointly train a model under privacy-preserving conditions where chemical structures must not be shared between the different participating parties. In this work we discuss three methods which provide a splitting of a data set and are applicable in a federated privacy-preserving setting, namely: a. locality-sensitive hashing (LSH), b. sphere exclusion clustering, c. scaffold-based binning (scaffold network). For evaluation of these splitting methods we consider the following quality criteria (compared to random splitting): bias in prediction performance, classification label and data imbalance, similarity distance between the test and training set compounds. The main findings of the paper are a. both sphere exclusion clustering and scaffold-based binning result in high quality splitting of the data sets, b. in terms of compute costs sphere exclusion clustering is very expensive in the case of federated privacy-preserving setting.

4.
Molecules ; 26(22)2021 Nov 18.
Article En | MEDLINE | ID: mdl-34834051

Machine learning models predicting the bioactivity of chemical compounds belong nowadays to the standard tools of cheminformaticians and computational medicinal chemists. Multi-task and federated learning are promising machine learning approaches that allow privacy-preserving usage of large amounts of data from diverse sources, which is crucial for achieving good generalization and high-performance results. Using large, real world data sets from six pharmaceutical companies, here we investigate different strategies for averaging weighted task loss functions to train multi-task bioactivity classification models. The weighting strategies shall be suitable for federated learning and ensure that learning efforts are well distributed even if data are diverse. Comparing several approaches using weights that depend on the number of sub-tasks per assay, task size, and class balance, respectively, we find that a simple sub-task weighting approach leads to robust model performance for all investigated data sets and is especially suited for federated learning.


Drug Discovery/methods , Machine Learning , Drug Design , Humans , Small Molecule Libraries/chemistry , Small Molecule Libraries/pharmacology
5.
J Chem Inf Model ; 61(3): 1444-1456, 2021 03 22.
Article En | MEDLINE | ID: mdl-33661004

The understanding of the mechanism-of-action (MoA) of compounds and the prediction of potential drug targets play an important role in small-molecule drug discovery. The aim of this work was to compare chemical and cell morphology information for bioactivity prediction. The comparison was performed using bioactivity data from the ExCAPE database, image data (in the form of CellProfiler features) from the Cell Painting data set (the largest publicly available data set of cell images with ∼30,000 compound perturbations), and extended connectivity fingerprints (ECFPs) using the multitask Bayesian matrix factorization (BMF) approach Macau. We found that the BMF Macau and random forest (RF) performance were overall similar when ECFPs were used as compound descriptors. However, BMF Macau outperformed RF in 159 out of 224 targets (71%) when image data were used as compound information. Using BMF Macau, 100 (corresponding to about 45%) and 90 (about 40%) of the 224 targets were predicted with high predictive performance (AUC > 0.8) with ECFP data and image data as side information, respectively. There were targets better predicted by image data as side information, such as ß-catenin, and others better predicted by fingerprint-based side information, such as proteins belonging to the G-protein-Coupled Receptor 1 family, which could be rationalized from the underlying data distributions in each descriptor domain. In conclusion, both cell morphology changes and chemical structure information contain information about compound bioactivity, which is also partially complementary, and can hence contribute to in silico MoA analysis.


Drug Discovery , Proteins , Bayes Theorem , Computer Simulation , Databases, Factual
6.
J Cheminform ; 12(1): 26, 2020 Apr 19.
Article En | MEDLINE | ID: mdl-33430964

Artificial intelligence (AI) is undergoing a revolution thanks to the breakthroughs of machine learning algorithms in computer vision, speech recognition, natural language processing and generative modelling. Recent works on publicly available pharmaceutical data showed that AI methods are highly promising for Drug Target prediction. However, the quality of public data might be different than that of industry data due to different labs reporting measurements, different measurement techniques, fewer samples and less diverse and specialized assays. As part of a European funded project (ExCAPE), that brought together expertise from pharmaceutical industry, machine learning, and high-performance computing, we investigated how well machine learning models obtained from public data can be transferred to internal pharmaceutical industry data. Our results show that machine learning models trained on public data can indeed maintain their predictive power to a large degree when applied to industry data. Moreover, we observed that deep learning derived machine learning models outperformed comparable models, which were trained by other machine learning algorithms, when applied to internal pharmaceutical company datasets. To our knowledge, this is the first large-scale study evaluating the potential of machine learning and especially deep learning directly at the level of industry-scale settings and moreover investigating the transferability of publicly learned target prediction models towards industrial bioactivity prediction pipelines.

7.
ChemMedChem ; 14(20): 1795-1802, 2019 10 17.
Article En | MEDLINE | ID: mdl-31479198

A significant challenge in high-throughput screening (HTS) campaigns is the identification of assay technology interference compounds. A Compound Interfering with an Assay Technology (CIAT) gives false readouts in many assays. CIATs are often considered viable hits and investigated in follow-up studies, thus impeding research and wasting resources. In this study, we developed a machine-learning (ML) model to predict CIATs for three assay technologies. The model was trained on known CIATs and non-CIATs (NCIATs) identified in artefact assays and described by their 2D structural descriptors. Usual methods identifying CIATs are based on statistical analysis of historical primary screening data and do not consider experimental assays identifying CIATs. Our results show successful prediction of CIATs for existing and novel compounds and provide a complementary and wider set of predicted CIATs compared to BSF, a published structure-independent model, and to the PAINS substructural filters. Our analysis is an example of how well-curated datasets can provide powerful predictive models despite their relatively small size.


High-Throughput Screening Assays , Organic Chemicals/chemistry , Databases, Factual , Machine Learning , Models, Molecular , Molecular Structure , Particle Size
8.
J Cheminform ; 11(1): 54, 2019 Aug 08.
Article En | MEDLINE | ID: mdl-31396716

This study aims at improving upon existing activity predictions methods by augmenting chemical structure fingerprints with bio-activity based fingerprints derived from high-throughput screening (HTS) data (HTSFPs) and thereby showcasing the benefits of combining different descriptor types. This type of descriptor would be applied in an iterative screening scenario for more targeted compound set selection. The HTSFPs were generated from HTS data obtained from PubChem and combined with an ECFP4 structural fingerprint. The bioactivity-structure hybrid (BaSH) fingerprint was benchmarked against the individual ECFP4 and HTSFP fingerprints. Their performance was evaluated via retrospective analysis of a subset of the PubChem HTS data. Results showed that the BaSH fingerprint has improved predictive performance as well as scaffold hopping capability. The BaSH fingerprint identified unique compounds compared to both the ECFP4 and the HTSFP fingerprint indicating synergistic effects between the two fingerprints. A feature importance analysis showed that a small subset of the HTSFP features contribute most to the overall performance of the BaSH fingerprint. This hybrid approach allows for activity prediction of compounds with only sparse HTSFPs due to the supporting effect from the structural fingerprint.

9.
J Chem Inf Model ; 59(3): 962-972, 2019 03 25.
Article En | MEDLINE | ID: mdl-30408959

The volume of high throughput screening data has considerably increased since the beginning of the automated biochemical and cell-based assays era. This information-rich data source provides tremendous repurposing opportunities for data mining. It was recently shown that biochemical or cell-based assay results can be compiled into so-called high-throughput fingerprints (HTSFPs) as a new type of descriptor describing molecular bioactivity profiles which can be applied in virtual screening, iterative screening, and target deconvolution. However, so far, studies around HTSFPs and machine learning have mainly focused on predicting the outcome of molecules in single high-throughput assays, and no one has reported the modeling of compounds' biochemical assay activities toward a panel of target proteins. In this article, we aim at comparing how our in-house HTSFPs perform at this when combined with multitask deep learning versus the single task support vector machine method both in terms of hit identification and of scaffold hopping potential. Performances obtained from the two HTSFP models were reported with respect to the performances of multitask deep learning and support vector machine models built with the structural descriptors ECFP. Moreover, we investigated the effect of high throughput screening false positives and negatives on the performance of the generated models. Our results showed that the two fingerprints yielded in similar performances and diverse hits with very little overlap, thus demonstrating the orthogonality of bioactivity profile-based descriptors with structural descriptors. Therefore, modeling compound activity data using ECFPs together with HTSFPs increases the scaffold hopping potential of the predictive models.


Drug Evaluation, Preclinical/methods , High-Throughput Screening Assays/methods , Machine Learning , Neural Networks, Computer
10.
J Chem Inf Model ; 58(5): 1094-1103, 2018 05 29.
Article En | MEDLINE | ID: mdl-29697977

In this work, a comprehensive analysis of the local geometrical and physicochemical properties of a type III allosteric pocket located between the regulatory αC helix and the activation loop of protein kinases was made by comparing available crystal structures in the structural kinome. We first explored the structural kinome to outline the possible conformations of this site. Subsequently we characterized the positions of cocrystallized ligands of the structural kinome with respect to the structural variability of the allosteric site. Then, we searched for kinase structures with similar allosteric site conformation. The search returned 26 kinases with a DFG-in/αC-out conformation potentially prone to bind allosteric inhibitors, as well as different scaffolds that can be useful starting points for the design of new inhibitors. These promising allosteric pockets were probed by performing molecular docking of known active compounds taken from ChEMBL. Interestingly, none of the active compounds reported in ChEMBL had a purely allosteric binding mode, and none of the ATP-competitive ligands had chemical moieties extending into the allosteric pocket in more than two-thirds of the investigated kinases, indicating that the allosteric pocket is accessible but still largely unexplored by available inhibitors. Finally, we compared the physicochemical properties of the allosteric site in the structural kinome and discussed the peculiar and conserved features. These analyses may help the design of allosteric ligands tailored toward the intended kinase(s).


Chemical Phenomena , Genomics , Molecular Dynamics Simulation , Protein Kinases/chemistry , Protein Kinases/metabolism , Allosteric Site , Ligands , Protein Conformation, alpha-Helical , Protein Domains , Protein Kinase Inhibitors/pharmacology , Protein Kinases/genetics
11.
Planta Med ; 84(5): 304-310, 2018 Mar.
Article En | MEDLINE | ID: mdl-29100267

Recently, we have demonstrated that site comparison methodology using flavonoid biosynthetic enzymes as the query could automatically identify structural features common to different flavonoid-binding proteins, allowing for the identification of flavonoid targets such as protein kinases. With the aim of further validating the hypothesis that biosynthetic enzymes and therapeutic targets can contain a similar natural product imprint, we collected a set of 159 crystallographic structures representing 38 natural product biosynthetic enzymes by searching the Protein Databank. Each enzyme structure was used as a query to screen a repository of approximately 10 000 ligandable sites by active site similarity. We report a full analysis of the screening results and highlight three retrospective examples where the natural product validates the method, thereby revealing novel structural relationships between natural product biosynthetic enzymes and putative protein targets of the natural product. From a prospective perspective, our work provides a list of up to 64 potential novel targets for 25 well-characterized natural products.


Biological Products/metabolism , Catalytic Domain , Databases, Protein , Enzymes/chemistry , Bacterial Proteins/chemistry , Bacterial Proteins/metabolism , Binding Sites , Biological Products/chemistry , Biosynthetic Pathways , Crystallography , Enzymes/metabolism , Fungal Proteins/chemistry , Fungal Proteins/metabolism , Ligands , Molecular Structure , Plant Proteins/chemistry , Plant Proteins/metabolism , Retrospective Studies
12.
Front Pharmacol ; 8: 298, 2017.
Article En | MEDLINE | ID: mdl-28588497

Drug repurposing has become an important branch of drug discovery. Several computational approaches that help to uncover new repurposing opportunities and aid the discovery process have been put forward, or adapted from previous applications. A number of successful examples are now available. Overall, future developments will greatly benefit from integration of different methods, approaches and disciplines. Steps forward in this direction are expected to help to clarify, and therefore to rationally predict, new drug-target, target-disease, and ultimately drug-disease associations.

13.
Future Med Chem ; 8(15): 1871-1885, 2016 Oct.
Article En | MEDLINE | ID: mdl-27629811

AIM: We question the level of detail required in protein 3D-representation to detect site similarity which is relevant for polypharmacology prediction. RESULTS: We modified the in-house program SiteAlign to replace generic pharmacophoric descriptors of cavity-lining amino acids by descriptors accounting for solvent exposure. Benchmarking the novel, atom-based, method (SiteAlign2) revealed no global improvement of performance. However, in the rare cases of no sequence or global structure similarities between the compared proteins, SiteAlign2 was more successful if backbone atoms are key determinants of ligand binding. CONCLUSION: SiteAlign suits the comparison of binding sites for close or distant homologs. SiteAlign2 provides a better insight into the physical model of site similarity between nonhomologs, but at the expense of an increased sensitivity to atomic coordinates.

14.
Planta Med ; 81(6): 467-73, 2015 Apr.
Article En | MEDLINE | ID: mdl-25719942

Natural products are made by nature through interaction with biosynthetic enzymes. They also exert their effect as drugs by interaction with proteins. To address the question "Do biosynthetic enzymes and therapeutic targets share common mechanisms for the molecular recognition of natural products?", we compared the active site of five flavonoid biosynthetic enzymes to 8077 ligandable binding sites in the Protein Data Bank using two three-dimensional-based methods (SiteAlign and Shaper). Virtual screenings efficiently retrieved known flavonoid targets, in particular protein kinases. A consistent performance obtained for variable site descriptions (presence/absence of water, variable boundaries, or small structural changes) indicated that the methods are robust and thus well suited for the identification of potential target proteins of natural products. Finally, our results suggested that flavonoid binding is not primarily driven by shape, but rather by the recognition of common anchoring points.


Enzymes/metabolism , Flavonoids/biosynthesis , Proteins/metabolism , Binding Sites , Databases, Protein , Enzymes/chemistry , Flavonoids/metabolism , Protein Conformation , Proteins/chemistry
15.
J Chem Inf Model ; 52(9): 2410-21, 2012 Sep 24.
Article En | MEDLINE | ID: mdl-22920885

Selectivity is a key factor in drug development. In this paper, we questioned the Protein Data Bank to better understand the reasons for the promiscuity of bioactive compounds. We assembled a data set of >1000 pairs of three-dimensional structures of complexes between a "drug-like" ligand (as its physicochemical properties overlap that of approved drugs) and two distinct "druggable" protein targets (as their binding sites are likely to accommodate "drug-like" ligands). Studying the similarity between the ligand-binding sites in the different targets revealed that the lack of selectivity of a ligand can be due (i) to the fact that Nature has created the same binding pocket in different proteins, which do not necessarily have otherwise sequence or fold similarity, or (ii) to specific characteristics of the ligand itself. In particular, we demonstrated that many ligands can adapt to different protein environments by changing their conformation, by using different chemical moieties to anchor to different targets, or by adopting unusual extreme binding modes (e.g., only apolar contact between the ligand and the protein, even though polar groups are present on the ligand or at the protein surface). Lastly, we provided new elements in support to the recent studies which suggest that the promiscuity of a ligand might be inferred from its molecular complexity.


Menu Planning , Proteins/metabolism , Binding Sites , Computer Graphics , Ligands
...