Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 53
Filter
1.
J Cheminform ; 16(1): 40, 2024 Apr 07.
Article in English | MEDLINE | ID: mdl-38582911

ABSTRACT

Poly ADP-ribose polymerase 1 (PARP1) is an attractive therapeutic target for cancer treatment. Machine-learning scoring functions constitute a promising approach to discovering novel PARP1 inhibitors. Cutting-edge PARP1-specific machine-learning scoring functions were investigated using semi-synthetic training data from docking activity-labelled molecules: known PARP1 inhibitors, hard-to-discriminate decoys property-matched to them with generative graph neural networks and confirmed inactives. We further made test sets harder by including only molecules dissimilar to those in the training set. Comprehensive analysis of these datasets using five supervised learning algorithms, and protein-ligand fingerprints extracted from docking poses and ligand only features revealed one highly predictive scoring function. This is the PARP1-specific support vector machine-based regressor, when employing PLEC fingerprints, which achieved a high Normalized Enrichment Factor at the top 1% on the hardest test set (NEF1% = 0.588, median of 10 repetitions), and was more predictive than any other investigated scoring function, especially the classical scoring function employed as baseline.

2.
Health Data Sci ; 4: 0108, 2024.
Article in English | MEDLINE | ID: mdl-38486621

ABSTRACT

Background: Gemcitabine is a first-line chemotherapy for pancreatic adenocarcinoma (PAAD), but many PAAD patients do not respond to gemcitabine-containing treatments. Being able to predict such nonresponders would hence permit the undelayed administration of more promising treatments while sparing gemcitabine life-threatening side effects for those patients. Unfortunately, the few predictors of PAAD patient response to this drug are weak, none of them exploiting yet the power of machine learning (ML). Methods: Here, we applied ML to predict the response of PAAD patients to gemcitabine from the molecular profiles of their tumors. More concretely, we collected diverse molecular profiles of PAAD patient tumors along with the corresponding clinical data (gemcitabine responses and clinical features) from the Genomic Data Commons resource. From systematically combining 8 tumor profiles with 16 classification algorithms, each of the resulting 128 ML models was evaluated by multiple 10-fold cross-validations. Results: Only 7 of these 128 models were predictive, which underlines the importance of carrying out such a large-scale analysis to avoid missing the most predictive models. These were here random forest using 4 selected mRNAs [0.44 Matthews correlation coefficient (MCC), 0.785 receiver operating characteristic-area under the curve (ROC-AUC)] and XGBoost combining 12 DNA methylation probes (0.32 MCC, 0.697 ROC-AUC). By contrast, the hENT1 marker obtained much worse random-level performance (practically 0 MCC, 0.5 ROC-AUC). Despite not being trained to predict prognosis (overall and progression-free survival), these ML models were also able to anticipate this patient outcome. Conclusions: We release these promising ML models so that they can be evaluated prospectively on other gemcitabine-treated PAAD patients.

3.
J Adv Res ; 2024 Jan 26.
Article in English | MEDLINE | ID: mdl-38280715

ABSTRACT

INTRODUCTION: Small-molecule Programmable Cell Death Protein 1/Programmable Death-Ligand 1 (PD1/PDL1) inhibition via PDL1 dimerization has the potential to lead to inexpensive drugs with better cancer patient outcomes and milder side effects. However, this therapeutic approach has proven challenging, with only one PDL1 dimerizer reaching early clinical trials so far. There is hence a need for fast and accurate methods to develop alternative PDL1 dimerizers. OBJECTIVES: We aim to show that structure-based virtual screening (SBVS) based on PDL1-specific machine-learning (ML) scoring functions (SFs) is a powerful drug design tool for detecting PD1/PDL1 inhibitors via PDL1 dimerization. METHODS: By incorporating the latest MLSF advances, we generated and evaluated PDL1-specific MLSFs (classifiers and inactive-enriched regressors) on two demanding test sets. RESULTS: 60 PDL1-specific MLSFs (30 classifiers and 30 regressors) were generated. Our large-scale analysis provides highly predictive PDL1-specific MLSFs that benefitted from training with large volumes of docked inactives and enabling inactive-enriched regression. CONCLUSION: PDL1-specific MLSFs strongly outperformed generic SFs of various types on this target and are released here without restrictions.

4.
Nature ; 624(7991): 252, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38086935
5.
Nat Protoc ; 18(11): 3460-3511, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37845361

ABSTRACT

Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol , can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.


Subject(s)
Acetylcholinesterase , Artificial Intelligence , Ligands , Machine Learning , Algorithms , Molecular Docking Simulation
6.
Biomater Sci ; 11(17): 5797-5808, 2023 Aug 22.
Article in English | MEDLINE | ID: mdl-37401742

ABSTRACT

The delivery of genetic material (DNA and RNA) to cells can cure a wide range of diseases but is limited by the delivery efficiency of the carrier system. Poly ß-amino esters (pBAEs) are promising polymer-based vectors that form polyplexes with negatively charged oligonucleotides, enabling cell membrane uptake and gene delivery. pBAE backbone polymer chemistry, as well as terminal oligopeptide modifications, define cellular uptake and transfection efficiency in a given cell line, along with nanoparticle size and polydispersity. Moreover, uptake and transfection efficiency of a given polyplex formulation also vary from cell type to cell type. Therefore, finding the optimal formulation leading to high uptake in a new cell line is dictated by trial and error, and requires time and resources. Machine learning (ML) is an ideal in silico screening tool to learn the non-linearities of complex data sets, like the one presented herein, with the aim of predicting cellular internalisation of pBAE polyplexes. A library of pBAE nanoparticles was fabricated and the uptake studied in 4 different cell lines, on which various ML models were successfully trained. The best performing models were found to be gradient-boosted trees and neural networks. The gradient-boosted trees model was then analysed using SHapley Additive exPlanations, to interpret the model and gain an understanding into the important features and their impact on the predicted outcome.


Subject(s)
Nanoparticles , Polymers , Transfection , DNA , Gene Transfer Techniques , Cell Line
7.
Biomolecules ; 13(3)2023 03 08.
Article in English | MEDLINE | ID: mdl-36979433

ABSTRACT

Machine learning-based models have been widely used in the early drug-design pipeline. To validate these models, cross-validation strategies have been employed, including those using clustering of molecules in terms of their chemical structures. However, the poor clustering of compounds will compromise such validation, especially on test molecules dissimilar to those in the training set. This study aims at finding the best way to cluster the molecules screened by the National Cancer Institute (NCI)-60 project by comparing hierarchical, Taylor-Butina, and uniform manifold approximation and projection (UMAP) clustering methods. The best-performing algorithm can then be used to generate clusters for model validation strategies. This study also aims at measuring the impact of removing outlier molecules prior to the clustering step. Clustering results are evaluated using three well-known clustering quality metrics. In addition, we compute an average similarity matrix to assess the quality of each cluster. The results show variation in clustering quality from method to method. The clusters obtained by the hierarchical and Taylor-Butina methods are more computationally expensive to use in cross-validation strategies, and both cluster the molecules poorly. In contrast, the UMAP method provides the best quality, and therefore we recommend it to analyze this highly valuable dataset.


Subject(s)
Algorithms , Machine Learning , United States , National Cancer Institute (U.S.) , Cluster Analysis , Drug Design
8.
J Chem Inf Model ; 63(5): 1401-1405, 2023 03 13.
Article in English | MEDLINE | ID: mdl-36848585

ABSTRACT

We discuss how data unbiasing and simple methods such as protein-ligand Interaction FingerPrint (IFP) can overestimate virtual screening performance. We also show that IFP is strongly outperformed by target-specific machine-learning scoring functions, which were not considered in a recent report concluding that simple methods were better than machine-learning scoring functions at virtual screening.


Subject(s)
Ligands , Proteins , Proteins/chemistry , Machine Learning
9.
Adv Sci (Weinh) ; 9(24): e2201501, 2022 08.
Article in English | MEDLINE | ID: mdl-35785523

ABSTRACT

Doxorubicin is a common treatment for breast cancer. However, not all patients respond to this drug, which sometimes causes life-threatening side effects. Accurately anticipating doxorubicin-resistant patients would therefore permit to spare them this risk while considering alternative treatments without delay. Stratifying patients based on molecular markers in their pretreatment tumors is a promising approach to advance toward this ambitious goal, but single-gene gene markers such as HER2 expression have not shown to be sufficiently predictive. The recent availability of matched doxorubicin-response and diverse molecular profiles across breast cancer patients permits now analysis at a much larger scale. 16 machine learning algorithms and 8 molecular profiles are systematically evaluated on the same cohort of patients. Only 2 of the 128 resulting models are substantially predictive, showing that they can be easily missed by a standard-scale analysis. The best model is classification and regression tree (CART) nonlinearly combining 4 selected miRNA isoforms to predict doxorubicin response (median Matthew correlation coefficient (MCC) and area under the curve (AUC) of 0.56 and 0.80, respectively). By contrast, HER2 expression is significantly less predictive (median MCC and AUC of 0.14 and 0.57, respectively). As the predictive accuracy of this CART model increases with larger training sets, its update with future data should result in even better accuracy.


Subject(s)
Breast Neoplasms , MicroRNAs , Algorithms , Breast Neoplasms/drug therapy , Breast Neoplasms/genetics , Doxorubicin/therapeutic use , Female , Humans , Machine Learning , MicroRNAs/genetics
10.
Curr Res Struct Biol ; 4: 206-210, 2022.
Article in English | MEDLINE | ID: mdl-35769111

ABSTRACT

The interaction between PD1 and its ligand PDL1 has been shown to render tumor cells resistant to apoptosis and promote tumor progression. An innovative mechanism to inhibit the PD1/PDL1 interaction is PDL1 dimerization induced by small-molecule PDL1 binders. Structure-based virtual screening is a promising approach to discovering such small-molecule PD1/PDL1 inhibitors. Here we investigate which type of generic scoring functions is most suitable to tackle this problem. We consider CNN-Score, an ensemble of convolutional neural networks, as the representative of machine-learning scoring functions. We also evaluate Smina, a commonly used classical scoring function, and IFP, a top structural fingerprint similarity scoring function. These three types of scoring functions were evaluated on two test sets sharing the same set of small-molecule PD1/PDL1 inhibitors, but using different types of inactives: either true inactives (molecules with no in vitro PD1/PDL1 inhibition activity) or assumed inactives (property-matched decoy molecules generated from each active). On both test sets, CNN-Score performed much better than Smina, which in turn strongly outperformed IFP. The fact that the latter was the case, despite precluding any possibility of exploiting decoy bias, demonstrates the predictive value of CNN-Score for PDL1. These results suggest that re-scoring Smina-docked molecules with CNN-Score is a promising structure-based virtual screening method to discover new small-molecule inhibitors of this therapeutic target.

12.
Biomedicines ; 9(10)2021 Sep 26.
Article in English | MEDLINE | ID: mdl-34680436

ABSTRACT

(1) Background: Inter-tumour heterogeneity is one of cancer's most fundamental features. Patient stratification based on drug response prediction is hence needed for effective anti-cancer therapy. However, single-gene markers of response are rare and/or may fail to achieve a significant impact in the clinic. Machine Learning (ML) is emerging as a particularly promising complementary approach to precision oncology. (2) Methods: Here we leverage comprehensive Patient-Derived Xenograft (PDX) pharmacogenomic data sets with dimensionality-reducing ML algorithms with this purpose. (3) Results: Combining multiple gene alterations via ML leads to better discrimination between sensitive and resistant PDXs in 19 of the 26 analysed cases. Highly predictive ML models employing concise gene lists were found for three cases: paclitaxel (breast cancer), binimetinib (breast cancer) and cetuximab (colorectal cancer). Interestingly, each of these multi-gene ML models identifies some treatment-responsive PDXs not harbouring the best actionable mutation for that case. Thus, ML multi-gene predictors generally have much fewer false negatives than the corresponding single-gene marker. (4) Conclusions: As PDXs often recapitulate clinical outcomes, these results suggest that many more patients could benefit from precision oncology if ML algorithms were also applied to existing clinical pharmacogenomics data, especially those algorithms generating classifiers combining data-selected gene alterations.

14.
Brief Bioinform ; 22(6)2021 11 05.
Article in English | MEDLINE | ID: mdl-34368843

ABSTRACT

A central goal of precision oncology is to administer an optimal drug treatment to each cancer patient. A common preclinical approach to tackle this problem has been to characterize the tumors of patients at the molecular and drug response levels, and employ the resulting datasets for predictive in silico modeling (mostly using machine learning). Understanding how and why the different variants of these datasets are generated is an important component of this process. This review focuses on providing such introduction aimed at scientists with little previous exposure to this research area.


Subject(s)
Biomarkers, Tumor , Computational Biology/methods , Neoplasms/etiology , Neoplasms/metabolism , Pharmacogenetics/methods , Animals , Antineoplastic Agents/pharmacology , Antineoplastic Agents/therapeutic use , Biopsy , Cell Line, Tumor , Databases, Genetic , Disease Models, Animal , Drug Resistance, Neoplasm , Epigenomics/methods , Gene Expression Profiling/methods , Genomics/methods , High-Throughput Screening Assays , Humans , Neoplasms/drug therapy , Neoplasms/pathology , Precision Medicine/methods , Proteomics/methods
15.
Sci Immunol ; 6(61)2021 07 09.
Article in English | MEDLINE | ID: mdl-34244313

ABSTRACT

Conventional type 1 dendritic cells (cDC1s) are critical for antitumor immunity. They acquire antigens from dying tumor cells and cross-present them to CD8+ T cells, promoting the expansion of tumor-specific cytotoxic T cells. However, the signaling pathways that govern the antitumor functions of cDC1s in immunogenic tumors are poorly understood. Using single-cell transcriptomics to examine the molecular pathways regulating intratumoral cDC1 maturation, we found nuclear factor κB (NF-κB) and interferon (IFN) pathways to be highly enriched in a subset of functionally mature cDC1s. We identified an NF-κB-dependent and IFN-γ-regulated gene network in cDC1s, including cytokines and chemokines specialized in the recruitment and activation of cytotoxic T cells. By mapping the trajectory of intratumoral cDC1 maturation, we demonstrated the dynamic reprogramming of tumor-infiltrating cDC1s by NF-κB and IFN signaling pathways. This maturation process was perturbed by specific inactivation of either NF-κB or IFN regulatory factor 1 (IRF1) in cDC1s, resulting in impaired expression of IFN-γ-responsive genes and consequently a failure to efficiently recruit and activate antitumoral CD8+ T cells. Last, we demonstrate the relevance of these findings to patients with melanoma, showing that activation of the NF-κB/IRF1 axis in association with cDC1s is linked with improved clinical outcome. The NF-κB/IRF1 axis in cDC1s may therefore represent an important focal point for the development of new diagnostic and therapeutic approaches to improve cancer immunotherapy.


Subject(s)
Dendritic Cells/immunology , Interferon Regulatory Factor-1/immunology , Melanoma/immunology , NF-kappa B/immunology , Skin Neoplasms/immunology , Animals , Female , Gene Expression Regulation, Neoplastic , Humans , Interferon Regulatory Factor-1/genetics , Interferon-gamma/immunology , Kaplan-Meier Estimate , Male , Melanoma/genetics , Melanoma/mortality , Mice, Transgenic , NF-kappa B/genetics , Skin Neoplasms/genetics , Skin Neoplasms/mortality
16.
Curr Opin Chem Biol ; 65: 28-34, 2021 12.
Article in English | MEDLINE | ID: mdl-34052776

ABSTRACT

As more bioactivity and protein structure data become available, scoring functions (SFs) using machine learning (ML) to leverage these data sets continue to gain further accuracy and broader applicability. Advances in our understanding of the optimal ways to train and evaluate these ML-based SFs have introduced further improvements. One of these advances is how to select the most suitable decoys (molecules assumed inactive) to train or test an ML-based SF on a given target. We also review the latest applications of ML-based SFs for prospective structure-based virtual screening (SBVS), with a focus on the observed improvement over those using classical SFs. Finally, we provide recommendations for future prospective SBVS studies based on the findings of recent methodological studies.


Subject(s)
Machine Learning , Proteins , Ligands , Molecular Docking Simulation , Proteins/chemistry
17.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-32568385

ABSTRACT

Larger training datasets have been shown to improve the accuracy of machine learning (ML)-based scoring functions (SFs) for structure-based virtual screening (SBVS). In addition, massive test sets for SBVS, known as ultra-large compound libraries, have been demonstrated to enable the fast discovery of selective drug leads with low-nanomolar potency. This proof-of-concept was carried out on two targets using a single docking tool along with its SF. It is thus unclear whether this high level of performance would generalise to other targets, docking tools and SFs. We found that screening a larger compound library results in more potent actives being identified in all six additional targets using a different docking tool along with its classical SF. Furthermore, we established that a way to improve the potency of the retrieved molecules further is to rank them with more accurate ML-based SFs (we found this to be true in four of the six targets; the difference was not significant in the remaining two targets). A 3-fold increase in average hit rate across targets was also achieved by the ML-based SFs. Lastly, we observed that classical and ML-based SFs often find different actives, which supports using both types of SFs on those targets.


Subject(s)
Databases, Protein , Machine Learning , Molecular Docking Simulation , Proteins/chemistry , Proteins/genetics
18.
Biomolecules ; 10(11)2020 11 19.
Article in English | MEDLINE | ID: mdl-33227945

ABSTRACT

Background and purpose: Identifying the macromolecular targets of drug molecules is a fundamental aspect of drug discovery and pharmacology. Several drugs remain without known targets (orphan) despite large-scale in silico and in vitro target prediction efforts. Ligand-centric chemical-similarity-based methods for in silico target prediction have been found to be particularly powerful, but the question remains of whether they are able to discover targets for target-orphan drugs. Experimental Approach: We used one of these in silico methods to carry out a target prediction analysis for two orphan drugs: actarit and malotilate. The top target predicted for each drug was carbonic anhydrase II (CAII). Each drug was therefore quantitatively evaluated for CAII inhibition to validate these two prospective predictions. Key Results: Actarit showed in vitro concentration-dependent inhibition of CAII activity with submicromolar potency (IC50 = 422 nM) whilst no consistent inhibition was observed for malotilate. Among the other 25 targets predicted for actarit, RORγ (RAR-related orphan receptor-gamma) is promising in that it is strongly related to actarit's indication, rheumatoid arthritis (RA). Conclusion and Implications: This study is a proof-of-concept of the utility of MolTarPred for the fast and cost-effective identification of targets of orphan drugs. Furthermore, the mechanism of action of actarit as an anti-RA agent can now be re-examined from a CAII-inhibitor perspective, given existing relationships between this target and RA. Moreover, the confirmed CAII-actarit association supports investigating the repositioning of actarit on other CAII-linked indications (e.g., hypertension, epilepsy, migraine, anemia and bone, eye and cardiac disorders).


Subject(s)
Anti-Inflammatory Agents/administration & dosage , Antirheumatic Agents/administration & dosage , Carbonic Anhydrase II/antagonists & inhibitors , Carbonic Anhydrase II/metabolism , Phenylacetates/administration & dosage , Proof of Concept Study , Arthritis, Rheumatoid/drug therapy , Arthritis, Rheumatoid/enzymology , Dose-Response Relationship, Drug , Drug Delivery Systems/methods , Humans , Reproducibility of Results
20.
Biomolecules ; 10(6)2020 06 26.
Article in English | MEDLINE | ID: mdl-32604779

ABSTRACT

In silico models to predict which tumors will respond to a given drug are necessary for Precision Oncology. However, predictive models are only available for a handful of cases (each case being a given drug acting on tumors of a specific cancer type). A way to generate predictive models for the remaining cases is with suitable machine learning algorithms that are yet to be applied to existing in vitro pharmacogenomics datasets. Here, we apply XGBoost integrated with a stringent feature selection approach, which is an algorithm that is advantageous for these high-dimensional problems. Thus, we identified and validated 118 predictive models for 62 drugs across five cancer types by exploiting four molecular profiles (sequence mutations, copy-number alterations, gene expression, and DNA methylation). Predictive models were found in each cancer type and with every molecular profile. On average, no omics profile or cancer type obtained models with higher predictive accuracy than the rest. However, within a given cancer type, some molecular profiles were overrepresented among predictive models. For instance, CNA profiles were predictive in breast invasive carcinoma (BRCA) cell lines, but not in small cell lung cancer (SCLC) cell lines where gene expression (GEX) and DNA methylation profiles were the most predictive. Lastly, we identified the best XGBoost model per cancer type and analyzed their selected features. For each model, some of the genes in the selected list had already been found to be individually linked to the response to that drug, providing additional evidence of the usefulness of these models and the merits of the feature selection scheme.


Subject(s)
Antineoplastic Agents/therapeutic use , Machine Learning , Models, Statistical , Neoplasms/drug therapy , Computational Biology , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...