Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 53
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 624(7991): 252, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38086935
2.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32568385

RESUMO

Larger training datasets have been shown to improve the accuracy of machine learning (ML)-based scoring functions (SFs) for structure-based virtual screening (SBVS). In addition, massive test sets for SBVS, known as ultra-large compound libraries, have been demonstrated to enable the fast discovery of selective drug leads with low-nanomolar potency. This proof-of-concept was carried out on two targets using a single docking tool along with its SF. It is thus unclear whether this high level of performance would generalise to other targets, docking tools and SFs. We found that screening a larger compound library results in more potent actives being identified in all six additional targets using a different docking tool along with its classical SF. Furthermore, we established that a way to improve the potency of the retrieved molecules further is to rank them with more accurate ML-based SFs (we found this to be true in four of the six targets; the difference was not significant in the remaining two targets). A 3-fold increase in average hit rate across targets was also achieved by the ML-based SFs. Lastly, we observed that classical and ML-based SFs often find different actives, which supports using both types of SFs on those targets.


Assuntos
Bases de Dados de Proteínas , Aprendizado de Máquina , Simulação de Acoplamento Molecular , Proteínas/química , Proteínas/genética
3.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34368843

RESUMO

A central goal of precision oncology is to administer an optimal drug treatment to each cancer patient. A common preclinical approach to tackle this problem has been to characterize the tumors of patients at the molecular and drug response levels, and employ the resulting datasets for predictive in silico modeling (mostly using machine learning). Understanding how and why the different variants of these datasets are generated is an important component of this process. This review focuses on providing such introduction aimed at scientists with little previous exposure to this research area.


Assuntos
Biomarcadores Tumorais , Biologia Computacional/métodos , Neoplasias/etiologia , Neoplasias/metabolismo , Farmacogenética/métodos , Animais , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Biópsia , Linhagem Celular Tumoral , Bases de Dados Genéticas , Modelos Animais de Doenças , Resistencia a Medicamentos Antineoplásicos , Epigenômica/métodos , Perfilação da Expressão Gênica/métodos , Genômica/métodos , Ensaios de Triagem em Larga Escala , Humanos , Neoplasias/tratamento farmacológico , Neoplasias/patologia , Medicina de Precisão/métodos , Proteômica/métodos
4.
J Chem Inf Model ; 63(5): 1401-1405, 2023 03 13.
Artigo em Inglês | MEDLINE | ID: mdl-36848585

RESUMO

We discuss how data unbiasing and simple methods such as protein-ligand Interaction FingerPrint (IFP) can overestimate virtual screening performance. We also show that IFP is strongly outperformed by target-specific machine-learning scoring functions, which were not considered in a recent report concluding that simple methods were better than machine-learning scoring functions at virtual screening.


Assuntos
Ligantes , Proteínas , Proteínas/química , Aprendizado de Máquina
5.
Bioinformatics ; 35(20): 3989-3995, 2019 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-30873528

RESUMO

MOTIVATION: Studies have shown that the accuracy of random forest (RF)-based scoring functions (SFs), such as RF-Score-v3, increases with more training samples, whereas that of classical SFs, such as X-Score, does not. Nevertheless, the impact of the similarity between training and test samples on this matter has not been studied in a systematic manner. It is therefore unclear how these SFs would perform when only trained on protein-ligand complexes that are highly dissimilar or highly similar to the test set. It is also unclear whether SFs based on machine learning algorithms other than RF can also improve accuracy with increasing training set size and to what extent they learn from dissimilar or similar training complexes. RESULTS: We present a systematic study to investigate how the accuracy of classical and machine-learning SFs varies with protein-ligand complex similarities between training and test sets. We considered three types of similarity metrics, based on the comparison of either protein structures, protein sequences or ligand structures. Regardless of the similarity metric, we found that incorporating a larger proportion of similar complexes to the training set did not make classical SFs more accurate. In contrast, RF-Score-v3 was able to outperform X-Score even when trained on just 32% of the most dissimilar complexes, showing that its superior performance owes considerably to learning from dissimilar training complexes to those in the test set. In addition, we generated the first SF employing Extreme Gradient Boosting (XGBoost), XGB-Score, and observed that it also improves with training set size while outperforming the rest of SFs. Given the continuous growth of training datasets, the development of machine-learning SFs has become very appealing. AVAILABILITY AND IMPLEMENTATION: https://github.com/HongjianLi/MLSF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado de Máquina , Ligantes , Ligação Proteica , Proteínas
7.
Drug Discov Today Technol ; 32-33: 81-87, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33386098

RESUMO

Interest in docking technologies has grown parallel to the ever increasing number and diversity of 3D models for macromolecular therapeutic targets. Structure-Based Virtual Screening (SBVS) aims at leveraging these experimental structures to discover the necessary starting points for the drug discovery process. It is now established that Machine Learning (ML) can strongly enhance the predictive accuracy of scoring functions for SBVS by exploiting large datasets from targets, molecules and their associations. However, with greater choice, the question of which ML-based scoring function is the most suitable for prospective use on a given target has gained importance. Here we analyse two approaches to select an existing scoring function for the target along with a third approach consisting in generating a scoring function tailored to the target. These analyses required discussing the limitations of popular SBVS benchmarks, the alternatives to benchmark scoring functions for SBVS and how to generate them or use them using freely-available software.


Assuntos
Descoberta de Drogas , Aprendizado de Máquina , Preparações Farmacêuticas/química , Relação Estrutura-Atividade , Humanos
8.
Nucleic Acids Res ; 44(W1): W436-41, 2016 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-27106057

RESUMO

Ligand-based Virtual Screening (VS) methods aim at identifying molecules with a similar activity profile across phenotypic and macromolecular targets to that of a query molecule used as search template. VS using 3D similarity methods have the advantage of biasing this search toward active molecules with innovative chemical scaffolds, which are highly sought after in drug design to provide novel leads with improved properties over the query molecule (e.g. patentable, of lower toxicity or increased potency). Ultrafast Shape Recognition (USR) has demonstrated excellent performance in the discovery of molecules with previously-unknown phenotypic or target activity, with retrospective studies suggesting that its pharmacophoric extension (USRCAT) should obtain even better hit rates once it is used prospectively. Here we present USR-VS (http://usr.marseille.inserm.fr/), the first web server using these two validated ligand-based 3D methods for large-scale prospective VS. In about 2 s, 93.9 million 3D conformers, expanded from 23.1 million purchasable molecules, are screened and the 100 most similar molecules among them in terms of 3D shape and pharmacophoric properties are shown. USR-VS functionality also provides interactive visualization of the similarity of the query molecule against the hit molecules as well as vendor information to purchase selected hits in order to be experimentally tested.


Assuntos
Avaliação Pré-Clínica de Medicamentos/métodos , Internet , Preparações Farmacêuticas/análise , Preparações Farmacêuticas/química , Software , Desenho de Fármacos , Fluspirileno/química , Indóis/química , Ligantes , Reprodutibilidade dos Testes , Sulfonamidas/química , Vemurafenib
9.
BMC Bioinformatics ; 17(Suppl 11): 308, 2016 Sep 22.
Artigo em Inglês | MEDLINE | ID: mdl-28185549

RESUMO

BACKGROUND: Pose generation error is usually quantified as the difference between the geometry of the pose generated by the docking software and that of the same molecule co-crystallised with the considered protein. Surprisingly, the impact of this error on binding affinity prediction is yet to be systematically analysed across diverse protein-ligand complexes. RESULTS: Against commonly-held views, we have found that pose generation error has generally a small impact on the accuracy of binding affinity prediction. This is also true for large pose generation errors and it is not only observed with machine-learning scoring functions, but also with classical scoring functions such as AutoDock Vina. Furthermore, we propose a procedure to correct a substantial part of this error which consists of calibrating the scoring functions with re-docked, rather than co-crystallised, poses. In this way, the relationship between Vina-generated protein-ligand poses and their binding affinities is directly learned. As a result, test set performance after this error-correcting procedure is much closer to that of predicting the binding affinity in the absence of pose generation error (i.e. on crystal structures). We evaluated several strategies, obtaining better results for those using a single docked pose per ligand than those using multiple docked poses per ligand. CONCLUSIONS: Binding affinity prediction is often carried out on the docked pose of a known binder rather than its co-crystallised pose. Our results suggest than pose generation error is in general far less damaging for binding affinity prediction than it is currently believed. Another contribution of our study is the proposal of a procedure that largely corrects for this error. The resulting machine-learning scoring function is freely available at http://istar.cse.cuhk.edu.hk/rf-score-4.tgz and http://ballester.marseille.inserm.fr/rf-score-4.tgz .


Assuntos
Simulação de Acoplamento Molecular/normas , Proteínas Nucleares/metabolismo , Pirazinas/metabolismo , Software , Fatores de Transcrição/metabolismo , Humanos , Ligantes , Proteínas Nucleares/química , Ligação Proteica , Conformação Proteica , Pirazinas/química , Fatores de Transcrição/química
10.
Molecules ; 20(6): 10947-62, 2015 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-26076113

RESUMO

Docking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in support of this hypothesis. In this study, we investigated to which extent training a scoring function with data containing low-quality structural and binding data is detrimental for predictive performance. We actually found that low-quality data is not only non-detrimental, but beneficial for the predictive performance of machine-learning scoring functions, though the improvement is less important than that coming from high-quality data. Furthermore, we observed that classical scoring functions are not able to effectively exploit data beyond an early threshold, regardless of its quality. This demonstrates that exploiting a larger data volume is more important for the performance of machine-learning scoring functions than restricting to a smaller set of higher data quality.


Assuntos
Modelos Teóricos , Relação Estrutura-Atividade
11.
BMC Bioinformatics ; 15: 291, 2014 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-25159129

RESUMO

BACKGROUND: State-of-the-art protein-ligand docking methods are generally limited by the traditionally low accuracy of their scoring functions, which are used to predict binding affinity and thus vital for discriminating between active and inactive compounds. Despite intensive research over the years, classical scoring functions have reached a plateau in their predictive performance. These assume a predetermined additive functional form for some sophisticated numerical features, and use standard multivariate linear regression (MLR) on experimental data to derive the coefficients. RESULTS: In this study we show that such a simple functional form is detrimental for the prediction performance of a scoring function, and replacing linear regression by machine learning techniques like random forest (RF) can improve prediction performance. We investigate the conditions of applying RF under various contexts and find that given sufficient training samples RF manages to comprehensively capture the non-linearity between structural features and measured binding affinities. Incorporating more structural features and training with more samples can both boost RF performance. In addition, we analyze the importance of structural features to binding affinity prediction using the RF variable importance tool. Lastly, we use Cyscore, a top performing empirical scoring function, as a baseline for comparison study. CONCLUSIONS: Machine-learning scoring functions are fundamentally different from classical scoring functions because the former circumvents the fixed functional form relating structural features with binding affinities. RF, but not MLR, can effectively exploit more structural features and more training samples, leading to higher prediction performance. The future availability of more X-ray crystal structures will further widen the performance gap between RF-based and MLR-based scoring functions. This further stresses the importance of substituting RF for MLR in scoring function development.


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Proteínas/metabolismo , Ligantes , Modelos Lineares , Ligação Proteica
12.
J Chem Inf Model ; 54(3): 944-55, 2014 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-24528282

RESUMO

Predicting the binding affinities of large sets of diverse molecules against a range of macromolecular targets is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for exploiting and analyzing the outputs of docking, which is in turn an important tool in problems such as structure-based drug design. Classical scoring functions assume a predetermined theory-inspired functional form for the relationship between the variables that describe an experimentally determined or modeled structure of a protein-ligand complex and its binding affinity. The inherent problem of this approach is in the difficulty of explicitly modeling the various contributions of intermolecular interactions to binding affinity. New scoring functions based on machine-learning regression models, which are able to exploit effectively much larger amounts of experimental data and circumvent the need for a predetermined functional form, have already been shown to outperform a broad range of state-of-the-art scoring functions in a widely used benchmark. Here, we investigate the impact of the chemical description of the complex on the predictive power of the resulting scoring function using a systematic battery of numerical experiments. The latter resulted in the most accurate scoring function to date on the benchmark. Strikingly, we also found that a more precise chemical description of the protein-ligand complex does not generally lead to a more accurate prediction of binding affinity. We discuss four factors that may contribute to this result: modeling assumptions, codependence of representation and regression, data restricted to the bound state, and conformational heterogeneity in data.


Assuntos
Ligantes , Proteínas/metabolismo , Inteligência Artificial , Biologia Computacional , Bases de Dados de Proteínas , Modelos Biológicos , Modelos Moleculares , Ligação Proteica , Conformação Proteica , Proteínas/química
13.
J Comput Aided Mol Des ; 28(2): 89-97, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24554192

RESUMO

The p53 protein, known as the guardian of genome, is mutated or deleted in approximately 50 % of human tumors. In the rest of the cancers, p53 is expressed in its wild-type form, but its function is inhibited by direct binding with the murine double minute 2 (MDM2) protein. Therefore, inhibition of the p53-MDM2 interaction, leading to the activation of tumor suppressor p53 protein presents a fundamentally novel therapeutic strategy against several types of cancers. The present study utilized ultrafast shape recognition (USR), a virtual screening technique based on ligand-receptor 3D shape complementarity, to screen DrugBank database for novel p53-MDM2 inhibitors. Specifically, using 3D shape of one of the most potent crystal ligands of MDM2, MI-63, as the query molecule, six compounds were identified as potential p53-MDM2 inhibitors. These six USR hits were then subjected to molecular modeling investigations through flexible receptor docking followed by comparative binding energy analysis. These studies suggested a potential role of the USR-selected molecules as p53-MDM2 inhibitors. This was further supported by experimental tests showing that the treatment of human colon tumor cells with the top USR hit, telmisartan, led to a dose-dependent cell growth inhibition in a p53-dependent manner. It is noteworthy that telmisartan has a long history of safe human use as an approved anti-hypertension drug and thus may present an immediate clinical potential as a cancer therapeutic. Furthermore, it could also serve as a structurally-novel lead molecule for the development of more potent, small-molecule p53-MDM2 inhibitors against variety of cancers. Importantly, the present study demonstrates that the adopted USR-based virtual screening protocol is a useful tool for hit identification in the domain of small molecule p53-MDM2 inhibitors.


Assuntos
Ensaios de Seleção de Medicamentos Antitumorais/métodos , Processamento de Imagem Assistida por Computador/métodos , Proteínas Proto-Oncogênicas c-mdm2/antagonistas & inibidores , Proteína Supressora de Tumor p53/antagonistas & inibidores , Benzimidazóis/química , Benzimidazóis/farmacologia , Benzoatos/química , Benzoatos/farmacologia , Linhagem Celular Tumoral/efeitos dos fármacos , Proliferação de Células/efeitos dos fármacos , Bases de Dados Factuais , Relação Dose-Resposta a Droga , Humanos , Imidazóis/química , Imidazóis/farmacologia , Ligantes , Modelos Moleculares , Simulação de Acoplamento Molecular , Piperazinas/química , Piperazinas/farmacologia , Estudos Prospectivos , Proteínas Proto-Oncogênicas c-mdm2/química , Telmisartan , Proteína Supressora de Tumor p53/química
14.
Health Data Sci ; 4: 0108, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38486621

RESUMO

Background: Gemcitabine is a first-line chemotherapy for pancreatic adenocarcinoma (PAAD), but many PAAD patients do not respond to gemcitabine-containing treatments. Being able to predict such nonresponders would hence permit the undelayed administration of more promising treatments while sparing gemcitabine life-threatening side effects for those patients. Unfortunately, the few predictors of PAAD patient response to this drug are weak, none of them exploiting yet the power of machine learning (ML). Methods: Here, we applied ML to predict the response of PAAD patients to gemcitabine from the molecular profiles of their tumors. More concretely, we collected diverse molecular profiles of PAAD patient tumors along with the corresponding clinical data (gemcitabine responses and clinical features) from the Genomic Data Commons resource. From systematically combining 8 tumor profiles with 16 classification algorithms, each of the resulting 128 ML models was evaluated by multiple 10-fold cross-validations. Results: Only 7 of these 128 models were predictive, which underlines the importance of carrying out such a large-scale analysis to avoid missing the most predictive models. These were here random forest using 4 selected mRNAs [0.44 Matthews correlation coefficient (MCC), 0.785 receiver operating characteristic-area under the curve (ROC-AUC)] and XGBoost combining 12 DNA methylation probes (0.32 MCC, 0.697 ROC-AUC). By contrast, the hENT1 marker obtained much worse random-level performance (practically 0 MCC, 0.5 ROC-AUC). Despite not being trained to predict prognosis (overall and progression-free survival), these ML models were also able to anticipate this patient outcome. Conclusions: We release these promising ML models so that they can be evaluated prospectively on other gemcitabine-treated PAAD patients.

15.
J Cheminform ; 16(1): 40, 2024 Apr 07.
Artigo em Inglês | MEDLINE | ID: mdl-38582911

RESUMO

Poly ADP-ribose polymerase 1 (PARP1) is an attractive therapeutic target for cancer treatment. Machine-learning scoring functions constitute a promising approach to discovering novel PARP1 inhibitors. Cutting-edge PARP1-specific machine-learning scoring functions were investigated using semi-synthetic training data from docking activity-labelled molecules: known PARP1 inhibitors, hard-to-discriminate decoys property-matched to them with generative graph neural networks and confirmed inactives. We further made test sets harder by including only molecules dissimilar to those in the training set. Comprehensive analysis of these datasets using five supervised learning algorithms, and protein-ligand fingerprints extracted from docking poses and ligand only features revealed one highly predictive scoring function. This is the PARP1-specific support vector machine-based regressor, when employing PLEC fingerprints, which achieved a high Normalized Enrichment Factor at the top 1% on the hardest test set (NEF1% = 0.588, median of 10 repetitions), and was more predictive than any other investigated scoring function, especially the classical scoring function employed as baseline.

16.
J Adv Res ; 2024 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-38280715

RESUMO

INTRODUCTION: Small-molecule Programmable Cell Death Protein 1/Programmable Death-Ligand 1 (PD1/PDL1) inhibition via PDL1 dimerization has the potential to lead to inexpensive drugs with better cancer patient outcomes and milder side effects. However, this therapeutic approach has proven challenging, with only one PDL1 dimerizer reaching early clinical trials so far. There is hence a need for fast and accurate methods to develop alternative PDL1 dimerizers. OBJECTIVES: We aim to show that structure-based virtual screening (SBVS) based on PDL1-specific machine-learning (ML) scoring functions (SFs) is a powerful drug design tool for detecting PD1/PDL1 inhibitors via PDL1 dimerization. METHODS: By incorporating the latest MLSF advances, we generated and evaluated PDL1-specific MLSFs (classifiers and inactive-enriched regressors) on two demanding test sets. RESULTS: 60 PDL1-specific MLSFs (30 classifiers and 30 regressors) were generated. Our large-scale analysis provides highly predictive PDL1-specific MLSFs that benefitted from training with large volumes of docked inactives and enabling inactive-enriched regression. CONCLUSION: PDL1-specific MLSFs strongly outperformed generic SFs of various types on this target and are released here without restrictions.

17.
Biomolecules ; 13(3)2023 03 08.
Artigo em Inglês | MEDLINE | ID: mdl-36979433

RESUMO

Machine learning-based models have been widely used in the early drug-design pipeline. To validate these models, cross-validation strategies have been employed, including those using clustering of molecules in terms of their chemical structures. However, the poor clustering of compounds will compromise such validation, especially on test molecules dissimilar to those in the training set. This study aims at finding the best way to cluster the molecules screened by the National Cancer Institute (NCI)-60 project by comparing hierarchical, Taylor-Butina, and uniform manifold approximation and projection (UMAP) clustering methods. The best-performing algorithm can then be used to generate clusters for model validation strategies. This study also aims at measuring the impact of removing outlier molecules prior to the clustering step. Clustering results are evaluated using three well-known clustering quality metrics. In addition, we compute an average similarity matrix to assess the quality of each cluster. The results show variation in clustering quality from method to method. The clusters obtained by the hierarchical and Taylor-Butina methods are more computationally expensive to use in cross-validation strategies, and both cluster the molecules poorly. In contrast, the UMAP method provides the best quality, and therefore we recommend it to analyze this highly valuable dataset.


Assuntos
Algoritmos , Aprendizado de Máquina , Estados Unidos , National Cancer Institute (U.S.) , Análise por Conglomerados , Desenho de Fármacos
18.
Nat Protoc ; 18(11): 3460-3511, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37845361

RESUMO

Structure-based virtual screening (SBVS) via docking has been used to discover active molecules for a range of therapeutic targets. Chemical and protein data sets that contain integrated bioactivity information have increased both in number and in size. Artificial intelligence and, more concretely, its machine-learning (ML) branch, including deep learning, have effectively exploited these data sets to build scoring functions (SFs) for SBVS against targets with an atomic-resolution 3D model (e.g., generated by X-ray crystallography or predicted by AlphaFold2). Often outperforming their generic and non-ML counterparts, target-specific ML-based SFs represent the state of the art for SBVS. Here, we present a comprehensive and user-friendly protocol to build and rigorously evaluate these new SFs for SBVS. This protocol is organized into four sections: (i) using a public benchmark of a given target to evaluate an existing generic SF; (ii) preparing experimental data for a target from public repositories; (iii) partitioning data into a training set and a test set for subsequent target-specific ML modeling; and (iv) generating and evaluating target-specific ML SFs by using the prepared training-test partitions. All necessary code and input/output data related to three example targets (acetylcholinesterase, HMG-CoA reductase, and peroxisome proliferator-activated receptor-α) are available at https://github.com/vktrannguyen/MLSF-protocol , can be run by using a single computer within 1 week and make use of easily accessible software/programs (e.g., Smina, CNN-Score, RF-Score-VS and DeepCoy) and web resources. Our aim is to provide practical guidance on how to augment training data to enhance SBVS performance, how to identify the most suitable supervised learning algorithm for a data set, and how to build an SF with the highest likelihood of discovering target-active molecules within a given compound library.


Assuntos
Acetilcolinesterase , Inteligência Artificial , Ligantes , Aprendizado de Máquina , Algoritmos , Simulação de Acoplamento Molecular
19.
Biomater Sci ; 11(17): 5797-5808, 2023 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-37401742

RESUMO

The delivery of genetic material (DNA and RNA) to cells can cure a wide range of diseases but is limited by the delivery efficiency of the carrier system. Poly ß-amino esters (pBAEs) are promising polymer-based vectors that form polyplexes with negatively charged oligonucleotides, enabling cell membrane uptake and gene delivery. pBAE backbone polymer chemistry, as well as terminal oligopeptide modifications, define cellular uptake and transfection efficiency in a given cell line, along with nanoparticle size and polydispersity. Moreover, uptake and transfection efficiency of a given polyplex formulation also vary from cell type to cell type. Therefore, finding the optimal formulation leading to high uptake in a new cell line is dictated by trial and error, and requires time and resources. Machine learning (ML) is an ideal in silico screening tool to learn the non-linearities of complex data sets, like the one presented herein, with the aim of predicting cellular internalisation of pBAE polyplexes. A library of pBAE nanoparticles was fabricated and the uptake studied in 4 different cell lines, on which various ML models were successfully trained. The best performing models were found to be gradient-boosted trees and neural networks. The gradient-boosted trees model was then analysed using SHapley Additive exPlanations, to interpret the model and gain an understanding into the important features and their impact on the predicted outcome.


Assuntos
Nanopartículas , Polímeros , Transfecção , DNA , Técnicas de Transferência de Genes , Linhagem Celular
20.
Curr Res Struct Biol ; 4: 206-210, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35769111

RESUMO

The interaction between PD1 and its ligand PDL1 has been shown to render tumor cells resistant to apoptosis and promote tumor progression. An innovative mechanism to inhibit the PD1/PDL1 interaction is PDL1 dimerization induced by small-molecule PDL1 binders. Structure-based virtual screening is a promising approach to discovering such small-molecule PD1/PDL1 inhibitors. Here we investigate which type of generic scoring functions is most suitable to tackle this problem. We consider CNN-Score, an ensemble of convolutional neural networks, as the representative of machine-learning scoring functions. We also evaluate Smina, a commonly used classical scoring function, and IFP, a top structural fingerprint similarity scoring function. These three types of scoring functions were evaluated on two test sets sharing the same set of small-molecule PD1/PDL1 inhibitors, but using different types of inactives: either true inactives (molecules with no in vitro PD1/PDL1 inhibition activity) or assumed inactives (property-matched decoy molecules generated from each active). On both test sets, CNN-Score performed much better than Smina, which in turn strongly outperformed IFP. The fact that the latter was the case, despite precluding any possibility of exploiting decoy bias, demonstrates the predictive value of CNN-Score for PDL1. These results suggest that re-scoring Smina-docked molecules with CNN-Score is a promising structure-based virtual screening method to discover new small-molecule inhibitors of this therapeutic target.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA