Pesquisa | BVS - MINISTÉRIO DA SAÚDE

A Small Step Toward Generalizability: Training a Machine Learning Scoring Function for Structure-Based Virtual Screening.

Scantlebury, Jack; Vost, Lucy; Carbery, Anna; Hadfield, Thomas E; Turnbull, Oliver M; Brown, Nathan; Chenthamarakshan, Vijil; Das, Payel; Grosjean, Harold; von Delft, Frank; Deane, Charlotte M.

J Chem Inf Model ; 63(10): 2960-2974, 2023 05 22.

Artigo em Inglês | MEDLINE | ID: mdl-37166179

RESUMO

Over the past few years, many machine learning-based scoring functions for predicting the binding of small molecules to proteins have been developed. Their objective is to approximate the distribution which takes two molecules as input and outputs the energy of their interaction. Only a scoring function that accounts for the interatomic interactions involved in binding can accurately predict binding affinity on unseen molecules. However, many scoring functions make predictions based on data set biases rather than an understanding of the physics of binding. These scoring functions perform well when tested on similar targets to those in the training set but fail to generalize to dissimilar targets. To test what a machine learning-based scoring function has learned, input attribution, a technique for learning which features are important to a model when making a prediction on a particular data point, can be applied. If a model successfully learns something beyond data set biases, attribution should give insight into the important binding interactions that are taking place. We built a machine learning-based scoring function that aimed to avoid the influence of bias via thorough train and test data set filtering and show that it achieves comparable performance on the Comparative Assessment of Scoring Functions, 2016 (CASF-2016) benchmark to other leading methods. We then use the CASF-2016 test set to perform attribution and find that the bonds identified as important by PointVS, unlike those extracted from other scoring functions, have a high correlation with those found by a distance-based interaction profiler. We then show that attribution can be used to extract important binding pharmacophores from a given protein target when supplied with a number of bound structures. We use this information to perform fragment elaboration and see improvements in docking scores compared to using structural information from a traditional, data-based approach. This not only provides definitive proof that the scoring function has learned to identify some important binding interactions but also constitutes the first deep learning-based method for extracting structural information from a target for molecule design.

Assuntos

Aprendizado de Máquina , Proteínas , Ligação Proteica , Ligantes , Proteínas/química , Bases de Dados de Proteínas , Simulação de Acoplamento Molecular

SAMPL7 protein-ligand challenge: A community-wide evaluation of computational methods against fragment screening and pose-prediction.

Grosjean, Harold; Isik, Mehtap; Aimon, Anthony; Mobley, David; Chodera, John; von Delft, Frank; Biggin, Philip C.

J Comput Aided Mol Des ; 36(4): 291-311, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-35426591

RESUMO

A novel crystallographic fragment screening data set was generated and used in the SAMPL7 challenge for protein-ligands. The SAMPL challenges prospectively assess the predictive power of methods involved in computer-aided drug design. Application of various methods to fragment molecules are now widely used in the search for new drugs. However, there is little in the way of systematic validation specifically for fragment-based approaches. We have performed a large crystallographic high-throughput fragment screen against the therapeutically relevant second bromodomain of the Pleckstrin-homology domain interacting protein (PHIP2) that revealed 52 different fragments bound across 4 distinct sites, 47 of which were bound to the pharmacologically relevant acetylated lysine (Kac) binding site. These data were used to assess computational screening, binding pose prediction and follow-up enumeration. All submissions performed randomly for screening. Pose prediction success rates (defined as less than 2 Å root mean squared deviation against heavy atom crystal positions) ranged between 0 and 25% and only a very few follow-up compounds were deemed viable candidates from a medicinal-chemistry perspective based on a common molecular descriptors analysis. The tight deadlines imposed during the challenge led to a small number of submissions suggesting that the accuracy of rapidly responsive workflows remains limited. In addition, the application of these methods to reproduce crystallographic fragment data still appears to be very challenging. The results show that there is room for improvement in the development of computational tools particularly when applied to fragment-based drug design.

Assuntos

Desenho de Fármacos , Proteínas , Sítios de Ligação , Ligantes , Ligação Proteica , Proteínas/química

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA