Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
1.
J Chem Inf Model ; 64(12): 4651-4660, 2024 Jun 24.
Artículo en Inglés | MEDLINE | ID: mdl-38847393

RESUMEN

We present a novel and interpretable approach for assessing small-molecule binding using context explanation networks. Given the specific structure of a protein/ligand complex, our CENsible scoring function uses a deep convolutional neural network to predict the contributions of precalculated terms to the overall binding affinity. We show that CENsible can effectively distinguish active vs inactive compounds for many systems. Its primary benefit over related machine-learning scoring functions, however, is that it retains interpretability, allowing researchers to identify the contribution of each precalculated term to the final affinity prediction, with implications for subsequent lead optimization.


Asunto(s)
Redes Neurales de la Computación , Unión Proteica , Proteínas , Bibliotecas de Moléculas Pequeñas , Ligandos , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/farmacología , Bibliotecas de Moléculas Pequeñas/metabolismo , Proteínas/química , Proteínas/metabolismo , Aprendizaje Automático
2.
J Chem Inf Model ; 64(7): 2488-2495, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-38113513

RESUMEN

Deep learning methods that predict protein-ligand binding have recently been used for structure-based virtual screening. Many such models have been trained using protein-ligand complexes with known crystal structures and activities from the PDBBind data set. However, because PDBbind only includes 20K complexes, models typically fail to generalize to new targets, and model performance is on par with models trained with only ligand information. Conversely, the ChEMBL database contains a wealth of chemical activity information but includes no information about binding poses. We introduce BigBind, a data set that maps ChEMBL activity data to proteins from the CrossDocked data set. BigBind comprises 583 K ligand activities and includes 3D structures of the protein binding pockets. Additionally, we augmented the data by adding an equal number of putative inactives for each target. Using this data, we developed Banana (basic neural network for binding affinity), a neural network-based model to classify active from inactive compounds, defined by a 10 µM cutoff. Our model achieved an AUC of 0.72 on BigBind's test set, while a ligand-only model achieved an AUC of 0.59. Furthermore, Banana achieved competitive performance on the LIT-PCBA benchmark (median EF1% 1.81) while running 16,000 times faster than molecular docking with Gnina. We suggest that Banana, as well as other models trained on this data set, will significantly improve the outcomes of prospective virtual screening tasks.


Asunto(s)
Proteínas , Ubiquitina-Proteína Ligasas , Simulación del Acoplamiento Molecular , Ligandos , Estudios Prospectivos , Proteínas/química , Unión Proteica , Ubiquitina-Proteína Ligasas/metabolismo
3.
Biophys J ; 2023 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-38104241

RESUMEN

Protein structure predictions from deep learning models like AlphaFold2, despite their remarkable accuracy, are likely insufficient for direct use in downstream tasks like molecular docking. The functionality of such models could be improved with a combination of increased accuracy and physical intuition. We propose a new method to train deep learning protein structure prediction models using molecular dynamics force fields to work toward these goals. Our custom PyTorch loss function, OpenMM-Loss, represents the potential energy of a predicted structure. OpenMM-Loss can be applied to any all-atom representation of a protein structure capable of mapping into our software package, SidechainNet. We demonstrate our method's efficacy by finetuning OpenFold. We show that subsequently predicted protein structures, both before and after a relaxation procedure, exhibit comparable accuracy while displaying lower potential energy and improved structural quality as assessed by MolProbity metrics.

4.
J Chem Inf Model ; 63(21): 6598-6607, 2023 11 13.
Artículo en Inglés | MEDLINE | ID: mdl-37903507

RESUMEN

Conformer generation, the assignment of realistic 3D coordinates to a small molecule, is fundamental to structure-based drug design. Conformational ensembles are required for rigid-body matching algorithms, such as shape-based or pharmacophore approaches, and even methods that treat the ligand flexibly, such as docking, are dependent on the quality of the provided conformations due to not sampling all degrees of freedom (e.g., only sampling torsions). Here, we empirically elucidate some general principles about the size, diversity, and quality of the conformational ensembles needed to get the best performance in common structure-based drug discovery tasks. In many cases, our findings may parallel "common knowledge" well-known to practitioners of the field. Nonetheless, we feel that it is valuable to quantify these conformational effects while reproducing and expanding upon previous studies. Specifically, we investigate the performance of a state-of-the-art generative deep learning approach versus a more classical geometry-based approach, the effect of energy minimization as a postprocessing step, the effect of ensemble size (maximum number of conformers), and construction (filtering by root-mean-square deviation for diversity) and how these choices influence the ability to recapitulate bioactive conformations and perform pharmacophore screening and molecular docking.


Asunto(s)
Algoritmos , Diseño de Fármacos , Modelos Moleculares , Simulación del Acoplamiento Molecular , Conformación Molecular , Ligandos
5.
J Comput Aided Mol Des ; 38(1): 3, 2023 Dec 08.
Artículo en Inglés | MEDLINE | ID: mdl-38062207

RESUMEN

Determination of the bound pose of a ligand is a critical first step in many in silico drug discovery tasks. Molecular docking is the main tool for the prediction of non-covalent binding of a protein and ligand system. Molecular docking pipelines often only utilize the information of one ligand binding to the protein despite the commonly held hypothesis that different ligands share binding interactions when bound to the same receptor. Here we describe Open-ComBind, an easy-to-use, open-source version of the ComBind molecular docking pipeline that leverages information from multiple ligands without known bound structures to enhance pose selection. We first create distributions of feature similarities between ligand pose pairs, comparing near-native poses with all sampled docked poses. These distributions capture the likelihood of observing similar features, such as hydrogen bonds or hydrophobic contacts, in different pose configurations. These similarity distributions are then combined with a per-ligand docking score to enhance overall pose selection by 5% and 4.5% for high-affinity and congeneric series helper ligands, respectively. Open-ComBind reduces the average RMSD of ligands in our benchmark dataset by 9.0%. We provide Open-ComBind as an easy-to-use command line and Python API to increase pose prediction performance at www.github.com/drewnutt/open_combind .


Asunto(s)
Diseño de Fármacos , Proteínas , Simulación del Acoplamiento Molecular , Unión Proteica , Ligandos , Proteínas/química , Sitios de Unión
6.
J Chem Inf Model ; 62(8): 1819-1829, 2022 04 25.
Artículo en Inglés | MEDLINE | ID: mdl-35380443

RESUMEN

The lead optimization phase of drug discovery refines an initial hit molecule for desired properties, especially potency. Synthesis and experimental testing of the small perturbations during this refinement can be quite costly and time-consuming. Relative binding free energy (RBFE, also referred to as ΔΔG) methods allow the estimation of binding free energy changes after small changes to a ligand scaffold. Here, we propose and evaluate a Siamese convolutional neural network (CNN) for the prediction of RBFE between two bound ligands. We show that our multitask loss is able to improve on a previous state-of-the-art Siamese network for RBFE prediction via increased regularization of the latent space. The Siamese network architecture is well suited to the prediction of RBFE in comparison to a standard CNN trained on the same data (Pearson's R of 0.553 and 0.5, respectively). When evaluated on a left-out protein family, our Siamese CNN shows variability in its RBFE predictive performance depending on the protein family being evaluated (Pearson's R ranging from -0.44 to 0.97). RBFE prediction performance can be improved during generalization by injecting only a few examples (few-shot learning) from the evaluation data set during model training.


Asunto(s)
Redes Neurales de la Computación , Proteínas , Descubrimiento de Drogas , Entropía , Ligandos , Proteínas/química
7.
Proteins ; 89(11): 1489-1496, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34213059

RESUMEN

Despite recent advancements in deep learning methods for protein structure prediction and representation, little focus has been directed at the simultaneous inclusion and prediction of protein backbone and sidechain structure information. We present SidechainNet, a new dataset that directly extends the ProteinNet dataset. SidechainNet includes angle and atomic coordinate information capable of describing all heavy atoms of each protein structure and can be extended by users to include new protein structures as they are released. In this article, we provide background information on the availability of protein structure data and the significance of ProteinNet. Thereafter, we argue for the potentially beneficial inclusion of sidechain information through SidechainNet, describe the process by which we organize SidechainNet, and provide a software package (https://github.com/jonathanking/sidechainnet) for data manipulation and training with machine learning models.


Asunto(s)
Aminoácidos/química , Aprendizaje Automático , Proteínas/química , Programas Informáticos , Secuencia de Aminoácidos , Conjuntos de Datos como Asunto , Redes Neurales de la Computación , Conformación Proteica
8.
Molecules ; 26(23)2021 Dec 04.
Artículo en Inglés | MEDLINE | ID: mdl-34885952

RESUMEN

Virtual screening-predicting which compounds within a specified compound library bind to a target molecule, typically a protein-is a fundamental task in the field of drug discovery. Doing virtual screening well provides tangible practical benefits, including reduced drug development costs, faster time to therapeutic viability, and fewer unforeseen side effects. As with most applied computational tasks, the algorithms currently used to perform virtual screening feature inherent tradeoffs between speed and accuracy. Furthermore, even theoretically rigorous, computationally intensive methods may fail to account for important effects relevant to whether a given compound will ultimately be usable as a drug. Here we investigate the virtual screening performance of the recently released Gnina molecular docking software, which uses deep convolutional networks to score protein-ligand structures. We find, on average, that Gnina outperforms conventional empirical scoring. The default scoring in Gnina outperforms the empirical AutoDock Vina scoring function on 89 of the 117 targets of the DUD-E and LIT-PCBA virtual screening benchmarks with a median 1% early enrichment factor that is more than twice that of Vina. However, we also find that issues of bias linger in these sets, even when not used directly to train models, and this bias obfuscates to what extent machine learning models are achieving their performance through a sophisticated interpretation of molecular interactions versus fitting to non-informative simplistic property distributions.


Asunto(s)
Diseño de Fármacos , Descubrimiento de Drogas , Programas Informáticos , Aprendizaje Profundo , Diseño de Fármacos/métodos , Descubrimiento de Drogas/métodos , Humanos , Simulación del Acoplamiento Molecular
9.
J Chem Educ ; 97(10): 3872-3876, 2020 Oct 13.
Artículo en Inglés | MEDLINE | ID: mdl-36035779

RESUMEN

Classroom response systems are an important tool in many active learning pedagogies. They support real-time feedback on student learning and promote student engagement, even in large classrooms, by allowing instructors to solicit an answer to a question from all students and show the results. Existing classroom response systems are general purpose and not tailored to the specific needs of a chemistry classroom. In particular, it is not easy to deploy molecular representations except as static images. Here we present the 3Dmol.js learning environment, a classroom response system that uses the open source web-based 3Dmol.js JavaScript framework to provide interactive viewing and querying of 3D molecules. 3Dmol.js is available under a BSD 3-clause open source license, and the learning environment features are all available through http://3dmol.csb.pitt.edu/ without any software installation required.

10.
J Comput Aided Mol Des ; 33(1): 19-34, 2019 01.
Artículo en Inglés | MEDLINE | ID: mdl-29992528

RESUMEN

We assess the ability of our convolutional neural network (CNN)-based scoring functions to perform several common tasks in the domain of drug discovery. These include correctly identifying ligand poses near and far from the true binding mode when given a set of reference receptors and classifying ligands as active or inactive using structural information. We use the CNN to re-score or refine poses generated using a conventional scoring function, Autodock Vina, and compare the performance of each of these methods to using the conventional scoring function alone. Furthermore, we assess several ways of choosing appropriate reference receptors in the context of the D3R 2017 community benchmarking challenge. We find that our CNN scoring function outperforms Vina on most tasks without requiring manual inspection by a knowledgeable operator, but that the pose prediction target chosen for the challenge, Cathepsin S, was particularly challenging for de novo docking. However, the CNN provided best-in-class performance on several virtual screening tasks, underscoring the relevance of deep learning to the field of drug discovery.


Asunto(s)
Catepsinas/química , Simulación del Acoplamiento Molecular , Redes Neurales de la Computación , Algoritmos , Sitios de Unión , Bases de Datos de Proteínas , Descubrimiento de Drogas/métodos , Ligandos , Unión Proteica , Conformación Proteica , Relación Estructura-Actividad
11.
Nucleic Acids Res ; 44(W1): W442-8, 2016 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-27095195

RESUMEN

Pharmit (http://pharmit.csb.pitt.edu) provides an online, interactive environment for the virtual screening of large compound databases using pharmacophores, molecular shape and energy minimization. Users can import, create and edit virtual screening queries in an interactive browser-based interface. Queries are specified in terms of a pharmacophore, a spatial arrangement of the essential features of an interaction, and molecular shape. Search results can be further ranked and filtered using energy minimization. In addition to a number of pre-built databases of popular compound libraries, users may submit their own compound libraries for screening. Pharmit uses state-of-the-art sub-linear algorithms to provide interactive screening of millions of compounds. Queries typically take a few seconds to a few minutes depending on their complexity. This allows users to iteratively refine their search during a single session. The easy access to large chemical datasets provided by Pharmit simplifies and accelerates structure-based drug design. Pharmit is available under a dual BSD/GPL open-source license.


Asunto(s)
Bases de Datos de Compuestos Químicos , Evaluación Preclínica de Medicamentos/métodos , Internet , Preparaciones Farmacéuticas/química , Programas Informáticos , Interfaz Usuario-Computador , Algoritmos , Proteína Tirosina Quinasa CSK , Bases de Datos de Proteínas , Diseño de Fármacos , Termodinámica , Familia-src Quinasas/química , Familia-src Quinasas/metabolismo
12.
J Chem Inf Model ; 57(4): 942-957, 2017 04 24.
Artículo en Inglés | MEDLINE | ID: mdl-28368587

RESUMEN

Computational approaches to drug discovery can reduce the time and cost associated with experimental assays and enable the screening of novel chemotypes. Structure-based drug design methods rely on scoring functions to rank and predict binding affinities and poses. The ever-expanding amount of protein-ligand binding and structural data enables the use of deep machine learning techniques for protein-ligand scoring. We describe convolutional neural network (CNN) scoring functions that take as input a comprehensive three-dimensional (3D) representation of a protein-ligand interaction. A CNN scoring function automatically learns the key features of protein-ligand interactions that correlate with binding. We train and optimize our CNN scoring functions to discriminate between correct and incorrect binding poses and known binders and nonbinders. We find that our CNN scoring function outperforms the AutoDock Vina scoring function when ranking poses both for pose prediction and virtual screening.


Asunto(s)
Biología Computacional/métodos , Redes Neurales de la Computación , Proteínas/metabolismo , Evaluación Preclínica de Medicamentos , Ligandos , Modelos Moleculares , Conformación Proteica , Proteínas/química , Interfaz Usuario-Computador
13.
J Comput Aided Mol Des ; 30(9): 761-771, 2016 09.
Artículo en Inglés | MEDLINE | ID: mdl-27592011

RESUMEN

We assess the performance of several machine learning-based scoring methods at protein-ligand pose prediction, virtual screening, and binding affinity prediction. The methods and the manner in which they were trained make them sufficiently diverse to evaluate the utility of various strategies for training set curation and binding pose generation, but they share a novel approach to classification in the context of protein-ligand scoring. Rather than explicitly using structural data such as affinity values or information extracted from crystal binding poses for training, we instead exploit the abundance of data available from high-throughput screening to approach the problem as one of discriminating binders from non-binders. We evaluate the performance of our various scoring methods in the 2015 D3R Grand Challenge and find that although the merits of some features of our approach remain inconclusive, our scoring methods performed comparably to a state-of-the-art scoring function that was fit to binding affinity data.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Automático , Simulación del Acoplamiento Molecular , Proteínas/química , Algoritmos , Sitios de Unión , Proteínas HSP90 de Choque Térmico/química , Humanos , Ligandos , Estudios Prospectivos , Unión Proteica
14.
Knowl Inf Syst ; 43(1): 157-180, 2015 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-26085707

RESUMEN

We describe a novel algorithm for bulk-loading an index with high-dimensional data and apply it to the problem of volumetric shape matching. Our matching and packing algorithm is a general approach for packing data according to a similarity metric. First an approximate k-nearest neighbor graph is constructed using vantage-point initialization, an improvement to previous work that decreases construction time while improving the quality of approximation. Then graph matching is iteratively performed to pack related items closely together. The end result is a dense index with good performance. We define a new query specification for shape matching that uses minimum and maximum shape constraints to explicitly specify the spatial requirements of the desired shape. This specification provides a natural language for performing volumetric shape matching and is readily supported by the geometry-based similarity search (GSS) tree, an indexing structure that maintains explicit representations of volumetric shape. We describe our implementation of a GSS tree for volumetric shape matching and provide a comprehensive evaluation of parameter sensitivity, performance, and scalability. Compared to previous bulk-loading algorithms, we find that matching and packing can construct a GSS-tree index in the same amount of time that is denser, flatter, and better performing, with an observed average performance improvement of 2X.

15.
J Comput Chem ; 35(25): 1824-34, 2014 Sep 30.
Artículo en Inglés | MEDLINE | ID: mdl-25049193

RESUMEN

Shape-based virtual screening is an established and effective method for identifying small molecules that are similar in shape and function to a reference ligand. We describe a new method of shape-based virtual screening, volumetric aligned molecular shapes (VAMS). VAMS uses efficient data structures to encode and search molecular shapes. We demonstrate that VAMS is an effective method for shape-based virtual screening and that it can be successfully used as a prefilter to accelerate more computationally demanding search algorithms. Unique to VAMS is a novel minimum/maximum shape constraint query for precisely specifying the desired molecular shape. Shape constraint searches in VAMS are particularly efficient and millions of shapes can be searched in a fraction of a second. We compare the performance of VAMS with two other shape-based virtual screening algorithms a benchmark of 102 protein targets consisting of more than 32 million molecular shapes and find that VAMS provides a competitive trade-off between run-time performance and virtual screening performance.


Asunto(s)
Simulación por Computador , Evaluación Preclínica de Medicamentos/métodos , Estructura Molecular , Algoritmos , Ligandos , Modelos Moleculares , Factores de Tiempo
16.
Nucleic Acids Res ; 40(Web Server issue): W409-14, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22553363

RESUMEN

ZINCPharmer (http://zincpharmer.csb.pitt.edu) is an online interface for searching the purchasable compounds of the ZINC database using the Pharmer pharmacophore search technology. A pharmacophore describes the spatial arrangement of the essential features of an interaction. Compounds that match a well-defined pharmacophore serve as potential lead compounds for drug discovery. ZINCPharmer provides tools for constructing and refining pharmacophore hypotheses directly from molecular structure. A search of 176 million conformers of 18.3 million compounds typically takes less than a minute. The results can be immediately viewed, or the aligned structures may be downloaded for off-line analysis. ZINCPharmer enables the rapid and interactive search of purchasable chemical space.


Asunto(s)
Descubrimiento de Drogas , Programas Informáticos , Gráficos por Computador , Bases de Datos Factuales , Internet , Ligandos , Modelos Moleculares , Preparaciones Farmacéuticas/química , Conformación Proteica , Mapeo de Interacción de Proteínas , Interfaz Usuario-Computador
17.
Nucleic Acids Res ; 40(Web Server issue): W387-92, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22523085

RESUMEN

PocketQuery (http://pocketquery.csb.pitt.edu) is a web interface for exploring the properties of protein-protein interaction (PPI) interfaces with a focus on the discovery of promising starting points for small-molecule design. PocketQuery rapidly focuses attention on the key interacting residues of an interaction using a 'druggability' score that provides an estimate of how likely the chemical mimicry of a cluster of interface residues would result in a small-molecule inhibitor of an interaction. These residue clusters are chemical starting points that can be seamlessly exported to a pharmacophore-based drug discovery workflow. PocketQuery is updated on a weekly basis to contain all applicable PPI structures deposited in the Protein Data Bank and allows users to upload their own custom structures for analysis.


Asunto(s)
Diseño de Fármacos , Mapeo de Interacción de Proteínas , Programas Informáticos , Internet , Modelos Moleculares , Complejos Multiproteicos/química
18.
ArXiv ; 2024 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-38764591

RESUMEN

Diffusion generative models have emerged as a powerful framework for addressing problems in structural biology and structure-based drug design. These models operate directly on 3D molecular structures. Due to the unfavorable scaling of graph neural networks (GNNs) with graph size as well as the relatively slow inference speeds inherent to diffusion models, many existing molecular diffusion models rely on coarse-grained representations of protein structure to make training and inference feasible. However, such coarse-grained representations discard essential information for modeling molecular interactions and impair the quality of generated structures. In this work, we present a novel GNN-based architecture for learning latent representations of molecular structure. When trained end-to-end with a diffusion model for de novo ligand design, our model achieves comparable performance to one with an all-atom protein representation while exhibiting a 3-fold reduction in inference time.

19.
ArXiv ; 2024 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-38745704

RESUMEN

Deep generative models that produce novel molecular structures have the potential to facilitate chemical discovery. Diffusion models currently achieve state of the art performance for 3D molecule generation. In this work, we explore the use of flow matching, a recently proposed generative modeling framework that generalizes diffusion models, for the task of de novo molecule generation. Flow matching provides flexibility in model design; however, the framework is predicated on the assumption of continuously-valued data. 3D de novo molecule generation requires jointly sampling continuous and categorical variables such as atom position and atom type. We extend the flow matching framework to categorical data by constructing flows that are constrained to exist on a continuous representation of categorical data known as the probability simplex. We call this extension SimplexFlow. We explore the use of SimplexFlow for de novo molecule generation. However, we find that, in practice, a simpler approach that makes no accommodations for the categorical nature of the data yields equivalent or superior performance. As a result of these experiments, we present FlowMol, a flow matching model for 3D de novo generative model that achieves improved performance over prior flow matching methods, and we raise important questions about the design of prior distributions for achieving strong performance in flow matching models. Code and trained models for reproducing this work are available at https://github.com/dunni3/FlowMol.

20.
bioRxiv ; 2024 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-38712044

RESUMEN

Embeddings from protein language models (PLM's) capture intricate patterns for protein sequences, enabling more accurate and efficient prediction of protein properties. Incorporating protein structure information as direct input into PLMs results in an improvement on the predictive ability of protein embeddings on downstream tasks. In this work we demonstrate that indirectly infusing structure information into PLMs also leads to performance gains on structure related tasks. The key difference between this framework and others is that at inference time the model does not require access to structure to produce its embeddings.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA