Search | VHL Regional Portal

Sunseri, Jocelyn; Koes, David Ryan.

Molecules ; 26(23)2021 Dec 04.

Article in English | MEDLINE | ID: mdl-34885952

ABSTRACT

Virtual screening-predicting which compounds within a specified compound library bind to a target molecule, typically a protein-is a fundamental task in the field of drug discovery. Doing virtual screening well provides tangible practical benefits, including reduced drug development costs, faster time to therapeutic viability, and fewer unforeseen side effects. As with most applied computational tasks, the algorithms currently used to perform virtual screening feature inherent tradeoffs between speed and accuracy. Furthermore, even theoretically rigorous, computationally intensive methods may fail to account for important effects relevant to whether a given compound will ultimately be usable as a drug. Here we investigate the virtual screening performance of the recently released Gnina molecular docking software, which uses deep convolutional networks to score protein-ligand structures. We find, on average, that Gnina outperforms conventional empirical scoring. The default scoring in Gnina outperforms the empirical AutoDock Vina scoring function on 89 of the 117 targets of the DUD-E and LIT-PCBA virtual screening benchmarks with a median 1% early enrichment factor that is more than twice that of Vina. However, we also find that issues of bias linger in these sets, even when not used directly to train models, and this bias obfuscates to what extent machine learning models are achieving their performance through a sophisticated interpretation of molecular interactions versus fitting to non-informative simplistic property distributions.

Subject(s)

Drug Design , Drug Discovery , Software , Deep Learning , Drug Design/methods , Drug Discovery/methods , Humans , Molecular Docking Simulation

GNINA 1.0: molecular docking with deep learning.

McNutt, Andrew T; Francoeur, Paul; Aggarwal, Rishal; Masuda, Tomohide; Meli, Rocco; Ragoza, Matthew; Sunseri, Jocelyn; Koes, David Ryan.

J Cheminform ; 13(1): 43, 2021 Jun 09.

Article in English | MEDLINE | ID: mdl-34108002

ABSTRACT

Molecular docking computationally predicts the conformation of a small molecule when binding to a receptor. Scoring functions are a vital piece of any molecular docking pipeline as they determine the fitness of sampled poses. Here we describe and evaluate the 1.0 release of the Gnina docking software, which utilizes an ensemble of convolutional neural networks (CNNs) as a scoring function. We also explore an array of parameter values for Gnina 1.0 to optimize docking performance and computational cost. Docking performance, as evaluated by the percentage of targets where the top pose is better than 2Å root mean square deviation (Top1), is compared to AutoDock Vina scoring when utilizing explicitly defined binding pockets or whole protein docking. GNINA, utilizing a CNN scoring function to rescore the output poses, outperforms AutoDock Vina scoring on redocking and cross-docking tasks when the binding pocket is defined (Top1 increases from 58% to 73% and from 27% to 37%, respectively) and when the whole protein defines the binding pocket (Top1 increases from 31% to 38% and from 12% to 16%, respectively). The derived ensemble of CNNs generalizes to unseen proteins and ligands and produces scores that correlate well with the root mean square deviation to the known binding pose. We provide the 1.0 version of GNINA under an open source license for use as a molecular docking tool at https://github.com/gnina/gnina .

Three-Dimensional Convolutional Neural Networks and a Cross-Docked Data Set for Structure-Based Drug Design.

Francoeur, Paul G; Masuda, Tomohide; Sunseri, Jocelyn; Jia, Andrew; Iovanisci, Richard B; Snyder, Ian; Koes, David R.

J Chem Inf Model ; 60(9): 4200-4215, 2020 09 28.

Article in English | MEDLINE | ID: mdl-32865404

ABSTRACT

One of the main challenges in drug discovery is predicting protein-ligand binding affinity. Recently, machine learning approaches have made substantial progress on this task. However, current methods of model evaluation are overly optimistic in measuring generalization to new targets, and there does not exist a standard data set of sufficient size to compare performance between models. We present a new data set for structure-based machine learning, the CrossDocked2020 set, with 22.5 million poses of ligands docked into multiple similar binding pockets across the Protein Data Bank, and perform a comprehensive evaluation of grid-based convolutional neural network (CNN) models on this data set. We also demonstrate how the partitioning of the training data and test data can impact the results of models trained with the PDBbind data set, how performance improves by adding more lower-quality training data, and how training with docked poses imparts pose sensitivity to the predicted affinity of a complex. Our best performing model, an ensemble of five densely connected CNNs, achieves a root mean squared error of 1.42 and Pearson R of 0.612 on the affinity prediction task, an AUC of 0.956 at binding pose classification, and a 68.4% accuracy at pose selection on the CrossDocked2020 set. By providing data splits for clustered cross-validation and the raw data for the CrossDocked2020 set, we establish the first standardized data set for training machine learning models to recognize ligands in noncognate target structures while also greatly expanding the number of poses available for training. In order to facilitate community adoption of this data set for benchmarking protein-ligand binding affinity prediction, we provide our models, weights, and the CrossDocked2020 set at https://github.com/gnina/models.

Subject(s)

Drug Design , Neural Networks, Computer , Databases, Protein , Ligands , Protein Binding

libmolgrid: Graphics Processing Unit Accelerated Molecular Gridding for Deep Learning Applications.

Sunseri, Jocelyn; Koes, David R.

J Chem Inf Model ; 60(3): 1079-1084, 2020 03 23.

Article in English | MEDLINE | ID: mdl-32049525

ABSTRACT

We describe libmolgrid, a general-purpose library for representing three-dimensional molecules using multidimensional arrays of voxelized molecular data. libmolgrid provides functionality for sampling batches of data suited to machine learning workflows, and it also supports temporal and spatial recurrences over that data to facilitate work with convolutional and recurrent neural networks. It was designed for seamless integration with popular deep learning frameworks and features optimized performance by leveraging graphics processing units (GPUs). libmolgrid is a free and open source project (GPLv2) that aims to democratize grid-based modeling in computational chemistry.

Subject(s)

Deep Learning , Machine Learning , Neural Networks, Computer

Convolutional neural network scoring and minimization in the D3R 2017 community challenge.

Sunseri, Jocelyn; King, Jonathan E; Francoeur, Paul G; Koes, David Ryan.

J Comput Aided Mol Des ; 33(1): 19-34, 2019 01.

Article in English | MEDLINE | ID: mdl-29992528

ABSTRACT

We assess the ability of our convolutional neural network (CNN)-based scoring functions to perform several common tasks in the domain of drug discovery. These include correctly identifying ligand poses near and far from the true binding mode when given a set of reference receptors and classifying ligands as active or inactive using structural information. We use the CNN to re-score or refine poses generated using a conventional scoring function, Autodock Vina, and compare the performance of each of these methods to using the conventional scoring function alone. Furthermore, we assess several ways of choosing appropriate reference receptors in the context of the D3R 2017 community benchmarking challenge. We find that our CNN scoring function outperforms Vina on most tasks without requiring manual inspection by a knowledgeable operator, but that the pose prediction target chosen for the challenge, Cathepsin S, was particularly challenging for de novo docking. However, the CNN provided best-in-class performance on several virtual screening tasks, underscoring the relevance of deep learning to the field of drug discovery.

Subject(s)

Cathepsins/chemistry , Molecular Docking Simulation , Neural Networks, Computer , Algorithms , Binding Sites , Databases, Protein , Drug Discovery/methods , Ligands , Protein Binding , Protein Conformation , Structure-Activity Relationship

Protein-Ligand Scoring with Convolutional Neural Networks.

Ragoza, Matthew; Hochuli, Joshua; Idrobo, Elisa; Sunseri, Jocelyn; Koes, David Ryan.

J Chem Inf Model ; 57(4): 942-957, 2017 04 24.

Article in English | MEDLINE | ID: mdl-28368587

ABSTRACT

Computational approaches to drug discovery can reduce the time and cost associated with experimental assays and enable the screening of novel chemotypes. Structure-based drug design methods rely on scoring functions to rank and predict binding affinities and poses. The ever-expanding amount of protein-ligand binding and structural data enables the use of deep machine learning techniques for protein-ligand scoring. We describe convolutional neural network (CNN) scoring functions that take as input a comprehensive three-dimensional (3D) representation of a protein-ligand interaction. A CNN scoring function automatically learns the key features of protein-ligand interactions that correlate with binding. We train and optimize our CNN scoring functions to discriminate between correct and incorrect binding poses and known binders and nonbinders. We find that our CNN scoring function outperforms the AutoDock Vina scoring function when ranking poses both for pose prediction and virtual screening.

Subject(s)

Computational Biology/methods , Neural Networks, Computer , Proteins/metabolism , Drug Evaluation, Preclinical , Ligands , Models, Molecular , Protein Conformation , Proteins/chemistry , User-Computer Interface

Open source molecular modeling.

Pirhadi, Somayeh; Sunseri, Jocelyn; Koes, David Ryan.

J Mol Graph Model ; 69: 127-43, 2016 09.

Article in English | MEDLINE | ID: mdl-27631126

ABSTRACT

The success of molecular modeling and computational chemistry efforts are, by definition, dependent on quality software applications. Open source software development provides many advantages to users of modeling applications, not the least of which is that the software is free and completely extendable. In this review we categorize, enumerate, and describe available open source software packages for molecular modeling and computational chemistry. An updated online version of this catalog can be found at https://opensourcemolecularmodeling.github.io.

Subject(s)

Models, Molecular , Drug Evaluation, Preclinical , Internet , Ligands , Quantitative Structure-Activity Relationship , Quantum Theory , Software , Thermodynamics

A D3R prospective evaluation of machine learning for protein-ligand scoring.

Sunseri, Jocelyn; Ragoza, Matthew; Collins, Jasmine; Koes, David Ryan.

J Comput Aided Mol Des ; 30(9): 761-771, 2016 09.

Article in English | MEDLINE | ID: mdl-27592011

ABSTRACT

We assess the performance of several machine learning-based scoring methods at protein-ligand pose prediction, virtual screening, and binding affinity prediction. The methods and the manner in which they were trained make them sufficiently diverse to evaluate the utility of various strategies for training set curation and binding pose generation, but they share a novel approach to classification in the context of protein-ligand scoring. Rather than explicitly using structural data such as affinity values or information extracted from crystal binding poses for training, we instead exploit the abundance of data available from high-throughput screening to approach the problem as one of discriminating binders from non-binders. We evaluate the performance of our various scoring methods in the 2015 D3R Grand Challenge and find that although the merits of some features of our approach remain inconclusive, our scoring methods performed comparably to a state-of-the-art scoring function that was fit to binding affinity data.

Subject(s)

Computational Biology/methods , Machine Learning , Molecular Docking Simulation , Proteins/chemistry , Algorithms , Binding Sites , HSP90 Heat-Shock Proteins/chemistry , Humans , Ligands , Prospective Studies , Protein Binding

Pharmit: interactive exploration of chemical space.

Sunseri, Jocelyn; Koes, David Ryan.

Nucleic Acids Res ; 44(W1): W442-8, 2016 07 08.

Article in English | MEDLINE | ID: mdl-27095195

ABSTRACT

Pharmit (http://pharmit.csb.pitt.edu) provides an online, interactive environment for the virtual screening of large compound databases using pharmacophores, molecular shape and energy minimization. Users can import, create and edit virtual screening queries in an interactive browser-based interface. Queries are specified in terms of a pharmacophore, a spatial arrangement of the essential features of an interaction, and molecular shape. Search results can be further ranked and filtered using energy minimization. In addition to a number of pre-built databases of popular compound libraries, users may submit their own compound libraries for screening. Pharmit uses state-of-the-art sub-linear algorithms to provide interactive screening of millions of compounds. Queries typically take a few seconds to a few minutes depending on their complexity. This allows users to iteratively refine their search during a single session. The easy access to large chemical datasets provided by Pharmit simplifies and accelerates structure-based drug design. Pharmit is available under a dual BSD/GPL open-source license.

Subject(s)

Databases, Chemical , Drug Evaluation, Preclinical/methods , Internet , Pharmaceutical Preparations/chemistry , Software , User-Computer Interface , Algorithms , CSK Tyrosine-Protein Kinase , Databases, Protein , Drug Design , Thermodynamics , src-Family Kinases/chemistry , src-Family Kinases/metabolism

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL