RESUMO
On average, an approved drug currently costs US$2-3 billion and takes more than 10 years to develop1. In part, this is due to expensive and time-consuming wet-laboratory experiments, poor initial hit compounds and the high attrition rates in the (pre-)clinical phases. Structure-based virtual screening has the potential to mitigate these problems. With structure-based virtual screening, the quality of the hits improves with the number of compounds screened2. However, despite the fact that large databases of compounds exist, the ability to carry out large-scale structure-based virtual screening on computer clusters in an accessible, efficient and flexible manner has remained difficult. Here we describe VirtualFlow, a highly automated and versatile open-source platform with perfect scaling behaviour that is able to prepare and efficiently screen ultra-large libraries of compounds. VirtualFlow is able to use a variety of the most powerful docking programs. Using VirtualFlow, we prepared one of the largest and freely available ready-to-dock ligand libraries, with more than 1.4 billion commercially available molecules. To demonstrate the power of VirtualFlow, we screened more than 1 billion compounds and identified a set of structurally diverse molecules that bind to KEAP1 with submicromolar affinity. One of the lead inhibitors (iKeap1) engages KEAP1 with nanomolar affinity (dissociation constant (Kd) = 114 nM) and disrupts the interaction between KEAP1 and the transcription factor NRF2. This illustrates the potential of VirtualFlow to access vast regions of the chemical space and identify molecules that bind with high affinity to target proteins.
Assuntos
Descoberta de Drogas/métodos , Avaliação Pré-Clínica de Medicamentos/métodos , Simulação de Acoplamento Molecular/métodos , Software , Interface Usuário-Computador , Acesso à Informação , Automação/métodos , Automação/normas , Computação em Nuvem , Simulação por Computador , Bases de Dados de Compostos Químicos , Descoberta de Drogas/normas , Avaliação Pré-Clínica de Medicamentos/normas , Proteína 1 Associada a ECH Semelhante a Kelch/antagonistas & inibidores , Proteína 1 Associada a ECH Semelhante a Kelch/química , Proteína 1 Associada a ECH Semelhante a Kelch/metabolismo , Ligantes , Simulação de Acoplamento Molecular/normas , Terapia de Alvo Molecular , Fator 2 Relacionado a NF-E2/metabolismo , Reprodutibilidade dos Testes , Software/normas , TermodinâmicaRESUMO
The docking program PLANTS, which is based on ant colony optimization (ACO) algorithm, has many advanced features for molecular docking. Among them are multiple scoring functions, the possibility to model explicit displaceable water molecules, and the inclusion of experimental constraints. Here, we add support of PLANTS to VirtualFlow (VirtualFlow Ants), which adds a valuable method for primary virtual screenings and rescoring procedures. Furthermore, we have added support of ligand libraries in the MOL2 format, as well as on the fly conversion of ligand libraries which are in the PDBQT format to the MOL2 format to endow VirtualFlow Ants with an increased flexibility regarding the ligand libraries. The on the fly conversion is carried out with Open Babel and the program SPORES. We applied VirtualFlow Ants to a test system involving KEAP1 on the Google Cloud up to 128,000 CPUs, and the observed scaling behavior is approximately linear. Furthermore, we have adjusted several central docking parameters of PLANTS (such as the speed parameter or the number of ants) and screened 10 million compounds for each of the 10 resulting docking scenarios. We analyzed their docking scores and average docking times, which are key factors in virtual screenings. The possibility of carrying out ultra-large virtual screening with PLANTS via VirtualFlow Ants opens new avenues in computational drug discovery.
Assuntos
Algoritmos , Inteligência Artificial , Biologia Computacional/métodos , Simulação de Acoplamento Molecular , Proteína 1 Associada a ECH Semelhante a Kelch/química , Proteína 1 Associada a ECH Semelhante a Kelch/metabolismo , Ligantes , Fator 2 Relacionado a NF-E2/química , Fator 2 Relacionado a NF-E2/metabolismo , Ligação Proteica , Conformação Proteica , Reprodutibilidade dos Testes , TermodinâmicaRESUMO
Markov state models are to date the gold standard for modeling molecular kinetics since they enable the identification and analysis of metastable states and related kinetics in a very instructive manner. The state-of-the-art Markov state modeling methods and tools are very well developed for the modeling of reversible processes in closed equilibrium systems. On the contrary, they are largely not well suited to deal with nonreversible or even nonautonomous processes of nonequilibrium systems. Thus, we generalized the common Robust Perron Cluster Cluster Analysis (PCCA+) method to enable straightforward modeling of nonequilibrium systems as well. The resulting Generalized PCCA (G-PCCA) method readily handles equilibrium as well as nonequilibrium data by utilizing real Schur vectors instead of eigenvectors. This is implemented in the G-PCCA algorithm that enables the semiautomatic coarse graining of molecular kinetics. G-PCCA is not limited to the detection of metastable states but also enables the identification and modeling of cyclic processes. This is demonstrated by three typical examples of nonreversible systems.
RESUMO
Given a time-dependent stochastic process with trajectories x(t) in a space Ω, there may be sets such that the corresponding trajectories only very rarely cross the boundaries of these sets. We can analyze such a process in terms of metastability or coherence. Metastable setsM are defined in space MâΩ, and coherent setsM(t)âΩ are defined in space and time. Hence, if we extend the space Ω by the time-variable t, coherent sets are metastable sets in Ω×[0,∞) of an appropriate space-time process. This relation can be exploited, because there already exist spectral algorithms for the identification of metastable sets. In this article, we show that these well-established spectral algorithms (like PCCA+, Perron Cluster Cluster Analysis) also identify coherent sets of non-autonomous dynamical systems. For the identification of coherent sets, one has to compute a discretization (a matrix T) of the transfer operator of the process using a space-time-discretization scheme. The article gives an overview about different time-discretization schemes and shows their applicability in two different fields of application.
RESUMO
Molecular dynamics (MD) simulations face challenging problems since the time scales of interest often are much longer than what is possible to simulate; and even if sufficiently long simulations are possible the complex nature of the resulting simulation data makes interpretation difficult. Markov State Models (MSMs) help to overcome these problems by making experimentally relevant time scales accessible via coarse grained representations that also allow for convenient interpretation. However, standard set-based MSMs exhibit some caveats limiting their approximation quality and statistical significance. One of the main caveats results from the fact that typical MD trajectories repeatedly re-cross the boundary between the sets used to build the MSM which causes statistical bias in estimating the transition probabilities between these sets. In this article, we present a set-free approach to MSM building utilizing smooth overlapping ansatz functions instead of sets and an adaptive refinement approach. This kind of meshless discretization helps to overcome the recrossing problem and yields an adaptive refinement procedure that allows us to improve the quality of the model while exploring state space and inserting new ansatz functions into the MSM.
RESUMO
The joint analysis of two datasets [Formula: see text] and [Formula: see text] that describe the same phenomena (e.g. the cellular state), but measure disjoint sets of variables (e.g. mRNA vs. protein levels) is currently challenging. Traditional methods typically analyze single interaction patterns such as variance or covariance. However, problem-tailored external knowledge may contain multiple different information about the interaction between the measured variables. We introduce MIASA, a holistic framework for the joint analysis of multiple different variables. It consists of assembling multiple different information such as similarity vs. association, expressed in terms of interaction-scores or distances, for subsequent clustering/classification. In addition, our framework includes a novel qualitative Euclidean embedding method (qEE-Transition) which enables using Euclidean-distance/vector-based clustering/classification methods on datasets that have a non-Euclidean-based interaction structure. As an alternative to conventional optimization-based multidimensional scaling methods which are prone to uncertainties, our qEE-Transition generates a new vector representation for each element of the dataset union [Formula: see text] in a common Euclidean space while strictly preserving the original ordering of the assembled interaction-distances. To demonstrate our work, we applied the framework to three types of simulated datasets: samples from families of distributions, samples from correlated random variables, and time-courses of statistical moments for three different types of stochastic two-gene interaction models. We then compared different clustering methods with vs. without the qEE-Transition. For all examples, we found that the qEE-Transition followed by Ward clustering had superior performance compared to non-agglomerative clustering methods but had a varied performance against ultrametric-based agglomerative methods. We also tested the qEE-Transition followed by supervised and unsupervised machine learning methods and found promising results, however, more work is needed for optimal parametrization of these methods. As a future perspective, our framework points to the importance of more developments and validation of distance-distribution models aiming to capture multiple-complex interactions between different variables.
Assuntos
Algoritmos , Análise por Conglomerados , Humanos , Biologia Computacional/métodosRESUMO
A decomposition of a molecular conformational space into sets or functions (states) allows for a reduced description of the dynamical behavior in terms of transition probabilities between these states. Spectral clustering of the corresponding transition probability matrix can then reveal metastabilities. The more states are used for the decomposition, the smaller the risk to cover multiple conformations with one state, which would make these conformations indistinguishable. However, since the computational complexity of the clustering algorithm increases quadratically with the number of states, it is desirable to have as few states as possible. To balance these two contradictory goals, we present an algorithm for an adaptive decomposition of the position space starting from a very coarse decomposition. The algorithm is applied to small data classification problems where it was shown to be superior to commonly used algorithms, e.g., k-means. We also applied this algorithm to the conformation analysis of a tripeptide molecule where six-dimensional time series are successfully analyzed.
Assuntos
Simulação de Dinâmica Molecular , Oligopeptídeos/análise , Algoritmos , Conformação ProteicaRESUMO
Opioids are essential pharmaceuticals due to their analgesic properties, however, lethal side effects, addiction, and opioid tolerance are extremely challenging. The development of novel molecules targeting the [Formula: see text]-opioid receptor (MOR) in inflamed, but not in healthy tissue, could significantly reduce these unwanted effects. Finding such novel molecules can be achieved by maximizing the binding affinity to the MOR at acidic pH while minimizing it at neutral pH, thus combining two conflicting objectives. Here, this multi-objective optimal affinity approach is presented, together with a virtual drug discovery pipeline for its practical implementation. When applied to finding pH-specific drug candidates, it combines protonation state-dependent structure and ligand preparation with high-throughput virtual screening. We employ this pipeline to characterize a set of MOR agonists identifying a morphine-like opioid derivative with higher predicted binding affinities to the MOR at low pH compared to neutral pH. Our results also confirm existing experimental evidence that NFEPP, a previously described fentanyl derivative with reduced side effects, and recently reported [Formula: see text]-fluorofentanyls and -morphines show an increased specificity for the MOR at acidic pH when compared to fentanyl and morphine. We further applied our approach to screen a >50K ligand library identifying novel molecules with pH-specific predicted binding affinities to the MOR. The presented differential docking pipeline can be applied to perform multi-objective affinity optimization to identify safer and more specific drug candidates at large scale.
RESUMO
Virtual screening-based approaches to discover initial hit and lead compounds have the potential to reduce both the cost and time of early drug discovery stages, as well as to find inhibitors for even challenging target sites such as protein-protein interfaces. Here in this review, we provide an overview of the progress that has been made in virtual screening methodology and technology on multiple fronts in recent years. The advent of ultra-large virtual screens, in which hundreds of millions to billions of compounds are screened, has proven to be a powerful approach to discover highly potent hit compounds. However, these developments are just the tip of the iceberg, with new technologies and methods emerging to propel the field forward. Examples include novel machine-learning approaches, which can reduce the computational costs of virtual screening dramatically, while progress in quantum-mechanical approaches can increase the accuracy of predictions of various small molecule properties.
Assuntos
Aprendizado Profundo , Descoberta de Drogas/métodos , Ligantes , Aprendizado de Máquina , ProteínasRESUMO
The unparalleled global effort to combat the continuing severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic over the last year has resulted in promising prophylactic measures. However, a need still exists for cheap, effective therapeutics, and targeting multiple points in the viral life cycle could help tackle the current, as well as future, coronaviruses. Here, we leverage our recently developed, ultra-large-scale in silico screening platform, VirtualFlow, to search for inhibitors that target SARS-CoV-2. In this unprecedented structure-based virtual campaign, we screened roughly 1 billion molecules against each of 40 different target sites on 17 different potential viral and host targets. In addition to targeting the active sites of viral enzymes, we also targeted critical auxiliary sites such as functionally important protein-protein interactions.
RESUMO
Structure-based virtual screening approaches have the ability to dramatically reduce the time and costs associated to the discovery of new drug candidates. Studies have shown that the true hit rate of virtual screenings improves with the scale of the screened ligand libraries. Therefore, we have recently developed an open source drug discovery platform (VirtualFlow), which is able to routinely carry out ultra-large virtual screenings. One of the primary challenges of molecular docking is the circumstance when the protein is highly dynamic or when the structure of the protein cannot be captured by a static pose. To accommodate protein dynamics, we report the extension of VirtualFlow to allow the docking of ligands using a grey wolf optimization algorithm using the docking program GWOVina, which substantially improves the quality and efficiency of flexible receptor docking compared to AutoDock Vina. We demonstrate the linear scaling behavior of VirtualFlow utilizing GWOVina up to 128 000 CPUs. The newly supported docking method will be valuable for drug discovery projects in which protein dynamics and flexibility play a significant role.
RESUMO
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), previously known as 2019 novel coronavirus (2019-nCoV), has spread rapidly across the globe, creating an unparalleled global health burden and spurring a deepening economic crisis. As of July 7th, 2020, almost seven months into the outbreak, there are no approved vaccines and few treatments available. Developing drugs that target multiple points in the viral life cycle could serve as a strategy to tackle the current as well as future coronavirus pandemics. Here we leverage the power of our recently developed in silico screening platform, VirtualFlow, to identify inhibitors that target SARS-CoV-2. VirtualFlow is able to efficiently harness the power of computing clusters and cloud-based computing platforms to carry out ultra-large scale virtual screens. In this unprecedented structure-based multi-target virtual screening campaign, we have used VirtualFlow to screen an average of approximately 1 billion molecules against each of 40 different target sites on 17 different potential viral and host targets in the cloud. In addition to targeting the active sites of viral enzymes, we also target critical auxiliary sites such as functionally important protein-protein interaction interfaces. This multi-target approach not only increases the likelihood of finding a potent inhibitor, but could also help identify a collection of anti-coronavirus drugs that would retain efficacy in the face of viral mutation. Drugs belonging to different regimen classes could be combined to develop possible combination therapies, and top hits that bind at highly conserved sites would be potential candidates for further development as coronavirus drugs. Here, we present the top 200 in silico hits for each target site. While in-house experimental validation of some of these compounds is currently underway, we want to make this array of potential inhibitor candidates available to researchers worldwide in consideration of the pressing need for fast-tracked drug development.
RESUMO
Markov state models (MSMs) have received an unabated increase in popularity in recent years, as they are very well suited for the identification and analysis of metastable states and related kinetics. However, the state-of-the-art Markov state modeling methods and tools enforce the fulfillment of a detailed balance condition, restricting their applicability to equilibrium MSMs. To date, they are unsuitable to deal with general dominant data structures including cyclic processes, which are essentially associated with nonequilibrium systems. To overcome this limitation, we developed a generalization of the common robust Perron Cluster Cluster Analysis (PCCA+) method, termed generalized PCCA (G-PCCA). This method handles equilibrium and nonequilibrium simulation data, utilizing Schur vectors instead of eigenvectors. G-PCCA is not limited to the detection of metastable states but enables the identification of dominant structures in a general sense, unraveling cyclic processes. This is exemplified by application of G-PCCA on nonequilibrium molecular dynamics data of the Amyloid ß (1-40) peptide, periodically driven by an oscillating electric field.
Assuntos
Peptídeos beta-Amiloides/química , Fragmentos de Peptídeos/química , Algoritmos , Análise por Conglomerados , Eletricidade , Cinética , Cadeias de Markov , Simulação de Dinâmica MolecularRESUMO
With the help of theoretical calculations we explain the phenomenon of nonplanarity of crystalline alternariol. We find out that the different orientations of the hydroxyl groups of alternariol influence its planarity and aromaticity and lead to different twists of the structure. The presence of the intramolecular hydrogen bond stabilizes the planar geometry while the loss of the bond results in a twist of over 14°. This effect is thought to be involved while cutting DNA strands by alternariol.