RESUMO
Recent advances in cryo-electron microscopy (cryo-EM) have enabled modeling macromolecular complexes that are essential components of the cellular machinery. The density maps derived from cryo-EM experiments are often integrated with manual, knowledge-driven or artificial intelligence-driven and physics-guided computational methods to build, fit, and refine molecular structures. Going beyond a single stationary-structure determination scheme, it is becoming more common to interpret the experimental data with an ensemble of models that contributes to an average observation. Hence, there is a need to decide on the quality of an ensemble of protein structures on-the-fly while refining them against the density maps. We introduce such an adaptive decision-making scheme during the molecular dynamics flexible fitting (MDFF) of biomolecules. Using RADICAL-Cybertools, the new RADICAL augmented MDFF implementation (R-MDFF) is examined in high-performance computing environments for refinement of two prototypical protein systems, adenylate kinase and carbon monoxide dehydrogenase. For these test cases, use of multiple replicas in flexible fitting with adaptive decision making in R-MDFF improves the overall correlation to the density by 40% relative to the refinements of the brute-force MDFF. The improvements are particularly significant at high, 2-3 Å map resolutions. More importantly, the ensemble model captures key features of biologically relevant molecular dynamics that are inaccessible to a single-model interpretation. Finally, the pipeline is applicable to systems of growing sizes, which is demonstrated using ensemble refinement of capsid proteins from the chimpanzee adenovirus. The overhead for decision making remains low and robust to computing environments. The software is publicly available on GitHub and includes a short user guide to install R-MDFF on different computing environments, from local Linux-based workstations to high-performance computing environments.
Assuntos
Inteligência Artificial , Simulação de Dinâmica Molecular , Microscopia Crioeletrônica , Microscopia Eletrônica , Adenilato QuinaseRESUMO
We seek to completely revise current models of airborne transmission of respiratory viruses by providing never-before-seen atomic-level views of the SARS-CoV-2 virus within a respiratory aerosol. Our work dramatically extends the capabilities of multiscale computational microscopy to address the significant gaps that exist in current experimental methods, which are limited in their ability to interrogate aerosols at the atomic/molecular level and thus obscure our understanding of airborne transmission. We demonstrate how our integrated data-driven platform provides a new way of exploring the composition, structure, and dynamics of aerosols and aerosolized viruses, while driving simulation method development along several important axes. We present a series of initial scientific discoveries for the SARS-CoV-2 Delta variant, noting that the full scientific impact of this work has yet to be realized.
RESUMO
Despite the recent availability of vaccines against the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the search for inhibitory therapeutic agents has assumed importance especially in the context of emerging new viral variants. In this paper, we describe the discovery of a novel noncovalent small-molecule inhibitor, MCULE-5948770040, that binds to and inhibits the SARS-Cov-2 main protease (Mpro) by employing a scalable high-throughput virtual screening (HTVS) framework and a targeted compound library of over 6.5 million molecules that could be readily ordered and purchased. Our HTVS framework leverages the U.S. supercomputing infrastructure achieving nearly 91% resource utilization and nearly 126 million docking calculations per hour. Downstream biochemical assays validate this Mpro inhibitor with an inhibition constant (Ki) of 2.9 µM (95% CI 2.2, 4.0). Furthermore, using room-temperature X-ray crystallography, we show that MCULE-5948770040 binds to a cleft in the primary binding site of Mpro forming stable hydrogen bond and hydrophobic interactions. We then used multiple µs-time scale molecular dynamics (MD) simulations and machine learning (ML) techniques to elucidate how the bound ligand alters the conformational states accessed by Mpro, involving motions both proximal and distal to the binding site. Together, our results demonstrate how MCULE-5948770040 inhibits Mpro and offers a springboard for further therapeutic design.
Assuntos
COVID-19 , Inibidores de Proteases , Antivirais , Proteases 3C de Coronavírus , Humanos , Simulação de Acoplamento Molecular , Simulação de Dinâmica Molecular , Ácido Orótico/análogos & derivados , Piperazinas , SARS-CoV-2RESUMO
We develop a generalizable AI-driven workflow that leverages heterogeneous HPC resources to explore the time-dependent dynamics of molecular systems. We use this workflow to investigate the mechanisms of infectivity of the SARS-CoV-2 spike protein, the main viral infection machinery. Our workflow enables more efficient investigation of spike dynamics in a variety of complex environments, including within a complete SARS-CoV-2 viral envelope simulation, which contains 305 million atoms and shows strong scaling on ORNL Summit using NAMD. We present several novel scientific discoveries, including the elucidation of the spike's full glycan shield, the role of spike glycans in modulating the infectivity of the virus, and the characterization of the flexible interactions between the spike and the human ACE2 receptor. We also demonstrate how AI can accelerate conformational sampling across different systems and pave the way for the future application of such methods to additional studies in SARS-CoV-2 and other molecular systems.
RESUMO
BACKGROUND: Resistance to chemotherapy and molecularly targeted therapies is a major factor in limiting the effectiveness of cancer treatment. In many cases, resistance can be linked to genetic changes in target proteins, either pre-existing or evolutionarily selected during treatment. Key to overcoming this challenge is an understanding of the molecular determinants of drug binding. Using multi-stage pipelines of molecular simulations we can gain insights into the binding free energy and the residence time of a ligand, which can inform both stratified and personal treatment regimes and drug development. To support the scalable, adaptive and automated calculation of the binding free energy on high-performance computing resources, we introduce the High-throughput Binding Affinity Calculator (HTBAC). HTBAC uses a building block approach in order to attain both workflow flexibility and performance. RESULTS: We demonstrate close to perfect weak scaling to hundreds of concurrent multi-stage binding affinity calculation pipelines. This permits a rapid time-to-solution that is essentially invariant of the calculation protocol, size of candidate ligands and number of ensemble simulations. CONCLUSIONS: As such, HTBAC advances the state of the art of binding affinity calculations and protocols. HTBAC provides the platform to enable scientists to study a wide range of cancer drugs and candidate ligands in order to support personalized clinical decision making based on genome sequencing and drug discovery.
Assuntos
Ensaios de Triagem em Larga Escala/métodos , Ligação Proteica/fisiologia , HumanosRESUMO
Protein-ligand docking is a computational method for identifying drug leads. The method is capable of narrowing a vast library of compounds down to a tractable size for downstream simulation or experimental testing and is widely used in drug discovery. While there has been progress in accelerating scoring of compounds with artificial intelligence, few works have bridged these successes back to the virtual screening community in terms of utility and forward-looking development. We demonstrate the power of high-speed ML models by scoring 1 billion molecules in under a day (50 k predictions per GPU seconds). We showcase a workflow for docking utilizing surrogate AI-based models as a pre-filter to a standard docking workflow. Our workflow is ten times faster at screening a library of compounds than the standard technique, with an error rate less than 0.01% of detecting the underlying best scoring 0.1% of compounds. Our analysis of the speedup explains that another order of magnitude speedup must come from model accuracy rather than computing speed. In order to drive another order of magnitude of acceleration, we share a benchmark dataset consisting of 200 million 3D complex structures and 2D structure scores across a consistent set of 13 million "in-stock" molecules over 15 receptors, or binding sites, across the SARS-CoV-2 proteome. We believe this is strong evidence for the community to begin focusing on improving the accuracy of surrogate models to improve the ability to screen massive compound libraries 100 × or even 1000 × faster than current techniques and reduce missing top hits. The technique outlined aims to be a fast drop-in replacement for docking for screening billion-scale molecular libraries.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/metabolismo , Inteligência Artificial , Simulação de Acoplamento Molecular , Ligantes , Proteínas/metabolismoRESUMO
This new dataset is an ensemble of solar photovoltaic energy production simulations over the continental US. The simulations are carried out in three steps. First, a weather forecast system is used for the predictions of incoming insolation; then, forecast ensembles with 21 members are generated using the Analog Ensemble technique; finally, each ensemble member is used to simulate 13 different solar panels. In total, there are 21 × 13 = 273 simulated scenarios. Simulations are carried out for the entire year 2019, with a temporal resolution of one hour, and a spatial resolution of 12 km. The data provide a high spatio-temporal analysis of the power production under different weather and engineering scenarios. The size of the entire dataset is about 1 TB but can be openly accessed by days and scenarios. Details on how to access and use such a dataset are provided in this article.
RESUMO
Molecular dynamics or MD simulation is gradually maturing into a tool for constructing in vivo models of living cells in atomistic details. The feasibility of such models is bolstered by integrating the simulations with data from microscopic, tomographic and spectroscopic experiments on exascale supercomputers, facilitated by the use of deep learning technologies. Over time, MD simulation has evolved from tens of thousands of atoms to over 100 million atoms comprising an entire cell organelle, a photosynthetic chromatophore vesicle from a purple bacterium. In this chapter, we present a step-by-step outline for preparing, executing and analyzing such large-scale MD simulations of biological systems that are essential to life processes. All scripts are provided via GitHub.
Assuntos
Bactérias/citologia , Cromatóforos Bacterianos/química , Biologia Computacional/métodos , Bactérias/química , Aprendizado Profundo , Simulação de Dinâmica MolecularRESUMO
The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods, in this case, developed for linear accelerators, and physics-based methods. The two in silico methods, each have their own advantages and limitations which, interestingly, complement each other. Here, we present an innovative infrastructural development that combines both approaches to accelerate drug discovery. The scale of the potential resulting workflow is such that it is dependent on supercomputing to achieve extremely high throughput. We have demonstrated the viability of this workflow for the study of inhibitors for four COVID-19 target proteins and our ability to perform the required large-scale calculations to identify lead antiviral compounds through repurposing on a variety of supercomputers.
RESUMO
We seek to completely revise current models of airborne transmission of respiratory viruses by providing never-before-seen atomic-level views of the SARS-CoV-2 virus within a respiratory aerosol. Our work dramatically extends the capabilities of multiscale computational microscopy to address the significant gaps that exist in current experimental methods, which are limited in their ability to interrogate aerosols at the atomic/molecular level and thus ob-scure our understanding of airborne transmission. We demonstrate how our integrated data-driven platform provides a new way of exploring the composition, structure, and dynamics of aerosols and aerosolized viruses, while driving simulation method development along several important axes. We present a series of initial scientific discoveries for the SARS-CoV-2 Delta variant, noting that the full scientific impact of this work has yet to be realized. ACM REFERENCE FORMAT: Abigail Dommer 1 , Lorenzo Casalino 1 , Fiona Kearns 1 , Mia Rosenfeld 1 , Nicholas Wauer 1 , Surl-Hee Ahn 1 , John Russo, 2 Sofia Oliveira 3 , Clare Morris 1 , AnthonyBogetti 4 , AndaTrifan 5,6 , Alexander Brace 5,7 , TerraSztain 1,8 , Austin Clyde 5,7 , Heng Ma 5 , Chakra Chennubhotla 4 , Hyungro Lee 9 , Matteo Turilli 9 , Syma Khalid 10 , Teresa Tamayo-Mendoza 11 , Matthew Welborn 11 , Anders Christensen 11 , Daniel G. A. Smith 11 , Zhuoran Qiao 12 , Sai Krishna Sirumalla 11 , Michael O'Connor 11 , Frederick Manby 11 , Anima Anandkumar 12,13 , David Hardy 6 , James Phillips 6 , Abraham Stern 13 , Josh Romero 13 , David Clark 13 , Mitchell Dorrell 14 , Tom Maiden 14 , Lei Huang 15 , John McCalpin 15 , Christo- pherWoods 3 , Alan Gray 13 , MattWilliams 3 , Bryan Barker 16 , HarindaRajapaksha 16 , Richard Pitts 16 , Tom Gibbs 13 , John Stone 6 , Daniel Zuckerman 2 *, Adrian Mulholland 3 *, Thomas MillerIII 11,12 *, ShantenuJha 9 *, Arvind Ramanathan 5 *, Lillian Chong 4 *, Rommie Amaro 1 *. 2021. #COVIDisAirborne: AI-Enabled Multiscale Computational Microscopy ofDeltaSARS-CoV-2 in a Respiratory Aerosol. In Supercomputing '21: International Conference for High Perfor-mance Computing, Networking, Storage, and Analysis . ACM, New York, NY, USA, 14 pages. https://doi.org/finalDOI.
RESUMO
We develop a generalizable AI-driven workflow that leverages heterogeneous HPC resources to explore the time-dependent dynamics of molecular systems. We use this workflow to investigate the mechanisms of infectivity of the SARS-CoV-2 spike protein, the main viral infection machinery. Our workflow enables more efficient investigation of spike dynamics in a variety of complex environments, including within a complete SARS-CoV-2 viral envelope simulation, which contains 305 million atoms and shows strong scaling on ORNL Summit using NAMD. We present several novel scientific discoveries, including the elucidation of the spike's full glycan shield, the role of spike glycans in modulating the infectivity of the virus, and the characterization of the flexible interactions between the spike and the human ACE2 receptor. We also demonstrate how AI can accelerate conformational sampling across different systems and pave the way for the future application of such methods to additional studies in SARS-CoV-2 and other molecular systems.
RESUMO
Cloud computing has been increasingly adopted by users and providers to promote a flexible, scalable and tailored access to computing resources. Nonetheless, the consolidation of this paradigm has uncovered some of its limitations. Initially devised by corporations with direct control over large amounts of computational resources, cloud computing is now being endorsed by organizations with limited resources or with a more articulated, less direct control over these resources. The challenge for these organizations is to leverage the benefits of cloud computing while dealing with limited and often widely distributed computing resources. This study focuses on the adoption of cloud computing by higher education institutions and addresses two main issues: flexible and on-demand access to a large amount of storage resources, and scalability across a heterogeneous set of cloud infrastructures. The proposed solutions leverage a federated approach to cloud resources in which users access multiple and largely independent cloud infrastructures through a highly customizable broker layer. This approach allows for a uniform authentication and authorization infrastructure, a fine-grained policy specification and the aggregation of accounting and monitoring. Within a loosely coupled federation of cloud infrastructures, users can access vast amount of data without copying them across cloud infrastructures and can scale their resource provisions when the local cloud resources become insufficient.